Job Closed
This listing is no longer active.
Build in a weekend. Scale to millions.
Data Platform Reliability Engineer, Postgres
Location
Worldwide
Posted
106 days ago
Salary
0
Seniority
Senior
Job Description
Data Platform Reliability Engineer, Postgres
Supabase
• Manage the lifecycle of Postgres databases - platform RDS clusters and customer project databases. • Design and execute strategies for low-downtime major version upgrades and database migrations. • Proactively identify and resolve database performance issues before they impact users. • Build and maintain comprehensive monitoring, alerting, and observability for database systems. • Write detailed run books, technical documentation, and operational guides. • Identify reliability risks and implement preventative measures. • Participate in on-call rotation to support our global platform. • Work with development teams to optimize database schema and query patterns. • Analyze and optimize slow queries, connection pooling, and resource utilization. • Tune Postgres configurations for different workload patterns. • Monitor and address database bloat, vacuum strategies, and WAL management. • Partner with platform engineers, product teams, and SREs to deliver reliable database services. • Communicate database changes and maintenance windows clearly to stakeholders. • Share knowledge and mentor team members on Postgres best practices.
Job Requirements
- Deep understanding of Postgres internals, architecture, and advanced features.
- Production experience with replication (logical and physical), backups, and disaster recovery.
- Strong command of query optimization, EXPLAIN plans, indexing strategies, and performance tuning.
- Experience managing Postgres at scale in cloud environments.
- Hands-on experience with AWS RDS for platform infrastructure.
- Familiarity with cloud infrastructure concepts, networking, and storage systems.
- Understanding of IaC tools and automation approaches.
- Experience with other cloud database services (GCP Cloud SQL, Azure Database) is a plus.
- Track record of maintaining high-availability database systems.
- Obsessive about monitoring, observability, and measuring what matters.
- Proactive approach to identifying and mitigating risks.
- Experience with production troubleshooting and supporting live systems.
- You write clear run books and technical documentation.
- You're good at explaining complex database concepts to different audiences.
- You record decisions and share knowledge effectively in async environments.
- You thrive operating independently with high-level guidance.
- You see problems through to resolution, not just escalation.
- You automate repetitive tasks and build tools to make the team more effective.
- Proficiency in TypeScript or Go (we can teach these).
- Experience with Postgres extensions and customization.
- Contributions to Postgres or database-related open source projects.
- Familiarity with backup tools like WAL-G, pgBackRest, or Barman.
- Experience with database migration strategies and tooling.
- Background in SRE or DevOps practices.
Benefits
- Fully Remote
- ESOP
- Tech Allowance
- Health Benefits
- Annual Off-Sites
- Flexible Work
- Professional Development
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Audit Prometheus scrape targets, exporters, and metric endpoints • Review Grafana dashboards, alert rules, and data sources • Assess log coverage across Kibana and Loki • Map monitoring coverage across application, infrastructure, database, ingress, and platform layers • Identify missing exporters, stale dashboards, broken panels, and alert gaps • Analyze historical metrics to establish performance baselines • Define SLOs, KPIs, warning thresholds, and breach thresholds • Suggest Prometheus alert rules and Alertmanager routing strategies • Implement KPI and SLO alerts within Grafana alert management • Evaluate Kubernetes cluster topology and infrastructure usage patterns • Recommend architecture optimizations based on observed load and behavior • Document findings in structured audit and advisory reports • Participate in weekly syncs and structured handover sessions
DevOps Manager
BlastPointA.I.-driven customer intelligence tools that give companies the power to discover & engage the humans in their data.
• Ensure high availability, fault tolerance, and scalability of cloud services • Optimize performance and cost efficiency across AWS environments • Lead and mentor a small team of DevOps engineers, fostering a culture of innovation, collaboration, and accountability • Balance hands-on contributions with strategic leadership, leading by example to ensure smooth execution of DevOps initiatives • Design, deploy, and maintain BlastPoint’s AWS-based infrastructure using Terraform • Own the SOC 2 certification and compliance monitoring process • Implement security best practices, including IAM policies, encryption, vulnerability management, and incident response. • Enhance and maintain CI/CD pipelines using GitHub Actions to improve developer productivity and deployment speed • Collaborate with software engineers to streamline build, testing, and release processes • Implement observability, logging, and monitoring solutions to proactively detect and resolve issues. • Establish best practices for disaster recovery, data backup, and infrastructure resilience.
• Proactively explore and implement AI tools, LLM integrations, and MCP (Model Context Protocol) to reduce routine database toil, optimize query performance, and accelerate incident resolution. • Support our data warehouse ecosystem by optimizing Snowflake performance, including application packaging and testing. • Own the deep-level optimization of MSSQL (crucial for on-call stability) and PostgreSQL at the server, database, and query levels. • Forecast resource utilization across platforms. Identify cost-saving opportunities, optimize Snowflake credit usage, and right-size AWS infrastructure. • Automate all data infrastructure using Terraform, AWS, Docker, and Kubernetes. You will manage containerized data services and stateful workloads. • Manage and optimize deployment pipelines using GitLab and Octopus Deploy, ensuring safe, repeatable database schema changes. • Create technical documentation, including runbooks, "how-to" guides for developer self-service, and clear architectural diagrams. • Serve as the subject matter expert for SQL Server, Postgres, and Snowflake in a 24/7/365 on-call rotation.
• Deliver the ADO Environment Current-State Assessment Report identifying gaps in configurations, pipelines, and workflow structures (Deliverable A1) • Develop and execute the ADO Configuration Modernization Plan; implement updated ADO configurations including work item hierarchies, custom fields, sprint boards, and Kanban views (Deliverables A2/A3) • Design and deploy reusable CI/CD pipeline templates for Azure Databricks notebook deployment, data validation, and automated reporting (Deliverable A5) • Configure end-to-end DataOps integration: ADO Repos → Databricks notebooks → automated Power BI dashboard refresh — reducing manual effort by 80%+ through workflow automation • Build Power Automate workflows for governance approvals, policy triggers, and document routing integrated with ADO and SharePoint • Design and deploy GMCB's SharePoint Knowledge Management Library including architecture, document taxonomy, metadata schema, and content migration (Deliverables C1/C2) • Develop ADO Analytics dashboards for sprint velocity, governance compliance, data quality, and operational KPIs • Implement traceable work item linkage between ADO Epic-Feature-Story-Task structures and Azure Databricks development artifacts • Develop the ADO Integration Deployment Package including configuration documentation, runbooks, and administrator guides (Deliverable A5) • Support Agile pilot sprints by configuring and validating ADO Board workflows; provide hands-on technical support during adoption phase




