Build in a weekend. Scale to millions.
Customer Reliability Engineer
Location
United States
Posted
111 days ago
Salary
0
Seniority
Senior
Job Description
Customer Reliability Engineer
Supabase
• Apply SRE principles to Customer Success • Detect issues commonly occurring in the platform • Proactively find improvements in the platform • Work on escalations and longer-running, more complex technical cases • Assist those using the Supabase platform with complex and/or long-running issues • Deliver on synchronous and asynchronous engagements with Supabase customers • Serve as an internal champion for the platform and how customers use it.
Job Requirements
- 6+ years of relevant work experience in Database Engineering, Infrastructure Engineering, Solution Architecture or similar.
- Strong background with relational database management systems such as PostgreSQL or MySQL.
- Background in web application development, with familiarity in Python, Typescript and popular JavaScript frameworks (React, Vue, Svelte) as well as Node.js.
- Very strong communication skills, particularly of technical concepts.
- Experience with project management, business analysis and revenue operations tools.
- Previous experience leading customer-facing engagements to deliver meaningful technical and business value.
- Driven and team-focused.
Benefits
- Fully Remote
- ESOP
- Tech Allowance
- Health Benefits
- Annual Off-Sites
- Flexible Work
- Professional Development
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Site Reliability Engineer – AI & ML Infrastructure, Kubernetes, Terraform
DeepgramBuilding foundational AI for speech transcription and understanding.
• Architect and maintain our core computing platform using Kubernetes on AWS and on-premise, providing a stable, scalable environment for all applications and services. • Develop and manage our entire infrastructure using Infrastructure-as-Code (IaC) principles with Terraform, ensuring our environments are reproducible, versioned, and automated. • Design, build, and optimize our AI/ML job scheduling and orchestration systems, integrating Slurm with our Kubernetes clusters to efficiently manage GPU resources. • Provision, manage, and maintain our on-premise bare metal server infrastructure for high-performance GPU computing. • Implement and manage the platform's networking (CNI, service mesh) and storage (CSI, S3) solutions to support high-throughput, low-latency workloads across hybrid environments. • Develop a comprehensive observability stack (monitoring, logging, tracing) to ensure platform health, and create automation for operational tasks, incident response, and performance tuning. • Collaborate with AI researchers and ML engineers to understand their infrastructure needs and build the tools and workflows that accelerate their development cycle. • Automate the life cycle of single-tenant, managed deployments
Senior DevOps Software Engineer
eClinical SolutionsWe bring people and data together to support tomorrow’s breakthroughs
• Design, develop, test, and deploy scalable, secure, and highly interactive web applications • Own and evolve core platform modules • Influence application and system architecture • Lead by example through clean, well-tested code • Collaborate closely with Product Management, QA, and other engineers • Provide technical mentorship and guidance to other engineers • Diagnose and resolve complex production issues • Ensure solutions meet eClinical Solutions quality standards
Senior Site Reliability Engineer
ZscalerZscaler helps leading organizations in 180+ countries securely transform their networks and applications for a mobile and cloud-first world. Founded in 2008, th
• Expertly navigate networking principles, firewalls, and load balancing solutions to ensure robust infrastructure performance • Partner with Software Engineering and Infrastructure teams to design, implement, and deploy comprehensive end-to-end monitoring solutions • Execute seamless patches and upgrades, ensuring all administrative tools and utilities remain current and high-performing • Proactively monitor applications and services, participating in an on-call rotation to resolve issues and implement strategic prevention measures • Troubleshoot complex technical challenges and provide clear, candid communication regarding issues and their resolutions.
• Participate in on-call rotations as the primary technical lead. Act as the Incident Commander during major severity incidents: initiating war rooms, coordinating cross-functional teams, and providing clear status updates. • Instrument code to expose high-cardinality metrics and distributed traces. Collaboratively define, measure, and defend Service Level Objectives (SLOs) and Error Budgets with product owners. • Write high-quality, production-ready code (in Java, Go, or Python) to build internal tooling, automation platforms, and self-healing mechanisms that eliminate manual operator intervention. • Partner with Product Engineering teams during the design phase to ensure new services are built with reliability, scalability, and observability patterns (circuit breakers, rate limiting, backpressure, fallback strategies) from day one. • Analyze system performance and traffic patterns to model future capacity needs. Conduct load testing and chaos engineering experiments to verify system resilience under failure conditions.




