Real-time database for mobile, web, IoT, and server apps that can magically sync data with or even without the internet.
Site Reliability Engineer
Location
United States
Posted
180 days ago
Salary
$156K - $288K / year
Seniority
Senior
Job Description
Site Reliability Engineer
Ditto
• Develop and maintain observability solutions using platforms like Datadog, Prometheus and Grafana • Take a leading role in incident management, including coordinating response efforts, troubleshooting issues, and identifying follow-up actions • Partner with product engineering teams to architect reliable systems, recover from incidents, and learn from mistakes • Work with teams to implement and maintain SLOs, monitoring, and alerting strategies that ensure reliability at scale • Design and implement automation and support tooling to improve system resilience, maintain operational safety and reduce operational overhead • Lead the development and maintenance of runbooks, alert definitions, and incident response procedures • Participate in on-call rotations to provide 24/7 support for critical production systems
Job Requirements
- 4+ years of experience in Site Reliability Engineering or similar DevOps roles focused on system reliability and incident management
- 2+ years of hands-on experience architecting applications for Kubernetes, and managing Kubernetes infrastructure
- Strong experience with modern monitoring stacks including Prometheus, Grafana, and Datadog
- Experience in at least one systems programming language, such as Go, Rust, C, or Java
- Expertise with Infrastructure as Code tools, like Terraform and Helm
- Expertise with at least one major cloud service provider (AWS, GCP, Azure)
- Strong communication skills, with the ability to lead incident response and effectively collaborate across teams
- Willingness and experience engaging with on-call rotations and emergency response procedures
- A high degree of agency and bias towards action. Identify problems and work autonomously to solve them
- Excellent problem-solving skills and a methodical approach to troubleshooting complex issues
Benefits
- Health insurance
- Dental insurance
- Vision insurance
- Life insurance
- Disability insurance
- 401(k)
- Flexible spending accounts
- Flexible time off
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Junior Cloud - DevOps Engineer
ImplicitNo-Code Knowledge Engine for creating AI Knowledge Navigators.
• Assist with day-to-day AWS operations • Help deploy Kubernetes workloads (deployments, services, ingress, namespace organization) • Perform basic troubleshooting across services and AWS resources • Document technical processes • Participate in review sessions
• Design, build, and maintain scalable infrastructure automation pipelines using Jenkins and scripted pipelines in Groovy. • Automate infrastructure provisioning, configuration, patching, deployment, and more with Ansible and Terraform • Develop tooling and automation scripts in Groovy, Ansible, Bash, Python • Operate and troubleshoot Linux-based systems, with expertise in performance tuning, remote access, and system hardening. • Work hands-on with a range of AWS services: EC2, ECS, Lambda, CloudWatch, IAM, S3, SNS, and more. • Collaborate across Dev, QA, and IT teams to deliver outcomes driven by business requirements • Monitor and improve system availability, reliability, and security. • Maintain clear documentation for infrastructure, processes, and standards.
• Design, implement, and maintain scalable distributed systems & infrastructure • Build and manage infrastructure & core services for Honor’s Care Platform • Provide foundational building blocks for the engineering organization • Own build and deployment pipelines, maintain environments and supporting infrastructure • Consult to build the right abstractions for problem patterns
DevOps Engineer
decircleTalent Partner for decentralized organizations and projects that are building Web3.
• Design, implement, and manage scalable, secure, and reliable cloud infrastructure on AWS using Terraform and other Infrastructure-as-Code (IaC) tools. • Lead the development and evolution of our internal Platform-as-a-Service (PaaS) to support engineering teams and improve developer experience. • Build/maintain CI/CD pipelines to enable rapid, safe, and automated deployments. • Ensure high availability, performance, and cost efficiency of infrastructure through monitoring, tuning, and continuous optimization. • Apply cloud-native security best practices. • Manage containerized workloads using Docker/Kubernetes in prod environments. • Collaborate closely with engineering and product teams to integrate DevOps best practices into the software development lifecycle.



