Mozilla logo
Mozilla

The Mozilla Corporation was founded in 2005 as a taxable, wholly-owned subsidiary of the Mozilla Foundation, which launched in 2003. The corporation serves the

Senior Site Reliability Engineer

Location

California

Posted

4 days ago

Salary

$123K - $144K / year

Seniority

Senior

Bachelor Degree7 yrs expEnglishAWSGrafanaKubernetesTerraform

Job Description

Senior Site Reliability Engineer

Mozilla

• Operate and evolve our EKS-based Kubernetes platform, supporting service migrations, platform improvements, and reliability initiatives. • Design and develop CI/CD systems supporting websites, services, and Thunderbird desktop releases, contributing to pipeline reliability and OIDC-based authentication across GitHub Actions workflows. • Write and maintain infrastructure in Pulumi and/or Terraform/OpenTofu across multiple AWS accounts. • Operate and evolve our observability stack (VictoriaMetrics, VictoriaLogs, Grafana, Vector) and partner with engineering teams to incorporate instrumentation and monitoring into service design. • Apply security-conscious infrastructure practices, including least-privilege IAM, secrets management via AWS Secrets Manager and External Secrets Operator, and network segmentation. • Diagnose and debug production incidents; drive root-cause analysis and post-incident improvements to prevent recurring problems. • Participate in on-call rotation and collaborate with SDEs and fellow SREs to ship, maintain, and monitor new builds and support service onboarding. • Contribute to runbooks, architecture documentation, and team processes.

Job Requirements

  • 7+ years of experience in infrastructure, platform engineering, or site reliability roles, including hands-on production Kubernetes experience in workload operations, troubleshooting, and cluster management.
  • Hands-on experience with infrastructure-as-code on AWS using Terraform, OpenTofu, or Pulumi.
  • Security awareness in day-to-day infrastructure work: identity, least privilege, secrets hygiene, and network controls.
  • Demonstrated ownership mindset with the ability to proactively identify issues, drive work to completion, and communicate risks early.
  • Excellent async written communication skills; comfortable working with a geographically distributed team.
  • Ability to collaborate effectively with software engineers and non-engineering stakeholders to improve platform reliability and operational efficiency.
  • Ability to learn, evaluate, and responsibly use emerging technologies, including AI-enabled tools, to improve work processes.

Benefits

  • Fully remote work & schedule flexibility
  • Company-provided laptop
  • Annual bonus program
  • Monthly remote work stipend
  • Annual professional development stipend
  • Industry conferences
  • Company all-hands and team gatherings
  • 24 days PTO per year (prorated)
  • Your birthday
  • Year-end company shutdown
  • 9 wellbeing days
  • Public holidays
  • Other paid leave
  • Quarterly wellbeing stipend for personal / family activities
  • 401(k) / RRSP contributions
  • Health, dental, & vision insurance
  • Disability insurance
  • Life insurance
  • Employee assistance program
  • Paid parental leave
  • Paid sick days

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Mozilla logo

Senior Site Reliability Engineer

Mozilla

The Mozilla Corporation was founded in 2005 as a taxable, wholly-owned subsidiary of the Mozilla Foundation, which launched in 2003. The corporation serves the

DevOps Engineer4 days ago

• Operate and evolve our EKS-based Kubernetes platform • Design and develop CI/CD systems supporting websites, services, and Thunderbird desktop releases • Write and maintain infrastructure in Pulumi and/or Terraform/OpenTofu across multiple AWS accounts • Operate and evolve our observability stack and partner with engineering teams to incorporate instrumentation and monitoring into service design • Apply security-conscious infrastructure practices • Diagnose and debug production incidents and drive root-cause analysis • Participate in on-call rotation and collaborate with SDEs and fellow SREs • Contribute to runbooks, architecture documentation, and team processes

United States
$123K - $144K / year
A:20Labs logo

Lead DevOps Engineer

A:20Labs

Empowering Artists with Smart Systems

DevOps Engineer4 days ago
Full TimeRemoteTeam 1-10Since 2023H1B No Sponsor

• Oversee the design, implementation, and maintenance of cloud infrastructure across AWS and Azure • Lead and manage a team of DevOps engineers — assigning tasks and ensuring best practices in deployment, monitoring, and security • Define and enforce CI/CD processes and infrastructure automation standards using Terraform • Own the implementation and governance of Landing Zones in both AWS and Azure • Ensure HIPAA compliance and security policies are followed across all environments • Drive adoption of observability tools like DataDog and establish logging standards • Coordinate incident response and root cause analysis for infrastructure issues • Build SLA frameworks for critical services and define strategies for auto-scaling, failure recovery, and disaster recovery • Collaborate with stakeholders to align DevOps strategies with business needs • Review and approve technical documentation produced by the team

Illinois
Ant-Tech logo

Site Reliability Engineer, Fintech

Ant-Tech

Ant-tech is a reputable headhunter agency in France, specializing in providing high-quality recruitment services for companies across various industries. With a team of experienced professionals and an extensive network of partners, Ant-tech connects talented candidates with organizations in need, particularly in the technology, finance, and other sectors. Committed to delivering optimal recruitment solutions, Ant-tech focuses not only on finding the right talent but also ensuring long-term and sustainable growth for both candidates and partner companies.

DevOps Engineer4 days ago
Full TimeRemoteTeam 11-50Since 2016H1B No Sponsor

• Build and enhance automated provisioning for servers and network infrastructure across physical environments and cloud platforms (AWS, GCP). • Improve and evolve CI/CD pipelines for infrastructure provisioning and software deployment. • Develop and maintain infrastructure automation using tools such as Ansible. • Support and manage server, network, and platform reliability across the organisation. • Work closely with hardware vendors, telecom providers, and third-party service providers. • Coordinate procurement, deployment, and lifecycle management of infrastructure hardware. • Contribute to an engineering culture focused on automation, reliability, and continuous improvement.

United Kingdom
£90K - £110K / year
Job Closed
Full TimeRemoteTeam 11-50Since 2007H1B No Sponsor

• Mitverantwortung für die Systemverfügbarkeit: Du trägst aktiv zur Verfügbarkeit, Zuverlässigkeit und Effizienz unserer komplexen Systemarchitektur bei, die aus etwa 70 Servern bei Hetzner besteht. • Wartung und Automatisierung: Du unterstützt die Wartung und Automatisierung unserer bestehenden Infrastruktur, die auf Technologien wie Ubuntu, Percona MySQL Cluster, MinIO, Elasticsearch, Redis, NGINX, HAProxy, TiDB, Clickhouse und Kubernetes basiert. Dabei bringst du deine Ideen zur Optimierung ein. • Monitoring und Analyse: Du verbesserst unsere Monitoring-Strategien und führst umfassende Fehleranalysen durch. • Hohe Verfügbarkeit: Du bist bereit, in Ausnahmefällen auch nachts aufstehen zu müssen, um sicherzustellen, dass unsere Systeme reibungslos laufen. • Software Entwicklung: Mehrjährige Erfahrung in einer oder mehreren Programmiersprachen (z. B. Rust, Java, Go, Typescript) ist notwendig.

Germany