Domino Data Lab

The Enterprise MLOps platform powering over 20% of the Fortune 100

Staff Site Reliability Engineer

DevOps EngineerDevOps EngineerFull Time Remote LeadTeam 201-500Since 2013H1B SponsorCompany Site LinkedIn

Location

California

Posted

4 days ago

Salary

$200K - $230K / year

Seniority

Lead

Postgraduate DegreeEnglishCloud Kubernetes Linux Python Go

Job Description

• Lead the development of Domino's internal AI-assisted reliability tooling, including systems that analyze tickets, logs, traces, and documentation to help teams resolve outages faster with less recurring toil • Improve the observability coverage and signal quality for our most critical customer-facing systems, so engineers have more to work with throughout the development and support lifecycle • Own incident response end-to-end, from detection to remediation, and leave each problem space better documented, better understood, and less likely to recur • Guide the development of customer and user-facing observability tools within our products • Define and mature SLO/SLI frameworks for priority services, turning abstract reliability goals into measurable, actionable standards • Scale cloud operations practices for Domino’s single-tenant SaaS offering, and work with engineering teams to improve the reliability and repeatability of customer deployments and upgrades • Mentor other engineers and shape how SRE is practiced at Domino, including incident response workflows, operational readiness expectations, and post-incident learning culture

Job Requirements

Deep experience in Site Reliability Engineering, platform engineering, or a software engineering role with genuine, hands-on operational ownership
Fluency with Kubernetes, Linux, cloud platforms, and observability tooling, and the ability to use them to investigate complex, real-world production problems
A strong ability to perceive and close reliability gaps in technical products, tools and processes
Strong software engineering skills in Python or Go, with a track record of building internal tools or services that people actually rely on
Comfort leading technically ambiguous work and influencing direction across teams without needing direct authority to get things done
A history of improving reliability through engineering and automation, not just putting out fires manually
Strong communication skills and real experience mentoring engineers or shaping technical decision-making on your team
Sound judgment about AI/LLM tooling: you know where it genuinely helps in operational workflows and where it adds noise instead of signal
Bonus: Experience with LLM-based systems, retrieval workflows, SaaS platform operations, or building tooling for support or developer teams

Benefits

equity
company bonus or sales commissions/bonuses
401(k) plan
medical, dental, and vision benefits
wellness stipends

Related Categories

DevOps Engineer

Related Job Pages

DevOps Engineer Jobs in California Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Senior Network Deployment Engineer

CodiLime

A strategic partner for technology-driven companies | Network engineering | Software engineering

DevOps Engineer4 days ago

Contract RemoteTeam 201-500Since 2011H1B No Sponsor

Company Site LinkedIn

• Leading design, architecture, and optimization for networking infrastructure & devices in Large scale DCs and/or Offices • Overseeing entire site deployment cycles from initial requirements to operational handover • Collaborating with cross-functional IT, security, and facility teams plus external vendors • Enforcing and maintaining robust technical documentation and architectural standards • Developing Python and/or Ansible scripts to drive network automation initiatives

Ansible Linux Python Switching

View details: Senior Network Deployment Engineer

Poland

zł22K - zł26K / month

Apply

Senior Network Deployment Engineer

CodiLime

A strategic partner for technology-driven companies | Network engineering | Software engineering

DevOps Engineer4 days ago

Contract RemoteTeam 201-500Since 2011H1B No Sponsor

Company Site LinkedIn

• Leading design, architecture, and optimization for networking infrastructure&devices in Large scale DCs and/or Offices • Overseeing entire site deployment cycles from initial requirements to operational handover • Collaborating with cross-functional IT, security, and facility teams plus external vendors • Enforcing and maintaining robust technical documentation and architectural standards • Developing Python and/or Ansible scripts to drive network automation initiatives

Ansible Linux Python Switching

View details: Senior Network Deployment Engineer

Brazil

Apply

Senior DevSecOps Engineer

Kaseya

Kaseya® is the leading provider of IT and security management solutions for managed service providers (MSPs) and SMBs.

DevOps Engineer4 days ago

Full Time RemoteTeam 1,001-5,000H1B Sponsor

Company Site LinkedIn

• Design and implement security controls across CI/CD pipelines, cloud infrastructure, and software development workflows • Integrate security testing tools, including SAST, DAST, dependency scanning, and vulnerability management solutions • Conduct threat modeling and risk assessments for applications, infrastructure, and platform services • Implement and maintain security controls for cloud environments, infrastructure-as-code, and containerized workloads • Develop automated security and compliance checks supporting regulatory and internal security requirements • Partner with Engineering, Infrastructure, and Security teams to implement secure development practices • Evaluate, implement, and optimize security tooling supporting application and infrastructure security • Mentor engineers on secure development practices and DevSecOps methodologies

AWS Azure Cloud Docker Kubernetes Python Terraform

View details: Senior DevSecOps Engineer

United Kingdom

Apply

Especialista de SRE

credsystem

Tornando novas conquistas possíveis.

DevOps Engineer4 days ago

Full Time RemoteTeam 201-500Since 1996H1B No Sponsor

Company Site LinkedIn

• Definição da Infraestrutura dos produtos seguindo as definições da arquitetura; • Resiliência do ambiente; • Alinhamento e controle dos SLIs, SLAs e SLOs; • Troubleshooting da infraestrutura da aplicação (conhece, participa, propõe soluções); • Auxilia no Troubleshooting da aplicação, sob convite dos desenvolvedores; • Direciona soluções de monitoração, logs e automação; • Documentação da Infra dos produtos; • Participa e conhece a capacidade e custo da Infraestrutura; • Análise de tendências das aplicações; • Direciona novas soluções no produto; • Participa de POCs e testes de novas soluções; • IAC: Infraestrutura como código; • Implantar/Criar infraestrutura em nuvem (Azure, OCI, AWS e GCP); • Solicitar e acompanhar requisições aos times de infraestrutura onpremises.

Ansible AWS Azure Cloud Docker Google Cloud Platform Grafana Kafka Kubernetes Linux Prometheus Terraform

View details: Especialista de SRE

Brazil

Apply

Staff Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior Network Deployment Engineer

Senior Network Deployment Engineer

Senior DevSecOps Engineer

Especialista de SRE