Started in 2010, Alteryx provides companies of all sizes an end-to-end analytics platform that searches by utilizing data science and analytics to offer busines

Lead Site Reliability Engineer

DevOps EngineerDevOps EngineerFull Time Remote Senior Company Site

Location

United States

Posted

65 days ago

Salary

$136K - $177K / year

Seniority

Senior

6 yrs expEnglishCloud Distributed Systems Grafana Java JavaScript Kubernetes Python

Job Description

• Define and drive reliability strategy across control-plane and data-plane systems, including multi-region resilience, BCDR, and failover design • Establish and operationalize SLOs, SLAs, and error budgets, ensuring they inform planning and engineering tradeoffs • Lead initiatives that measurably improve MTTR, incident prevention, and overall service health • Own incident management end-to-end, driving systemic fixes and long-term reliability improvements beyond immediate response • Lead architecture and design reviews to ensure systems meet scalability, reliability, and cost efficiency goals • Champion automation and modernization, including AI-driven reliability improvements • Establish and enforce code quality and review standards • Lead cross-functional initiatives and align engineering with product priorities • Mentor senior engineers and act as a technical leader across teams

Job Requirements

6+ years leading delivery of complex, distributed systems or SaaS platforms
Strong experience with multi-region, split-plane architectures (control-plane / data-plane)
Proven track record improving SLOs, MTTR, and system reliability at scale
Proficiency in languages like Python, Java, C++, or JavaScript
Deep experience with:
Kubernetes (multi-cluster), CI/CD, and GitOps (ArgoCD)
SLO/SLA design, observability, and incident management
Infrastructure as Code and cloud platforms
Disaster recovery, resilience, and security best practices
Strong leadership skills with experience mentoring senior engineers and influencing cross-team decisions
Nice to Have
Experience with chaos engineering and large-scale reliability automation
Background in enterprise SaaS platforms or split-plane architectures
Expertise in navigating, understanding and leveraging modern Observability platforms (Datadog, Grafana, etc)

Benefits

bonus or commission
medical
retirement
financial
wellness
time off
employee discounts

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Site Reliability Engineer

Ensono

Ensono delivers complete Hybrid IT solutions, from mainframe to cloud, tailored to each client’s journey.

DevOps Engineer65 days ago

Full Time RemoteTeam 1,001-5,000H1B Sponsor

Company Site LinkedIn

• We are seeking an experienced Site Reliability Engineer (SRE) with expertise in Infrastructure as Code tools like Terraform, core CI/CD tools such as Azure DevOps, and monitoring tools including DataDog and AWS CloudWatch. • Strong leadership in client-facing discussions and engagement with third-party suppliers is essential. • Troubleshooting issues and identifying systemic failings indicated by incidents/failures. • Implementing fixes. • Proposing solutions for reducing toil. • Providing leadership in the Incident resolution process, including creating and maintaining documentation, and providing key input to Post-mortem analysis. • Improving Service Requests and Change Management processes, both technically and through stakeholder management. • Participate in the process for, and Proactively mitigate risks in a Security management process (Vulnerabilities in Code, Infrastructure, Dependencies).

AWS Azure Cloud Google Cloud Platform Kubernetes Terraform

View details: Site Reliability Engineer

India

Apply

Senior DevOps/SRE Engineer – Monitoring Expert

knowmad mood

growing together

DevOps Engineer65 days ago

Full Time RemoteTeam 1,001-5,000Since 1994H1B No Sponsor

Company Site LinkedIn

• Trabajar en proyectos de observabilidad y monitorización • Colaborar en la definición y gestión de SLO, SLI y SLA • Integrar observabilidad con procesos IT • Participar en actividades de formación y certificación

Cloud ITSM

View details: Senior DevOps/SRE Engineer – Monitoring Expert

Spain

Apply

Job Closed

Java Developer (Freelance)

Netguru

Netguru builds software that lets people do things differently.

DevOps Engineer65 days ago

Contract RemoteTeam 501-1,000Since 2008H1B No Sponsor

Company Site LinkedIn

We are looking for a Mid-level Java Developer to join our client’s engineering team and support the development of a SaaS platform used by enterprise customers. This role requires working in close collaboration with both Netguru and our client’s teams, acting as a fully integrated contributor within the client's software organization. This is a B2B contract position. Hourly rate: up to 30 EUR/h (net, invoiced). Working hours: 14:30 to 22:30 CET (±30 minutes). Responsibilities - Perform development work as part of the client’s software engineering teams - Support the client’s developers in daily tasks and project work - Provide assistance to the client’s customers during US working hours - Monitor and resolve critical issues that arise outside of standard European hours - Participate in regular team meetings and virtual team events - Integrate with the client’s wider developer community - Contribute to the ongoing development of the client’s SaaS solution, including: - Integrating with customer ERP systems to retrieve data - Analyzing data and delivering insights back to customers - Building and maintaining features within the SaaS application

View details: Java Developer (Freelance)

Poland

€30 / year

Apply

Director of Site Reliability Engineering – SRE

Backblaze

Backblaze is the cloud storage innovator delivering a modern alternative to traditional cloud providers.

DevOps Engineer65 days ago

Full Time RemoteTeam 201-500Since 2007H1B Sponsor

Company Site LinkedIn

• As the Director of SRE, you will be directly accountable for Backblaze’s production infrastructure and performance against key SLOs. • You will lead and mentor our global Sr. SRE, and SRE Level 1 services teams, ensuring operational excellence. • Additionally, you will share responsibility for managing demand forecasts and making strategic decisions regarding infrastructure expansion. • You will also oversee the budget for all operational tooling and observability. • Lead a globally distributed team of 15+ highly technical teammates. • Provide 24/7 services for SRE. • Own the single source of truth for the state of production. • Centrally manage all aspects of incident and change management. • Maintain a culture of continuous improvement, leveraging operational data to prioritize work across teams. • Be customer focused, and have a strong bias to action therein. • Collaborate closely with Customer Support to provide seamless world-class support. • Collaborate with Supply Chain to manage proper levels of inventory. • Lead & coordinate strategic initiatives to evolve and improve production support, incident/change/asset management. • Liaise with Vendor Management and Legal to manage critical contract renewal cycles. • Establish department-level objectives, policies, and procedures, creating OKRs or other measurements as applicable. • Recruit & coach the team to support Backblaze and individual career objectives. • Build strong cross-functional relationships, most notably with Infrastructure Engineering, Customer Support, and Data Center Operations. • Manage department budget. • Be an engaged, visible, and admired leader.

Cloud

View details: Director of Site Reliability Engineering – SRE

United States

Apply

Job Closed

Lead Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Site Reliability Engineer

Senior DevOps/SRE Engineer – Monitoring Expert

Java Developer (Freelance)

Director of Site Reliability Engineering – SRE