Alteryx logo
Alteryx

Started in 2010, Alteryx provides companies of all sizes an end-to-end analytics platform that searches by utilizing data science and analytics to offer busines

Lead Site Reliability Engineer

Location

United States

Posted

65 days ago

Salary

$136K - $177K / year

Seniority

Senior

Job Description

Lead Site Reliability Engineer

Alteryx

• Define and drive reliability strategy across control-plane and data-plane systems, including multi-region resilience, BCDR, and failover design • Establish and operationalize SLOs, SLAs, and error budgets, ensuring they inform planning and engineering tradeoffs • Lead initiatives that measurably improve MTTR, incident prevention, and overall service health • Own incident management end-to-end, driving systemic fixes and long-term reliability improvements beyond immediate response • Lead architecture and design reviews to ensure systems meet scalability, reliability, and cost efficiency goals • Champion automation and modernization, including AI-driven reliability improvements • Establish and enforce code quality and review standards • Lead cross-functional initiatives and align engineering with product priorities • Mentor senior engineers and act as a technical leader across teams

Job Requirements

  • 6+ years leading delivery of complex, distributed systems or SaaS platforms
  • Strong experience with multi-region, split-plane architectures (control-plane / data-plane)
  • Proven track record improving SLOs, MTTR, and system reliability at scale
  • Proficiency in languages like Python, Java, C++, or JavaScript
  • Deep experience with:
  • Kubernetes (multi-cluster), CI/CD, and GitOps (ArgoCD)
  • SLO/SLA design, observability, and incident management
  • Infrastructure as Code and cloud platforms
  • Disaster recovery, resilience, and security best practices
  • Strong leadership skills with experience mentoring senior engineers and influencing cross-team decisions
  • Nice to Have
  • Experience with chaos engineering and large-scale reliability automation
  • Background in enterprise SaaS platforms or split-plane architectures
  • Expertise in navigating, understanding and leveraging modern Observability platforms (Datadog, Grafana, etc)

Benefits

  • bonus or commission
  • medical
  • retirement
  • financial
  • wellness
  • time off
  • employee discounts

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Ensono logo

Site Reliability Engineer

Ensono

Ensono delivers complete Hybrid IT solutions, from mainframe to cloud, tailored to each client’s journey.

DevOps Engineer65 days ago
Full TimeRemoteTeam 1,001-5,000H1B Sponsor

• We are seeking an experienced Site Reliability Engineer (SRE) with expertise in Infrastructure as Code tools like Terraform, core CI/CD tools such as Azure DevOps, and monitoring tools including DataDog and AWS CloudWatch. • Strong leadership in client-facing discussions and engagement with third-party suppliers is essential. • Troubleshooting issues and identifying systemic failings indicated by incidents/failures. • Implementing fixes. • Proposing solutions for reducing toil. • Providing leadership in the Incident resolution process, including creating and maintaining documentation, and providing key input to Post-mortem analysis. • Improving Service Requests and Change Management processes, both technically and through stakeholder management. • Participate in the process for, and Proactively mitigate risks in a Security management process (Vulnerabilities in Code, Infrastructure, Dependencies).

India
Full TimeRemoteTeam 1,001-5,000Since 1994H1B No Sponsor

• Trabajar en proyectos de observabilidad y monitorización • Colaborar en la definición y gestión de SLO, SLI y SLA • Integrar observabilidad con procesos IT • Participar en actividades de formación y certificación

Spain
Job Closed
Netguru logo

Java Developer (Freelance)

Netguru

Netguru builds software that lets people do things differently.

DevOps Engineer65 days ago
ContractRemoteTeam 501-1,000Since 2008H1B No Sponsor

We are looking for a Mid-level Java Developer to join our client’s engineering team and support the development of a SaaS platform used by enterprise customers. This role requires working in close collaboration with both Netguru and our client’s teams, acting as a fully integrated contributor within the client's software organization. This is a B2B contract position. Hourly rate: up to 30 EUR/h (net, invoiced). Working hours: 14:30 to 22:30 CET (±30 minutes). Responsibilities - Perform development work as part of the client’s software engineering teams - Support the client’s developers in daily tasks and project work - Provide assistance to the client’s customers during US working hours - Monitor and resolve critical issues that arise outside of standard European hours - Participate in regular team meetings and virtual team events - Integrate with the client’s wider developer community - Contribute to the ongoing development of the client’s SaaS solution, including: - Integrating with customer ERP systems to retrieve data - Analyzing data and delivering insights back to customers - Building and maintaining features within the SaaS application

Poland
€30 / year
Backblaze logo

Director of Site Reliability Engineering – SRE

Backblaze

Backblaze is the cloud storage innovator delivering a modern alternative to traditional cloud providers.

DevOps Engineer65 days ago
Full TimeRemoteTeam 201-500Since 2007H1B Sponsor

• As the Director of SRE, you will be directly accountable for Backblaze’s production infrastructure and performance against key SLOs. • You will lead and mentor our global Sr. SRE, and SRE Level 1 services teams, ensuring operational excellence. • Additionally, you will share responsibility for managing demand forecasts and making strategic decisions regarding infrastructure expansion. • You will also oversee the budget for all operational tooling and observability. • Lead a globally distributed team of 15+ highly technical teammates. • Provide 24/7 services for SRE. • Own the single source of truth for the state of production. • Centrally manage all aspects of incident and change management. • Maintain a culture of continuous improvement, leveraging operational data to prioritize work across teams. • Be customer focused, and have a strong bias to action therein. • Collaborate closely with Customer Support to provide seamless world-class support. • Collaborate with Supply Chain to manage proper levels of inventory. • Lead & coordinate strategic initiatives to evolve and improve production support, incident/change/asset management. • Liaise with Vendor Management and Legal to manage critical contract renewal cycles. • Establish department-level objectives, policies, and procedures, creating OKRs or other measurements as applicable. • Recruit & coach the team to support Backblaze and individual career objectives. • Build strong cross-functional relationships, most notably with Infrastructure Engineering, Customer Support, and Data Center Operations. • Manage department budget. • Be an engaged, visible, and admired leader.

United States
Job Closed