Job Closed

This listing is no longer active.

Stairwell

Staff Software Reliability Engineer

DevOps EngineerDevOps EngineerOther Remote LeadTeam 11-50

Location

United States + 1 more

Posted

128 days ago

Salary

$195K - $245K / year

Seniority

Lead

No structured requirement data.

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description We're looking for a Staff SRE who can own the reliability, scalability, and operational excellence of our platform. You'll work at the intersection of infrastructure and software engineering - building the systems, tooling, and practices that let our team ship confidently and operate at scale. - Set technical direction for infrastructure and reliability - evaluate approaches, make architectural decisions, and establish standards. - Own and evolve our Kubernetes-based infrastructure on GCP. - Build and maintain CI/CD pipelines, deployment tooling, and release processes. - Maintain and simplify our build system (Bazel) for faster, more reliable builds across the org. - Define and instrument SLIs/SLOs; build dashboards and alerting that surface real problems. - Drive incident response, post-mortems, and reliability improvements. - Partner with product engineers to design systems that are reliable and operable from day one. - Contribute to our engineering culture around AI-augmented development - sharing patterns, workflows, and lessons learned. Qualifications - Significant experience in SRE, platform engineering, or infrastructure roles at scale. - Demonstrated technical leadership: you've driven significant infrastructure or reliability initiatives, not just executed on them. - Deep hands-on expertise with Kubernetes (GKE preferred) and GCP services. - Strong programming skills - Go preferred. - Experience with build systems (Bazel strongly preferred) and CI/CD tooling. - Practical experience with AI coding assistants as part of your regular workflow - not just experimentation, but daily use. - Ability to critically evaluate AI-generated code and infrastructure configs: you know when to trust it, when to revise it, and when to write it yourself. - Track record of improving reliability through automation, observability, and good engineering practices. - Comfort with ambiguity and ownership; we're a small team where engineers drive decisions. Nice to Have - Background in security, malware analysis, or threat detection. - Experience with large-scale data systems (BigTable, Spanner, BigQuery). - Deep proficiency in Go. Benefits - Hard technical problems with real security impact. - Small team, huge impact, high autonomy, low process overhead. - Opportunity to collaborate with world-class experts in cybersecurity. - Work remotely in the USA or Canada, or use our co-working space in Santa Clara to collaborate with teammates in-person.

Job Requirements

Significant experience in SRE, platform engineering, or infrastructure roles at scale.
Demonstrated technical leadership: you've driven significant infrastructure or reliability initiatives, not just executed on them.
Deep hands-on expertise with Kubernetes (GKE preferred) and GCP services.
Strong programming skills - Go preferred.
Experience with build systems (Bazel strongly preferred) and CI/CD tooling.
Practical experience with AI coding assistants as part of your regular workflow - not just experimentation, but daily use.
Ability to critically evaluate AI-generated code and infrastructure configs: you know when to trust it, when to revise it, and when to write it yourself.
Track record of improving reliability through automation, observability, and good engineering practices.
Comfort with ambiguity and ownership; we're a small team where engineers drive decisions.
Nice to Have
Background in security, malware analysis, or threat detection.
Experience with large-scale data systems (BigTable, Spanner, BigQuery).
Deep proficiency in Go.

Benefits

Hard technical problems with real security impact.
Small team, huge impact, high autonomy, low process overhead.
Opportunity to collaborate with world-class experts in cybersecurity.
Work remotely in the USA or Canada, or use our co-working space in Santa Clara to collaborate with teammates in-person.

Related Categories

DevOps Engineer

Related Job Pages

More Remote Jobs

More DevOps Engineer Jobs

Senior Site Reliability Engineer, Azure Red Hat OpenShift

Red Hat

The leading provider of enterprise open source solutions.

DevOps Engineer128 days ago

Other RemoteTeam 10,001+Since 1993H1B Sponsor

Company Site LinkedIn

• Contribute code to increase the scalability and reliability of the service • Contribute software tests and participate in peer review to increase the quality of our codebase • Help and develop peers’ capabilities through knowledge sharing, mentoring, and collaboration • Participate in a regular on-call schedule, including occasional paid weekends and holidays • Practice sustainable incident response and blameless postmortems • Resolve customer issues escalated from the Red Hat Global Support team • Work within a small agile team to develop and improve SRE software, support your peers, plan and self-improve • Explore and experiment with emerging AI technologies relevant to software development, proactively identifying opportunities to incorporate new AI capabilities into existing workflows and tooling.

Ansible AWS Azure Chef DNS Java Linux Prometheus Puppet Python TCP/IP

View details: Senior Site Reliability Engineer, Azure Red Hat OpenShift

California + 1 more

$139.6K - $230.2K / year

Apply

Job Closed

Senior Site Reliability Engineer – m/f/d

Famedly GmbH

Famedly is a complete medical collaboration platform delivered as a single decentralized application.

DevOps Engineer128 days ago

Full Time RemoteTeam 11-50Since 2019H1B No Sponsor

Company Site LinkedIn

• Responsibility for the reliability, observability, and performance of backend systems • Design and implement SRE practices • Maintain infrastructure as code • Work closely with development teams • Automate incident detection and remediation • Contribute to architecture and roadmap

Kubernetes

View details: Senior Site Reliability Engineer – m/f/d

Germany

€60K - €70K / year

Apply

Staff Software Engineer I – SRE

Confluent

Set data in motion.

DevOps Engineer128 days ago

Full Time RemoteTeam 1,001-5,000Since 2014H1B Sponsor

Company Site LinkedIn

• Analyze systemic failure patterns and design improvements that prevent incident recurrence • Define and maintain SLO/SLA frameworks; use error budgets to guide reliability investments • Build tooling and automation to reduce incident response toil and scale team impact • Own Rootly configuration, workflows, and integrations with PagerDuty, Jira, Confluence, and Slack • Analyze reliability data to identify systemic improvements; build dashboards that drive action • Explore AI-assisted approaches to documentation quality and incident analysis • Design scalable reliability standards that reduce reactive workload over time. • Own standards, practices, and continuous improvement of incident response • Define incident commander eligibility criteria and manage the rotation • Available as escalation IC when incidents exceed a team's management chain • Develop and deliver training programs for engineering teams at all levels • Coach teams through post-mortems and on developing actionable corrective actions • Edit and review customer-facing incident documents to ensure quality and clarity • Drive turnaround SLAs while maintaining technical accuracy • Ensure clear explanation of what happened, why, and how we'll prevent recurrence • Partner with engineering leaders to elevate reliability practices • Be the expert who teams proactively engage for guidance

AWS Azure Distributed Systems GCP Apache Kafka Kubernetes

View details: Staff Software Engineer I – SRE

India

Apply

DevOps Engineer

Veradigm®

Driving value through its unique combination of platforms, data, expertise, connectivity, and scale.

DevOps Engineer128 days ago

Full Time RemoteTeam 1,001-5,000H1B No Sponsor

Company Site LinkedIn

• Veradigm is expanding its DevOps Engineering team and is seeking a highly skilled and enthusiastic DevOps Engineer to support and evolve our platforms and systems. • This role is critical to the success of our VEHR/VPM/VIE products and will be responsible for building and deploying solutions and services in On-premises and Hosted environment. • Simultaneously, it will also support Azure environments used by the Dev/QA teams. • Knowledge of secure DevOps practices (secrets management, compliance, scanning tools). • Exposure and understanding of container technologies like Docker and/or Kubernetes. • Experience with Configuration Management tools (e.g., Ansible, Chef, etc.) is a plus. • Able to work with developers supporting both modern and legacy applications. • Comfortable with CI/CD, including debugging build failures and deployment issues. • Self-driven and motivated, with the ability to work independently and prioritize tasks effectively. • Strong communication and interpersonal skills, with the ability to collaborate and communicate effectively with cross-functional teams. • Excellent troubleshooting and problem-solving skills, with keen attention to detail. • Excellent documentation skills.

Ansible Azure Chef Delphi Docker Grafana Kubernetes Prometheus Python SQL Terraform HashiCorp Vault

View details: DevOps Engineer

India

Apply

Job Closed

Staff Software Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior Site Reliability Engineer, Azure Red Hat OpenShift

Senior Site Reliability Engineer – m/f/d

Staff Software Engineer I – SRE

DevOps Engineer