Job Closed

This listing is no longer active.

Confluent

Set data in motion.

Staff Site Reliability Engineer – Incident Management & Reliability

DevOps EngineerDevOps EngineerFull Time Remote LeadTeam 1,001-5,000Since 2014H1B SponsorCompany Site LinkedIn

Location

Canada

Posted

128 days ago

Salary

CA$225.1K - CA$264.5K / year

Seniority

Lead

Bachelor Degree10 yrs expEnglishAWS Azure Cloud Distributed Systems Google Cloud Platform Kafka Kubernetes

Job Description

• Analyze systemic failure patterns and design reliability improvements that prevent incident recurrence • Own Rootly configuration, workflows, and integrations with PagerDuty, Jira, Confluence, and Slack • Define and maintain SLO/SLA frameworks; use error budgets to guide reliability investments • Own standards, practices, and continuous improvement of incident response across engineering • Edit and review customer-facing incident documents (CRCAs) to ensure quality and clarity • Develop and deliver training programs; coach teams through post-mortems • Partner with engineering leaders to elevate reliability practices org-wide

Job Requirements

10+ years of relevant experience in SRE, incident management, or reliability engineering
Cloud experience with at least one of AWS, GCP, or Azure (we run all three)
Experience navigating reliability/incident programs at 500+ engineer organizations
Deep expertise with incident management tooling (Rootly, PagerDuty, or similar)
Strong understanding of distributed systems and failure modes at scale
Deep experience with observability: metrics, logging, tracing
Kubernetes and container orchestration experience
Understanding of CI/CD pipelines and release processes
Strong written communication (design docs, runbooks, post-mortems)
Experience driving org-wide process and cultural changes
Kafka/event streaming expertise preferred, or demonstrated rapid mastery of complex systems

Benefits

Offers Equity

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Build and Release Engineer

HeadSpin

Optimize Digital Experiences‍ with Data Science Capabilities

DevOps Engineer128 days ago

Full Time RemoteTeam 201-500H1B Sponsor

Company Site LinkedIn

• Responsible for maintaining and improving deployment pipeline • Ensuring that software can be deployed smoothly to cloud infrastructure and on-premise environments • Collaborating with engineering team to track dependencies and apply upgrades • Creating and maintaining documentation for the release process • Coordinating with stakeholders to ensure smooth release process • Setting up and configuring CI/CD pipelines to automate build and deployment process

Docker Linux MySQL PostgreSQL Python Unix

View details: Build and Release Engineer

India

Apply

Job Closed

Site Reliability Engineer

Dropbox

Dropbox is the one place to keep life organized and keep work moving.

DevOps Engineer128 days ago

Full Time RemoteTeam 1,001-5,000Since 2007H1B Sponsor

Company Site LinkedIn

• Ensure the reliability, scalability, and performance of Dropbox's infrastructure and services • Collaborate with cross-functional teams to develop and maintain best practices for monitoring, logging, and incident response • Build, Implement and maintain automations & infrastructure-as-code tooling, specifically Terraform, Ansible, and Github Actions as well as custom code platforms • Utilize container orchestration platforms, such as Kubernetes, Amazon ECS and Red Hat Openshift, to manage containers at scale • Manage and optimize monitoring and logging pipelines using tools like Datadog and Cribl LogStream • Drive improvement projects related to service health and visibility for our stakeholders, ranging from developers to business service owners to C-level • Develop and maintain custom tooling and automation scripts in Bash, Python and other scripting languages

Ansible AWS Chef DNS Docker Amazon EC2 Kubernetes Linux OpenShift Python Terraform

View details: Site Reliability Engineer

Mexico

Apply

Senior Data DevOps Engineer

Digitanity

Bridging your culture with international talent

DevOps Engineer128 days ago

Contract RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Deploy, maintain, and optimize cloud-based data infrastructure on AWS • Own CI/CD pipelines, infrastructure automation, and monitoring • Ensure platform stability, observability, and scalability • Support the transition from single-client to multi-client architecture • Work closely with the founder and data engineers to move fast and safely

AWS Docker Grafana Kubernetes Linux Prometheus Terraform

View details: Senior Data DevOps Engineer

Bulgaria

Apply

DevSecOps Site Reliability Engineer

System Automation Corporation

Bringing innovative solutions to our regulatory communities. FOLLOW us to be connected to the Evoke Network.

DevOps Engineer128 days ago

Other RemoteTeam 51-200Since 1973H1B No Sponsor

Company Site LinkedIn

• Design and evolve Azure platform infrastructure with a focus on scalability, reliability, and growth readiness. • Participate in capacity planning to support growth, peak demand, and seasonal usage patterns. • Integrate with development resources to implement infrastructure-as-code (e.g., Bicep). • Troubleshoot production infrastructure issues and lead incident response efforts, including coordination, escalation, and real-time remediation across teams. • Conduct post-incident reviews (postmortems) focused on root cause analysis, corrective actions, and long-term reliability improvements rather than blame. • Monitor and operate production systems using Azure Monitor, Application Insights, Sentinel, and related observability tooling. • Improve system reliability and performance through alerting, error monitoring, SLOs/SLAs, and analysis of performance and capacity trends. • Collaborate with security analyst to define and implement security controls across Azure resources and pipelines. • Manage secrets, certificates, and identity integrations. • Automate security posture checks in CI/CD pipelines. • Maintain policy-as-code using Azure Blueprints or Defender for Cloud Compliance & Audit Support. • Support SOC 2 Type II compliance through tooling, automation, and audit readiness. • Respond to evidence requests and generate reports from observability and security systems. • Contribute to the documentation of platform controls and best practices. • Support, maintain, and own CI/CD pipelines (GitHub Actions, Azure DevOps, or equivalent). • Optimize build, test, and release flows, partnering with engineers to diagnose failures and improve deployment reliability. • Define and maintain consistent environment standards across development, staging, and production to ensure deployment safety, reliability, and compliance. • Partner with engineering teams to improve deployment promotion strategies, rollback mechanisms, and release safety practices.

Azure Python

View details: DevSecOps Site Reliability Engineer

United States

$130K - $150K / year

Apply

Job Closed

Staff Site Reliability Engineer – Incident Management & Reliability

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Build and Release Engineer

Site Reliability Engineer

Senior Data DevOps Engineer

DevSecOps Site Reliability Engineer