Job Closed

This listing is no longer active.

Censys

The Leader in Attack Surface Management & Cloud Security

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 51-200Since 2017H1B SponsorCompany Site LinkedIn

Location

United States

Posted

102 days ago

Salary

$145K - $190K / year

Seniority

Senior

Bachelor Degree5 yrs expEnglishCloud Google Cloud Platform Kubernetes Terraform

Job Description

• Build and maintain tooling to support applications in Kubernetes and Google Cloud Platform. • Work with development teams to build, ship, and deploy services. • Ensure smooth operations of production environments. • Create a self-service platform to accelerate developer velocity. • Participate in shared on-call rotation schedule.

Job Requirements

5+ years of experience in an SRE role or similar.
Experience deploying, managing, and debugging applications in a Kubernetes environment.
Experience building, securing, and managing container images.
Experience working with Cloud-based environments.
Familiarity with Infrastructure-as-code Tools, such as Terraform, Crossplane, or similar.
Experience with tools to monitor the 4 golden signals.
Familiarity with a monorepo, trunk-based development model with CI/CD.
Ability to communicate and support developers with empathy.

Benefits

401k match
health
vision
dental
more!

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Fullstack Developer, DevOps – Laravel TALL Stack

Placing-Me

Mitarbeiter:innen in der Augenoptik oder Hörakustik gesucht? Placing-Me: Einfach. Schnell. Erfolgreich.

DevOps Engineer102 days ago

Full Time RemoteTeam 1-10Since 2017H1B No Sponsor

Company Site LinkedIn

• Take responsibility for the technical future of Placing‑Me • Rebuild the platform from scratch and have broad freedom to design it • Maintain and optimize the existing Laravel web application • Design and implement the system architecture (backend, frontend, admin panel) • Select and define the tech stack • Develop scalable, maintainable structures • Independently implement key features and core functionalities • Ensure performance, code quality and long‑term maintainability • Ensure and further develop the ongoing operation of the existing Laravel web app

Laravel PHP

View details: Fullstack Developer, DevOps – Laravel TALL Stack

Germany

€50K - €62K / year

Apply

Job Closed

Senior Site Reliability Engineer, Tenant Services – Geo

GitLab

Build software faster. The One DevOps Platform enables your entire org to collaborate around your code. We're hiring.

DevOps Engineer102 days ago

Full Time RemoteTeam 1,001-5,000Since 2014H1B No Sponsor

Company Site LinkedIn

• Execute Dedicated Geo migrations and cutovers end-to-end, including planning, pre-cutover validation, execution, and post-cutover verification and cleanup. • Join the team’s shift and weekend coverage rotation for Dedicated cutovers across EMEA and US hours, and participate in the SaaS Site Reliability Engineering (SRE) on-call rotation to respond to incidents that impact GitLab.com availability. • Operate and improve the Geo operational surface for Dedicated, including: • Environment preparation and data hygiene checks prior to migrations. • Execution of replication, validation, and cutover procedures. • Handling Geo-related escalations from Support and internal partners. • Design, build, and maintain automation, tooling, and runbooks that make migrations, cutovers, and Geo escalations as “boring” and repeatable as possible. • Run our infrastructure with tools such as Ansible, Chef, Terraform, GitLab CI/CD, and Kubernetes; contribute improvements back to GitLab’s product and infrastructure where appropriate. • Build and maintain monitoring, alerting, and dashboards that: • Detect symptoms early, not just outages. • Track migration and cutover success rates, duration, rollback frequency, and related SLOs. • Collaborate closely with: • The core Geo team on improving Geo features and operability. • Dedicated migrations and Support on migration planning, customer communications, and escalation handling. • Other Infrastructure teams on capacity planning, disaster recovery, and reliability improvements. • Contribute to readiness reviews, incident reviews, and root cause analyses, turning learnings into changes in automation, process, or product. • Document every action, including runbooks, architecture decisions, and post-incident reviews, so your findings turn into repeatable practices and automation. • Proactively identify and reduce toil by automating repetitive operational work and simplifying migration workflows.

Ansible AWS Chef Cloud Distributed Systems Google Cloud Platform Grafana Kubernetes Prometheus Python Ruby Terraform Go

View details: Senior Site Reliability Engineer, Tenant Services – Geo

India

Apply

Job Closed

Director, Site Reliability Engineer

Cision

DevOps Engineer102 days ago

Full Time RemoteTeam 1,001-5,000Since 2000H1B Sponsor

Company Site LinkedIn

• Provide strategic leadership and oversight for four SRE teams, setting clear direction, priorities, and expectations aligned to business and engineering objectives • Lead, mentor, and develop SRE managers and senior engineers, fostering a culture of accountability, operational ownership, innovation, and psychological safety • Define and own the SRE and Platform Engineering strategy and roadmap, ensuring alignment with cloud transformation initiatives and long-term organizational goals • Serve as a key voice in architectural and platform decisions, influencing designs with a focus on scalability, reliability, automation, and operational efficiency • Partner with executive leadership to communicate reliability posture, risks, and investment needs in clear business terms • Establish and continuously evolve SRE principles and best practices, including SLIs, SLOs, error budgets, toil management, and reliability-driven prioritization • Provide technical direction and governance across GCP (preferred) and AWS environments, ensuring consistent reliability and operational patterns • Drive the evolution of Platform Engineering, enabling self-service infrastructure and guard-railed service delivery for application teams • Own strategy and standards for Infrastructure-as-Code (IaC) and automation, leveraging tools such as Terraform or equivalent frameworks across cloud environments • Ensure observability excellence through metrics, logging, tracing, alerting, and proactive capacity and performance management • Provide executive leadership during large-scale or high-impact incidents, ensuring effective coordination, escalation, and stakeholder communication • Define, refine, and scale incident management and on-call practices, emphasizing resilience, sustainability, and rapid recovery • Champion blameless postmortems, ensuring root causes are addressed and learnings are translated into systemic improvements • Partner with Security and Compliance teams to ensure systems meet security, privacy, and regulatory requirements without compromising reliability • Own and report on reliability metrics, operational KPIs, and service health for leadership and executive stakeholders • Drive continuous improvement through reliability reviews, retrospectives, and data-driven decision-making • Balance reliability, velocity, and cost across platforms, applying error budgets and capacity planning to guide trade-offs

AWS Azure Cloud Google Cloud Platform Terraform

View details: Director, Site Reliability Engineer

Kentucky + 1 more

Apply

Job Closed

Software Engineering, DevOps AI Rater/Evaluator

LILT AI

Make anything multilingual. Translation, AI data set creation, and human expert evals. For businesses and governments.

DevOps Engineer102 days ago

Contract RemoteTeam 201-500Since 2015H1B No Sponsor

Company Site LinkedIn

• Evaluate AI outputs related to software engineering, DevOps, and infrastructure topics • Perform structured scoring, comparison, classification, and judgment tasks • Assess technical correctness, completeness, security implications, and best-practice alignment • Identify hallucinations, incorrect code, unsafe recommendations, or misleading system guidance • Apply domain-specific engineering and DevOps guidelines consistently across tasks • Validate and refine evaluation rubrics and edge-case handling • Perform adjudication where raters disagree • Conduct error analysis and qualitative reviews of model behavior • Partner with LILT research, product, and customer teams on evaluation design • Support red-teaming, security review, and model readiness assessments

Cloud Distributed Systems

View details: Software Engineering, DevOps AI Rater/Evaluator

United States

Apply

Job Closed

Senior Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Fullstack Developer, DevOps – Laravel TALL Stack

Senior Site Reliability Engineer, Tenant Services – Geo

Director, Site Reliability Engineer

Software Engineering, DevOps AI Rater/Evaluator