Job Closed

This listing is no longer active.

Defining what it means to build and deliver the most extraordinary sports & entertainment experiences.The Crown is Yours

Lead Site Reliability Engineer

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 1,001-5,000Since 2012H1B No SponsorCompany Site LinkedIn

Location

United States

Posted

137 days ago

Salary

$148K - $185K / year

Seniority

Senior

Bachelor Degree6 yrs expEnglishAnsible AWS Chef Cloud Docker Elixir Google Cloud Platform IoT Java Kubernetes Linux Python Ruby Terraform Go .NET

Job Description

• Lead SRE initiatives across multiple projects and products, collaborating with cross-functional teams to shape platform and infrastructure engineering efforts across the organization. • Drive technical excellence by mentoring and guiding engineers, fostering a culture of continuous learning and innovation. • Architect and automate self-healing, fault-tolerant infrastructure with declarative configurations, GitOps, and event-driven automation for scalable deployments across public clouds and on-premise. • Design, develop, and maintain software-driven infrastructure automation to build internal tools and eliminate repetitive operational tasks. • Own and drive decisions on product deployment, performance tuning, monitoring, and alerting to ensure high availability and system efficiency in production. • Define key metrics and SLAs around new web services being created to support our rapid traffic growth. • Design and implement monitoring and alerting strategies to enforce application SLAs.

Job Requirements

At least 6 years of experience managing distributed cloud environments (GCP, AWS, vSphere, Nutanix) and platform automation at scale.
Deep expertise in container orchestration (Kubernetes) and container runtimes (Docker, containers), with the ability to design, scale, and troubleshoot complex workloads.
Expert-level understanding of networking and web concepts, with the ability to debug issues down to the packet level.
Strong experience developing software for automation and infrastructure tooling (Go, Python).
Strong understanding of Linux-based operating systems, including performance tuning, bootloaders, storage, partitioning, kernel debugging, and low-level system optimizations.
Experience with Infrastructure as Code (IaC) and configuration management tools (Terraform, Ansible, Chef, etc.), ensuring scalable and repeatable infrastructure provisioning.
Understanding of applications written in various programming languages (C#/.NET, Java, Elixir, Ruby, etc).
Experience in AWS Greengrass IoT management and A/B booting.

Benefits

bonus
equity
benefits as applicable

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Site Reliability Engineering Manager

Flywire

Headquartered in Boston, Massachusetts, Flywire is a privately held financial services company offering international payment solutions to businesses and consum

DevOps Engineer137 days ago

Other RemoteTeam 1,200Since 2011

Company Site

• Help drive reliability, automation and performance within cloud-based infrastructure • Coordinate and support daily activities for SREs on the team • Work on issues of limited scope and execute solutions to routine problems • Become embedded within an Engineering team advocating for best practices • Mentor team members and drive initiatives • Debug production issues across services and levels of the stack • Identify opportunities both in processes and tools to improve team productivity • Participate in an on-call shift along with other disciplines to respond to incidents • Lean into business domain and needs as well as company vision, mission and strategy

View details: Site Reliability Engineering Manager

Massachusetts

$160K - $200K / year

Apply

Senior Site Reliability Engineer

Customer.io

Customer.io helps companies communicate with their customers in a more authentic and human way. Its versatile marketing automation platform helps “bring human

DevOps Engineer137 days ago

Other Remote

Company Site

• Build and scale infrastructure to support billions of messages per day and real-time events • Automate deployments, alerting, and incident response • Make our on-call better - clear alerts, solid documentation, and faster resolution • Tune MySQL and other datastore performance and improve reliability across distributed systems • Collaborate across teams to debug, ship, and support systems in production • Share knowledge and raise the bar through sharing your progress publicly with short videos, thoughtful writing, and mentorship • Leverage AI tools to prototype, move faster, and make better decisions

Distributed Systems GCP MySQL Terraform

View details: Senior Site Reliability Engineer

United States

$140K - $180K / year

Apply

Job Closed

Senior Site Reliability Engineer

Customer.io

Customer.io helps companies communicate with their customers in a more authentic and human way. Its versatile marketing automation platform helps “bring human

DevOps Engineer137 days ago

Full Time Remote

Company Site

Distributed Systems GCP MySQL Terraform

View details: Senior Site Reliability Engineer

Europe

$140K - $180K / year

Apply

Job Closed

Senior DevSecOps Engineer, AI Enablement

CACI International Inc

Expertise and Technology for National Security

DevOps Engineer137 days ago

Other RemoteTeam 10,001+Since 1962H1B No Sponsor

Company Site LinkedIn

• Join CACI’s AI Enablement team as a Senior DevSecOps Engineer delivering rapid GenAI infrastructure and CI/CD capabilities through 1–2 month program engagements. • Deploy secure pipelines, containerized platforms, cloud environments, and managed AI services while coaching program teams to operate and evolve systems independently. • Enhance our solution catalog by refining IaC templates and contributing new infrastructure patterns from field experience. • Rapidly deploy GenAI infrastructure across AWS, Azure, and on‑prem using catalog templates. • Implement and operationalize containerized platforms; train teams on deployment and troubleshooting. • Establish production readiness standards including observability, reliability, and documentation. • Build and refine GitLab CI/CD pipelines with security scanning and deployment automation. • Configure identity and access management (Keycloak or similar) with OIDC/SAML. • Lead workshops, pair‑programming, and reviews to build program team capabilities. • Develop reusable Terraform modules and IaC patterns for networking, IAM, and GenAI infrastructure. • Document architecture decisions, lessons learned, and best practices. • Improve catalog templates and tooling based on recurring field challenges.

AWS Azure Distributed Systems Docker Grafana Kubernetes Microservices Prometheus Python Terraform

View details: Senior DevSecOps Engineer, AI Enablement

United States

$98.5K - $206.8K / year

Apply

Job Closed

Lead Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Site Reliability Engineering Manager

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior DevSecOps Engineer, AI Enablement