Job Closed

This listing is no longer active.

Netflix logo
Netflix

Described as the world's top internet television network, Netflix is a publicly-traded entertainment company offering video-on-demand and streaming media. As an

Site Reliability Engineer L5 – Live SRE

Location

United States

Posted

139 days ago

Salary

0

Seniority

Senior

Job Description

Site Reliability Engineer L5 – Live SRE

Netflix

• Support live streaming events by focusing on cloud traffic (API Gateway, IPC between microservices). • Prepare and execute various load tests to ensure infrastructure can handle sudden API traffic increases. • Implement end-to-end observability and visualize data to achieve desired availability at scale. • Drive continual improvement in observability, monitoring, and scalability. • Implement, automate, execute, and analyze results from live streaming delivery focused tests. • Write and review code, develop documentation, and debug complex problems. • Coordinate and collaborate across multiple stakeholders for smooth event execution. • Participate in an on-call rotation and work flexible hours based on event schedules.

Job Requirements

  • 5+ years service reliability/operational experience running large scale, high performance systems & internet services with focus on traffic at scale.
  • Knowledge of and proven experience with L4 Load Balancer, HTTP cache, and reverse proxy technologies.
  • Expert-level knowledge of Unix or Linux systems and TCP/IP network fundamentals.
  • Proficient understanding of networking principles, transport, and application protocols, especially DNS, TLS, and HTTP(s) etc.
  • Proficient in a programming language such as Go, Python, Rust etc.
  • Experience with using real time and BigData analytics processing technologies (Kafka, time series database and Presto/Trino, Spark SQL etc)
  • Ability to work in a highly collaborative environment and to communicate effectively with internal and external partners.
  • Preferred - B.S. in Computer Science, Electrical or Computer Engineering (or equivalent professional experience).

Benefits

  • Inclusion is a Netflix value and we strive to host a meaningful interview experience for all candidates.
  • We are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Fulfillment IQ logo

DevOps Engineer

Fulfillment IQ

eCommerce Fulfillment Product Studio that supports brands, retailers, and 3PLs with bespoke solutions.

DevOps Engineer139 days ago
Full TimeRemoteTeam 51-200H1B Sponsor

• Design, build, and maintain CI/CD pipelines to enable faster, reliable, and automated deployments. • Manage and optimize cloud infrastructure (AWS/Azure/GCP) for scalability, security, and cost-efficiency. • Implement infrastructure as code (Terraform, Ansible, CloudFormation, etc.) for consistent environment provisioning. • Monitor system reliability and application performance using modern observability tools (Prometheus, Grafana, ELK, Datadog, etc.). • Ensure compliance with security and governance standards (ISO 27001, SOC2). • Collaborate with developers to streamline code integration, automated testing, and release processes. • Troubleshoot production issues, perform root cause analysis, and implement long-term fixes. • Document DevOps workflows, playbooks, and system architectures for knowledge sharing.

India
Job Closed
Great Gray Trust Company logo

Senior DevOps Engineer

Great Gray Trust Company

Great Gray delivers CIT solutions & beyond, empowering you with the essential governance & expertise to grow confidently

DevOps Engineer139 days ago
Full TimeRemoteTeam 51-200Since 2023H1B No Sponsor

• Build and maintain self-service tools and automation that empower engineering teams to ship faster with confidence • Design and implement standards, patterns, and best practices for containerized and traditional applications • Create documentation and runbooks that enable teams to own their services • Champion DevOps culture and practices across engineering teams • Lead a migration from Azure DevOps to GitHub Actions, designing robust, scalable pipeline architectures • Mature existing CI/CD pipelines with improved testing, security scanning, and deployment strategies • Implement blue-green and canary deployment patterns • Build reusable pipeline templates and shared workflows • Manage and optimize our Azure Kubernetes Service (AKS) clusters • Containerize legacy applications and design migration strategies from monoliths to microservices • Work with Azure App Services, Virtual Machine Scale Sets, and networking infrastructure • Implement and maintain service mesh patterns, monitoring, and observability • Write and maintain Terraform modules for Azure infrastructure • Automate infrastructure provisioning, configuration management, and scaling • Modernize legacy infrastructure with Windows AD dependencies, SSRS, and SSIS workloads • Design and implement auto-scaling strategies for applications currently lacking them • Enhance monitoring and alerting using Application Insights, Azure Monitor, Prometheus/Grafana, Jaeger, and OpenTelemetry • Build dashboards and SLO/SLI frameworks to measure system health • Drive incident response processes and post-mortem culture • Optimize application and infrastructure performance • Complete other related duties as assigned

California + 21 moreAll locations: California | Colorado | Connecticut | District Of Columbia | Florida | Illinois | Nevada | New Hampshire | New Jersey | New York | North Carolina | Ohio | Maryland | Massachusetts | Michigan | Minnesota | Pennsylvania | Rhode Island | South Carolina | Tennessee | Texas | Virginia
$155K - $180K / year
Job Closed
Cohere logo

Site Reliability Engineer – Inference Infrastructure

Cohere

At Cohere, our mission is to build machines that understand the world, and to make them safely accessible to all.

DevOps Engineer140 days ago
Full TimeRemoteTeam 11-50H1B Sponsor

• Build self-service systems that automate managing, deploying and operating services. • This includes our custom Kubernetes operators that support language model deployments. • Automate environment observability and resilience. Enable all developers to troubleshoot and resolve problems. • Take steps required to ensure we hit defined SLOs, including participation in an on-call rotation. • Build strong relationships with internal developers and influence the Infrastructure team’s roadmap based on their feedback. • Develop our team through knowledge sharing and an active review process.

Canada
OtherRemoteTeam 10,001+Since 1986H1B No Sponsor

• Ensure reliability, scalability, and performance of services through SLIs/SLOs, capacity planning, and incident response • Drive automation of infrastructure operations to minimize toil • Develop and support monitoring, alerting, and observability systems to support proactive issue detection • Partner with internal engineering teams to define service-level objectives, improve deployment workflows, and integrate infrastructure with development needs • Contribute to on-call rotations and incident management, helping ensure high availability of services • Drive post-incident reviews and blameless retrospectives to improve reliability • Stay current with emerging technologies and recommend improvements to existing systems and practices.

Colorado + 3 moreAll locations: Colorado | New York | Massachusetts | Missouri
$175K - $185K / year
Job Closed