Job Closed

This listing is no longer active.

Underdog Fantasy describes itself as one of the fastest-growing sports companies on the market, bringing "fun, approachable contests and games to the masses." A

Senior Site Reliability Engineer – Infrastructure

DevOps EngineerDevOps EngineerOther Remote Senior Company Site

Location

United States

Posted

135 days ago

Salary

$160K - $240K / year

Seniority

Senior

Bachelor DegreeEnglishAWS Kotlin Kubernetes PostgreSQL Python Ruby Swift TypeScript

Job Description

• Own and maintain the incident response process, including defining procedures, tools, and best practices • Guide teams in establishing and monitoring Service Level Objectives (SLOs), including setting up alerts and reporting systems • Lead capacity planning initiatives, focusing on both short and long-term scalability while optimizing costs • Develop and implement disaster recovery plans, including regular testing and regulatory compliance • Collaborate with teams on architecture decisions to ensure high availability and scalability • Manage launch and event planning for high-traffic occasions, focusing on infrastructure preparation and capacity management (a.k.a. Launch Readiness) • Act as an internal expert and consultant for monitoring tools like Datadog and Pagerduty and infrastructure like AWS and Kubernetes • Emphasis on automation and tooling to scale our workload • Contribute across codebases in Ruby, Python, Go, TypeScript, Swift, and Kotlin as needed to support the initiatives described above.

Job Requirements

A strong written and verbal communicator
Collaborative by nature
Someone who enjoys using research, data, and experiments to make decisions; you believe “Hope is not a strategy.”
You enjoy working directly with customers (generally engineers or other people inside the company)
You think long-term about what is best for the business and its customers
You are excited to take ownership
You are very comfortable around an IDE, working with multiple languages, multiple web application frameworks, AWS services, Kubernetes, PostgreSQL
You can work independently to learn new languages/technologies as needed
You enjoy deploying changes to production quickly, multiple times a week if necessary

Benefits

Unlimited PTO (we're extremely flexible with the exception of the first few weeks before & into the NFL season)
16 weeks of fully paid parental leave
Home office stipend
A connected virtual first culture with a highly engaged distributed workforce
5% 401k match, FSA, company paid health, dental, vision plan options for employees and dependents

Related Categories

DevOps Engineer

Related Job Pages

Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Senior Machine Learning Site Reliability Engineer

Prima Power

EVOLVE BY INTEGRATION

DevOps Engineer135 days ago

Full Time RemoteTeam 1,001-5,000H1B Sponsor

Company Site LinkedIn

• Design, build, and operate reliable and scalable systems by defining and monitoring SLOs/SLIs • work directly on production infrastructure • collaborate closely with software engineers on system design and reliability improvements • actively develop automation for infrastructure and operational workflows to eliminate toil and reduce MTTR • participate in and lead incident response • drive blameless post-incident reviews with concrete follow-ups implemented in code and tooling • continuously analyze and optimize system performance and cost • provide data, insights, and recommendations to inform capacity planning • support security best practices through hands-on vulnerability remediation and threat mitigation

AWS DNS Kubernetes PySpark Python Terraform

View details: Senior Machine Learning Site Reliability Engineer

Italy

Apply

Senior Site Reliability Engineer

Hashgraph

Hashgraph, formerly Swirlds Labs, is a software company home to some of the brightest minds in web3.

DevOps Engineer135 days ago

Other RemoteTeam 51-200Since 2022H1B No Sponsor

Company Site LinkedIn

• Help design, build, and integrate key product features for enterprise businesses built on Hiero, for our private distributed ledger technology • Leverage distributed systems engineering experience, software development skills, and understanding of industry standard SRE and DevOps practices to deliver core platform services • Contribute to a highly scalable, mission-critical infrastructure product used by some of the largest companies in finance, supply chain, and healthcare industries.

AWS Azure Distributed Systems GCP Kubernetes Solidity

View details: Senior Site Reliability Engineer

United States

Apply

Job Closed

DevOps Engineer / Site Reliability Engineer

TWO95 International, Inc

Recruitment and Staffing Soultion

DevOps Engineer135 days ago

Other RemoteTeam 51-200Since 1993H1B No Sponsor

Company Site LinkedIn

**Job Title: Lead SRE (Site Reliability Engineer )** **Location: Remote Work** **Type: 6+ Month Contract to hire** **Rate: $Open /hr.** Pl forward updated resume to **deivy.malli****@two95intl.com** and include your rate requirement along with your contact details with a suitable time when we can reach you. **Responsibilities ** · Own uptime, SLAs, and overall reliability of cloud infrastructure and kiosks platform. · Lead incident response, root-cause analysis, and drive actionable postmortems. · Automate infrastructure, deployments, and operational tasks using modern IaC and scripting in collaboration with the Platform Engineering team. · Maintain and improve monitoring, alerting, and observability (Grafana, Prometheus, New Relic, etc). · Manage, operate and recommend improvement of mo · Execute and continuously improve disaster recovery and business continuity plans. · Partner with platform engineering, QA, and development teams to ensure operational readiness. · Establish and maintain runbooks, operational standards, and reliability best practices. · Provide leadership, mentorship, and clear communication during both normal operations and incidents. · Optimize cloud and Kubernetes environments for reliability, performance, and scalability.

Grafana Kubernetes Prometheus Python Terraform

View details: DevOps Engineer / Site Reliability Engineer

United States

Apply

Job Closed

Site Reliability Engineer L5 – Live SRE

Netflix

Described as the world's top internet television network, Netflix is a publicly-traded entertainment company offering video-on-demand and streaming media. As an

DevOps Engineer136 days ago

Other Remote

Company Site

• Support live streaming events by focusing on cloud traffic (API Gateway, IPC between microservices). • Prepare and execute various load tests to ensure infrastructure can handle sudden API traffic increases. • Implement end-to-end observability and visualize data to achieve desired availability at scale. • Drive continual improvement in observability, monitoring, and scalability. • Implement, automate, execute, and analyze results from live streaming delivery focused tests. • Write and review code, develop documentation, and debug complex problems. • Coordinate and collaborate across multiple stakeholders for smooth event execution. • Participate in an on-call rotation and work flexible hours based on event schedules.

DNS Apache Kafka Linux Microservices Python Rust Apache Spark SQL TCP/IP Unix

View details: Site Reliability Engineer L5 – Live SRE

United States

Apply

Job Closed

Senior Site Reliability Engineer – Infrastructure

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior Machine Learning Site Reliability Engineer

Senior Site Reliability Engineer

DevOps Engineer / Site Reliability Engineer

Site Reliability Engineer L5 – Live SRE