Zocdoc

Zocdoc is the beginning of a better healthcare experience for millions of patients every month.

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerOther Remote SeniorTeam 501-1,000Since 2007H1B SponsorCompany Site LinkedIn

Location

United States

Posted

103 days ago

Salary

$180K - $220K / year

Seniority

Senior

Bachelor Degree5 yrs expEnglishAWS DNS Docker Firewalls GCP Kubernetes TCP/IP

Job Description

• Monitoring and maintaining complex cloud-based infrastructure, systems, and services and ensuring their uptime to help millions of patients get the care they need • Automating and developing our tooling, processes, and infrastructure to speed up development and make them repeatable and error-proof • Supporting our large product engineering org with their scaling, performance, and uptime needs as well as helping diagnose and debug production related issues • Analyzing and performance tuning systems, code, and networking for scaling and optimal operation • Working with cutting edge GenAI tools and technology

Job Requirements

5+ years of supporting consumer facing web application production environments and systems in a Site Reliability Engineering or Production Engineering role
2+ years of on-call experience in a 24/7 cloud-based production environment
2+ years of experience in managing and supporting modern cloud-based environments and infrastructure like AWS/GCP, Docker, Kubernetes, etc.
Experience with edge technologies such as load balancers, reverse proxies, web application firewalls, routing, etc.
Deep understanding of protocols such as TCP/IP, HTTP/HTTPS, TLS, DNS, NTP
A Bachelor’s degree in Computer Science, Computer Engineering, or equivalent engineering experience is a plus, but not required

Benefits

Flexible, hybrid work environment at our convenient Soho location (If based in NYC)
Unlimited Vacation
100% paid employee health benefit options (including medical, dental, and vision)
Commuter Benefits
401(k) with employer funded match
Corporate wellness program with Wellhub
Sabbatical leave (for employees with 5+ years of service)
Competitive paid parental leave and fertility/family planning reimbursement
Cell phone reimbursement
Catered lunch everyday along with beverages and snacks
Employee Resource Groups and ZocClubs to promote shared community and belonging
Great Place to Work Certified

Related Categories

DevOps Engineer

Related Job Pages

More Remote Jobs

More DevOps Engineer Jobs

Senior DevOps Engineer

Reveleer

The End-to-End Platform for Risk Adjustment, Quality Improvement, and Member Management

DevOps Engineer103 days ago

Other RemoteTeam 51-200H1B Sponsor

Company Site LinkedIn

• Architect, build, and maintain scalable and secure cloud infrastructure across AWS, Azure, and GCP. • Design and implement multi-region, fault-tolerant architectures that support 24/7 SaaS healthcare operations. • Lead Infrastructure as Code (IaC) development using Terraform, CloudFormation, Pulumi, or equivalent. • Build, optimize, and maintain CI/CD pipelines using tools such as Bitbucket, GitHub Actions, GitLab CI, Jenkins, CircleCI, etc. • Automate repeatable processes, deployments, and operational tasks to increase reliability and reduce human error. • Implement end-to-end automated testing frameworks integrated into deployment workflows. • Drive SRE principles, including SLIs/SLOs/SLA management, observability, and proactive reliability improvements. • Implement and maintain logging, monitoring, alerting, and distributed tracing (e.g., New Relic, Datadog, Prometheus, Grafana, ELK). • Lead major incident response, root cause analysis, and post-mortem processes. • Implement DevSecOps best practices, embedding security into CI/CD and infrastructure workflows. • Collaborate with the Security team to ensure controls meet HIPAA, HITRUST, SOC2, NIST, and CIS requirements. • Manage secret stores, identity/access controls, certificate management, and vulnerability remediation. • Architect and maintain cloud networking, including VPCs, Firewalls, WAFs, VPNs, load balancers, service meshes, and hybrid networking. • Support secure integrations between platforms, SaaS systems, and 3rd-party vendors. • Partner with Software Engineering to enable rapid development while maintaining operational excellence. • Work with SRE, Security, QA, and Data teams to optimize performance, automation, and compliance. • Mentor junior engineers and contribute to team standards, design reviews, and architecture discussions.

AWS Azure DNS Docker Firewalls GCP Grafana Jenkins Kubernetes Prometheus Python Terraform

View details: Senior DevOps Engineer

United States

$170K - $180K / year

Apply

Job Closed

DevOps Engineer – IST Timezone

testRigor

testRigor is the #1 generative AI-based codeless test automation tool for manual testers and product managers.

DevOps Engineer103 days ago

Full Time RemoteTeam 51-200Since 2015H1B No Sponsor

Company Site LinkedIn

• The DevOps Engineer (IST Timezone) will be responsible for designing, automating, and maintaining cloud infrastructure and deployment pipelines for a global SaaS platform. • You will collaborate closely with developers and technical leadership in an agile, cross-functional team, driving best practices in automation, security, and reliability. • As a key contributor, you will address challenging infrastructure problems and deliver scalable solutions in fast-paced, start-up environments. • Estimation and planning for infrastructure and automation tasks. • Analysis of requirements to develop robust, maintainable systems. • Designing and implementing Infrastructure as Code (IaC) solutions using Terraform and related toolchains. • Managing cloud infrastructure across Azure, AWS, Google Cloud (GC), and Cloudflare (CF). • Deploying, maintaining, and optimizing Kubernetes (k8s) clusters and containerized workloads. • Developing scripts and automation using Python, Bash, and PowerShell. • Administering MongoDB databases and ensuring high availability and backups. • Maintaining systems across Linux, macOS, and Windows environments. • Supporting development teams with CI/CD pipelines, automated testing, and continuous delivery. • Configuring networks, VPNs, and HTTP reverse proxies for secure and efficient communication. • Implementing best practices for source control with Git and automating workflows. • Monitoring systems for reliability, performance, and security.

AWS Azure Distributed Systems Kubernetes Linux macOS MongoDB Python Terraform

View details: DevOps Engineer – IST Timezone

Kazakhstan

Apply

Job Closed

Customer Site Reliability Engineer – OpenShift Managed Cloud Services, Spoken Japanese, Kubernetes/AWS/Azure, Linux

Red Hat

The leading provider of enterprise open source solutions.

DevOps Engineer103 days ago

Full Time RemoteTeam 10,001+Since 1993H1B Sponsor

Company Site LinkedIn

• Manage large-scale, distributed systems, focusing on minimizing downtime and improving system resilience. • Maintain customer trust and confidence by ensuring stability and functionality of services. • Drive continuous enhancement of processes, tools, and methodologies to support the evolving needs of the service. • Lead the development of code and automation scripts to optimize the scalability, reliability, and performance of services. • Lead and participate in high-priority customer escalations, adopting a customer-first mindset. • Coordinate and execute complex incident response procedures, ensuring timely resolution and thorough postmortems. • Collaborate with cross-functional teams to enhance system robustness. • Demonstrate a proactive mindset to help preempt escalations and ensure reliable operations. • Document resolutions, root causes, and best practices to enrich the knowledge base and promote self-service solutions. • Mentor and coach team members, fostering a culture of continuous learning, knowledge sharing and collaboration. • Participate in on-call rotation and provide leadership during critical incidents. • Collaborate on strategic AI and automation projects designed to increase the efficiency of fleet operations and troubleshooting, ultimately delivering a better product experience for customers.

Ansible AWS Azure Cloud Distributed Systems Google Cloud Platform Kubernetes Linux OpenShift Prometheus TCP/IP Terraform Go

View details: Customer Site Reliability Engineer – OpenShift Managed Cloud Services, Spoken Japanese, Kubernetes/AWS/Azure, Linux

Australia

Apply

Senior Manager, Site Reliability Engineer

Eltropy Inc.

Eltropy is on a mission to disrupt the way people access financial services. Eltropy enables financial institutions to digitally engage in a secure and compliant way. Using our world-class digital communications platform, community financial institutions can improve operations, engagement, and productivity. CFIs (Community Banks and Credit Unions) use Eltropy to communicate with consumers via Text, Video, Secure Chat, co-browsing, screen sharing, and chatbot technology — all integrated in a single platform bolstered by AI, skill-based routing, and other contact center capabilities. Customers are our North Star No Fear - Tell the truth Team of Owners Eltropy is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.

DevOps Engineer103 days ago

Other RemoteTeam 51-200

About the Role We are seeking a Senior Manger of Site Reliability Engineering to lead and scale our SRE function, ensuring the reliability, availability, performance, and efficiency of our critical systems. This role blends deep technical expertise with strategic leadership, partnering closely with Engineering, Product, Security, and Infrastructure teams to build resilient, scalable platforms that support business growth. As a Senior Manager of SRE, you will define reliability standards, establish operational excellence, and foster a culture of automation, observability, and continuous improvement. Key Responsibilities Leadership & Strategy - Define and execute the SRE vision, strategy, and roadmap aligned with business objectives - Build, mentor, and lead a high-performing team of SRE managers and engineers - Establish best practices for reliability, incident management, change management, and capacity planning - Serve as a senior technical leader and trusted advisor across the organization Reliability & Operations - Own system reliability metrics, including SLIs, SLOs, and error budgets - Lead major incident response, post-incident reviews, and long-term remediation efforts - Drive improvements in uptime, latency, scalability, and fault tolerance across Architecture & Engineering Excellence - Influence system architecture to improve resilience, scalability, and operability - Champion automation, Infrastructure as Code, and self-service platforms - Oversee observability strategy (monitoring, logging, tracing, alerting) - Ensure systems are designed for high availability, disaster recovery, and business continuity Collaboration & Governance - Partner with Product, Platform, Security, and Compliance teams to meet operational and regulatory requirements - Define operational standards, runbooks, and on-call practices - Communicate reliability risks, tradeoffs, and performance to executive leadership Required Qualifications - 8+ years of experience in Site Reliability Engineering, DevOps, or Production Engineering - 3+ years in engineering leadership roles - Strong background in distributed systems, cloud platforms (AWS, GCP, Azure), and container orchestration (Kubernetes) - Hands-on experience with CI/CD, Infrastructure as Code (e.g., Terraform, CloudFormation), and automation - Proven experience defining and operating SLOs, SLIs, and error budgets - Excellent incident management and root cause analysis skills - Strong communication skills with the ability to influence technical and non-technical stakeholders Preferred Qualifications - Experience supporting large-scale, high-traffic, or mission-critical systems - Background in software engineering or systems engineering - Experience scaling SRE practices in a fast-growing organization - Familiarity with security, compliance, and regulatory requirements - Bachelor’s or Master’s degree in Computer Science or a related field (or equivalent experience) Location: Remote Compensation: $200,000-$220,000 (Base) About Eltropy (www.eltropy.com) Eltropy is on a mission to disrupt the way people access financial services. Eltropy enables financial institutions to digitally engage in a secure and compliant way. Using our world-class digital communications platform, community financial institutions can improve operations, engagement and productivity. CFIs (Community Banks and Credit Unions) use Eltropy to communicate with consumers via Text, Video, Secure Chat, co-browsing, screen sharing and chatbot technology — all integrated in a single platform bolstered by AI, skill-based routing and other contact center capabilities. Eltropy Values: - Customers are our North Star - No Fear - Tell the truth - Team of Owners Eltropy is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.

View details: Senior Manager, Site Reliability Engineer

United States

Apply

Job Closed

Senior Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior DevOps Engineer

DevOps Engineer – IST Timezone

Customer Site Reliability Engineer – OpenShift Managed Cloud Services, Spoken Japanese, Kubernetes/AWS/Azure, Linux

Senior Manager, Site Reliability Engineer