HeartFlow

HeartFlow works to enable better care for patients, and allow clinicians to better identify coronary artery disease through its software HeartFlow Analysis. The company is headquar

Staff - Lead Site Reliability Engineer

Location

California

Posted

31 days ago

Salary

$200.8K - $250.9K / year

Seniority

Lead

No structured requirement data.

Job Description

Staff - Lead Site Reliability Engineer

HeartFlow

Staff/Lead Site Reliability Engineer (SRE) San Francisco, California Heartflow is a medical technology company advancing the diagnosis and management of coronary artery disease, the #1 cause of death worldwide, using cutting-edge technology. The flagship product—an AI-driven, non-invasive cardiac test supported by the ACC/AHA Chest Pain Guidelines called the Heartflow FFRCT Analysis—provides a color-coded, 3D model of a patient’s coronary arteries indicating the impact blockages have on blood flow to the heart. Heartflow is the first AI-driven non-invasive integrated heart care solution across the CCTA pathway that helps clinicians identify stenoses in the coronary arteries (RoadMap™Analysis), assess coronary blood flow (FFRCT Analysis), and characterize and quantify coronary atherosclerosis (Plaque Analysis). Our pipeline of products is growing and so is our team; join us in helping to revolutionize precision heartcare. Heartflow is a publicly traded company (HTFL) that has received international recognition for exceptional strides in healthcare innovation, is supported by medical societies around the world, cleared for use in the US, UK, Europe, Japan and Canada, and has been used for more than 500,000 patients worldwide. HeartFlow is transforming cardiovascular care with cutting-edge, non-invasive technology. We are launching a massive Platform Modernization initiative to power the next generation of our life-saving medical products. We're looking for an experienced Site Reliability Engineer (SRE) to join our cloud-native infrastructure team. You will work closely with our Platform engineers and development teams to ensure our critical systems are highly available, scalable, observable, and performant. If you thrive on eliminating toil, automating complex operations, and defining the standards for production excellence, we want to talk to you. Job Responsibilities As our Staff SRE, you'll be the primary expert responsible for our entire compute ecosystem. Your key responsibilities will include: As a Staff SRE, you'll operate at the highest level of technical expertise and influence. You won't just solve problems; you'll prevent them at a fundamental level across organizational boundaries. - Lead the design, implementation, and operation of reliable, scalable cloud infrastructure - Define and begin rollout of SLI/SLO standards across microservices - Develop self-service instrumentation tooling enabling engineering teams to own observability - Establish observability and monitoring using OSS toolchain - Serve as a technical escalation point for critical incidents, perform deep-dive root cause analyses (RCAs), and implement robust corrective measures to prevent recurrence. - Enhance our monitoring, logging, and tracing systems to provide comprehensive visibility into system health. - Set the technical direction and best practices for the entire SRE and engineering organization. Mentor mid-level and senior engineers on design patterns, operational rigor, and reliability principles. We're looking for a leader and a deep technical expert with a proven track record of solving the hardest scaling and reliability challenges. Required Qualifications - 8+ years of progressive experience in Site Reliability Engineering, Production Engineering, or a closely related role. - Deep expertise with: - AWS  - Kubernetes, Helm - Observability stack (Prometheus, Grafana, Mimir, Loki, Pixie, Tempo) - CI/CD systems (ArgoCD, Harness) - Fluency in at least one major scripting/programming language for automation and tooling (e.g., Python, Go, or Java). - Hands-on engineering mindset — able to instrument services directly, not just configure tooling - Track record of building or significantly improving incident detection and response systems - Have deep technical familiarity with Kubernetes ecosystems, containerization technologies, and modern IaC tooling (e.g., Terraform, Crossplane, or Operators) so you can effectively guide the team's technical decisions - Exceptional communication skills, capable of explaining complex technical issues to both technical and non-technical audiences. Nice-to-Have - Experience implementing Service Mesh technologies (e.g., Istio, Linkerd). - A strong understanding of security principles and practices in a cloud environment. - Certifications such as CKA (Certified Kubernetes Administrator) or CKAD (Certified Kubernetes Application Developer). A reasonable estimate of the base salary compensation range is $200,750 to $250,922, cash bonus, and equity. #LI-IB1 #LI-Hybrid

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Telefónica Tech logo

Junior DevOps Engineer

Telefónica Tech

Will you join the conversation? #WeAreTelefónicaTech

DevOps Engineer31 days ago
Full TimeRemoteTeam 1,001-5,000H1B No Sponsor

• Design, implement, and maintain Modern Data Platform cloud infrastructure using Azure services • Deployment of Azure Landing Zones in alignment with Microsoft best practices • Collaborate with Data Engineering team to deploy and manage modern data platform components • Collaborate with developers, clients and stakeholders to implement and maintain continuous integration and deployment pipelines • Automate deployment processes using tools such as Azure DevOps, Terraform, BICEP, YAML, and PowerShell • Implement and manage monitoring, logging, and alerting systems to ensure maximum availability and reliability • Continuously optimize the performance and scalability of our cloud infrastructure • Work with cross-functional teams to troubleshoot and resolve issues related to our cloud infrastructure and Data Platform • Stay up-to-date with the latest Azure services, trends, and best practices • Document procedures, configurations, and best practices

United Kingdom
GrooveTech logo

Senior Observability Analyst – SRE, Monitoring

GrooveTech

Com T.I., ajudamos empresas e projetos a atingir o máximo potencial de performance e crescimento sustentável.

DevOps Engineer31 days ago
Full TimeRemoteTeam 51-200Since 2017H1B No Sponsor

• Act as the technical observability lead for high-criticality environments. • Manage and evolve solutions such as Datadog, Zabbix, and Grafana. • Implement and optimize APM practices, UX monitoring, traces, metrics, and logs. • Use Azure Monitor and Azure Logs for troubleshooting and event correlation. • Design and implement alert integrations via PagerDuty. • Create and maintain playbooks and runbooks for incident response. • Support root cause analysis and define preventive actions together with infrastructure and application teams.

Brazil
Job Closed
Verity Group logo

Senior SRE / DevOps Engineer

Verity Group

Somos Humanos. Somos Digitais. Somos Verity!

DevOps Engineer31 days ago
Full TimeRemoteTeam 51-200Since 2010H1B No Sponsor

• Design, implement, and evolve CI/CD pipelines • Provision and maintain infrastructure on Google Cloud Platform (GCP) using Terraform and Ansible • Operate and scale Kubernetes environments (GKE) • Define, implement, and monitor SLIs, SLOs, and error budgets • Build observability, alerts, and APM (Dynatrace is a plus) • Work closely with squads, promoting platform engineering and reliability best practices

Brazil
MoneySmart Group logo

Senior DevOps Engineer

MoneySmart Group

Empowering you to reach your financial goals.

DevOps Engineer31 days ago
Full TimeRemoteTeam 51-200Since 2009H1B No Sponsor

• Own production reliability outcomes for services (availability, scalability, cost-efficiency, and security) from early design through implementation and operations • Architect, deploy, and secure AWS cloud infrastructure following best practices • Lead incident response, RCAs and run post-incident reviews • Build and maintain “everything as code” infrastructure using Terraform/CloudFormation • Operate, scale, and evolve the container platform (Kubernetes/EKS/ECS) • Design, implement, and continuously improve CI/CD pipelines to enable fast, safe, and repeatable releases • Integrate security tools, practices, and controls into CI/CD and DevOps workflows including container image scanning, runtime security controls, IAM best practices, and secure secret management • Continuously assess and reduce cloud security risks through automated guardrails, compliance enforcement, and monitoring • Establish and mature observability practices (metrics, logs, traces, dashboards, alerting) to improve system visibility and reduce incident resolution time • Drive automation to eliminate operational toil across infrastructure provisioning, deployments, monitoring, access management, and routine maintenance tasks • Promote an AI-first engineering culture by leveraging AI-assisted tooling to accelerate scripting, documentation, troubleshooting, and operational analysis • Design and prototype agentic solutions for automating repetitive DevOps workflows • Mentor peers, define platform standards, and raise the engineering bar to improve consistency, resilience, and knowledge sharing across teams.

Philippines
₱190K - ₱260K / year