Fabric logo
Fabric

The national pay range for this role is $165,000.00 - $210,000.00 per year. Actual compensation will be determined by factors such as the candidate's geographic market, experience, skills, and qualifications. Certain roles may also be eligible for additional compensation. If your compensation requirement is greater than our posted range, please still consider applying; a determination can be made based on unique qualifications. Expected compensation ranges for this role may change over time.

Staff Site Reliability Engineer

DevOps EngineerDevOps EngineerOtherRemoteLeadTeam 11-50

Location

United States

Posted

77 days ago

Salary

$140K - $170K / year

Seniority

Lead

No structured requirement data.

Job Description

Staff Site Reliability Engineer

Fabric

About the Role As a Staff Site Reliability Engineer, you will own and evolve the infrastructure powering healthcare experiences for millions of patients. This role bridges the gap between traditional infrastructure excellence and the future of AI-driven operations. You will act as a primary architect for our AWS and Kubernetes (EKS) environment, ensuring the platform is resilient, scalable, and compliant while exploring how agentic workflows can modernize SRE practices. What You'll Do As a Staff Site Reliability Engineer, you will be a steward of Fabric’s production integrity, leading the strategy for infrastructure automation, observability, and system resilience. Your primary responsibilities include: - Infrastructure & Kubernetes Orchestration - Designing, deploying, and maintaining production Kubernetes (EKS) clusters to ensure enterprise-grade availability for our users. - Eliminating manual configuration by building and managing a scalable infrastructure state entirely through Terraform. - Optimizing the AWS footprint—specifically EC2, RDS, and S3—to balance high performance with cost-efficiency and reliability. - AI-Assisted Operations & Automation - Exploring and deploying agentic workflows for AI-assisted runbooks that automate complex operational decisions and repetitive tasks. - Building and evolving deployment pipelines using GitHub Actions or Semaphore to ensure delivery is both rapid and safe. - Focusing on toil reduction by developing internal tools that replace manual operational work with intelligent, autonomous systems. - Observability & Incident Management - Driving the evolution of the observability stack in Datadog by implementing the sophisticated metrics, traces, and logs needed to meet SLOs. - Leading incident response efforts and facilitating the blameless postmortems that help systematically reduce recovery time (MTTR). - Defining and monitoring the SLIs and SLOs that ensure the platform consistently meets rigorous healthcare performance standards. - Compliance & Collaboration - Ensuring every piece of infrastructure remains fully compliant with HIPAA and other critical healthcare regulatory requirements. - Mentoring engineers across the company on reliability best practices and contributing a clinical-safety perspective to cross-functional design reviews. Why You Might Be a Good Fit - You are a deeply proficient engineer who excels at the intersection of cloud infrastructure, automation, and system design. - You possess a meticulous approach to observability and a passion for finding the "root cause" rather than just applying a patch. - You enjoy exploring the "next frontier" of SRE, including how AI and agentic tools can make operations more efficient. - You thrive in fast-paced environments where technical rigor is balanced with pragmatism and clinical-grade safety. This Might Not Be The Right Fit If... - You prefer working on static infrastructure rather than evolving systems through code and automation. - You are uncomfortable with the "agile" pace of tech-driven platform development or integrating AI tools into your daily workflow. - You prefer a siloed role that does not involve active participation in incident response or collaborative postmortems. Your Qualifications - 8+ years of experience in SRE, DevOps, or Platform roles managing production environments at scale. - Expert technical depth in AWS (EKS, EC2, RDS, S3) and production-grade Kubernetes management. - Proficiency with modern tooling including Terraform (IaC), Datadog (Observability), and CI/CD systems. - Deeply proficient coding and scripting skills in Python, Bash, Ruby, or Go. - Preferred experience building agentic workflows or AI-assisted tooling to drive operational efficiency. - A "rigor-first" mindset with a dedication to HIPAA-compliant, high-availability architecture. The national pay range for this role is $140,000.00 – $170,000.00 per year. Actual compensation will be determined by factors such as the candidate's geographic market, experience, skills, and qualifications. Certain roles may also be eligible for additional compensation, including a comprehensive benefits package such as medical, dental, vision, unlimited PTO, and a 401(k) plan, stock options and bonuses. If your compensation requirement is greater than our posted range, please still consider applying; a determination can be made based on unique qualifications. Expected compensation ranges for this role may change over time.

Job Requirements

  • 8+ years of experience in SRE, DevOps, or Platform roles managing production environments at scale.
  • Expert technical depth in AWS (EKS, EC2, RDS, S3) and production-grade Kubernetes management.
  • Proficiency with modern tooling including Terraform (IaC), Datadog (Observability), and CI/CD systems.
  • Deeply proficient coding and scripting skills in Python, Bash, Ruby, or Go.
  • Preferred experience building agentic workflows or AI-assisted tooling to drive operational efficiency.
  • A "rigor-first" mindset with a dedication to HIPAA-compliant, high-availability architecture.

Benefits

  • The national pay range for this role is $140,000.00 – $170,000.00 per year.
  • Actual compensation will be determined by factors such as the candidate's geographic market, experience, skills, and qualifications.
  • Certain roles may also be eligible for additional compensation, including a comprehensive benefits package such as medical, dental, vision, unlimited PTO, and a 401(k) plan, stock options, and bonuses.
  • If your compensation requirement is greater than our posted range, please still consider applying; a determination can be made based on unique qualifications.
  • Expected compensation ranges for this role may change over time.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

SkySafe logo

Senior DevSecOps Engineer

SkySafe

Securing the airspace, protecting the public, enabling the commercial drone industry.

DevOps Engineer77 days ago
Full TimeRemoteTeam 11-50Since 2015H1B No Sponsor

• Design, implement, and maintain scalable and secure cloud infrastructure supporting SkySafe’s SaaS platform • Build and maintain infrastructure automation using Infrastructure-as-Code tools such as Terraform or similar • Develop and maintain CI/CD pipelines to enable safe, reliable, and repeatable deployments • Ensure infrastructure and operational practices align with SOC2 and government security requirements • Implement and maintain monitoring, logging, and alerting systems for production infrastructure • Improve system reliability, observability, and performance across the platform • Collaborate with engineering teams to design infrastructure that supports new services and features • Manage secrets, identity, and access controls using industry best practices • Help define operational standards and DevOps processes across the engineering organization • Support incident response and root-cause analysis for production issues

United States
$145K - $200K / year
Job Closed
Healthie logo

Senior DevOps Engineer

Healthie

Healthie is the world’s leading API-first, ONC-Certified EHR for healthcare delivery outside of the hospital. We provide the powerful infrastructure every scaling organization needs—EHR, scheduling, patient engagement, billing, and more—all accessible via modern APIs and a white-labeled UI. Today, over 1 billion API calls are made to Healthie every month, as thousands of organizations—working with more than 13 million patients in total—rely on Healthie to deliver care across a spectrum of specialties, from preventative health and wellness to complex chronic care management. Healthie is backed by leading investors, and while we've raised $42M to date, more importantly, we operate with fiscal responsibility and have been profitable for more than half of our time as a company.

DevOps Engineer77 days ago
OtherRemoteTeam 51-200

Our Mission We’re building infrastructure for modern healthcare delivery. Traditional healthcare is plagued with outdated, monolithic EHRs designed to maximize billing outcomes. Patient outcomes and provider experiences have been afterthoughts, as these systems have bolted on non-API-first solutions. None of this is built for how clinically excellent healthcare is actually delivered—longitudinally and collaboratively, with the patient at the center. Healthie is the world’s leading API-first, ONC-Certified EHR for healthcare delivery outside of the hospital. We provide the powerful infrastructure every scaling organization needs—EHR, scheduling, patient engagement, billing, and more—all accessible via modern APIs and a white-labeled UI. Our platform makes it simple for organizations of any size to launch, customize, and scale their care delivery models without reinventing the wheel. Today, over 1 billion API calls are made to Healthie every month, as thousands of organizations—working with more than 13 million patients in total—rely on Healthie to deliver care across a spectrum of specialties, from preventative health and wellness to complex chronic care management. We believe in the power of technology to improve access to healthcare—and we’re building the rails that make this a reality. We work fast and with quality because we provide business-critical, healthcare-critical software that clinicians and patients need for a better healthcare system. We’re customer-obsessed, operate with lightning-fast processes and responses, make our product roadmap public so customers can see what we’re building, and remain relentlessly focused on how care gets delivered. Healthie is backed by leading investors, and while we've $42M raised to date, more importantly, we operate with fiscal responsibility and have been profitable for more than half of our time as a company. Learn more at https://www.gethealthie.com/ About the role We are hiring for a DevOps engineer to join our Platform Engineering team at Healthie! In this role, you’ll partner closely with platform, infrastructure, and core engineering teams to improve the reliability of our CI/CD pipeline, implement developer tooling that helps the rest of the engineering team move faster, and make changes to our product to make it easier to manage. This is a hands-on role, ideal for someone who is excited to improve the developer efficiency and quality of life in a fast-moving startup environment and help shape the future of security at Healthie. You should be able to design, scope, and implement tooling independently. If you're passionate about building impactful systems, driving innovation, and making a difference in healthcare — we’d love to hear from you. What You'll Do - Automate infrastructure via tools such as terraform. - Administer our software platform using tools like CircleCI, PostgreSQL, Depot, Shipyard, Github actions. - Work with engineers on the product, customer engineering, data, and platform teams to develop solutions to SDLC slowdowns and inefficiencies. - Develop solutions that improve quality of life and reliability for the engineering organization as a whole.. - Measure performance and make improvements to our ecosystem, evaluate tooling, propose and implement new tools when needed. Details, details - This is a full-time, remote position - U.S. work authorization is required. - The salary range is $180,000 - $200,000 plus equity, annual bonus, & benefits About you - 5+ years of experience in a DevOps/Infrastructure engineering environment - Familiarity with and experience administering continuous integration and deployment pipelines. - You have a desire to problem solve and drive results. - Have a working knowledge of and experience with containerization tools such as docker. - You have excellent communication skills, and enjoy working with other teams directly. - The ideal candidate will have experience with a RoR ecosystem. - Bonus if you have experience with observability platforms such as prometheus/grafana. Interview Process - Quick chat with someone from our Talent team (15 minutes) - Interview with Chris, Director of Platform Engineering (30 minutes) - Pair Coding interview (1hr) - Talk with folks from the platform team (30 minutes) - Interview with Cavan, CTO + cofounder (20 minutes) - Reference checks To learn more about Working at Healthie & our benefits, click here. Healthie participates in e-verify

United States
$180K - $200K / year

About Clear Labs Clear Labs (CL) harnesses the power of next-generation sequencing (NGS) to simplify complex diagnostics for clinical and applied markets. By creating a fully automated platform that brings together DNA sequencing, robotics, and cloud-based analytics, Clear Labs democratizes genomics applications to deliver better clarity. Clear Labs’ turnkey platform accelerates outcomes and improves accuracy from food-borne pathogens to infectious diseases. Position Summary We are a fast-moving, lean engineering team building complex instrument software that bridges physical lab hardware with the cloud. We are looking for a proactive DevOps Engineer to take ownership of our infrastructure. You won't be starting from scratch, nor will you be left alone. You will be taking the reins from our outgoing Senior Architect (who will remain available in an advisory capacity) and will work closely with our core development team to modernize our CI/CD pipelines, secure our GCP environments, and prepare our infrastructure for SOC2 compliance. If you are a hungry mid-level US engineer looking for the autonomy to own a hybrid edge-to-cloud stack, or an experienced nearshore engineer looking for a direct integration with a US-based hardware/software team, this is your launchpad. Reports to: Senior Vice President of Engineering Location: Remote (US / PST Time zone) OR On-Site (San Carlos, CA). Onsite presence is required 3 days per week for employees within a reasonable commuting distance, with additional days onsite possible as business needs require. Primary Responsibilities - Cloud Infrastructure: Manage, monitor, and scale our cloud environment using Infrastructure-as-Code (Terraform) across GCP and GKE (Kubernetes). - CI/CD & Developer Velocity: Maintain and optimize our Jenkins build and deployment pipelines to help our developers ship code faster and more reliably. Build and maintain Docker images and manage containerized applications. - Security & Compliance: Lead the hardening of our clusters, manage secrets, and implement the technical controls necessary for our upcoming SOC2/ISO 27001 audits. - Database & Messaging Operations: Ensure the stability, backup automation, and scaling of our MySQL databases, BigQuery datasets, and message brokers (RabbitMQ, Cloud PubSub). - High-Availability Support: Act as the primary point of contact for infrastructure stability, ensuring continuous operational overlap during core PST working hours. - Release Management: Handle deployment and version control of multiple systems and releases within our blue-green environment Note that job duties and responsibilities may evolve based on company needs and technological advancements. Travel: Travel to company headquarters may be requested occasionally, typically 2–4 times per year. Physical Requirements Able to sit or stand at a computer for extended periods and use monitors and related hardware comfortably.

California
Job Closed
ClearlyAgile logo

DevOps Engineer

ClearlyAgile

We meet you where you are in your Agile journey.

DevOps Engineer77 days ago
OtherRemoteTeam 51-200H1B No Sponsor

We are hiring an experienced DevOps Engineer to help us support existing and new customers. If you're dedicated, ambitious and have a passion for working with the latest leading edge Cloud Native technologies, this is an excellent opportunity for you. In this role you will be responsible for supporting the developers and customers using the infrastructure you are involved with developing, and being our company’s first line of defense in protecting our entire platform against hackers and viruses. Key Responsibilities: - Analyze the company’s current technology stack and develop strategy to continuously improve application delivery - Establish milestones to measure and monitor the delivery and reliability of software as strategies are implemented - Identify manual processes during the software development lifecycle that can be automated - Develop a cloud-migration strategy for software not currently hosted in a cloud environment - Periodically review company’s infrastructure and hosting cost to identify opportunities for cost savings - Establish best practices and mentor other engineers in enterprise-level DevOps solutions - Environment: Kubernetes, Docker, Google Cloud Platform (GCP), GitLab, Azure DevOps, Jira, Terraform, Auth0

Florida