SmarterDx

SmarterDx, founded in 2020 in New York, New York, is a health technology company focused on clinical AI solutions that enhance hospital revenue integrity and ca

Staff Site Reliability Engineer

Location

United States

Posted

85 days ago

Salary

$230K - $250K / year

Seniority

Lead

No structured requirement data.

Job Description

Staff Site Reliability Engineer

SmarterDx

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description We are seeking a Staff Site Reliability Engineer (SRE) to lead the reliability, scalability, and operational excellence of our production systems. This role is responsible for defining and driving SRE practices across the organization, including: - SLIs/SLOs - Incident management - Capacity planning - Resilience engineering You will design and implement automation that reduces toil, improve observability and performance across our Kubernetes and AWS environments, and ensure our systems are highly available and fault-tolerant. The ideal candidate is a deeply technical engineer with strong distributed systems expertise, a passion for operational rigor, and a track record of improving reliability through thoughtful engineering, automation, and data-driven decision-making. This role is fully remote within the US Qualifications - 10+ years of software and software reliability engineering experience, with significant time spent operating and scaling distributed systems in production environments. - 3+ years of hands-on experience running cloud-native infrastructure in AWS, including deep familiarity with containers, Kubernetes, monitoring, and alerting in live production systems. - Proven experience defining and managing SLIs/SLOs, leading incident response, and driving postmortems and systemic reliability improvements. - Strong expertise with Terraform and infrastructure-as-code practices for managing production infrastructure safely and reproducibly. - Deep experience with Kubernetes architecture and operations, including workload reliability, cluster scaling, networking, and failure modes. - Experience working in security-conscious, compliance-oriented environments where reliability and data protection are first-class concerns. - Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field — or equivalent practical experience operating large-scale systems. Requirements - Reliability engineering experience with production database systems (e.g. Postgres) Benefits - Medical, Dental & Vision – Comprehensive plans with leading insurance providers, covering 75% of your premiums, depending on the plan. - Paid Parental Leave – Generous paid leave to support families through birth or adoption: Up to 12 weeks for parents. - Remote-First Team – Work from anywhere in the U.S. - Unlimited PTO & 10 Holidays – So you can relax and recharge. - 401(k) with Traditional & Roth Options – Tax-advantaged retirement savings through Fidelity with a 4% match. - Minimal Bureaucracy – A fast-moving, high-impact environment where you can focus on what matters. - Incredible Teammates! – Work alongside smart, supportive, and mission-driven colleagues.

Job Requirements

  • 10+ years of software and software reliability engineering experience, with significant time spent operating and scaling distributed systems in production environments.
  • 3+ years of hands-on experience running cloud-native infrastructure in AWS, including deep familiarity with containers, Kubernetes, monitoring, and alerting in live production systems.
  • Proven experience defining and managing SLIs/SLOs, leading incident response, and driving postmortems and systemic reliability improvements.
  • Strong expertise with Terraform and infrastructure-as-code practices for managing production infrastructure safely and reproducibly.
  • Deep experience with Kubernetes architecture and operations, including workload reliability, cluster scaling, networking, and failure modes.
  • Experience working in security-conscious, compliance-oriented environments where reliability and data protection are first-class concerns.
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field — or equivalent practical experience operating large-scale systems.
  • Reliability engineering experience with production database systems (e.g. Postgres)

Benefits

  • Medical, Dental & Vision – Comprehensive plans with leading insurance providers, covering 75% of your premiums, depending on the plan.
  • Paid Parental Leave – Generous paid leave to support families through birth or adoption: Up to 12 weeks for parents.
  • Remote-First Team – Work from anywhere in the U.S.
  • Unlimited PTO & 10 Holidays – So you can relax and recharge.
  • 401(k) with Traditional & Roth Options – Tax-advantaged retirement savings through Fidelity with a 4% match.
  • Minimal Bureaucracy – A fast-moving, high-impact environment where you can focus on what matters.
  • Incredible Teammates! – Work alongside smart, supportive, and mission-driven colleagues.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

thinkbridge logo

AWS DevOps Engineer

thinkbridge

thinkbridge is how growth-stage companies turn into tech disruptors, drive growth, and increase their valuations.

DevOps Engineer85 days ago
Full TimeRemoteTeam 201-500Since 2014H1B No Sponsor

• Design, deploy, and manage scalable, secure, and highly available AWS infrastructure. • Implement and manage Infrastructure as Code (IaC) using Terraform to provision resources efficiently. • Build, deploy, and maintain containerized applications using Docker. • Monitor and optimize system performance using DataDog, ensuring proactive issue resolution. • Collaborate with development teams to automate CI/CD pipelines using tools like Bitbucket. • Write and maintain Bash scripts to automate system-level processes. • Ensure the security, compliance, and high availability of infrastructure. • Troubleshoot and resolve system and application-level issues promptly. • Document processes, configurations, and workflows for operational transparency.

India
Job Closed
Global Alliant Inc logo

DevOps Engineer

Global Alliant Inc

2021 & 2022 Inc. 5000 Company | #1 Fastest-Growing IT Services Company in the Mid-Atlantic

DevOps Engineer85 days ago
Full TimeRemoteTeam 51-200H1B Sponsor

• Support applications in the AWS cloud • Responsible for the development, testing, and maintenance of automation scripts, infrastructure tools, DSLs CI/CD pipelines utilizing Ruby and related technologies.

Maryland
$115K / year
Keeper Security, Inc. logo

Senior DevOps Program Manager

Keeper Security, Inc.

Manage, protect and monitor all your organization's passwords, secrets and remote connections with zero-trust security

DevOps Engineer85 days ago
OtherRemoteTeam 501-1,000Since 2011H1B No Sponsor

• Lead end-to-end execution of complex DevOps and infrastructure programs, including cloud modernization, CI/CD optimization, automation, and security integrations • Partner with Engineering, Security, Compliance, and Product leadership to define program strategy, priorities, and success criteria • Oversee large-scale cloud initiatives across AWS and other platforms, ensuring scalability, cost efficiency, and operational resilience • Coordinate Infrastructure-as-Code (IaC) initiatives using Terraform and related automation tooling • Drive improvements across CI/CD pipelines (GitHub Actions, Jenkins, etc.) to reduce deployment friction and enhance reliability • Champion best practices in automated testing, security scanning, and release governance • Integrate compliance and security-by-design principles into all DevOps programs, ensuring alignment with FedRAMP, SOC 2, ISO 27001, and similar standards • Collaborate closely with security engineering and the CISO to ensure program-level compliance and audit readiness • Oversee observability, SRE, and monitoring initiatives to enhance system visibility, performance, and incident response • Define SLIs, SLOs, and error budgets in partnership with Engineering and Security teams • Serve as a cross-functional liaison, ensuring consistent communication, dependency tracking, and alignment across teams • Manage program timelines, risks, and stakeholder expectations across multiple initiatives • Work with Agile, Waterfall, or hybrid methodologies to ensure effective delivery depending on program needs • Identify and adopt emerging technologies that strengthen Keeper’s cloud, automation, and monitoring capabilities

United States
Job Closed
Keeper Security, Inc. logo

Senior DevOps Engineer – IL5, FedRAMP High

Keeper Security, Inc.

Manage, protect and monitor all your organization's passwords, secrets and remote connections with zero-trust security

DevOps Engineer85 days ago
OtherRemoteTeam 501-1,000Since 2011H1B No Sponsor

• Design, implement, and manage IL5 / FedRAMP High–compliant infrastructure in AWS GovCloud and/or Azure Government • Automate infrastructure provisioning using Terraform and infrastructure-as-code best practices • Build and maintain secure CI/CD pipelines that meet IL5 and FedRAMP High compliance requirements • Collaborate with security and compliance teams to ensure appropriate controls, monitoring, and reporting • Configure logging, alerting, and telemetry in restricted and hardened environments • Harden operating systems and container runtimes in accordance with DISA STIGs, CIS benchmarks, and security best practices • Support secure secrets management, access controls (RBAC, ABAC), and audit logging • Participate in architecture discussions to ensure infrastructure is scalable, resilient, and compliant • Assist with documentation, evidence collection, and remediation activities supporting ATO (Authority to Operate) processes

California + 1 moreAll locations: California | Illinois