Job Closed

This listing is no longer active.

SmarterDx

Improving clinical and financial outcomes with physician-validated AI for documentation and coding.

Staff Site Reliability Engineer

DevOps EngineerDevOps EngineerOther Remote LeadTeam 11-50H1B No SponsorCompany Site LinkedIn

Location

United States

Posted

138 days ago

Salary

$230K - $250K / year

Seniority

Lead

No structured requirement data.

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description We are seeking a Staff Site Reliability Engineer (SRE) to lead the reliability, scalability, and operational excellence of our production systems. This role is responsible for defining and driving SRE practices across the organization, including: - SLIs/SLOs - Incident management - Capacity planning - Resilience engineering You will design and implement automation that reduces toil, improve observability and performance across our Kubernetes and AWS environments, and ensure our systems are highly available and fault-tolerant. The ideal candidate is a deeply technical engineer with strong distributed systems expertise, a passion for operational rigor, and a track record of improving reliability through thoughtful engineering, automation, and data-driven decision-making. This role is fully remote within the US Qualifications - 10+ years of software and software reliability engineering experience, with significant time spent operating and scaling distributed systems in production environments. - 3+ years of hands-on experience running cloud-native infrastructure in AWS, including deep familiarity with containers, Kubernetes, monitoring, and alerting in live production systems. - Proven experience defining and managing SLIs/SLOs, leading incident response, and driving postmortems and systemic reliability improvements. - Strong expertise with Terraform and infrastructure-as-code practices for managing production infrastructure safely and reproducibly. - Deep experience with Kubernetes architecture and operations, including workload reliability, cluster scaling, networking, and failure modes. - Experience working in security-conscious, compliance-oriented environments where reliability and data protection are first-class concerns. - Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field — or equivalent practical experience operating large-scale systems. Requirements - Reliability engineering experience with production database systems (e.g. Postgres) Benefits - Medical, Dental & Vision – Comprehensive plans with leading insurance providers, covering 75% of your premiums, depending on the plan. - Paid Parental Leave – Generous paid leave to support families through birth or adoption: Up to 12 weeks for parents. - Remote-First Team – Work from anywhere in the U.S. - Unlimited PTO & 10 Holidays – So you can relax and recharge. - 401(k) with Traditional & Roth Options – Tax-advantaged retirement savings through Fidelity with a 4% match. - Minimal Bureaucracy – A fast-moving, high-impact environment where you can focus on what matters. - Incredible Teammates! – Work alongside smart, supportive, and mission-driven colleagues.

Job Requirements

10+ years of software and software reliability engineering experience, with significant time spent operating and scaling distributed systems in production environments.
3+ years of hands-on experience running cloud-native infrastructure in AWS, including deep familiarity with containers, Kubernetes, monitoring, and alerting in live production systems.
Proven experience defining and managing SLIs/SLOs, leading incident response, and driving postmortems and systemic reliability improvements.
Strong expertise with Terraform and infrastructure-as-code practices for managing production infrastructure safely and reproducibly.
Deep experience with Kubernetes architecture and operations, including workload reliability, cluster scaling, networking, and failure modes.
Experience working in security-conscious, compliance-oriented environments where reliability and data protection are first-class concerns.
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field — or equivalent practical experience operating large-scale systems.
Reliability engineering experience with production database systems (e.g. Postgres)

Benefits

Medical, Dental & Vision – Comprehensive plans with leading insurance providers, covering 75% of your premiums, depending on the plan.
Paid Parental Leave – Generous paid leave to support families through birth or adoption: Up to 12 weeks for parents.
Remote-First Team – Work from anywhere in the U.S.
Unlimited PTO & 10 Holidays – So you can relax and recharge.
401(k) with Traditional & Roth Options – Tax-advantaged retirement savings through Fidelity with a 4% match.
Minimal Bureaucracy – A fast-moving, high-impact environment where you can focus on what matters.
Incredible Teammates! – Work alongside smart, supportive, and mission-driven colleagues.

Related Categories

DevOps Engineer

Related Job Pages

More Remote Jobs

More DevOps Engineer Jobs

AWS DevOps Engineer

thinkbridge

thinkbridge is how growth-stage companies turn into tech disruptors, drive growth, and increase their valuations.

DevOps Engineer138 days ago

Full Time RemoteTeam 201-500Since 2014H1B No Sponsor

Company Site LinkedIn

• Design, deploy, and manage scalable, secure, and highly available AWS infrastructure. • Implement and manage Infrastructure as Code (IaC) using Terraform to provision resources efficiently. • Build, deploy, and maintain containerized applications using Docker. • Monitor and optimize system performance using DataDog, ensuring proactive issue resolution. • Collaborate with development teams to automate CI/CD pipelines using tools like Bitbucket. • Write and maintain Bash scripts to automate system-level processes. • Ensure the security, compliance, and high availability of infrastructure. • Troubleshoot and resolve system and application-level issues promptly. • Document processes, configurations, and workflows for operational transparency.

Ansible AWS Docker Amazon EC2 Jenkins Kubernetes Python Terraform

View details: AWS DevOps Engineer

India

Apply

Job Closed

Senior DevOps Program Manager

Keeper Security, Inc.

Manage, protect and monitor all your organization's passwords, secrets and remote connections with zero-trust security

DevOps Engineer138 days ago

Other RemoteTeam 501-1,000Since 2011H1B No Sponsor

Company Site LinkedIn

• Lead end-to-end execution of complex DevOps and infrastructure programs, including cloud modernization, CI/CD optimization, automation, and security integrations • Partner with Engineering, Security, Compliance, and Product leadership to define program strategy, priorities, and success criteria • Oversee large-scale cloud initiatives across AWS and other platforms, ensuring scalability, cost efficiency, and operational resilience • Coordinate Infrastructure-as-Code (IaC) initiatives using Terraform and related automation tooling • Drive improvements across CI/CD pipelines (GitHub Actions, Jenkins, etc.) to reduce deployment friction and enhance reliability • Champion best practices in automated testing, security scanning, and release governance • Integrate compliance and security-by-design principles into all DevOps programs, ensuring alignment with FedRAMP, SOC 2, ISO 27001, and similar standards • Collaborate closely with security engineering and the CISO to ensure program-level compliance and audit readiness • Oversee observability, SRE, and monitoring initiatives to enhance system visibility, performance, and incident response • Define SLIs, SLOs, and error budgets in partnership with Engineering and Security teams • Serve as a cross-functional liaison, ensuring consistent communication, dependency tracking, and alignment across teams • Manage program timelines, risks, and stakeholder expectations across multiple initiatives • Work with Agile, Waterfall, or hybrid methodologies to ensure effective delivery depending on program needs • Identify and adopt emerging technologies that strengthen Keeper’s cloud, automation, and monitoring capabilities

AWS Docker Grafana Jenkins Kubernetes Prometheus Terraform

View details: Senior DevOps Program Manager

United States

Apply

Job Closed

Senior DevOps Engineer – IL5, FedRAMP High

Keeper Security, Inc.

Manage, protect and monitor all your organization's passwords, secrets and remote connections with zero-trust security

DevOps Engineer138 days ago

Other RemoteTeam 501-1,000Since 2011H1B No Sponsor

Company Site LinkedIn

• Design, implement, and manage IL5 / FedRAMP High–compliant infrastructure in AWS GovCloud and/or Azure Government • Automate infrastructure provisioning using Terraform and infrastructure-as-code best practices • Build and maintain secure CI/CD pipelines that meet IL5 and FedRAMP High compliance requirements • Collaborate with security and compliance teams to ensure appropriate controls, monitoring, and reporting • Configure logging, alerting, and telemetry in restricted and hardened environments • Harden operating systems and container runtimes in accordance with DISA STIGs, CIS benchmarks, and security best practices • Support secure secrets management, access controls (RBAC, ABAC), and audit logging • Participate in architecture discussions to ensure infrastructure is scalable, resilient, and compliant • Assist with documentation, evidence collection, and remediation activities supporting ATO (Authority to Operate) processes

AWS Azure Jenkins Python Terraform

View details: Senior DevOps Engineer – IL5, FedRAMP High

California + 1 more

Apply

Senior Database Reliability Engineer

Rithum

Rithum is the heartbeat of commerce

DevOps Engineer138 days ago

Other RemoteTeam 501-1,000Since 1997H1B No Sponsor

Company Site LinkedIn

• Ensure maximum availability and reliability of mission-critical database systems across hybrid infrastructure. • Design, implement, and maintain SQL Server Always-on Availability Groups, clustering, and replication topologies. Constantly improve the observability of all database systems. • Lead major database upgrade initiatives and modernization efforts. Support other engineers and teams in their use of database systems. • Continuously enhance observability using telemetry, performance analysis, and proactive monitoring. • Continuously enhance processes through automation. Automate operational workflows using PowerShell, Python, and CI/CD tooling. • Ensure all data is protected and secure • Participate in our on-call rotation • Troubleshoot and tune high-load production systems, including complex performance and replication issues. • Lead technical response during high-severity incidents and conduct root cause analysis. • Ensure database security, backup integrity, and disaster recovery readiness. • Contribute to the development of best practices for database engineering and reliability. • Collaborate cross-functionally to design scalable, resilient data architectures. • Mentor team members and contribute to engineering best practices.

Python SQL

View details: Senior Database Reliability Engineer

Washington

$90K - $140K / year

Apply

Job Closed

Staff Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

AWS DevOps Engineer

Senior DevOps Program Manager

Senior DevOps Engineer – IL5, FedRAMP High

Senior Database Reliability Engineer