Job Closed

This listing is no longer active.

CardioOne logo
CardioOne

Redefining independence.

Site Reliability Engineer

DevOps EngineerDevOps EngineerOtherRemoteMid LevelTeam 1-10H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

88 days ago

Salary

$130K - $150K / year

Seniority

Mid Level

Job Description

Site Reliability Engineer

CardioOne

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description We are seeking a highly skilled Site Reliability Engineer (SRE) to ensure the reliability, scalability, security, and performance of our production systems and services. The SRE will bridge the gap between software development and operations, implementing automation, monitoring, and best practices to enable rapid, reliable delivery of applications. You will report directly to the Senior Director of Engineering. What you’ll do: - Reliability & Performance - Ensure high availability, scalability, and performance of production systems. - Implement and maintain SLIs, SLOs, and SLAs for critical services. - Conduct capacity planning and performance tuning. - Automation & Tooling - Automate infrastructure provisioning using IaC tools such as Terraform and Terragrunt, Ansible. - Develop automation to minimize manual operations and improve deployment workflows. - Build CI/CD pipelines to support rapid and reliable deployments. - Monitoring & Incident Response - Design and maintain monitoring, logging, and alerting systems (Datadog). - Participate in on-call rotations and lead incident response efforts. - Perform root-cause analysis and develop postmortems to prevent recurring issues. - Systems Engineering - Manage cloud infrastructure (AWS, Azure) and container orchestration platforms (Kubernetes, ECS). - Optimize system architecture for reliability and fault tolerance. - Implement best practices for security, networking, and service resilience. - Collaboration & Leadership - Work closely with development teams to design reliable microservices and distributed systems. - Advocate for SRE principles and drive operational excellence across engineering teams. - Mentor engineers on reliability practices, tooling, and automation strategies. Qualifications - Bachelor’s degree in Computer Science, Engineering, or equivalent experience. - 3–7 years of experience in SRE, DevOps, or Systems Engineering roles. - Strong proficiency with Linux systems and shell scripting. - Experience with cloud platforms (AWS, Azure). - Hands-on experience with Kubernetes/ECS and container technologies (Docker). - Proficiency in at least one programming language: Python or Java. - Experience with CI/CD pipelines and DevOps tooling. - Strong understanding of distributed systems, networking, and security fundamentals. Preferred Qualifications - Experience with observability stacks (OpenTelemetry). - Knowledge of database management (PostgreSQL). - Experience with configuration management tools (Ansible, Chef, Puppet). - Familiarity with zero-downtime deployments and chaos engineering practices. Soft Skills - Strong analytical and problem-solving skills. - Excellent communication and cross-team collaboration. - Ability to thrive in fast-paced, high-stakes environments. - A mindset focused on continuous improvement and operational excellence. Work Location - Remote: Colorado, Delaware, Florida, New Hampshire, New Jersey, New York, Pennsylvania, Texas. Additional Information - Full-time base salary range of $130,000 to $150,000 plus medical, dental, and vision benefits and a matching 401K.

Job Requirements

  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
  • 3–7 years of experience in SRE, DevOps, or Systems Engineering roles.
  • Strong proficiency with Linux systems and shell scripting.
  • Experience with cloud platforms (AWS, Azure).
  • Hands-on experience with Kubernetes/ECS and container technologies (Docker).
  • Proficiency in at least one programming language: Python or Java.
  • Experience with CI/CD pipelines and DevOps tooling.
  • Strong understanding of distributed systems, networking, and security fundamentals.
  • Preferred Qualifications
  • Experience with observability stacks (OpenTelemetry).
  • Knowledge of database management (PostgreSQL).
  • Experience with configuration management tools (Ansible, Chef, Puppet).
  • Familiarity with zero-downtime deployments and chaos engineering practices.
  • Soft Skills
  • Strong analytical and problem-solving skills.
  • Excellent communication and cross-team collaboration.
  • Ability to thrive in fast-paced, high-stakes environments.
  • A mindset focused on continuous improvement and operational excellence.
  • Work Location
  • Remote: Colorado, Delaware, Florida, New Hampshire, New Jersey, New York, Pennsylvania, Texas.
  • Additional Information
  • Full-time base salary range of $130,000 to $150,000 plus medical, dental, and vision benefits and a matching 401K.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Full TimeRemoteTeam 201-500Since 2016H1B No Sponsor

• Engineer Secure-by-Default Foundations: Design, build, and maintain hardened, multi-account AWS architectures, "golden" AMIs, and secure-by-default container/Kubernetes (EKS) base images. • Automate Security via IaC: Be the expert in "Policy-as-Code." Publish and maintain Infrastructure controls, golden Terraform modules, Helm charts, and admission policies. You will measure adoption, drift detection, and exception aging while preventing misconfigurations before they're deployed. • Own the Platform & Edge Defense: Configure and manage runtime security for Kubernetes (e.g., admission controllers, least-privilege policies) and own the safe-change processes for our layered edge defenses (WAF/CDN/anti-Bot), including pre-prod testing, blast-radius limits, rollback patterns, and change metrics. • Generate High-Fidelity Signals: Integrate posture signals (CSPM, KSPM, CI/CD, WAF) into centralized dashboards and our SIEM/SOAR with clear routing and ownership, partnering with D&R to ensure signals are high-fidelity and actionable. • Enable & Mentor: Lead threat modeling exercises and partner with Platform, SRE, and Product teams to translate risks into actionable backlogs. You'll be mentoring others on prevention-first design. • Support Incident Response: Define platform incident playbooks for misconfiguration and drift containment. You will act as the senior subject-matter expert for cloud/platform incidents, providing deep technical expertise to the IR team.

Spain
Job Closed
Element Solutions logo

Sr. Site Reliability Engineer

Element Solutions

Element is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status, marital status, protected veteran status, or any other legally protected class. We believe in a world where solutions we build improve the lives of those who use them.

DevOps Engineer88 days ago

Who is Element? We serve as a partner at the intersection of innovation and our clients' needs, efficiently crafting meaningful user experiences for government and commercial customers. By breaking down complex problems to their fundamental elements, we create modern digital solutions that drive efficiencies, maximize taxpayer dollars, and deliver essential outcomes that serve the people. Why Work at Element? Make an impact that resonates-join our vibrant team and discover how you can improve lives through digital transformation. Our talented professionals bring unparalleled energy engagement, setting a higher standard for impactful work. Come be a part of our team and shape a better future. Position Summary The Senior Site Reliability Engineer (SRE) serves as the Technical Architecture & Stability Assessment Lead responsible for evaluating the reliability, scalability, and operational resilience of complex enterprise infrastructure environments. This role supports a structured 16-week technical assessment and optional implementation phase focused on identifying stability risks, mapping infrastructure dependencies, and strengthening existing architecture to support operational continuity during modernization initiatives. Element’s approach prioritizes practical stabilization over unnecessary redesign. Rather than introducing large-scale architectural transformations, this role emphasizes reinforcing the current infrastructure to withstand coexistence pressures and operational demands while sequencing improvements responsibly. Key Responsibilities - Conducting current-state infrastructure mapping across application, platform, and hosting layers, documenting and recommending improvements. - Performing dependency and integration analysis across interconnected enterprise systems. - Identifying single points of failure and systemic reliability risks. - Supporting datacenter transition modeling and infrastructure transition sequencing. - Conducting Citrix dependency analysis and migration sequencing recommendations. - Evaluating system performance, scalability, and operational resilience under high-volume workloads. - Providing modularization and decoupling recommendations to reduce operational fragility and support phased modernization and migration efforts. - Advising on supporting operations and development teams on continuity strategies that allow legacy and modern systems to operate reliably in tandem during modernization and migrations. Minimum Qualifications - Bachelor’s degree in Computer Science, Information Systems, Information Technology, Engineering, or a related technical discipline. - This individual brings 10+ years of professional experience in infrastructure engineering, site reliability engineering, or enterprise platform architecture, including experience supporting complex enterprise environments. - Demonstrated experience conducting enterprise architecture or infrastructure assessments. - Enterprise infrastructure architecture and systems engineering - Extensive experience in hybrid hosting environments and modernization and cloud migration planning. - Virtualization platform management. - Performance engineering and scalability assessments for high-volume systems. - Experience designing or advising on system resilience and operational continuity strategies. - Experience in Infrastructure stabilization during large-scale multi-application modernization or migration initiatives. - Strong analytical and systems-thinking abilities. - Excellent technical documentation and architecture diagramming skills. - Strong stakeholder communication and facilitation abilities. - US Citizenship or Permanent Residency required. - Must reside in the Continental US; located in the state of Pennsylvania a plus, but not required. - Depending on the government agency, specific requirements may include public trust background check or security clearance. Preferred Qualifications - Experience with Citrix App and Desktop Virtualization. - Certification in any of the following is a plus: - AWS Certified Solutions Architect - Google Professional Cloud Architect - Microsoft Certified: Azure Solutions Architect Expert - Certified Kubernetes Administrator - ITIL Foundation - Experience working with large public sector, healthcare system environments, or within the State/Commonwealth is a plus. - Familiarity with enterprise architecture frameworks (e.g., TOGAF or similar).Experience supporting legacy system environments undergoing modernization. $140,000 - $180,000 a year The likely salary range for this position is $140,000-$180,000. This is not, however, a guarantee of compensation or salary. Rather, salary will be set based on experience, geographic location and possibly contractual requirements and could fall outside of this range. Location Be in your Element. We are a remote-first company based in Washington, DC. Element is an equal opportunity employer All qualified applicants will receive consideration for employment without regard to age, ancestry, race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status, marital status, protected veteran status, or any other legally protected class. We believe in a world where solutions we build improve the lives of those who use them.

United States
$140K - $180K / year
Job Closed
Oddball logo

Junior DevOps Engineer

Oddball

A strangely human digital agency

DevOps Engineer88 days ago
OtherRemoteTeam 51-200H1B No Sponsor

• Support and maintain CI/CD pipelines using tools such as GitHub Actions and Jenkins. • Assist with provisioning and managing cloud infrastructure in AWS, including services like EC2, S3, and RDS. • Help automate infrastructure and environment configuration using Terraform or CloudFormation. • Support container-based application deployments using Docker. • Assist with monitoring and troubleshooting environments using tools such as Datadog and CloudWatch. • Write basic automation scripts using Python or Bash to improve deployment and operational workflows. • Collaborate with engineering teams operating in Agile development environments.

United States
$80K - $100K / year
Job Closed
ClickHouse logo

Senior Site Reliability Engineer

ClickHouse

ClickHouse is an open-source, column-oriented OLAP database management system.

DevOps Engineer88 days ago
OtherRemoteTeam 51-200Since 2016H1B Sponsor

• Collaborate with various engineering teams in ClickHouse to design and implement scalable, secure, and highly available systems for ClickHouse. • Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud. • Ensure all the infrastructure components in ClickHouse Cloud (including Dataplane, Control Plane and ClickHouse Core) have monitoring and alerting in place to ensure timely detection and resolution of incidents. • Enhance and refine incident response processes and post-mortem analysis for any outages in ClickHouse Cloud including working with the support team to communicate to the impacted customers. • Continuously improve the reliability and performance of our ClickHouse services. • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities. • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize downtime.

United States
$141K - $208K / year