Penn Mutual

Helping people get stronger is a pretty good business to be in.

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerOther Remote SeniorTeam 1,001-5,000Since 1847H1B SponsorCompany Site LinkedIn

Location

United States

Posted

101 days ago

Salary

$128K - $165K / year

Seniority

Senior

Bachelor Degree6 yrs expEnglishAWS Distributed Systems ITSM

Job Description

• Lead reliability availability, scalability, and recovery design for critical systems. • Define and evolve SLOs, SLIs, and error budget practices across services. • Identify systemic reliability risks and drive cross-team remediation efforts. • Influence application and platform architecture to improve operational outcomes. • Act as a technical lead during major incidents and complex outages. • Drive high-quality root cause analysis and recommend corrective actions. • Improve incident response processes, tooling, and runbooks. • Design and implement advanced automation to eliminate operational toil at scale. • Build and maintain shared SRE tooling and platforms. • Set engineering standards for reliability-focused code and operational practices. • Review and improve CI/CD, deployment, and rollback strategies. • Partner with Release and Change Management to automate release practices. • Lead risk assessments for high impact changes and releases. • Ensure compliance requirements are met without sacrificing engineering velocity. • Serve as a reliability authority for release readiness decisions. • Mentor junior SREs and junior engineers through technical guidance and review. • Lead by example in operational excellence and engineering rigor. • Influence reliability culture across engineering and product teams.

Job Requirements

Bachelor’s degree in Computer Science, Engineering, or related field.
6–10+ years of experience in SRE, software engineering, platform, or DevOps roles.
Professional experience in performing root cause analysis on incidents, documenting SRE systems and usage.
Strong programming skills with professional experience in multiple languages.
Deep experience with AWS and distributed systems.
Advanced knowledge of observability, ITSM, and reliability engineering principles.
Proven ability to operate effectively in complex, regulated environments.
Experience with use/implementation of observability tools (metrics, logs, tracing).
Experience with CI/CD pipelines and deployment automation.
Experience with Root Cause Analysis investigation/documentation.
Familiarity with containerization and orchestration technologies.
Strong troubleshooting and analytical skills.

Related Categories

DevOps Engineer

Related Job Pages

More Remote Jobs

More DevOps Engineer Jobs

DevOps Engineer

Summit Racing Equipment

The World's Speed Shop ®

DevOps Engineer101 days ago

Other RemoteTeam 11-50Since 1968H1B No Sponsor

Company Site LinkedIn

• Support and facilitate cross team communication • Ownership of: Systems and Service Monitoring • Software Deployments • Organize and execute deployments • Effectively utilize tools like Jira, Confluence, and Bitbucket to collect issues relating to deployment • Create release structures • Deploy code • Research and create proof-of-concepts for new technologies and processes • Support and enhance the SDLC • Act as a knowledge source regarding who should be contacted when problems arise • Document new and existing processes, technologies, and tools for future supportability

Ansible Chef Docker JavaScript Jenkins Kubernetes Linux Node.js Puppet Python SDLC Selenium TCP/IP .NET

View details: DevOps Engineer

Florida + 4 more

Apply

Job Closed

Site Reliability Engineer Intern

Credit Acceptance

Driving Possibility

DevOps Engineer101 days ago

Other RemoteTeam 1,001-5,000Since 1972H1B Sponsor

Company Site LinkedIn

• Design and develop software • Write unit-tests and validate your software against acceptance criteria • Apply team coding, documenting, and testing standards • Participate in code reviews and communicate application changes • Document code and projects so others can easily understand, maintain and support • Debug the problems which arise in production • Read and write design documents • Contribute to team's sprint commitments and actively participate in our Agile practices • Learn the business process domain to better support the business

JUnit Kubernetes SOAP

View details: Site Reliability Engineer Intern

United States

$23 / hour

Apply

Job Closed

Senior OpenShift Engineer

Analytica

Data-driven consulting and technology services

DevOps Engineer101 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

Role Description Analytica is seeking a Senior OpenShift Platform Engineer to design, deploy, secure, and maintain Red Hat OpenShift container platforms in a RaaS environment. This role supports multiple clusters and works closely with developers to containerize applications, automate deployments, and ensure platform reliability and security. Key Responsibilities: - Deploy, maintain, and secure OpenShift/Kubernetes clusters - Automate infrastructure and configuration using Ansible or similar tools - Manage CI/CD pipelines supporting containerized applications - Troubleshoot cluster, networking, and performance issues - Collaborate with developers to containerize and deploy applications - Support SDN networking and persistent storage solutions (NFS, CSI) - Monitor platform health using Grafana, Zabbix, or similar tools Qualifications - Bachelor's degree in computer science or a related field - 5–8+ years of experience in platform engineering, systems engineering, or container infrastructure - Hands-on experience administering and supporting production OpenShift environments - Experience supporting enterprise or mission-critical systems - Deep expertise in Red Hat OpenShift and Kubernetes - Strong Linux system administration experience - Automation and configuration management using Ansible - Experience managing CI/CD pipelines and container delivery workflows - Understanding of SDN concepts in containerized environments - Experience with NFS and persistent storage for Kubernetes workloads - Monitoring and observability using Grafana and/or Zabbix - Experience troubleshooting complex cluster and platform issues - Must Be US Citizen with ability to obtain Public Trust Clearance Requirements - Preferred Certification: Red Hat Certified System Administrator (RHCSA) Benefits - Competitive compensation with opportunities for bonuses - Employer-paid health care - Training and development funds - 401k match Company Description Analytica is a leading consulting and information technology solutions provider to public sector organizations supporting health, civilian, and national security missions. Founded in 2009 and headquartered in Bethesda, MD, the company is an established SBA small business that has been recognized by Inc. Magazine each of the past three years as one of the 250 fastest-growing companies in the U.S. - Specializes in providing software and systems engineering, information management, analytics & visualization, agile project management, and management consulting services - Appraised by the Software Engineering Institute (SEI) at CMMI® Maturity Level 3 - ISO 9001:2008 certified provider

OpenShift Kubernetes Ansible CI/CD Linux Grafana NFS

View details: Senior OpenShift Engineer

United States

Apply

Job Closed

DevOps Engineer

papernest

Less paperwork, greater efficiency

DevOps Engineer102 days ago

Full Time RemoteTeam 501-1,000H1B No Sponsor

Company Site LinkedIn

• Designing, building, and operating secure, scalable, and cost-efficient cloud infrastructure • Improving reliability, automation, and developer experience • Cloud infrastructure & architecture Design and maintain AWS multi-account, multi-region environments • Build and evolve VPC networking (subnets, routing, peering, security groups, private endpoints) • Operate container platforms (ECS/EKS clusters, services, capacity management, scaling strategies) • Manage data platforms (RDS PostgreSQL, ElastiCache, S3, OpenSearch) • Develop and maintain CloudFormation and Terraform stacks • Standardize reusable modules and enforce best practices • Design and operate deployment pipelines (GitHub Actions, self-hosted runners) • Implement monitoring, logging, and alerting (Sentry, Opensearch, CloudWatch, metrics, alarms, log pipelines) • Troubleshoot production incidents and perform root cause analysis • Define SLOs, and improve system resilience • Analyze AWS billing and usage patterns • Manage Google, VPN and SSO access, IAM roles, policies, and least-privilege access. • Participate in on-call rotations and incident management • Build scripts and internal tools (Python, CLI automation, Airflow)

Airflow AWS DNS Linux PostgreSQL Python TCP/IP Terraform

View details: DevOps Engineer

France

Apply