Job Closed
This listing is no longer active.
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best! Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time. #LI-CL1 We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Senior Site Reliability Engineer
Location
United States
Posted
76 days ago
Salary
$120K - $125K / year
Seniority
Senior
Job Description
Senior Site Reliability Engineer
Jobgether
Role Description This role offers a unique opportunity to ensure the reliability, scalability, and performance of critical platform services in a fast-paced, technology-driven environment. The Senior Site Reliability Engineer (SRE) will combine software engineering expertise with operational excellence to automate processes, improve observability, and reduce operational risk across the platform. You will collaborate closely with development, DevOps, release engineering, and security teams to embed reliability and security best practices throughout the software lifecycle. This position emphasizes proactive problem-solving, automation, and continuous improvement while providing mentorship to peers and contributing to high-impact projects. The role is ideal for someone who thrives on solving complex technical challenges while shaping the platform’s resilience and scalability. - Define and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for critical services. - Lead capacity planning, performance tuning, design reviews, and disaster recovery exercises to validate platform resilience. - Automate infrastructure provisioning, patching, and operational tasks using Terraform, Ansible, and CI/CD pipelines to eliminate manual processes. - Partner with security teams to enforce compliance (SOC2, CIS benchmarks), implement least-privileged IAM policies, and maintain hardened, secure systems. - Serve as Tier-2 escalation during incidents, lead root cause analysis, and continuously improve incident response playbooks and on-call processes. - Identify repetitive operational tasks and implement automation or self-service modules to reduce toil and improve developer productivity. - Measure system performance, track reliability metrics, and collaborate with leadership to drive iterative improvements. Qualifications - Bachelor’s degree in Computer Science, Engineering, or related field. - Minimum of 5 years of experience in Site Reliability Engineering, DevOps, or Systems Engineering roles. - Strong experience with AWS multi-account environments, Terraform, Ansible, CI/CD tools (GitHub Actions, Bitbucket, Jenkins, AWS CodeBuild/CodePipeline), and observability platforms (New Relic, CloudWatch). - Background with containerized environments (ECS, Fargate, EKS) and resilient system architectures. - Preferred certifications: AWS DevOps Engineer or Solutions Architect, Kubernetes, or SRE/DevOps practitioner certifications. - Excellent analytical, troubleshooting, and problem-solving abilities. - Strong collaboration skills to work effectively with cross-functional teams, mentor peers, and contribute to continuous improvement. Benefits - Competitive salary range: USD $120,000 – $125,000 per year. - Day-one medical, dental, vision coverage with flexible spending options (HSA/FSA). - 401(k) with company match available from day one. - Paid sick leave, volunteer time, and parental leave options. - Employer-paid life and disability insurance. - Wellbeing on Demand program to support personal health and wellness. - Flexible work environment with remote opportunities and casual dress code. Company Description
Job Requirements
- Bachelor’s degree in Computer Science, Engineering, or related field.
- Minimum of 5 years of experience in Site Reliability Engineering, DevOps, or Systems Engineering roles.
- Strong experience with AWS multi-account environments, Terraform, Ansible, CI/CD tools (GitHub Actions, Bitbucket, Jenkins, AWS CodeBuild/CodePipeline), and observability platforms (New Relic, CloudWatch).
- Background with containerized environments (ECS, Fargate, EKS) and resilient system architectures.
- Preferred certifications: AWS DevOps Engineer or Solutions Architect, Kubernetes, or SRE/DevOps practitioner certifications.
- Excellent analytical, troubleshooting, and problem-solving abilities.
- Strong collaboration skills to work effectively with cross-functional teams, mentor peers, and contribute to continuous improvement.
Benefits
- Competitive salary range: USD $120,000 – $125,000 per year.
- Day-one medical, dental, vision coverage with flexible spending options (HSA/FSA).
- 401(k) with company match available from day one.
- Paid sick leave, volunteer time, and parental leave options.
- Employer-paid life and disability insurance.
- Wellbeing on Demand program to support personal health and wellness.
- Flexible work environment with remote opportunities and casual dress code.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Our vision for the future is based on the idea that transforming financial lives starts by giving our people the freedom to transform their own. We have a flexible work environment, and fluid career paths. We not only encourage but celebrate internal mobility. We also recognize the importance of purpose, well-being, and work-life balance. Within Empower and our communities, we work hard to create a welcoming and inclusive environment, and our associates dedicate thousands of hours to volunteering for causes that matter most to them. Chart your own path and grow your career while helping more customers achieve financial freedom. Empower Yourself. ***Applicants must be authorized to work for any employer in the U.S. We are unable to sponsor or take over sponsorship of an employment visa at this time, including CPT/OPT.*** We are seeking a Site Reliability Engineer (SRE) to own the reliability, availability, and operational excellence of our AWS-based data platform. This role is focused on applying core SRE principles — production engineering, incident management, root cause elimination, observability, automation, and capacity planning — to large-scale data infrastructure supporting EMR, EMR Serverless, Redshift, DynamoDB, and S3. You will treat data pipelines and analytics platforms as production systems, designing and enforcing SLAs/SLOs for uptime, performance, scalability, and data freshness. You will lead incident response, perform deep root cause analysis, implement durable fixes, and eliminate toil through automation and infrastructure-as-code. What you will do: - Own and improve the reliability, stability, scalability, and performance of our core data platforms and services - Provide operational support for large-scale, distributed data systems, ensuring high availability and strong SLAs - Partner closely with full-stack, data, and platform engineering teams to deliver continuous improvements - Operate and support EMR and EMR Serverless (Python/Spark) workloads and data pipelines - Support and optimize Amazon Redshift and DynamoDB in high-throughput, production environments - Design, build, and evolve monitoring, alerting, and observability frameworks with a focus on symptoms, not just outages - Lead incident response, troubleshooting production issues across the full stack and coordinating with internal and external stakeholders - Perform root cause analysis (RCA) and readiness reviews; turn findings into durable fixes and automation - Create and maintain runbooks, SOPs, and operational documentation - Collaborate with engineering teams to optimize performance, reliability, and cost - Participate in an on-call rotation to respond to incidents impacting customer-facing systems - Recommend and influence the use of AWS managed services and architectural patterns - Continuously evaluate system performance, capacity, and cost to scale efficiently What you will bring: - 4–6 years of experience building or operating systems across multiple architecture domains: application, data, integration, infrastructure, and security - 4+ years of hands-on AWS experience, with strong production exposure to several of the following: - Redshift, DynamoDB, EMR, EMR Serverless, EC2, S3 - Lambda, Step Functions, EventBridge, RDS, IAM - Proven experience operating data platforms such as data lakes and data warehouses in production - Strong SQL skills and experience working with modern databases (e.g., Redshift, DynamoDB, Postgres, MySQL, Oracle) - 4+ years of Python experience, including scripting, automation, or data workloads - Experience with CloudWatch, infrastructure monitoring, and alerting - Hands-on experience with incident management, uptime SLAs, and customer-impacting systems - Strong understanding of Git-based workflows (GitHub, Git Flow, or similar) - Experience working in Agile environments (Scrum / Kanban) using tools such as Jira and Confluence - Bachelor’s in Computer Science, Information Systems, Data/Analytics, or related; equivalent practical experience welcomed. What will set you apart: - Experience with Terraform or other Infrastructure-as-Code tools - Exposure to Snowflake or experience supporting analytics platforms beyond Redshift - Experience in financial services or other highly regulated environments - Knowledge of DevOps and CI/CD best practices - Familiarity with observability tools such as Splunk, AppDynamics, or advanced CloudWatch usage - Comfortable working across Linux/Unix environments - Strong communication skills during incident response with both technical and non-technical stakeholders - Security-minded approach to building secure, reliable, and durable systems - Willingness to support occasional off-hours or weekend incidents as part of on-call responsibilities - Streaming/event pipelines (Kafka/Kinesis), CDC patterns, and backfill strategies. - Experience with OpenLineage/Marquez and catalog integrations (Collibra/Alation/Purview). - Prior FinOps or capacity-planning ownership for data platforms. - Familiarity with BI semantic layers and contract enforcement at consumption (Looker/Power BI/Tableau). Work conditions Participate in an on-call rotation; occasional change windows outside business hours to support safe releases and resiliency drills. This job description is not intended to be an exhaustive list of all duties, responsibilities and qualifications of the job. The employer has the right to revise this job description at any time. You will be evaluated in part based on your performance of the responsibilities and/or tasks listed in this job description. You may be required perform other duties that are not included on this job description. The job description is not a contract for employment, and either you or the employer may terminate employment at any time, for any reason. What we offer you We offer an array of diverse and inclusive benefits regardless of where you are in your career. We believe that providing our employees with the means to lead healthy balanced lives results in the best possible work performance. - Medical, dental, vision and life insurance - Retirement savings – 401(k) plan with generous company matching contributions (up to 6%), financial advisory services, potential company discretionary contribution, and a broad investment lineup - Tuition reimbursement up to $5,250/year - Business-casual environment that includes the option to wear jeans - Generous paid time off upon hire – including a paid time off program plus ten paid company holidays and three floating holidays each calendar year - Paid volunteer time — 16 hours per calendar year - Leave of absence programs – including paid parental leave, paid short- and long-term disability, and Family and Medical Leave (FMLA) - Business Resource Groups (BRGs) – BRGs facilitate inclusion and collaboration across our business internally and throughout the communities where we live, work and play. BRGs are open to all. Base Salary Range $87,400.00 - $123,400.00 The salary range above shows the typical minimum to maximum base salary range for this position in the location listed. Non-sales positions have the opportunity to participate in a bonus program. Sales positions are eligible for sales incentives, and in some instances a bonus plan, whereby total compensation may far exceed base salary depending on individual performance. Actual compensation offered may vary from posted hiring range based upon geographic location, work experience, education, licensure requirements and/or skill level and will be finalized at the time of offer. Equal opportunity employer • Drug-free workplace We are an equal opportunity employer with a commitment to diversity. All individuals, regardless of personal characteristics, are encouraged to apply. All qualified applicants will receive consideration for employment without regard to age (40 and over), race, color, national origin, ancestry, sex, sexual orientation, gender, gender identity, gender expression, marital status, pregnancy, religion, physical or mental disability, military or veteran status, genetic information, or any other status protected by applicable state or local law. ***For remote and hybrid positions you will be required to provide reliable high-speed internet with a wired connection as well as a place in your home to work with limited disruption. You must have reliable connectivity from an internet service provider that is fiber, cable or DSL internet. Other necessary computer equipment, will be provided. You may be required to work in the office if you do not have an adequate home work environment and the required internet connection.*** Job Posting End Date at 12:01 am on: 03-20-2026 Want the latest money news and views shaping how we live, work and play? Sign up for Empower’s free newsletter and check out The Currency.
DevOps Automation Engineer - Web Content Management (WCM)
Peraton CorporationPeraton Corporation, a national security company headquartered in Herndon, Virginia, supplies solutions for mission-critical programs and systems. Founded in 2017, Peraton's missio
Responsibilities Peraton is looking for a DevOps Automation Engineer to support a team of engineers and is responsible for the design, implementation, and operations and maintenance (O&M) of the multi-tenant Web Content Management as a Service (WCMaaS) platform. This includes managing all aspects of the underlying infrastructure, such as servers and storage. The Engineer will work to continuously enhance the platform, leveraging secure, scalable, cost-effective, and operationally sustainable cloud solutions. Aside from technical qualifications, applicants should have effective communication skills, both written and verbal. This role requires deep expertise in cloud services, infrastructure automation, DevSecOps, open-source ecosystems, enterprise solution design, and software and infrastructure engineering practices. The ideal candidate brings strong technical leadership, the ability to engage with a diverse set of stakeholders, and the ability to adapt solutions to evolving requirements and priorities. Location: Remote (but must reside and perform all work within the United States) Work Hours: This position requires working online from 8:00 AM to 5:00 PM Eastern Day to Day Roles and Responsibilities: - DevOps Pipeline and infrastructure automation using GitLab and Ansible - Strong Linux experience for both troubleshooting and System Administration related tasks. - Apache web platform management - Develop code using scripting and programming language - Investigate and resolve issues across existing pipelines and infrastructure - Perform root cause analysis to address underlying IaC issues and provide solutions to prevent recurrence. - AWS Infrastructure Management - Ensure that all tenants’ AWS resources are secure, FedRAMP compliant, and optimized for performance - Collaborate with the Architecture team to implement solutions that align with best practices for AWS cloud infrastructure - Adhere to Change Management procedures - Collaboration and Knowledge Sharing - Collaborate with other team engineers to resolve development issues/incidents and implement improvements - Document solution designs, process procedures, and lessons learned to enhance team knowledge Qualifications Basic Qualifications: - Bachelors degree and 5 years experience or a Masters degree and 3 years experience or an Associates degree and 7 years experience or a High School diploma and 9 years experience. - Must be a U.S. Citizen and have the ability to obtain/maintain a DHS Public Trust clearance. - 5+ years of experience in cloud services and infrastructure. - 3+ years of extensive hands-on experience with automation involving a wide range of AWS services including but not limited to EC2 instances ASG’s, Lambdas, and other services - Required Certification: - AWS Cloud Practitioner - Required pipeline and infrastructure automation experience: - Pipeline Orchestration tool experience required, with GitLab preferred - Config as Code is required, with Ansible preferred - Python programming language required - Extensive knowledge and understanding of AWS GovCloud and deploying in Government environments. - Exemplary communication, analytical skills, and technical knowledge across the client environment. - Ability to produce concise and clear technical documentation. Preferred Qualifications: - Preferred Certifications: - RHSCA/RHSCE - Any AWS associate level certification (SysOps, Developer, Solution Architect) Peraton Overview Peraton is a next-generation national security company that drives missions of consequence spanning the globe and extending to the farthest reaches of the galaxy. As the world’s leading mission capability integrator and transformative enterprise IT provider, we deliver trusted, highly differentiated solutions and technologies to protect our nation and allies. Peraton operates at the critical nexus between traditional and nontraditional threats across all domains: land, sea, space, air, and cyberspace. The company serves as a valued partner to essential government agencies and supports every branch of the U.S. armed forces. Each day, our employees do the can’t be done by solving the most daunting challenges facing our customers. Visit peraton.com to learn how we’re keeping people around the world safe and secure. Target Salary Range $80,000 - $128,000. This represents the typical salary range for this position. Salary is determined by various factors, including but not limited to, the scope and responsibilities of the position, the individual’s experience, education, knowledge, skills, and competencies, as well as geographic location and business and contract considerations. Depending on the position, employees may be eligible for overtime, shift differential, and a discretionary bonus in addition to base pay. EEO EEO: Equal opportunity employer, including disability and protected veterans, or other characteristics protected by law.
DevOps Platform Engineer – US Remote
Goldstone Partners, Inc.Strategic Talent Scouts | Executive Search Consultants
Company Description The Regis Company leverages its learning technology platform to create experiential learning programs for some of the world's largest Fortune 500 organizations. Headquartered in beautiful downtown Golden, Colorado, we deliver products with global reach. To date, we've empowered over 1.2 million learners across six continents and earned more than 50 awards, including Best Advance in Leadership Simulation Tools, Excellence in Executive Education, Best Advances in Gaming or Simulation Technology, and so much more. If you're passionate about education and technology, this might be the role for you! Job Description As the newest member of our engineering team, you will own the infrastructure and delivery systems that power our SimGate™ SaaS platform. Our engineering foundation is strong, and your goal will be to modernize our CI/CD pipelines, strengthen platform reliability, and improve observability across the environment. You bring 5+ years of DevOps, Platform, or Site Reliability Engineering experience along with strong cloud and Kubernetes expertise. You enjoy making systems more resilient, improving developer workflows, and ensuring software moves from code to production safely and efficiently. If you’re excited about helping scale a platform used by global organizations, let’s talk! What your days will look like: - Owning and evolving our CI/CD pipelines — modernizing workflows, improving reliability, and streamlining the path from code to production - Managing and scaling our Kubernetes-based cloud infrastructure, including provisioning, cost optimization, and operational stability - Building monitoring dashboards and alerts so we know about issues before our customers do - Participating in incident response and leading post-incident reviews to identify root causes and implement improvements - Embedding security best practices into the delivery pipeline, including dependency scanning, secrets management, and automated safeguards - Working closely with engineering teams to improve deployment confidence, platform stability, and overall developer experience - Promoting a culture of continuous improvement in all that you do! Qualifications Show us your - Bachelor’s degree in computer science or equivalent experience supporting production SaaS platforms and 5+ years in DevOps, Platform or Site Reliability roles - A track record of improving platform reliability, scalability, and engineering productivity - Strong CI/CD expertise using platforms such as GitHub Actions, GitLab CI, or similar tools - Hands-on experience with Kubernetes deploying and managing containerized workloads - Proficiency with at least one major cloud provider, including AWS, Azure, or GCP and Infrastructure-as-code (IaC) capabilities using Terraform, Pulumi, or similar tools - Solid Linux fundamentals along with scripting ability in Bash, Python, or similar languages - Working knowledge of monitoring and observability platforms such as Datadog, Dynatrace, Grafana, or Prometheus - Familiarity with security and compliance frameworks such as SOC 2 or ISO standards - Curiosity, strong problem-solving ability, and a mindset focused on continuous improvement - Personal ethic that embodies the ideals of diversity and inclusion Additional Information Working with us you’ll enjoy: - Salary $120,000 to $135,000 plus bonus and an impressive suite of benefits - Ample time off to maintain balance - An amazing team of brilliant professionals to spend your days with Goldstone Partners is helping this highly successful company find talented professionals who want to contribute to the development of world-class leaders. Applications welcome for those who are US Citizens or hold a Green Card. Principals only, please. - Compensation: USD 120000 - USD 130000 - yearly
• Deploy and maintain critical applications on cloud-native microservices architecture • Implement automation, effective monitoring, and infrastructure-as-code • Deploy and maintain CI/CD pipelines across multiple environments • Support and work alongside a cross-functional engineering team on the latest technologies • Iterate on best practices to increase the quality & velocity of deployments • Have on-call responsibilities in rotation with the engineering team • Increase the sophistication of our alerting and escalation mechanisms • Help increase system performance with a focus on high availability and scalability • Propose, scope, design, and implement various infrastructure architectures • Develop and maintain solutions for operational administration, system/data backup, disaster recovery, and security/performance monitoring • Continuously evaluate existing systems with industry standards, and make recommendations for improvement • Perform root cause analysis for production errors • Continue to keep the lights on (day-to-day administration)



