Job Closed

This listing is no longer active.

AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.

Site Reliability Engineer

DevOps EngineerDevOps EngineerFull Time Remote Mid LevelTeam 1,001-5,000

Location

Brazil

Posted

66 days ago

Salary

Seniority

Mid Level

AWS Kubernetes Terraform Git GitLab Amazon EKS Grafana Linux

Job Description

Role Description As a Site Reliability Engineer (SRE), you will drive the reliability, scalability, and performance of cloud-native systems, enabling engineering teams to deliver with confidence. Working across AWS, Kubernetes, and modern DevOps practices, you’ll automate infrastructure, enhance observability, and support seamless deployments. This role offers strong ownership and cross-team collaboration, with the opportunity to shape SRE and DevSecOps practices while improving system resilience at scale. What You Will Do - Design, build, and deploy solutions to improve system reliability, scalability, and operational efficiency; - Build and maintain CI/CD pipelines and deployment automation; - Work with product teams to support application deployments and infrastructure requirements; - Improve system reliability through root cause analysis, post-mortems, and automation; - Implement and maintain monitoring, logging, and alerting systems; - Support security scanning and DevSecOps practices; - Automate operational and support tasks to reduce manual work; - Assist support teams in troubleshooting infrastructure and deployment issues; - Promote SRE and DevSecOps best practices across engineering teams; - Provide after-hours support when necessary for critical incidents. Qualifications - 8+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering; - Strong experience with AWS cloud infrastructure and architecture; - Strong experience with Infrastructure as Code using Terraform or AWS CloudFormation; - Experience building and maintaining CI/CD pipelines; - Experience with Git and GitLab in multi-team environments; - Experience with containers and Kubernetes (EKS or similar); - Experience with monitoring, logging, and observability tools such as Grafana; - Strong scripting skills in Linux or Windows environments; - Strong communication skills and ability to work across engineering teams; - Upper-intermediate English level. Nice to Haves - AWS certifications; - Experience with Artifactory or artifact repositories; - Experience with DevSecOps and security scanning tools; - Experience with APM and infrastructure monitoring tools; - Experience improving automation and operational workflows; - Experience mentoring engineers or promoting best practices across teams. Benefits - Professional growth: Mentorship, TechTalks, and personalized growth roadmaps. - Competitive compensation: USD-based pay with education, fitness, and team activity budgets. - Exciting projects: Modern solutions with Fortune 500 and top product companies. - Flextime: Flexible schedule with remote and office options.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Site Reliability Engineer

RingCentral

A leading provider of AI-driven cloud business communications, contact center, video, and hybrid event solutions.

DevOps Engineer66 days ago

Full Time RemoteTeam 5,001-10,000Since 1999H1B Sponsor

Company Site LinkedIn

• Operate and maintain Linux based telephony and platform services in production • Troubleshoot SIP signaling and RTP media flows, including call routing, provisioning, registration, and signaling behavior • Diagnose issues across the network stack affecting realtime voice and media traffic, including analysis of packet and signaling flows • Deploy and administer Kubernetes services using Helm and GitOps workflows (e.g. CI/CD, FluxCD), including tracing and debugging configuration through layered rendering pipelines • Manage stateful and pinned workloads, including understanding of Kubernetes scheduling primitives such as taints, tolerations, and node affinity • Monitor systems and participate in on-call incident response for production infrastructure • Implement production changes using testing, rollback planning, and risk mitigation practices • Contribute to automation, observability, and operational tooling improvements • Coordinate with infrastructure, network, storage, and platform teams to resolve cross-domain issues and maintain highly available global services

AWS Amazon EC2 GCP Grafana Kubernetes Linux Node.js Prometheus Python Shell TCP/IP Unix VoIP

View details: Site Reliability Engineer

Colorado

$94.9K - $135.5K / year

Apply

Job Closed

Senior Site Reliability Engineer – Metrics and Observability

CVS Health

Bringing our heart to every moment of your health.

DevOps Engineer66 days ago

Full Time RemoteTeam 10,001+Since 1963H1B No Sponsor

Company Site LinkedIn

• Define, implement, and maintain key performance metrics, SLOs, and SLIs to measure system reliability and performance • Manage error budgets effectively, collaborating with development teams to balance reliability and feature delivery • Design and implement comprehensive monitoring solutions to provide real-time visibility into system health • Develop and implement automated quality gates that ensure all releases meet defined reliability and performance standards • Assist in incident response efforts by providing insights from metrics and monitoring tools • Drive initiatives to enhance monitoring, observability, and reliability practices

AWS Azure Docker GCP Grafana Kubernetes Prometheus

View details: Senior Site Reliability Engineer – Metrics and Observability

Louisiana + 4 more

$83.4K - $203.9K / year

Apply

Job Closed

Principal DevOps Engineer

Perforce Software

The DevOps Edge for the Outperformers: Enable teams to build, manage & maintain apps — from code to business-ready.

DevOps Engineer66 days ago

Full Time RemoteTeam 1,001-5,000Since 1995H1B Sponsor

Company Site LinkedIn

• Responsible for building platforms and frameworks to create consistent, verifiable, and automatic management of applications and infrastructure between non-production and production environments • Mentored exceptional engineers and DevOps developers on Cloud technology and practice. • Implement application security best practices throughout the agile SDLC • Foster and advocate for a DevOps culture at Perforce to ensure efficient testing, delivery, and deployment of all software artifacts • Lead the development and enhancements of our CI/CD pipeline infrastructure/tools. • Establish technical design principles and practices and drive them across all product portfolios to make operation design a must-have phase of the development lifecycle • Your daily tasks will include developing a technical design for our cloud platform, developing a framework, and working with • Dev and DevOps teams to ensure that the product meets our quality standards.

AWS Azure Docker GCP Java Kubernetes Puppet Python Ruby SDLC Terraform

View details: Principal DevOps Engineer

Massachusetts

$120K - $150K / year

Apply

Job Closed

Senior Autonomy Release Engineer

May Mobility

Transforming cities through autonomous technology to create a safer, greener, more accessible world.

DevOps Engineer66 days ago

Full Time RemoteTeam 51-200Since 2017H1B Sponsor

Company Site LinkedIn

• Release ownership and release execution end-to-end across: • Major autonomy releases • Incremental/performance releases • Hotfix/safety patches • Manage branching strategy, versioning, and release cut processes • Drive release readiness and go/no-go decisions • Partner cross-functionally with Autonomy, Infra, Validation, and Fleet Ops • Act as a system owner for release readiness • Investigate and resolve complex issues arising from: • Software/hardware interactions • Distributed systems behavior • On-vehicle vs simulation discrepancies • Develop deep understanding of: • Sensor stack, middleware, autonomy stack • Compute platforms, networking, configurations • Be the go-to for: • “Why does this fail in the real world?” • “What changed between releases?” • Enforce stage-gated release framework: • Feature Complete → Code Freeze → Validation → Release Candidate • Integrate validation signals: • Simulation corpus results • Regression testing • Vehicle testing (HIL / on-road) • Ensure safety-critical issues are identified, tracked, and gated • Take initiative to find and permanently solve challenging system level issues caused by the interplay between different software and hardware components. • Collaborate and lead system-wide improvements when working with other teams without having direct ownership or management responsibility. • Assess and develop approaches that scale and improve performance in a variety of ways (e.g. CPU performance, memory usage, disk usage, network usage).

Distributed Systems Python

View details: Senior Autonomy Release Engineer

United States

$176K - $253K / year

Apply