Tecnologia e Inovação para revolucionar seu negócio
Senior Site Reliability Engineer
Location
Brazil
Posted
31 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer
Darede
• **Incident Leadership:** • Act as Incident Response Lead in War Rooms, coordinating technical remediation and communication with stakeholders. • **Observability Engineering:** • Design and evolve telemetry in Datadog (Logs, APM, Traces and business metrics) to reduce MTTD and the team's cognitive load. • **Workload Management on AWS Amplify:** • Ensure the resilience and scalability of hosted front-end applications and critical APIs. • **SRE Governance:** • Define and monitor SLIs, SLOs and SLAs, managing the Error Budget to balance delivery speed with stability. • **Mitigation Automation:** • Develop auto-healing tools and scripts (automatic rollback, controlled restart, component isolation). • **Root Cause Analysis:** • Lead blameless post-mortem processes and ensure the implementation of structural improvements to prevent recurrence. • **Systems Modernization:** • Work with development teams to implement resilience patterns (Circuit Breakers, Bulkheads and Rate Limiting) in both modern architectures and legacy systems. • **AI in Operations:** • Implement anomaly detection and intelligent response solutions using AIOps (Datadog Bits AI or AWS DevOps Agent).
Job Requirements
- Proven Seniority in SRE or DevOps:** Solid experience in high-scale, mission-critical environments.
- Deep AWS Expertise:** Advanced experience with EC2, RDS, S3, IAM, EKS and Amplify.
- Observability Tools:** Strong experience in monitoring, logging and APM (preferably using Datadog).
- Containers & Orchestration:** Strong knowledge of Docker and Kubernetes (EKS/GKE).
- Infrastructure as Code (IaC):** Proficiency in Terraform.
- Development/Scripting:** Proficient in Python, Go or Shell scripting for automation.
- Incident Management:** Real experience with on-call rotations and real-time problem resolution.
- Plus / Nice-to-haves:**
- Analytical Profile for Legacy Systems:** Experience troubleshooting .NET Framework applications and Oracle or PostgreSQL databases.
- Chaos Engineering:** Experience executing controlled stress and resilience tests.
- Certifications:** AWS Certified DevOps Engineer - Professional or official Datadog certifications.
Benefits
- 📚 Educational Incentives (Partnerships with Educational Institutions)
- 🌴 Paid Vacation
- 🏋️ TotalPass
- 🎂 Birthday off
- 🏥 Health Insurance
- 🦷 Dental Insurance
- 🤰 Maternity Leave
- 👨👩👧👦 Paternity Leave
- 🌟 Reimbursement for AWS Certifications
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior DevOps Engineer
Akido LabsRedesigning healthcare with AI at the core. Our mission is to make exceptional healthcare universal.
• Design, provision, and maintain cloud infrastructure on AWS using Terraform and infrastructure-as-code best practices • Manage and evolve Kubernetes clusters, ensuring reliability, performance, and security across environments • Build and improve CI/CD pipelines to accelerate delivery and reduce deployment risk • Monitor system health, respond to incidents, and drive post-mortems to prevent recurrence • Collaborate closely with software engineers to support application deployment and platform needs • Harden infrastructure security posture, including IAM policies, network segmentation, and secrets management • Contribute to capacity planning, cost optimization, and cloud architecture decisions • Support a multi-cloud environment, including limited work in Azure • Document infrastructure, runbooks, and operational procedures to support a growing team
Senior SRE Engineer
Casas Bahia TecnologiaA Tecnologia do Grupo Casas Bahia - A dedicação nunca foi tão forte!
• Manage and evolve Cloud infrastructure practices, principles, services, standards, community, and policies. • Actively participate in project deployments involving Cloud environments. • Automate tasks that are currently performed manually (Infrastructure as Code - IaC). • Define and support cluster architectures for microservices, ensure correct instance sizing, and manage scaling and upgrade processes. • Facilitate resource configuration.
• Lead the migration of engineering platforms (IDP). • Design, implement, and evolve platform infrastructure with a focus on operational continuity, security, and resilience. • Architect and modernize existing CI/CD pipelines, promoting automation, standardization, and fast feedback cycles. • Ensure high standards of availability, reliability, and performance during transition and modernization processes. • Manage and optimize containerized environments, driving progressive migration to modern orchestrators and market-leading platforms. • Drive the adoption of Infrastructure as Code (IaC) in existing environments, promoting governance, traceability, and scalability. • Develop application templates on platforms such as Backstage or similar. • Promote DevOps and GitOps practices as the organization’s standard operating model. • Serve as a technical reference, mentoring engineers and supporting teams’ technical maturity. • Define and execute platform modernization strategies that balance cost, performance, risk, and business value.
DevOps Tech Lead
AxoniusControl complexity with Axonius. Get an always up-to-date asset inventory, uncover security gaps, and automate action.
• Define and create technological solutions • Lead design of cloud infrastructure architecture • Design networking architecture for distributed services • Lead design and development of Infrastructure as Code frameworks • Design and maintain CI automation • Lead governance of Configuration Management architecture • Architect and govern development environments • Lead design of monitoring architecture • Evaluate and introduce new DevOps tools



