To empower the heroes and scale-ups that grow the economy
Engineering Manager – SRE & DevXP
Location
Brazil
Posted
4 days ago
Salary
0
Seniority
Senior
Job Description
Engineering Manager – SRE & DevXP
RD Station
• Ensure the reliability, scalability, and performance of all platform services. • Lead the technical evolution of the Engineering Platform to optimize the software lifecycle and delivery speed (Developer Experience - DevXP). • Define the strategic and operational vision for SRE practices and the Internal Developer Platform (IDP). • Collaborate with product and architecture teams to deliver resilient, secure, and efficient solutions. • Establish and ensure adherence to Service Level Objectives (SLOs) and reliability and observability KPIs, driving continuous improvement and reducing MTTR. • Lead the rollout and adoption of the Internal Developer Platform (IDP) and Golden Paths to optimize developer workflows. • Develop and execute an annual strategic roadmap for the Engineering Platform aligned with product and architecture goals. • Increase automation, reduce toil, and optimize infrastructure costs (Cloud FinOps). • Develop and mentor the SRE and Platform engineering team, fostering a culture of ownership and technical excellence.
Job Requirements
- Proven experience leading software engineering teams with a focus on SRE, Platform, or Cloud.
- Strong background in building and adopting Internal Developer Platforms (IDP) and Developer Experience (DevXP) practices.
- Deep knowledge of SRE practices, observability, Cloud Computing (GCP or AWS), Kubernetes, and automation (CI/CD, IaC).
- Experience operating at high scale and with distributed architectures.
- Ability to manage multidisciplinary technical teams, with a focus on mentoring and talent development.
- Experience with Cloud FinOps and cost optimization practices is a plus.
- Postgraduate degree or relevant certifications in Cloud (GCP/AWS) or SRE are a plus.
- Familiarity with IDP tools such as Backstage and GitOps is a plus.
- Familiarity with infrastructure security practices and DevSecOps is a plus.
- Product-oriented mindset applied to platform development and strong executive and technical communication skills are a plus.
Benefits
- Holistic Well-being: We take care of the people who drive our evolution. We support the overall well-being of each team member through programs and benefits that enable self-care across five pillars: Emotional, Financial, Physical, Occupational, and Social.
- Diversity and Belonging: Diversity is what makes us powerful. We actively promote inclusion and belonging, ensuring TOTVS is a place where you can be yourself. Our expertise is human and alive: we embrace differences to empower businesses inside and outside the company.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Role Description - Operate and maintain on-premise infrastructure environments across DEV, TEST, STAGING, UAT, and PROD. - Ensure network zoning and environment segregation in line with External / DMZ / Internal architecture. - Configure and support Web Application Firewalls (WAFs) and controlled traffic flows between zones. - Operate and maintain External and Internal API Gateways. - Support enterprise integrations via the Software AG integration platform. - Operate identity and access management infrastructure, including miniOrange IdP, MFA, and OIDC integrations. - Design, maintain, and operate CI/CD pipelines using Azure DevOps, including secure release promotion. - Implement and operate Secure SDLC controls (SAST, SCA, DAST). - Implement and maintain monitoring, logging, and audit capabilities (Prometheus, Grafana, Graylog, Sentry, SIEM forwarding). - Support backup, replication, and disaster recovery activities, including DR testing. Qualifications - 5+ years of experience in DevOps or Infrastructure Engineering. - Experience with on-premise infrastructure deployment and operations in enterprise environments. - Experience managing multiple isolated environments (DEV, TEST, STAGING, UAT, PROD). - Knowledge of network security zoning architectures (External / DMZ / Internal). - Hands-on experience working with Web Application Firewalls (WAFs). - Experience configuring and supporting External and Internal API Gateways. - Experience working with enterprise integration / ESB platforms (Software AG). - Experience integrating and operating Identity Providers (IdP) (miniOrange). - Knowledge of OIDC / OAuth2 authentication flows and Multi-Factor Authentication (MFA). - Experience with TLS / mTLS secure communication and PKI-based certificate management. - Hands-on experience with CI/CD pipelines using Azure DevOps. - Experience implementing Secure SDLC practices, including SAST, SCA, and DAST. - Knowledge of threat modeling techniques, specifically STRIDE. - Experience with monitoring and observability tools (Prometheus, Grafana). - Experience with centralized logging and application monitoring (Graylog, Sentry, SIEM integration). - Experience supporting backup, replication, and disaster recovery processes, including DR testing.
DevOps Reliability Engineer
Advanced Solutions International, Inc.We help people achieve great things though innovative solutions.
• Monitor and improve the health, availability, performance, and cost efficiency of Azure-based production systems. • Use application, database, and infrastructure telemetry to identify performance issues, bottlenecks, and reliability risks. • Tune Azure services and platform configurations to maximize performance, resilience, and resource efficiency. • Partner with engineering teams to recommend and implement practical, data-driven improvements to reliability, scalability, and operational effectiveness. • Create and maintain operational documentation, runbooks, and troubleshooting guides to support consistent incident response and ongoing operations. • Support Tech Support and Sustained Engineering by executing approved SQL queries and completing database backups and restores for troubleshooting purposes. • Analyze how partner integrations and customer usage patterns impact system performance and cloud spend. • Investigate complex production issues, perform root cause analysis, and drive resolution of reliability and performance problems. • Contribute to continuous improvement across deployment processes, system stability, and operational readiness. • Perform other job-related duties and responsibilities as assigned.
DevOps Reliability Engineer
Advanced Solutions International, Inc.We help people achieve great things though innovative solutions.
• Monitor and improve the health, availability, performance, and cost efficiency of Azure-based production systems. • Use application, database, and infrastructure telemetry to identify performance issues, bottlenecks, and reliability risks. • Tune Azure services and platform configurations to maximize performance, resilience, and resource efficiency. • Partner with engineering teams to recommend and implement practical, data-driven improvements to reliability, scalability, and operational effectiveness. • Create and maintain operational documentation, runbooks, and troubleshooting guides to support consistent incident response and ongoing operations. • Support Tech Support and Sustained Engineering by executing approved SQL queries and completing database backups and restores for troubleshooting purposes. • Analyze how partner integrations and customer usage patterns impact system performance and cloud spend. • Investigate complex production issues, perform root cause analysis, and drive resolution of reliability and performance problems. • Contribute to continuous improvement across deployment processes, system stability, and operational readiness. • Perform other job-related duties and responsibilities as assigned.
• Co-liderar el diseño y la operación de la plataforma cloud de Tenpo. • Asumir el ownership de los proyectos de infraestructura en GCP. • Diseñar y ejecutar estándares de seguridad globales. • Asegurar la automatización y estabilidad de las plataformas críticas de Data en GCP. • Diseñar, implementar y mantener la arquitectura de networking en GCP. • Administrar la infraestructura de manera efectiva en costo y plazos. • Diseñar, construir y mantener pipelines de CI/CD. • Implementar y mantener la infraestructura como código utilizando Terraform.



