Job Closed
This listing is no longer active.
SRE/DevOps Professional, AWS
Location
Brazil
Posted
54 days ago
Salary
0
Seniority
Senior
Job Description
SRE/DevOps Professional, AWS
Internas
• Design and build complex, mission-critical cloud architectures, ensuring security and cost optimization. • Develop and implement custom monitoring and observability solutions, including creating dashboards in DataDog. • Provide training, mentoring, and promote SRE best practices to raise the maturity of the technical community. • Participate in discussions and define standards for Infrastructure as Code (IaC), continuous integration, and automation via GitOps. • Collaborate on the continuous improvement of the development platform, offering technical feedback and suggestions for enhancement. • Serve as a technical reference, supporting squads in resolving complex cloud infrastructure and observability issues.
Job Requirements
- Strong expertise in AWS architectures (multi-account setups, isolated VPCs, security, and cost controls).
- Advanced experience with ECS, EKS, and other cloud compute services.
- Infrastructure automation using GitOps practices and tools such as Terraform.
- Development of advanced monitoring and observability solutions (DataDog, structured logging, distributed tracing).
- Implementation of disaster recovery and high-availability strategies in the cloud, including custom instrumentation for application and infrastructure observability.
Benefits
- We value the continuous growth of Zuppers, encouraging each individual to pursue paths that drive their professional development.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Manage SLAs that require sub-hour attention and respond to incidents on a daily basis • React, resolve, and escalate incidents as necessary, collaborating with the technical and business teams. • Build and implement incident processes to ensure the organization's ability to operate our connection at scale, working alongside the business and technology teams. • Take ownership, plan, and execute both planned and on-demand maintenance of our systems, collaborating with impacted and dependent technology and business teams. • Continuously improve our product, including observability and integration with internal services and processes.
Senior DevOps Engineer
DotmaticsFounded in 2005, Dotmatics is self-described as the world’s largest research and development scientific software platform, used by leading researchers in biopharma, academia, and
• Maintain and enhance our cloud-based infrastructure • Spearhead our ISO 27001 effort and maintain compliance with corporate security requirements • Work in a small yet highly effective and efficient specialised team • Maintain and operate the software build server and continuous integration pipelines for cross-platform desktop and HPC applications • Manage release processes and versioned software distribution • Support and maintain license control systems and related backend services • Manage and configure AWS services, including: EC2, RDS, S3, ECR, IAM, WAF, CloudFront, Identity, CloudTrail, Security Lake • Ensure security, scalability, and reliability of cloud infrastructure • Support operational tooling across AWS and other cloud providers • Administrator for a diverse set of services (Office, AWS, Github, Mailchimp, etc)
Database Reliability Engineer – Core Team
ClickHouseClickHouse, Inc. is a database management system that allows users to generate analytical reports using real-time SQL queries. The company’s technology works faster than traditio
• Continuously improve the reliability and performance of ClickHouse core. • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers. • Dig deeper into the most common problems encountered by customers in Clickhouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements. • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers. • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities. • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact.
Database Reliability Engineer – Core Team
ClickHouseClickHouse, Inc. is a database management system that allows users to generate analytical reports using real-time SQL queries. The company’s technology works faster than traditio
• Continuously improve the reliability and performance of ClickHouse core. • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers. • Dig deeper into the most common problems encountered by customers in Clickhouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements. • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers. • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities. • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact.


