Job Closed
This listing is no longer active.
CEA is the exclusive distributor of JCB, Atlas Copco, Ditch Witch, & Dynapac equipment.
SRE Specialist
Location
Brazil
Posted
79 days ago
Salary
0
Seniority
Senior
Job Description
SRE Specialist
CEA
• Management and governance of cloud environments on the AWS platform. • Management of the Kubernetes environment (OpenShift). • Automation of server provisioning with Terraform. • Support for test automation and continuous integration. • Administration of Linux servers.
Job Requirements
- Ability to lead mission-critical strategic projects.
- Advanced knowledge of the Linux operating system.
- Knowledge of API Gateway.
- Knowledge of message queue systems.
- Cloud WAF (Cloudflare).
- APM (Application Performance Monitoring).
- FinOps best practices.
- Scripting in Shell and Python.
- Experience with AWS cloud infrastructure following best practices (Well-Architected, Landing Zone).
- Experience with infrastructure resources (EC2, VPC, S3, EKS, Route 53, SNS, SQS, API Gateway and Lambda).
- Experience with automation (Terraform).
- Agile methodologies.
- Strong ownership mindset and collaborative attitude are required.
- Experience working with containers and major platforms (Docker, Kubernetes, EKS, OpenShift).
- Preferred: Certifications such as AWS Certified Solutions Architect – Associate; CKA – Certified Kubernetes Administrator; Red Hat Certified Specialist in OpenShift Administration.
Benefits
- Medical and dental insurance (employee and dependents).
- Dr. C&A - Telemedicine and teletherapy services.
- Annual bonus.
- Parking or commuter allowance (Work location: Alphaville – Barueri/SP).
- Birthday off — one paid day off during your birthday month.
- Flexible working hours.
- On-site cafeteria.
- Flexible meal benefit (food allowance and/or meal vouchers).
- Gympass.
- Semi-annual vacation.
- Employee discount at C&A stores and e-commerce.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Help build and maintain cloud infrastructure and applications that powers Legal AI platform • Collaborate with engineering teams for monitoring, incident response, and deployment strategies • Ensure high availability and reliability of proprietary models and services • Standardise and implement observability practices in service-based architecture • Design, deploy, and operate infrastructure to support product teams • Add automation around manual operational tasks • Participate in and improve on-call and incident handling processes
• Help build and maintain cloud infrastructure and applications for our Legal AI platform • Collaborate with engineering teams to establish monitoring, incident response, and deployment strategies • Ensure high availability and reliability of our proprietary models and services • Standardise and implement observability practices through logging, traces, metrics, and monitors • Design, deploy, and operate infrastructure to support product teams as we expand into new regions • Add automation around manual operational tasks • Participate in and improve on-call and incident handling processes to ensure 24/7 system reliability
Senior – Principal Site Reliability Engineer
DataCrunchPremium dedicated GPU servers and clusters. Raw performance at an unmatched price.
• Ensure the reliability, scalability, and performance of HPC and cloud systems. • Build and maintain automation, observability, and monitoring frameworks for compute clusters. • Collaborate with ML, data, and infrastructure teams to deliver high-availability systems. • Develop and enhance CI/CD pipelines, deployment workflows, and on-call processes. • Participate in architecture design and long-term infrastructure strategy discussions. • Participate in a 24/7 on-call rotation, with at least one full on-call week per month.
• Design, build and operate our AWS- and Kubernetes-based platform • Own one or more areas and act as the go-to person in the team • Operate production AWS environments and Kubernetes clusters • Maintain observability stack: Metrics, Logs, Traces, Instrumentation • Define SLOs, dashboards and alerting for teams • Work on Kubernetes networking, Ingress controllers and traffic routing • Build and maintain Terraform modules for AWS and Kubernetes • Support connectivity between cloud and on-prem systems • Participate in design reviews, incident reviews and on-call.



