Diving deep on data, AI & ML for AWS-powered business
DevOps Engineer
Location
Brazil
Posted
84 days ago
Salary
$4.5K / year
Seniority
Senior
Job Description
DevOps Engineer
Metal Toad
• Planning • Analyzing customer requirements for software components, system availability, security, and performance • Designing and documenting complete cloud hosting systems, including capacity planning software and instance type selection, allocation, and network design • Estimating the costs of the recommended system design • Building systems by executing installation, configuration, and testing of cloud resources • Using automation and configuration management to ensure repeatability and traceability of changes • Troubleshooting system hardware, software, networks, and operating systems • Protecting the integrity and security of systems through proper use of controls and monitoring tools • Maintaining system performance through system monitoring and analysis, performance tuning, and planning for future growth • Designing and running load and stress tests, documenting outcomes, and debugging infrastructure issues • Maintaining internal systems and customer deployment documentation • Partnering with project managers, technical consultants, software architects, and developers to validate infrastructure deliverables against the requirements • Responding to support tickets and incidents in a timely manner that corresponds to SLA commitments. • Contributing to the definition of best practices, operational policies, and procedures • Establishing, documenting, and testing disaster recovery procedures
Job Requirements
- Advanced to fluent English communication skills
- 4+ years of experience with Amazon Web Services (AWS)
- Experience with other cloud providers is a bonus
- Experienced in Linux and/or Windows Systems administration (at least one required)
- Scripting (bash shell and Python preferred, PowerShell acceptable)
- Knowledge of TCP/IP networking and HTTP protocols
- Experience with web accelerators, load balancers, reverse proxies, and CDNs
- Problem solver and willing to work in an agile/fast-paced environment
- Customer-oriented with good communication skills
- Willing to participate in a 24/7 on-call rotation with approximately one shift per month compensated.
- AWS Certifications or be willing to get certified
- Interest in Generative AI technologies
Benefits
- 24/7 support
- Remote work options
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
MLOps Engineer – Systems, DevOps, MLOps, Cybersecurity
XideralCode is borderless! Click below to learn more about our international services
• Construir, desplegar y mantener pipelines de ML en modelos de machine learning listos para producción • Procesar enormes flujos de datos para flujos de trabajo de ML escalables • Definir y desarrollar APIs y servidores MCP para apoyar soluciones de ML • Trabajar en colaboración con Científicos de Datos, Ingenieros de Datos, y ML Engineers para coordinar pipelines, mantenimiento y entrenamiento en producción • Procesar y gestionar conjuntos de datos estructurados y no estructurados a gran escala • Aplicar conocimiento empresarial para analizar datos, generar insights y resolver problemas complejos • Realizar análisis de datos ad-hoc basado en necesidades empresariales • Participar en el análisis de problemas y resolución relacionados con el flujo y contenido de datos con partes interesadas • Establecer relaciones sólidas con clientes y equipos internos, asegurando alta satisfacción del cliente • Promover mejores prácticas, innovación y mejora continua en los procesos de MLOps
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description We are hiring for a highly experienced Senior Staff SRE Engineer to act as a senior technical authority within our reliability function. This is a deeply hands-on individual contributor role, to build and operate SRE practices at scale. You will: - Design and evolve resilient infrastructure - Drive reliability across multiple engineering streams - Ensure our AI-driven products operate with high availability, performance, and security - Work across platform, product, data, and ML teams - Help productionise models and standardise customer environments - Strengthen Kubernetes-based architecture - Mature our CI/CD pipelines end-to-end - Collaborate with other Staff engineers and Architects to shape the global product architect and technology vision Responsibilities - Architect, deploy, and operate scalable, secure production environments (AWS preferred) - Lead reliability improvements across multiple engineering streams - Design and evolve Kubernetes-based infrastructure, including migration and optimisation initiatives - Build and enforce strong Infrastructure-as-Code standards - Define and operationalise SLIs, SLOs, and error budgets - Strengthen observability across applications, infrastructure, data pipelines, and ML systems - Work closely with product and data teams to integrate model analytics and product telemetry into reliability insights - Work across and optimise the entire CI/CD pipeline, from build to deploy to rollback - Improve release safety, deployment frequency, and predictability of SLAs - Lead incident response for complex cross-system failures and drive postmortems - Reduce operational toil through automation and platform engineering improvements - Design processes and tooling to absorb, standardise, and troubleshoot customer environments - Support and productionise ML workloads (MLOps practices including model deployment, monitoring, retraining workflows) - Ensure infrastructure aligns with enterprise-grade security and regulatory requirements - Mentor engineers and raise the overall reliability bar across teams Qualifications - Extensive hands-on experience in SRE or Production Engineering roles - Demonstrated experience building or scaling SRE practices in high-growth or complex environments - Deep expertise in AWS or Azure-based cloud infrastructure - Strong experience with Kubernetes (including migration, scaling, and production hardening) - Advanced Infrastructure-as-Code experience (Terraform or equivalent) - End-to-end CI/CD pipeline design and optimisation experience - Strong experience with observability tooling across distributed systems - Experience troubleshooting complex multi-tenant or customer-hosted environments - Experience supporting production data platforms and ML systems - MLOps experience, including model deployment and monitoring - Strong understanding of distributed systems, scalability, and fault tolerance - Systems thinker who understands interactions across infrastructure, product, data, and ML - Excellent communication skills and ability to work cross-functionally Preferred Experience - Experience in large-scale global B2B/B2C products - Experience working with AI/ML systems, NLP, or LLM-based products - Experience integrating product analytics and model performance metrics into operational monitoring - Background in enterprise environments with strong security and compliance requirements - Experience implementing regulatory controls within cloud infrastructure - Experience scaling infrastructure during rapid growth phases - Experience evaluating infrastructure tooling and vendors - Experience in collaborating with large scale enterprise customers to deploy and operate environments within their accounts and VPCs Personal Characteristics - Strong problem solver who anticipates failure modes - High ownership mentality and accountability - Comfortable working across streams and influencing without formal authority - Learning-oriented with a drive for continuous improvement
SRE, Kubernetes
Logicalis SpainSomos Arquitectos Del Cambio, ayudamos a las organizaciones a tener éxito en un mundo cada vez más digitalizado.
• Administrar y mantener plataformas de contenedores (Kubernetes, OpenShift, AKS, EKS). • Operar y garantizar la disponibilidad, estabilidad y rendimiento de entornos híbridos (cloud pública y on-premise). • Automatizar tareas de operación e infraestructura utilizando Helm y GitOps. • Ejecutar despliegues, actualizaciones, backups y troubleshooting en los distintos entornos. • Mantener y optimizar pipelines CI/CD (Jenkins, ArgoCD, etc.). • Gestionar configuraciones y control de versiones (Git). • Monitorizar el estado de los sistemas y colaborar estrechamente con equipos de desarrollo y operaciones. • Asegurar el cumplimiento de buenas prácticas de operación, seguridad y documentación.
Senior DevOps Engineer
Imaginary CloudSoftware Development & UX/UI Design | Awarded Best Workplace Europe, Best Quality of Life & 2nd Best Workplace Portugal
• Work with multidisciplinary teams on innovative projects • Engage in development, problem-solving, management, and human interaction • Drive innovation by creating innovative projects for top companies • Daily tasks include cloud management, architecture, and client interaction as required.



