Metal Toad logo
Metal Toad

Diving deep on data, AI & ML for AWS-powered business

DevOps Engineer

DevOps EngineerDevOps EngineerContractRemoteSeniorTeam 11-50Since 2003H1B No SponsorCompany SiteLinkedIn

Location

Brazil

Posted

84 days ago

Salary

$4.5K / year

Seniority

Senior

Bachelor Degree4 yrs expEnglishAWSLinuxPythonTCP/IP

Job Description

DevOps Engineer

Metal Toad

• Planning • Analyzing customer requirements for software components, system availability, security, and performance • Designing and documenting complete cloud hosting systems, including capacity planning software and instance type selection, allocation, and network design • Estimating the costs of the recommended system design • Building systems by executing installation, configuration, and testing of cloud resources • Using automation and configuration management to ensure repeatability and traceability of changes • Troubleshooting system hardware, software, networks, and operating systems • Protecting the integrity and security of systems through proper use of controls and monitoring tools • Maintaining system performance through system monitoring and analysis, performance tuning, and planning for future growth • Designing and running load and stress tests, documenting outcomes, and debugging infrastructure issues • Maintaining internal systems and customer deployment documentation • Partnering with project managers, technical consultants, software architects, and developers to validate infrastructure deliverables against the requirements • Responding to support tickets and incidents in a timely manner that corresponds to SLA commitments. • Contributing to the definition of best practices, operational policies, and procedures • Establishing, documenting, and testing disaster recovery procedures

Job Requirements

  • Advanced to fluent English communication skills
  • 4+ years of experience with Amazon Web Services (AWS)
  • Experience with other cloud providers is a bonus
  • Experienced in Linux and/or Windows Systems administration (at least one required)
  • Scripting (bash shell and Python preferred, PowerShell acceptable)
  • Knowledge of TCP/IP networking and HTTP protocols
  • Experience with web accelerators, load balancers, reverse proxies, and CDNs
  • Problem solver and willing to work in an agile/fast-paced environment
  • Customer-oriented with good communication skills
  • Willing to participate in a 24/7 on-call rotation with approximately one shift per month compensated.
  • AWS Certifications or be willing to get certified
  • Interest in Generative AI technologies

Benefits

  • 24/7 support
  • Remote work options

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Xideral logo

MLOps Engineer – Systems, DevOps, MLOps, Cybersecurity

Xideral

Code is borderless! Click below to learn more about our international services

DevOps Engineer84 days ago
Full TimeRemoteTeam 201-500Since 2004H1B No Sponsor

• Construir, desplegar y mantener pipelines de ML en modelos de machine learning listos para producción • Procesar enormes flujos de datos para flujos de trabajo de ML escalables • Definir y desarrollar APIs y servidores MCP para apoyar soluciones de ML • Trabajar en colaboración con Científicos de Datos, Ingenieros de Datos, y ML Engineers para coordinar pipelines, mantenimiento y entrenamiento en producción • Procesar y gestionar conjuntos de datos estructurados y no estructurados a gran escala • Aplicar conocimiento empresarial para analizar datos, generar insights y resolver problemas complejos • Realizar análisis de datos ad-hoc basado en necesidades empresariales • Participar en el análisis de problemas y resolución relacionados con el flujo y contenido de datos con partes interesadas • Establecer relaciones sólidas con clientes y equipos internos, asegurando alta satisfacción del cliente • Promover mejores prácticas, innovación y mejora continua en los procesos de MLOps

Mexico
Job Closed
OtherRemoteTeam 51-200

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description We are hiring for a highly experienced Senior Staff SRE Engineer to act as a senior technical authority within our reliability function. This is a deeply hands-on individual contributor role, to build and operate SRE practices at scale. You will: - Design and evolve resilient infrastructure - Drive reliability across multiple engineering streams - Ensure our AI-driven products operate with high availability, performance, and security - Work across platform, product, data, and ML teams - Help productionise models and standardise customer environments - Strengthen Kubernetes-based architecture - Mature our CI/CD pipelines end-to-end - Collaborate with other Staff engineers and Architects to shape the global product architect and technology vision Responsibilities - Architect, deploy, and operate scalable, secure production environments (AWS preferred) - Lead reliability improvements across multiple engineering streams - Design and evolve Kubernetes-based infrastructure, including migration and optimisation initiatives - Build and enforce strong Infrastructure-as-Code standards - Define and operationalise SLIs, SLOs, and error budgets - Strengthen observability across applications, infrastructure, data pipelines, and ML systems - Work closely with product and data teams to integrate model analytics and product telemetry into reliability insights - Work across and optimise the entire CI/CD pipeline, from build to deploy to rollback - Improve release safety, deployment frequency, and predictability of SLAs - Lead incident response for complex cross-system failures and drive postmortems - Reduce operational toil through automation and platform engineering improvements - Design processes and tooling to absorb, standardise, and troubleshoot customer environments - Support and productionise ML workloads (MLOps practices including model deployment, monitoring, retraining workflows) - Ensure infrastructure aligns with enterprise-grade security and regulatory requirements - Mentor engineers and raise the overall reliability bar across teams Qualifications - Extensive hands-on experience in SRE or Production Engineering roles - Demonstrated experience building or scaling SRE practices in high-growth or complex environments - Deep expertise in AWS or Azure-based cloud infrastructure - Strong experience with Kubernetes (including migration, scaling, and production hardening) - Advanced Infrastructure-as-Code experience (Terraform or equivalent) - End-to-end CI/CD pipeline design and optimisation experience - Strong experience with observability tooling across distributed systems - Experience troubleshooting complex multi-tenant or customer-hosted environments - Experience supporting production data platforms and ML systems - MLOps experience, including model deployment and monitoring - Strong understanding of distributed systems, scalability, and fault tolerance - Systems thinker who understands interactions across infrastructure, product, data, and ML - Excellent communication skills and ability to work cross-functionally Preferred Experience - Experience in large-scale global B2B/B2C products - Experience working with AI/ML systems, NLP, or LLM-based products - Experience integrating product analytics and model performance metrics into operational monitoring - Background in enterprise environments with strong security and compliance requirements - Experience implementing regulatory controls within cloud infrastructure - Experience scaling infrastructure during rapid growth phases - Experience evaluating infrastructure tooling and vendors - Experience in collaborating with large scale enterprise customers to deploy and operate environments within their accounts and VPCs Personal Characteristics - Strong problem solver who anticipates failure modes - High ownership mentality and accountability - Comfortable working across streams and influencing without formal authority - Learning-oriented with a drive for continuous improvement

United States
Job Closed
Logicalis Spain logo

SRE, Kubernetes

Logicalis Spain

Somos Arquitectos Del Cambio, ayudamos a las organizaciones a tener éxito en un mundo cada vez más digitalizado.

DevOps Engineer84 days ago
Full TimeRemoteTeam 1,001-5,000H1B No Sponsor

• Administrar y mantener plataformas de contenedores (Kubernetes, OpenShift, AKS, EKS). • Operar y garantizar la disponibilidad, estabilidad y rendimiento de entornos híbridos (cloud pública y on-premise). • Automatizar tareas de operación e infraestructura utilizando Helm y GitOps. • Ejecutar despliegues, actualizaciones, backups y troubleshooting en los distintos entornos. • Mantener y optimizar pipelines CI/CD (Jenkins, ArgoCD, etc.). • Gestionar configuraciones y control de versiones (Git). • Monitorizar el estado de los sistemas y colaborar estrechamente con equipos de desarrollo y operaciones. • Asegurar el cumplimiento de buenas prácticas de operación, seguridad y documentación.

Spain
Job Closed
Imaginary Cloud logo

Senior DevOps Engineer

Imaginary Cloud

Software Development & UX/UI Design | Awarded Best Workplace Europe, Best Quality of Life & 2nd Best Workplace Portugal

DevOps Engineer84 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

• Work with multidisciplinary teams on innovative projects • Engage in development, problem-solving, management, and human interaction • Drive innovation by creating innovative projects for top companies • Daily tasks include cloud management, architecture, and client interaction as required.

Portugal
€39.7K - €64.5K / year