Job Closed
This listing is no longer active.
Accelerating R&D to untangle the complexity of life
Technical DevOps – Infrastructure Engineer
Location
Brazil
Posted
38 days ago
Salary
0
Seniority
Lead
Job Description
Technical DevOps – Infrastructure Engineer
Deep Origin
• Own our cloud infrastructure across AWS and third-party hosting and compute providers; ensure it is reliable, scalable, and cost-efficient • Own and operate bare-metal compute clusters: node provisioning, configuration management, networking, secure access, and ongoing reliability • Build and maintain configuration management using Ansible (or similar), ensuring reproducible and scalable server provisioning • Set up and maintain Slurm for job scheduling across CPU and GPU node pools; ensure researchers can submit, monitor, and manage jobs without DevOps involvement • Design and manage cluster networking: management and storage networks, inter-node communication, DNS, and secure perimeter access, including bastion/jump host setup • Deep hands-on experience managing Linux-based infrastructure, including networking, firewalls, VPNs, and performance tuning in distributed environments • Own disaster recovery and business continuity: define RTO/RPO targets, maintain runbooks, and run regular tests • Manage and optimize infrastructure spend through capacity planning, right-sizing, and intelligent use of reserved and spot capacity • Manage Kubernetes clusters, networking, and workload scheduling across cloud and on-premise environments • Enable infrastructure-as-code practices in Terraform; drive consistency, modularity, and auditability across the codebase • Evolve our observability platform: improve coverage, reduce alert noise, and ensure engineering teams have the visibility they need to detect and resolve issues quickly • Own security posture across the platform: IAM policies, secrets management, network segmentation, vulnerability management, and SOC 2 compliance • Lead incident management: on-call processes, escalation policies, runbooks, and blameless post-mortems • Drive CI/CD improvements and developer workflow initiatives that meaningfully increase engineering throughput • Evolve internal tooling and CLI infrastructure that engineering teams depend on daily.
Job Requirements
- 10+ years of infrastructure and DevOps engineering experience, with a proven track record in senior or lead IC roles
- Ability to take end-to-end ownership of complex, multi-team initiatives and drive them from design through to production
- Hands-on experience running HPC or research compute clusters: bare-metal provisioning, Slurm (or equivalent), GPU infrastructure, and shared storage (NFS, Lustre, or similar)
- Comfortable operating in environments with a mix of cloud, VPS, and bare-metal systems, including legacy or non-standard setups
- Experience supporting scientific or R&D teams with mixed workloads: long-running CPU batch jobs, GPU training jobs, and interactive compute
- Deep, hands-on AWS expertise: EKS/Kubernetes, IAM, VPC networking, S3, RDS, and cost management
- Solid Terraform skills and a principled approach to infrastructure-as-code
- Strong Linux fundamentals and experience managing multi-node environments at scale
- Experience owning and improving production observability systems (Prometheus/Grafana, OpenTelemetry, ELK, or similar)
- Strong security fundamentals: threat modeling, least-privilege access design, vulnerability management, and compliance frameworks
- Experience owning incident management end-to-end, including process design and continuous improvement
- Excellent communication skills; able to work directly with researchers and scientists as well as with engineering and leadership
- Fluent English.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Lead the integration and deployment of global DevOps solutions across private and multi-cloud environments • Design, maintain, and optimize CI/CD pipelines using GitLab CI • Manage Kubernetes-based platforms, ensuring scalability, reliability, and performance • Implement Infrastructure as Code (IaC) using Terraform and Ansible to automate deployments • Build and maintain observability stacks (Prometheus, Grafana, ELK/Loki, OpenTelemetry) • Troubleshoot and resolve issues across application, infrastructure, and network layers • Collaborate with cross-functional and international teams to ensure smooth operations • Drive GitOps automation with ArgoCD / Flux and promote best practices across teams • Enforce security, compliance, and access controls (Vault, RBAC, network policies) • Support hybrid and API-driven infrastructure across multiple cloud environments
Senior Azure DevOps Engineer
SpyrosoftWe enable our clients to thrive, thanks to a combination of technical proficiency and domain-specific knowledge.
• Design, build, and maintain Azure infrastructure using Terraform across dev, staging, and production environments • Own and improve Azure DevOps CI/CD pipelines (YAML), including artefact management, approvals, and automated deployments • Implement and maintain secure Azure networking (VNets, subnets, Private Endpoints, Private DNS, NSGs) to ensure private access to critical services • Configure and enforce security best practices across the platform (RBAC, Managed Identities, Service Principals, Key Vault, secrets handling) • Ensure reliable hosting and operations for platform components (App Service, Functions, Storage, Azure SQL, ACR) • Monitor platform health and performance; troubleshoot incidents using Azure monitoring, logs, and diagnostics • Support the AI processing infrastructure needs, including enabling secure and scalable execution for workflows and batch/experimental pipelines (and optionally Azure ML / Azure OpenAI)
Senior AWS DevSecOps
Poland and Eastern EuropeXebia is a global tech company with a journey in CEE that started with two Polish companies – PGS Software and GetInData. We are a team of 1,000+ experts delivering top-notch work across cloud, data, and software. We work on impactful projects across various sectors including fintech, e-commerce, aviation, logistics, media, and fashion, helping clients build scalable platforms and cutting-edge applications. Our clients include notable names like McLaren, Aviva, Deloitte, Spotify, Disney, ING, UPS, Tesco, Truecaller, AllSaints, Volotea, Schmitz Cargobull, Allegro, and InPost.
Role Description You will be: - designing and implementing scalable AWS infrastructure across multi-account environments, - building and maintaining reusable Infrastructure as Code (Terraform) modules and platform components, - implementing automation across provisioning, deployment, and governance workflows, - embedding security, compliance, and operational controls into CI/CD pipelines using policy-as-code, - improving platform reliability through observability, monitoring, and incident response practices, - collaborating with platform, security, and application teams to deliver and adopt platform capabilities. Qualifications - 8+ years of experience in cloud, infrastructure, or platform engineering roles, - strong hands-on experience designing and operating AWS environments at scale (multi-account), - proven expertise in Infrastructure as Code such as Terraform, - experience implementing CI/CD pipelines and embedding DevSecOps practices, - strong knowledge of AWS core domains - networking, IAM, container platforms (e.g. EKS), and security principles, - experience improving platform reliability through observability, monitoring, and incident management, - familiarity with least-privilege IAM strategies, and secure-by-design cloud architecture, - practical experience using AI-powered assistants (e.g. Claude Code, GitHub Copilot, Cursor) to improve productivity, quality, or decision-making in software delivery. Requirements - Work from the European Union region and a work permit are required. Benefits - AWS certifications (e.g. Solutions Architect – Professional, DevOps Engineer – Professional) are nice to have, - experience implementing AWS landing zones and Control Tower at scale, - familiarity with policy-as-code and automated compliance controls, - experience working in large-scale, distributed enterprise environments, - exposure to FinOps practices and cloud cost optimization strategies, - experience applying GenAI in a more structured way within the SDLC, including defined workflows, prompt patterns, or tool integrations embedded into daily work, - interest in and familiarity with emerging AI-driven practices (e.g. agent-based workflows, automation patterns, AI-augmented development), with a willingness to explore and experiment beyond standard approaches. Recruitment Process - CV review - HR call - Interview - Client Interview - Decision
Senior DevOps Engineer
AvengaA global IT engineering and consulting company specializing in custom software development.
• Design and implement a Golden Path CI/CD pipeline using GitHub Actions and GitOps principles • Standardize and optimize CI/CD processes across engineering teams • Build and maintain scalable deployment workflows using ArgoCD and Helm • Collaborate with development teams to improve developer experience and release efficiency • Ensure reliability, scalability, and maintainability of CI/CD infrastructure • Contribute to best practices around infrastructure automation and delivery pipelines • Work within a complex Kubernetes environment (50+ clusters), ensuring consistency and performance at scale



