Job Closed
This listing is no longer active.
Build software faster. The One DevOps Platform enables your entire org to collaborate around your code. We're hiring.
Intermediate Site Reliability Engineer, Environment Automation
Location
India
Posted
84 days ago
Salary
0
Seniority
Senior
Job Description
Intermediate Site Reliability Engineer, Environment Automation
GitLab
• Contribute to automating operational tasks across many GitLab environments, from initial provisioning and configuration updates to upgrades and routine maintenance, helping reduce manual work and improve reliability at scale under the guidance of senior team members. • Help build and refine the observability stack for multi-tenant GitLab environments so we monitor the right signals across Kubernetes, cloud services, and GitLab applications, supporting early issue detection and basic capacity tracking. • Assist in responding to platform alerts and incidents, collaborating with Environment Automation SREs and engineering teams to troubleshoot production issues across multiple tenants and document findings. • Support planning and implementation of infrastructure changes, capacity expansions, and new service rollouts for Dedicated and other managed GitLab environments, contributing to efforts that improve resource efficiency and environment isolation. • Develop and maintain scripts, automation tools, and infrastructure-as-code workflows that manage parts of the GitLab environment lifecycle, enabling more repeatable, self-service operations over time. • Apply and help implement best practices for running GitLab on Kubernetes and cloud platforms, focusing on day-to-day reliability, performance, and security while learning how to keep environments consistent. • Participate in the on-call rotation for production GitLab environments with appropriate support, helping triage and mitigate incidents across clusters and cloud providers and contributing to post-incident reviews. • Document operational tasks, runbooks, and lessons learned so they become clear, repeatable processes and can be candidates for future automation, improving shared knowledge and reducing manual toil across the team.
Job Requirements
- Experience working as an SRE or in a similar role operating production infrastructure, with an interest in automating the lifecycle of many environments or tenants in parallel, even if you have not yet done so at large scale.
- Hands-on experience with Golang (required) and the ability to read, understand, and modify infrastructure tools written in Go.
- Hands-on experience running Kubernetes-based workloads in production, including basic understanding of deployments, rollouts, and debugging common issues like crash loops, failed health checks, and scheduling problems.
- Familiarity with infrastructure automation and configuration management tools such as Terraform and Ansible, including experience working with modules, variables, and managing state safely for multiple environments.
- Solid understanding of Git-based workflows and infrastructure-as-code practices, with the ability to contribute to reusable modules, templates, and pipelines that make automation safer and more consistent.
- Experience working in distributed systems or cloud-based production environments, ideally in SaaS or managed service settings, with comfort participating in incident response and on-call rotations under guidance from more senior team members.
- A proactive mindset focused on automation and documentation—you look for opportunities to remove manual steps, improve runbooks, and turn repetitive tasks into reliable, self-service tools.
- Comfort working asynchronously across distributed teams and a desire to contribute to GitLab's values of collaboration, transparency, and iteration.
Benefits
- Benefits to support your health, finances, and well-being
- Flexible Paid Time Off
- Team Member Resource Groups
- Equity Compensation & Employee Stock Purchase Plan
- Growth and Development Fund
- Parental leave
- Home office support
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevSecOps Engineer
AP MAX INCBrello is a wellness-first brand that makes access to science-backed compounded medications feel effortless — never clinical or confusing. We connect individuals to licensed providers through Telegra, with prescriptions fulfilled by trusted 503A pharmacies. Our mission is to simplify, humanize, and demystify wellness solutions for longevity and weight management, with an authentic voice that is friendly, empowering, and transparent.
Role Description The DevSecOps Engineer plays a critical role in enabling secure, scalable software delivery across Allia Health Group’s cloud infrastructure. This role operates at the intersection of DevOps and security, embedding security controls directly into CI/CD pipelines and engineering workflows. This is a mission-critical hire supporting the organization’s SOC 2 compliance timeline. The ideal candidate brings a balanced skill set across infrastructure, automation, and security tooling, and is comfortable working in a fast-paced, evolving environment. - Operate as a bridge between DevOps and Security to integrate security into the software development lifecycle - Implement CI/CD security controls including SAST, DAST, SCA, and container scanning - Implement controls aligned with SOC 2 change management and vulnerability management requirements - Manage secrets lifecycle using cloud-native tools - Build and maintain infrastructure security controls using Terraform - Generate audit-ready change management evidence - Integrate vulnerability scanning into compliance workflows - Enforce secure development practices and pipeline protections - Collaborate with GRC teams to align technical controls with compliance requirements Qualifications - Minimum 3+ years of experience in DevOps, DevSecOps, or platform engineering - Experience with cloud platforms such as Google Cloud Platform (GCP) - Strong experience with CI/CD tools such as GitHub Actions - Hands-on experience with Terraform and infrastructure as code - Knowledge of application security and container security tools - Familiarity with SOC 2 or similar compliance frameworks Requirements - Experience with compliance platforms such as Drata or similar tools - Knowledge of HIPAA technical safeguards - Experience with policy-as-code tools - Relevant cloud or security certifications Benefits - Full benefits package including medical, vision, dental, 401(k) with company match, PTO, Flex days, holidays, and more - Working in Madeira in a shared office space, remote in Portugal, or remote in a Portuguese time zone-friendly location - Opportunity to build security-first infrastructure and systems - High-impact role within a growing technology organization - Benefits package designed to meet local market standards and legal requirements
• Co-responsibility for system availability: You actively contribute to the availability, reliability, and efficiency of our complex system architecture, which consists of around 70 servers hosted at Hetzner. • Maintenance and automation: You support the maintenance and automation of our existing infrastructure based on technologies such as Ubuntu, Percona MySQL Cluster, MinIO, Elasticsearch, Redis, NGINX, HAProxy, TiDB, ClickHouse and Kubernetes. • Monitoring and analysis: You improve our monitoring strategies and perform comprehensive incident and fault analyses. • High availability: You are prepared, in exceptional cases, to be available outside normal hours, including at night, to ensure our systems run smoothly. • Software development: Several years of experience in one or more programming languages (e.g., Rust, Java, Go, TypeScript) are required.
• Administrar y mantener la infraestructura de la empresa en Google Cloud Platform (GCP). • Implementar y mantener soluciones de infraestructura como código con Terraform y Terragrunt. • Gestionar la estructura de proyectos, carpetas, VPCs, IAM y recursos en GCP. • Crear y mantener contenedores con Docker, charts de Helm y orquestación con Kubernetes (GKE). • Gestionar artefactos en GCP Artifact Registry. • Administrar esquemas y migraciones de bases de datos con Atlas. • Gestionar el versionado del código con Gitlab y Git. • Implementar y mantener procesos de CI/CD utilizando Gitlab CI/CD. • Monitorizar y optimizar la infraestructura con Cloud Monitoring, Cloud Logging y Cloud Trace. • Colaborar estrechamente con el equipo de desarrollo para la implementación y despliegue de aplicaciones.
Senior SRE – Site Reliability Engineer
Raiô BenefíciosUm ecossistema completo de benefícios corporativos.
• Ensure the reliability, stability, performance and cost-efficiency of Raiô's platforms, working closely with engineering and product teams. • Define, monitor and evolve reliability and performance indicators (SLIs/SLOs), establishing effective alerts and continuous improvement routines. • Manage production incidents, conducting root cause analysis and implementing corrective and preventive plans, with a focus on learning and system evolution. • Design, deploy and evolve observability practices (logs, metrics and tracing), improving predictability and reducing time to resolution for failures. • Develop and maintain automations and infrastructure as code, ensuring consistent, secure and reproducible environments. • Structure and evolve operational practices: routines, playbooks/runbooks, change management, metrics review and capacity planning. • Lead, together with engineering teams, technical and architectural decisions aimed at resilience, scalability and cost optimization. • Drive continuous improvements in operational processes, security, availability and cost control in cloud environments. • Promote best practices in reliability, operations and engineering, raising the overall technical level of the team.



