GitLab logo
GitLab

GitLab, founded in 2011 and based in San Francisco, California, maintains a distributed team of professionals that work remotely across multiple continents. GitLab advocates for pr

Intermediate Site Reliability Engineer, Cloud Cost Utilization

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 2,500Since 2014

Location

United Kingdom

Posted

37 days ago

Salary

0

Seniority

Senior

Job Description

Intermediate Site Reliability Engineer, Cloud Cost Utilization

GitLab

• Design and maintain cloud resource tagging and labeling strategies across GCP and AWS to support accurate cost attribution • Develop tooling and pipelines to ingest, normalize, and report on cloud billing data using the FOCUS specification • Automate cost anomaly detection, forecasting, and alerting so engineering teams can respond quickly to changes in infrastructure spend • Contribute to GitLab's observability and monitoring stacks, including Prometheus, LGTM (Loki, Grafana, Tempo, and Mimir), and ELK, with a focus on surfacing cost efficiency signals • Partner with Finance and Engineering leadership to support cloud cost forecasting for planning and budget discussions • Act as a subject matter expert for cloud cost attribution, tagging strategy, and FOCUS adoption across GitLab Infrastructure • Collaborate with Finance and Compliance teams on audits, certifications, and financial reporting needs related to cloud infrastructure usage • Contribute to infrastructure-as-code efforts, including Terraform and Ansible, so cost controls and tagging requirements are built into provisioning workflows from the start.

Job Requirements

  • Hands-on experience with cloud cost management in GCP and/or AWS, including billing data, pricing models, and optimization approaches
  • Familiarity with, or interest in adopting, the FinOps FOCUS specification for multi-cloud cost analysis
  • Experience designing or implementing cloud resource tagging and labeling strategies and improving adoption across teams
  • Comfort working across technical and business functions, including Engineering, Finance, and other stakeholders
  • Experience with infrastructure as code, including Terraform and Ansible
  • Familiarity with observability tooling, including Grafana, and an understanding of how reliability and cost signals can be connected
  • Ability to explain technical cost data clearly to non-engineering audiences and support informed decision-making
  • A self-directed approach to work, with comfort operating in a fully remote and asynchronous environment.

Benefits

  • Benefits to support your health, finances, and well-being
  • Flexible Paid Time Off
  • Team Member Resource Groups
  • Equity Compensation & Employee Stock Purchase Plan
  • Growth and Development Fund
  • Parental leave
  • Home office support

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Sarwa ثروة logo

DevOps Engineering Intern

Sarwa ثروة

Make money work. Sarwa is regulated by the FSRA in the ADGM.

DevOps Engineer37 days ago
InternshipRemoteTeam 11-50Since 2017H1B No Sponsor

• Audit existing Terraform state and understand Sarwa’s Infrastructure as Code philosophy • Provision a temporary Sandbox account within Sarwa’s AWS Organization, deploying core networking (VPC, Subnets) via Terraform to validate infrastructure portability across accounts and regions • Participate in the DR Automation project - helping automate regional failover and promotion workflows • Write and maintain automation scripts using boto3 for tasks that IaC cannot handle dynamically • Contribute to documentation: build a comprehensive, code-backed DR Playbook that enables any engineer to trigger a regional failover with high confidence • Collaborate with the backend and platform teams to ensure infrastructure changes align with application requirements • Learn and apply cross-region peering, multi-account routing, and EKS deployment management

United Arab Emirates
Full TimeRemoteTeam 51-200H1B No Sponsor

• Manage and optimize release pipelines to ensure smooth deployment of software updates. • Define and maintain versioning strategies, ensuring consistency across multiple environments. • Coordinate with engineering, QA, and DevOps teams to ensure timely and stable releases. • Automate and improve build, release, and deployment processes for efficiency and reliability. • Monitor and troubleshoot release-related issues, ensuring minimal downtime. • Maintain documentation for release workflows, rollback plans, and deployment strategies. • Ensure compliance with security, performance, and quality standards in all releases. • Work with CI/CD tools (e.g., Jenkins, GitHub Actions, GitLab CI, CircleCI) to manage automated releases. • Implement and maintain feature flagging strategies to enable controlled rollouts. • Analyze release performance and drive continuous improvements in deployment processes.

Portugal
€54K - €67K / year
Job Closed
Full TimeRemoteTeam 51-200H1B No Sponsor

• Manage and optimize release pipelines to ensure smooth deployment of software updates. • Define and maintain versioning strategies, ensuring consistency across multiple environments. • Coordinate with engineering, QA, and DevOps teams to ensure timely and stable releases. • Automate and improve build, release, and deployment processes for efficiency and reliability. • Monitor and troubleshoot release-related issues, ensuring minimal downtime. • Maintain documentation for release workflows, rollback plans, and deployment strategies. • Ensure compliance with security, performance, and quality standards in all releases. • Work with CI/CD tools (e.g., Jenkins, GitHub Actions, GitLab CI, CircleCI) to manage automated releases. • Implement and maintain feature flagging strategies to enable controlled rollouts. • Analyze release performance and drive continuous improvements in deployment processes.

Spain
€54K - €67K / year
Job Closed
Full TimeRemoteTeam 51-200H1B No Sponsor

• Design and implement scalable, reliable, and fault-tolerant systems across cloud environments. • Develop and maintain observability tools, including monitoring, logging, and alerting (e.g., Prometheus, Grafana, Datadog, ELK). • Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code (IaC) tools like Terraform or CloudFormation. • Optimize system performance, scalability, and incident response workflows to improve uptime. • Work closely with development and DevOps teams to improve system design for reliability. • Conduct root cause analysis (RCA) and implement preventative measures to minimize failures. • Ensure high availability by designing and maintaining load balancing, failover, and disaster recovery strategies. • Improve CI/CD pipelines to enhance deployment speed while maintaining stability. • Optimize cloud cost and resource utilization for AWS, Azure, or Google Cloud Platform (GCP). • Participate in on-call rotations to quickly address system failures and minimize downtime.

France
€55K - €68K / year
Job Closed