Intermediate Site Reliability Engineer, Cloud Cost Utilization at GitLab

GitLab, founded in 2011 and based in San Francisco, California, maintains a distributed team of professionals that work remotely across multiple continents. Git

Intermediate Site Reliability Engineer, Cloud Cost Utilization

DevOps EngineerDevOps EngineerFull Time Remote Senior Company Site

Location

United Kingdom

Posted

91 days ago

Salary

Seniority

Senior

EnglishAnsible AWS Cloud Google Cloud Platform Grafana Prometheus Terraform

Job Description

• Design and maintain cloud resource tagging and labeling strategies across GCP and AWS to support accurate cost attribution • Develop tooling and pipelines to ingest, normalize, and report on cloud billing data using the FOCUS specification • Automate cost anomaly detection, forecasting, and alerting so engineering teams can respond quickly to changes in infrastructure spend • Contribute to GitLab's observability and monitoring stacks, including Prometheus, LGTM (Loki, Grafana, Tempo, and Mimir), and ELK, with a focus on surfacing cost efficiency signals • Partner with Finance and Engineering leadership to support cloud cost forecasting for planning and budget discussions • Act as a subject matter expert for cloud cost attribution, tagging strategy, and FOCUS adoption across GitLab Infrastructure • Collaborate with Finance and Compliance teams on audits, certifications, and financial reporting needs related to cloud infrastructure usage • Contribute to infrastructure-as-code efforts, including Terraform and Ansible, so cost controls and tagging requirements are built into provisioning workflows from the start.

Job Requirements

Hands-on experience with cloud cost management in GCP and/or AWS, including billing data, pricing models, and optimization approaches
Familiarity with, or interest in adopting, the FinOps FOCUS specification for multi-cloud cost analysis
Experience designing or implementing cloud resource tagging and labeling strategies and improving adoption across teams
Comfort working across technical and business functions, including Engineering, Finance, and other stakeholders
Experience with infrastructure as code, including Terraform and Ansible
Familiarity with observability tooling, including Grafana, and an understanding of how reliability and cost signals can be connected
Ability to explain technical cost data clearly to non-engineering audiences and support informed decision-making
A self-directed approach to work, with comfort operating in a fully remote and asynchronous environment.

Benefits

Benefits to support your health, finances, and well-being
Flexible Paid Time Off
Team Member Resource Groups
Equity Compensation & Employee Stock Purchase Plan
Growth and Development Fund
Parental leave
Home office support

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

DevOps Engineering Intern

Sarwa ثروة

Make money work. Sarwa is regulated by the FSRA in the ADGM.

DevOps Engineer91 days ago

Internship RemoteTeam 11-50Since 2017H1B No Sponsor

Company Site LinkedIn

• Audit existing Terraform state and understand Sarwa’s Infrastructure as Code philosophy • Provision a temporary Sandbox account within Sarwa’s AWS Organization, deploying core networking (VPC, Subnets) via Terraform to validate infrastructure portability across accounts and regions • Participate in the DR Automation project - helping automate regional failover and promotion workflows • Write and maintain automation scripts using boto3 for tasks that IaC cannot handle dynamically • Contribute to documentation: build a comprehensive, code-backed DR Playbook that enables any engineer to trigger a regional failover with high confidence • Collaborate with the backend and platform teams to ensure infrastructure changes align with application requirements • Learn and apply cross-region peering, multi-account routing, and EKS deployment management

AWS Cloud Kubernetes Linux Python Terraform

View details: DevOps Engineering Intern

United Arab Emirates

Apply

Release Engineer

Air Apps

DevOps Engineer91 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Manage and optimize release pipelines to ensure smooth deployment of software updates. • Define and maintain versioning strategies, ensuring consistency across multiple environments. • Coordinate with engineering, QA, and DevOps teams to ensure timely and stable releases. • Automate and improve build, release, and deployment processes for efficiency and reliability. • Monitor and troubleshoot release-related issues, ensuring minimal downtime. • Maintain documentation for release workflows, rollback plans, and deployment strategies. • Ensure compliance with security, performance, and quality standards in all releases. • Work with CI/CD tools (e.g., Jenkins, GitHub Actions, GitLab CI, CircleCI) to manage automated releases. • Implement and maintain feature flagging strategies to enable controlled rollouts. • Analyze release performance and drive continuous improvements in deployment processes.

AWS Azure Cloud Docker Google Cloud Platform Gradle Jenkins Kubernetes Python Webpack

View details: Release Engineer

Portugal

€54K - €67K / year

Apply

Job Closed

Release Engineer

Air Apps

DevOps Engineer91 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

AWS Azure Cloud Docker Google Cloud Platform Gradle Jenkins Kubernetes Python Webpack

View details: Release Engineer

Spain

€54K - €67K / year

Apply

Job Closed

Site Reliability Engineer – SRE

Air Apps

DevOps Engineer91 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Design and implement scalable, reliable, and fault-tolerant systems across cloud environments. • Develop and maintain observability tools, including monitoring, logging, and alerting (e.g., Prometheus, Grafana, Datadog, ELK). • Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code (IaC) tools like Terraform or CloudFormation. • Optimize system performance, scalability, and incident response workflows to improve uptime. • Work closely with development and DevOps teams to improve system design for reliability. • Conduct root cause analysis (RCA) and implement preventative measures to minimize failures. • Ensure high availability by designing and maintaining load balancing, failover, and disaster recovery strategies. • Improve CI/CD pipelines to enhance deployment speed while maintaining stability. • Optimize cloud cost and resource utilization for AWS, Azure, or Google Cloud Platform (GCP). • Participate in on-call rotations to quickly address system failures and minimize downtime.

AWS Azure Cloud Distributed Systems Docker Google Cloud Platform Grafana Kubernetes Linux Prometheus Python Terraform Go

View details: Site Reliability Engineer – SRE

France

€55K - €68K / year

Apply

Job Closed

Intermediate Site Reliability Engineer, Cloud Cost Utilization

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DevOps Engineering Intern

Release Engineer

Release Engineer

Site Reliability Engineer – SRE