FluidStack

NVIDIA H100 & A100 GPUs available on demand at scale. Access thousands of GPUs for AI/LLM/ML, ready for deployment now.

Principal Operations Engineer – Reliability, Data Center Operations

DevOps EngineerDevOps EngineerFull Time Remote LeadTeam 11-50H1B No SponsorCompany Site LinkedIn

Location

United States

Posted

6 days ago

Salary

$150K - $250K / year

Seniority

Lead

Bachelor DegreeEnglish

Job Description

• Take the on-call escalation when a site hits trouble and triage it virtually, using real knowledge of the team and the systems to decide what to escalate, when, and how to keep the field crew focused without burying them. • Get on a plane when it matters: travel site to site (50%+) to work live incidents and post-incident reviews on the floor, and bring the practices that worked elsewhere with you. • Own root cause analysis on significant events through to closure and track corrective actions to done, killing the underlying class of failure rather than the one instance in front of you. • Read the patterns across the fleet’s incidents and RCAs, push the few highest-value learnings through to closure, and stay honest about what’s achievable and what to drop instead of boiling the ocean. • Carry learnings and practices from one campus to the next so a fix at one site becomes the standard everywhere before the failure repeats. • Write the operational Assessment standard and audit each campus against it, feeding what you find straight back into the corrective-action loop.

Job Requirements

You’ve run a live critical operation and led a team of operators, and you carry the deep, earned judgment that comes from owning the floor when it counts.
You’ve been the person a site calls when something breaks, triaged the problem over the phone, and known exactly when to escalate and when to let the field team work it.
You’ve authored root cause analyses on significant events and tracked corrective actions to closure, and you can show the difference between an RCA that closed a ticket and one that killed a class of failure.
You’ve sat with a pile of RCA actions and cut it to the few that matter, because you know an operation that commits to everything finishes nothing.
You’ve traveled site to site, walked the floor, and left each operation better than you found it, carrying the practices that worked from one into the next.
You’ve written the standard, not just followed it, audited real sites against it without flinching from what you found, and can hold one bar across domains you don’t all live in.
Bonus: Hyperscale or large colocation at hundreds of MW+. Direct exposure to Hardware or Network operations, not only Facilities, incident.io or equivalent incident tooling, plus DCIM. Building an assessment, audit, qualification, or training program from scratch.

Benefits

Competitive total compensation package (salary + equity).
Retirement or pension plan, in line with local norms.
Health, dental, and vision insurance.
Generous PTO policy, in line with local norms.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Senior Consultant – DevOps

3Cloud

Delivering the ultimate Microsoft Azure experience.

DevOps Engineer6 days ago

Full Time RemoteTeam 501-1,000H1B No Sponsor

Company Site LinkedIn

• Participate in technical envisioning, technical design, and delivery of assigned projects. • Work with 3Cloud Architects to support project efforts from a technical perspective. • Execute the implementation of designed solutions into client deliverables. • Assist with design and deployment of client workloads into Azure. • Providing technical expertise and support across the following four areas of specialization: • Datacenter Transformation • Azure Infrastructure • DevOps and CI/CI pipelines • Cloud automation

Ansible AWS Azure Chef Cloud Docker Google Cloud Platform Groovy Jenkins Kubernetes Linux Puppet Python Terraform TFS

View details: Senior Consultant – DevOps

Philippines

Apply

Apigee Developer, DevOps Engineer

Elfonze Technologies

In a world of quantity, we offer quality...

DevOps Engineer6 days ago

Full Time RemoteTeam 201-500Since 2020H1B No Sponsor

Company Site LinkedIn

• Design, deploy, and manage Kubernetes clusters • Implement Infrastructure as Code using Terraform • Manage Google Cloud IAM and Workload Identity Federation • Provide incident response and production support • Develop and manage Apigee proxies and policies

Cloud Kubernetes Terraform

View details: Apigee Developer, DevOps Engineer

India

Apply

Lead DevOps Engineer

Profitroom

Empowering hotels directly! Maximize Bookings and Convert Site Visitors into Guest

DevOps Engineer6 days ago

Full Time RemoteTeam 201-500Since 2008H1B No Sponsor

Company Site LinkedIn

• Collaborate with software development teams and support their releases, develop, and maintain CI/CD pipelines. • Design and implement automated infrastructure. • Maintain existing cloud infrastructure/VMs. • Monitor the environment, analyse and solve problems if required. • Utilise your understanding of the Software Development Life Cycle to proactively optimise and automate infrastructure and processes.

Ansible AWS Cloud Docker Google Cloud Platform Grafana JavaScript Kubernetes Linux MariaDB NGINX Node.js Perl PHP Prometheus Python SDLC Terraform Go

View details: Lead DevOps Engineer

Poland

zł15.6K - zł22K / month

Apply

Principal AWS DevOps Engineer – AI/ML Platform

ellowtech

Hire faster than ever with pre-vetted remote developers you can trust

DevOps Engineer6 days ago

Contract RemoteTeam 201-500Since 2020H1B No Sponsor

Company Site LinkedIn

• Design and manage AWS infrastructure supporting AI/ML workloads • Develop and maintain infrastructure using Terraform • Automate provisioning and configuration management • Design and maintain CI/CD pipelines • Deploy and manage AI/ML services on AWS • Monitor and ensure platform reliability

AWS Cloud Terraform

View details: Principal AWS DevOps Engineer – AI/ML Platform

India

Apply

Principal Operations Engineer – Reliability, Data Center Operations

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior Consultant – DevOps

Apigee Developer, DevOps Engineer

Lead DevOps Engineer

Principal AWS DevOps Engineer – AI/ML Platform