Job Closed

This listing is no longer active.

Microsoft logo
Microsoft

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to any characteristic protected by applicable local laws, regulations, and ordinances.

Principal AI Operations Engineer

DevOps EngineerDevOps EngineerOtherRemoteTeam 10,001+H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

104 days ago

Salary

0

No structured requirement data.

Job Description

Principal AI Operations Engineer

Microsoft

We are seeking a Principal AI Operations Engineer to define the technical direction for the AI Operations group. In this role, you will: Design and architect operational systems Establish standards for branch health, CI/CD pipelines, production deployments, and on-call processes Drive reliability initiatives and maintain production health and uptime Ensure the platform meets its SLOs Be the escalation point for complex incidents Work closely with the Platform team to ensure services are operationally ready

Job Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • 6+ years technical engineering experience in DevOps, SRE, or platform operations
  • 6+ years driving complex operational initiatives across teams; demonstrated success leading without authority
  • 4+ years hands-on experience with Kubernetes in production environments
  • 3+ years building and maintaining CI/CD pipelines at scale
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • Preferred Qualifications
  • Experienced with Kubernetes: cluster operations, Helm, troubleshooting, autoscaling, and production management
  • Proficiency with CI/CD platforms: Azure DevOps, GitHub Actions, or similar pipeline tooling
  • Experience with cloud platforms (Azure preferred): AKS, networking, identity management, and resource provisioning
  • Infrastructure as Code: Bicep, Terraform, or Helm chart development
  • Observability tooling: Prometheus, Grafana, OpenTelemetry, and log analytics (Kusto/KQL)

Benefits

  • The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year
  • There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year
  • Certain roles may be eligible for benefits and other compensation

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Arizona Department of Administration logo

Senior Systems Administrator

Arizona Department of Administration

The Attorney General's Office offers a comprehensive benefits package. For a complete list of benefits provided by The State of Arizona, please visit our benefits page.

DevOps Engineer104 days ago
OtherRemoteTeam 1,001-5,000

The Senior Systems Administrator is responsible for ensuring the secure, reliable, and efficient operation of enterprise IT systems, networks, and infrastructure. This role proactively solves complex technical issues, implements and maintains security controls, and supports agency operations through system administration, network management, and customer-focused technical support. Own day-to-day technical decision making, escalated incident response, and enforcement of technical standards across systems and infrastructure. Collaborate closely with business partners to protect data, mitigate security risks, and support ongoing technology improvements. Work with minimal supervision, contribute to IT planning and resource management, and help drive operational efficiency through automation, documentation, and continuous improvement initiatives. Configure, monitor, and troubleshoot servers, workstations, networks, cloud platforms, and enterprise applications. Administer user access and security controls. Respond to incidents and suspicious activity. Support IT projects, upgrades, and migrations.

United States
Job Closed
Vanco logo

Senior Site Reliability Engineer

Vanco

We serve those who enrich our communities.

DevOps Engineer104 days ago
OtherRemoteTeam 51-200H1B Sponsor

• Work collaboratively with software and systems engineering to deploy and manage systems within AWS Cloud. • Lead the automation and streamlining operations and processes. • Design, build, setup, and maintain tools for deployment, monitoring, and infrastructure provisioning. • Administer all systems related to R&D projects, including user creation, systems provision troubleshooting, monitoring, etc. • Create the vision and designs the automation strategy across the platform. • Troubleshoot site down issues and respond to emergency outages. • Scale infrastructure to meet demand and continuously monitor/improve the quality of infrastructure. • Participate in on-call rotation as needed.

Alabama + 36 moreAll locations: Alabama | Arizona | California | Colorado | Connecticut | District of Columbia | Florida | Illinois | Iowa | Kansas | Kentucky | Louisiana | Montana | Nevada | New Hampshire | New Jersey | New York | North Carolina | Ohio | Oklahoma | Oregon | Maryland | Massachusetts | Michigan | Minnesota | Mississippi | Missouri | Pennsylvania | South Carolina | South Dakota | Tennessee | Texas | Utah | Virginia | Washington | West Virginia | Wisconsin
$85K - $120K / year
Job Closed
Andromeda logo

Site Reliability Engineer – AI Infrastructure

Andromeda

Where technology meets empathy – pioneering the future of human-robot interaction.

DevOps Engineer104 days ago
OtherRemoteTeam 11-50H1B Sponsor

• Provision, configure, and operate Kubernetes-based clusters for customers across multiple providers • Build automation and tooling to streamline cluster deployments and integrations • Debug customer issues across networking, storage, scheduling, and system layers • Improve reliability and scalability of both training and inference infrastructure • Design and implement monitoring, alerting, and observability for critical systems • Collaborate with engineering and product teams to plan and deliver infrastructure for new services • Participate in on-call and incident response, leading postmortems and reliability improvements

California
BigDataCorp logo

DevOps Specialist – Infrastructure

BigDataCorp

The data platform for the digital age! The best data for your business, ethically and transparently sourced.

DevOps Engineer105 days ago
Full TimeRemoteTeam 51-200Since 2013H1B No Sponsor

• Configure and manage infrastructure environments on AWS, focusing on AWS ECS and AWS EKS; • Develop and maintain infrastructure as code (IaC) using Terraform; • Implement and manage continuous integration/continuous delivery (CI/CD) pipelines in GitLab; • Perform monitoring, scaling, and automation of deployment processes; • Ensure DevOps security and compliance best practices; • Collaborate with development and operations teams to optimize software delivery; • Maintain code analysis and security tools integrated with the pipeline, such as SonarQube and ZapProxy; • Keep detailed technical documentation and share knowledge with the team; • Participate in the configuration and management of resources in AWS environments; • AWS Organizations, AWS Control Tower and AWS IAM Identity Center.

Brazil