Job Closed

This listing is no longer active.

HPC - AI/ML Platform Engineer

Platform EngineerPlatform EngineerFull TimeRemoteSeniorTeam 10,001+Since 1903H1B SponsorCompany SiteLinkedIn

Location

Michigan

Posted

86 days ago

Salary

$113.6K - $190.5K / year

Seniority

Senior

Job Description

HPC - AI/ML Platform Engineer

Ford Motor Company

• Design, implement, and support GPU/Kubernetes clusters and supporting infrastructure • Supporting AI/ML training, simulation, and HPC workload customers • Develop automation and tooling for cluster provisioning, configuration management, and platform operations • Collaborate with application and research teams to optimize workloads running on GPU infrastructure • Implement monitoring, observability, and performance tuning across GPU and compute platforms • Troubleshoot infrastructure issues across compute, networking, and container platforms (occasional on-call support) • Contribute to platform reliability, scalability, and operational best practices • Produce clear technical documentation and operational runbooks

Job Requirements

  • 5+ years of Linux systems engineering or infrastructure experience
  • 2+ years working with container platforms such as Kubernetes or OpenShift
  • Familiarity with Kubernetes GPU scheduling and related tooling
  • Familiarity with CI/CD pipelines and platform engineering practices
  • Experience operating compute infrastructure for high-performance workloads or large distributed systems
  • Strong scripting or programming skills (Python, Bash, or similar)
  • Experience building infrastructure automation and operational tooling
  • Strong troubleshooting and problem-solving skills across complex infrastructure systems
  • Ability to communicate clearly with both platform engineers and application teams
  • Demonstrated ability to manage multiple technical initiatives simultaneously
  • Nice to Have:
  • Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent experience
  • Experience with observability platforms such as Prometheus, Grafana, or similar
  • Experience with infrastructure automation tools (Ansible, Terraform, etc.)
  • Experience with high-speed networking technologies such as InfiniBand or RDMA

Benefits

  • Immediate medical, dental, and prescription drug coverage
  • Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up child care and more
  • Vehicle discount program for employees and family members, and management leases
  • Tuition assistance
  • Established and active employee resource groups
  • Paid time off for individual and team community service
  • A generous schedule of paid holidays, including the week between Christmas and New Year’s Day
  • Paid time off and the option to purchase additional vacation time.

Related Categories

Related Job Pages

More Platform Engineer Jobs

Full TimeRemoteTeam 51-200H1B No Sponsor

• Build and maintain Kubernetes platforms for customer-deployed and internally hosted products. • Integrate edge sensor ingest pathways to cloud analytics platforms using platforms such as Cloudflare to provide secure, performant connectivity between field-deployed systems and cloud infrastructure. • Own the container build, signing, scanning, and promotion pipeline for your supported products, implementing supply chain security best practices. • Build and operate multi-tenant SaaS infrastructure with a focus on tenant isolation, observability, and cost efficiency. • Implement infrastructure as code (Terraform, Pulumi) and CI/CD workflows to ensure environments are reproducible and delivery is auditable. • Collaborate with mission engineers to integrate field capabilities with the platform. • Provide support and troubleshooting on deployed systems.

United States
$220K - $240K / year
Job Closed
Appsilon logo

Platform Engineer

Appsilon

Open-source AI, R & Python, cloud statistical computing, and SAS-to-OS migration to speed regulated drug development.

Full TimeRemoteTeam 51-200Since 2013H1B No Sponsor

• Build and maintain scalable cloud environments (AWS, Azure) for data-driven projects • Automate DevOps processes (GitHub Actions, Azure DevOps, ArgoCD, GitlabCI) • Describe Infrastructure as Code (Terraform, Ansible) • Develop infrastructure for data science and ML workflows (e.g., Databricks, Posit) • Collaborate with cross-functional teams and advise clients on architecture and best practices • Lead documentation efforts and internal technical initiatives • Work on one or more client projects - as consultants, we usually work on one main project at a time, with occasional context switching across client engagements

Poland
€2.9K - €4.1K / month
Job Closed
CRG Solutions logo

Platform Engineer

CRG Solutions

Enabling People and Businesses for Success

Full TimeRemoteTeam 201-500Since 2015H1B No Sponsor

• Design and maintain Infrastructure as Code using Terraform for multi-environment deployments • Manage and expand ECS clusters and services, implementing container orchestration best practices • Optimize CI/CD pipelines with automated testing, image scanning, and deployment capabilities • Migrate containerized applications to ECS deployments using Docker • Implement monitoring, alerting, and maintain platform reliability • Create operational documentation and runbooks

Costa Rica
Job Closed
Full TimeRemoteTeam 51-200H1B No Sponsor

• Design, build, and operate scalable and high performance cloud infrastructure on AWS • Manage infrastructure as code using Terraform, Terragrunt, and CloudFormation • Build immutable infrastructure with Packer • Develop and maintain CI/CD pipelines using GitLab CI/CD • Operate containerized workloads across: Amazon EKS, Docker on EC2, Singularity (Apptainer) for HPC workloads • Configure systems using Ansible • Design and operate high throughput cloud and HPC storage solutions • Monitor, troubleshoot, and optimize platforms for performance, reliability, and cost • Document architectures and operational best practices

Philippines
₱200K - ₱250K / month
Job Closed