Job Closed
This listing is no longer active.
At Ford Motor Company, we believe freedom of movement drives human progress. We also believe in providing you with the freedom to define and realize your dreams. With our incredible plans for the future of mobility, we have a wide variety of opportunities for you to accelerate your career potential as you help us define tomorrow’s transportation.
HPC - AI/ML Platform Engineer
Location
United States
Posted
79 days ago
Salary
$113K - $190K / year
Seniority
Mid Level
Job Description
HPC - AI/ML Platform Engineer
Ford Motor Company
The selected candidate will join the team responsible for engineering and operating large-scale GPU and compute platforms that power AI/ML and high performance computing workloads across multiple datacenters. The team manages Kubernetes-based GPU environments, cluster infrastructure, and the supporting systems that enable internal engineering teams to train models, run simulations, and develop advanced software at scale. This role focuses on building reliable, scalable GPU platforms and helping internal users successfully run AI/ML and high-performance workloads on Kubernetes and related compute infrastructure. - Design, implement, and support GPU/Kubernetes clusters and supporting infrastructure - Supporting AI/ML training, simulation, and HPC workload customers - Develop automation and tooling for cluster provisioning, configuration management, and platform operations - Collaborate with application and research teams to optimize workloads running on GPU infrastructure - Implement monitoring, observability, and performance tuning across GPU and compute platforms - Troubleshoot infrastructure issues across compute, networking, and container platforms (occasional on-call support) - Contribute to platform reliability, scalability, and operational best practices - Produce clear technical documentation and operational runbooks Must Have: - 5+ years of Linux systems engineering or infrastructure experience - 2+ years working with container platforms such as Kubernetes or OpenShift - Familiarity with Kubernetes GPU scheduling and related tooling - Familiarity with CI/CD pipelines and platform engineering practices - Experience operating compute infrastructure for high-performance workloads or large distributed systems - Strong scripting or programming skills (Python, Bash, or similar) - Experience building infrastructure automation and operational tooling - Strong troubleshooting and problem-solving skills across complex infrastructure systems - Ability to communicate clearly with both platform engineers and application teams - Demonstrated ability to manage multiple technical initiatives simultaneously Nice to Have: - Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent experience - Experience with observability platforms such as Prometheus, Grafana, or similar - Experience with infrastructure automation tools (Ansible, Terraform, etc.) - Experience with high-speed networking technologies such as InfiniBand or RDMA You may not check every box, or your experience may look a little different from what we've outlined, but if you think you can bring value to Ford Motor Company, we encourage you to apply! As an established global company, we offer the benefit of choice. You can choose what your Ford future will look like: will your story span the globe, or keep you close to home? Will your career be a deep dive into what you love, or a series of new teams and new skills? Will you be a leader, a changemaker, a technical expert, a culture builder…or all of the above? No matter what you choose, we offer a work life that works for you, including: - Immediate medical, dental, and prescription drug coverage - Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up child care and more - Vehicle discount program for employees and family members, and management leases - Tuition assistance - Established and active employee resource groups - Paid time off for individual and team community service - A generous schedule of paid holidays, including the week between Christmas and New Year’s Day - Paid time off and the option to purchase additional vacation time. For a detailed look at our benefits, click here: Benefit Summary This position is a salary grade 8. This position is a salary grade 8 and ranges from $113,580-190,500. *Visa Sponsorship is not provided for this role* Candidates for positions with Ford Motor Company must be legally authorized to work in the United States. Verification of employment eligibility will be required at the time of hire. We are an Equal Opportunity Employer committed to a culturally diverse workforce. All qualified applicants will receive consideration for employment without regard to race, religion, color, age, sex, national origin, sexual orientation, gender identity, disability status or protected veteran status. In the United States, If you need a reasonable accommodation for the online application process due to a disability, please call 1-888-336-0660. #LI-Remote #LI-GH2
Job Requirements
- 5+ years of Linux systems engineering or infrastructure experience
- 2+ years working with container platforms such as Kubernetes or OpenShift
- Familiarity with Kubernetes GPU scheduling and related tooling
- Familiarity with CI/CD pipelines and platform engineering practices
- Experience operating compute infrastructure for high-performance workloads or large distributed systems
- Strong scripting or programming skills (Python, Bash, or similar)
- Experience building infrastructure automation and operational tooling
- Strong troubleshooting and problem-solving skills across complex infrastructure systems
- Ability to communicate clearly with both platform engineers and application teams
- Demonstrated ability to manage multiple technical initiatives simultaneously
- Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent experience (Nice to Have)
- Experience with observability platforms such as Prometheus, Grafana, or similar (Nice to Have)
- Experience with infrastructure automation tools (Ansible, Terraform, etc.) (Nice to Have)
- Experience with high-speed networking technologies such as InfiniBand or RDMA (Nice to Have)
Benefits
- Immediate medical, dental, and prescription drug coverage
- Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up child care and more
- Vehicle discount program for employees and family members, and management leases
- Tuition assistance
- Established and active employee resource groups
- Paid time off for individual and team community service
- A generous schedule of paid holidays, including the week between Christmas and New Year’s Day
- Paid time off and the option to purchase additional vacation time
Related Guides
Related Categories
Related Job Pages
More Platform Engineer Jobs
• Provide customers with configuration advice, training, and problem resolution throughout the setup and installation process of Five9’s call center software. • Design and configure Five9’s platform for each customer’s unique requirements. • Troubleshoot software solutions in a wide array of configurations and customer environments both remotely and on-site. • Provide customized training to ensure customers have a thorough understanding of these solutions. • Articulate the value of Five9’s Professional Services through demonstrations and open discussion with Customers and prospects. • Effectively communicate with internal and external stakeholders.
Platform Engineering Manager
OnebriefSoftware for rapid military planning: make planning fast enough for today's environment
• Lead, mentor, and grow the team of Platform Engineers by providing direction, removing blockers, and empowering them to own core platform infrastructure and tooling. • Partner with Cybersecurity, Product, and Engineering teams to ensure the platform supports secure, reliable, high-velocity delivery into cloud-native and air-gapped environments. • Drive standards and frameworks for Infrastructure as Code, service onboarding, environment management, GitOps workflows, and overall platform releases. • Ensure observability, reliability, scalability, and cost-effectiveness of the platform; continuously monitor and improve service health, developer feedback, and performance metrics. • Own the operational aspects of the platform team: incidents, responsiveness, After-Action Reviews (AARs) / post-mortems, runbooks, and continuous improvement of operational maturity. • Act as a key stakeholder in architecture, influencing decisions about how the platform integrates with product, engineering, and operations, and ensuring the platform meets mission needs.
• El Power Platform Developer será el responsable de diseñar, desarrollar e implementar soluciones empresariales de alto impacto utilizando la suite completa de Microsoft Power Platform. • Este rol actúa como referente técnico dentro del equipo, liderando iniciativas de automatización, digitalización de procesos y generación de valor a partir de datos, alineando las soluciones tecnológicas a los objetivos estratégicos del negocio. • Se espera que el profesional demuestre capacidad de adaptación ante entornos cambiantes, pueda trabajar de forma autónoma y colaborativa, y contribuya activamente a la madurez digital de la organización. • Diseñar y desarrollar aplicaciones empresariales con Power Apps (Canvas y Model-Driven). • Crear y optimizar flujos de trabajo complejos en Power Automate, incluyendo integraciones con sistemas externos vía conectores y API REST. • Implementar chatbots y agentes conversacionales con Copilot Studio (Power Virtual Agents). • Gestionar y configurar portales externos con Power Pages. • Integrar soluciones Power Platform con SharePoint Online, Teams, Dataverse, Azure AD y servicios de Azure. • Administrar entornos, soluciones administradas/no administradas y gobierno de la plataforma en el Power Platform Admin Center. • Aplicar buenas prácticas de ALM (Application Lifecycle Management) con pipelines CI/CD y control de versiones. • Participar activamente en el levantamiento de requerimientos con stakeholders de negocio. • Proponer mejoras continuas, estándares de desarrollo y documentación técnica. • Mentoría y acompañamiento a perfiles junior del equipo. • Evaluar y gestionar riesgos técnicos en las soluciones bajo su responsabilidad.
Senior Platform Engineer, Data Persistence New York, NY Are you a platform engineer who thrives on designing scalable, resilient data systems that support real-world, enterprise workloads? Do you enjoy owning complex persistence challenges—balancing performance, reliability, and cost—while enabling product teams to move faster with confidence? Are you excited to shape the foundation of a cloud-native SaaS platform by building durable, high-availability data architectures that stand up to growth and change? If so, we invite you to be a part of our innovative team. As a Senior Platform Engineer on the Data Persistence team, you will play a critical role in designing, building, and optimizing the data systems that power Ridgeline’s enterprise SaaS platform. You’ll lead the evolution of our cloud-native, distributed data architecture—solving challenges around scale, performance, availability, and cost—while delivering a best-in-class developer experience. In this role, you will collaborate closely with application and infrastructure teams to develop efficient, resilient, and high-performance data solutions that directly support customer-facing workloads. You’ll leverage cutting-edge technologies—including AI tools like GitHub Copilot and ChatGPT—to enhance productivity, accelerate problem-solving, and continuously improve how we design and operate our data platforms. At Ridgeline, how we work matters as much as what we build. Ridgeliners act like owners, choose growth over comfort, and communicate with transparency. We assume positive intent, bias toward action, and bring solutions—not just problems. We celebrate wins, learn from setbacks, and thrive in a resilient, collaborative, high-performing culture. If this excites you, we’d love to meet you. You must be work authorized in the United States without the need for employer sponsorship. The impact you will have: - Design and develop cloud-native data storage solutions that deliver scalable, reliable, and performant persistence across a multi-tenant SaaS platform - Optimize data access patterns and collaborate with application teams to improve end-to-end performance at the application layer - Drive RPO and RTO improvements by establishing high-availability architectures, failover strategies, and disaster recovery plans - Support high-throughput, low-latency workloads through effective data partitioning, caching, and indexing strategies - Improve observability by investing in database monitoring, automation, and performance telemetry - Balance performance and efficiency by identifying and implementing cost optimizations across database infrastructure - Collaborate transparently with cross-functional teams to solve complex data challenges and share best practices - Mentor engineers, foster technical growth, and contribute to a resilient, inclusive culture of engineering excellence - Take ownership of critical systems, acting with accountability and a long-term mindset aligned with Ridgeline’s values What we look for: - 5+ years of experience in software or infrastructure engineering, with deep expertise in data persistence, distributed systems, or database engineering - Proven experience building and operating distributed, multi-writer OLTP SQL systems (e.g., SingleStore, CockroachDB) and single-writer systems (e.g., Postgres), with a strong understanding of replication, sharding, and consistency tradeoffs - Hands-on experience with OLAP or analytical data stores such as ClickHouse and/or Apache Iceberg on S3 - Strong background in high availability and disaster recovery strategies, including cross-region replication, backups and point-in-time recovery, and clearly defined RPO/RTO targets - Solid experience with AWS services such as Aurora RDS, S3, ECS, Lambda, and related infrastructure - Proficiency in at least one programming language such as Kotlin, Java, or TypeScript - Experience using Datadog or similar tooling for database and storage observability - Proficiency with AI-assisted development tools such as Cursor, Claude, or GitHub Copilot - Strong problem-solving skills, clear communication, and a genuine interest in learning and continuous improvement Bonus: - Experience designing and operating high-throughput JVM services and data-access libraries, with deep knowledge of threading, connection pooling, and saturation behavior (e.g., HikariCP) - Background in event-driven architectures using Kafka or Pub/Sub, including familiarity with Debezium, Kafka Connect, schema registries, or CDC workflows - Experience in fintech, investment management, or other highly regulated data environments The typical starting salary range for new hires in this role is targeted at $146,000 - $172,000. Final compensation amounts are determined by multiple factors, including candidate experience and expertise, and may vary from the amount listed above. As an employee at Ridgeline, you’ll have many opportunities for advancement in your career and can make a true impact on the product. In addition to the base salary, 100% of Ridgeline employees can participate in our Company Stock Plan subject to the applicable Stock Option Agreement. We also offer rich benefits that reflect the kind of organization we want to be: one in which our employees feel valued and are inspired to bring their best selves to work. These include unlimited vacation, educational and wellness reimbursements, and $0 cost employee insurance plans. #LI-Hybrid



