Job Closed
This listing is no longer active.
Waymo is a company in the autonomous driving technology space offering self-driving vehicles with the potential to increase mobility and decrease lives lost in
ML Engineer, Foundation Model Infrastructure
Location
United States
Posted
136 days ago
Salary
$204K - $259K / year
Seniority
Mid Level
Job Description
ML Engineer, Foundation Model Infrastructure
Waymo
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description The mission of the Waymo AI Foundations team is to develop machine learning solutions addressing open problems in autonomous driving, towards the goal of safely operating Waymo vehicles in dozens of cities and under all driving conditions. This role follows a hybrid work schedule and you will report to a Senior Research Scientist. - Build and operate the petabyte-scale data systems and ML pipelines at the heart of Waymo's foundation model development - Shepherd cutting-edge foundation models from research prototypes to robust components within the Waymo Driver - Create the automated infrastructure for rigorously benchmarking, continuously monitoring, and safely releasing models - Wield large-scale compute and frameworks like Flume and JAX to process massive datasets and train/deploy complex models - Drive significant leaps in the speed, reliability, and efficiency of the end-to-end ML development lifecycle - Partner with AI Foundations, ML, and Platform experts to transform model innovations into tangible on-road improvements Qualifications - Masters degree in Computer Science, Machine Learning, Robotics, similar technical field of study, or equivalent practical experience - Proficiency in Python - Proficiency in C++ - Familiarity with one of the modern deep learning frameworks (e.g. Pytorch, JAX, Tensorflow) - Experience building or maintaining large-scale data pipelines or ML infrastructure (e.g., Flume, Spark, Borg, Kubeflow) Requirements - Strong hands-on SWE skills, able to drive development of large, complex shared codebases - Experience in AV planning and related research - Experience designing and building distributed systems or MLOps platforms (e.g., model versioning, experiment tracking, CI/CD for ML) - Prior work in an industrial or research setting developing methodologies for the evaluation of ML models Benefits - Eligible to participate in Waymo’s discretionary annual bonus program - Equity incentive plan - Generous Company benefits program, subject to eligibility requirements Salary Range $204,000 — $259,000 USD
Job Requirements
- Masters degree in Computer Science, Machine Learning, Robotics, similar technical field of study, or equivalent practical experience
- Proficiency in Python
- Proficiency in C++
- Familiarity with one of the modern deep learning frameworks (e.g. Pytorch, JAX, Tensorflow)
- Experience building or maintaining large-scale data pipelines or ML infrastructure (e.g., Flume, Spark, Borg, Kubeflow)
- Strong hands-on SWE skills, able to drive development of large, complex shared codebases
- Experience in AV planning and related research
- Experience designing and building distributed systems or MLOps platforms (e.g., model versioning, experiment tracking, CI/CD for ML)
- Prior work in an industrial or research setting developing methodologies for the evaluation of ML models
Benefits
- Eligible to participate in Waymo’s discretionary annual bonus program
- Equity incentive plan
- Generous Company benefits program, subject to eligibility requirements
- Salary Range
- $204,000 — $259,000 USD
Related Guides
Related Job Pages
More Machine Learning Engineer Jobs
Member of Technical Staff, Inference
RunwayBusiness financials got stuck in the 15th century so we're showing them today’s computers 🖥
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description We're looking for an ML infrastructure engineer to bridge the gap between research and production at Runway. You'll work directly with our research teams to productionize cutting-edge generative models—taking checkpoints from training to staging to production, ensuring reliability at scale, and building the infrastructure that enables fast iteration. You'll be embedded within research teams, providing platform support throughout the entire model development lifecycle. Your work will directly impact how quickly we can ship new models and features to millions of users. A peek at our technical stack - API endpoints for real-time collaboration and media asset management written in TypeScript, running in ECS containers on AWS Fargate. - Leverage multiple AWS-native components, such as S3, CloudFront, Lambda, Kinesis, and SQS. - Inference backend written in Python (PyTorch, TorchScript), deployed across multiple clusters/cloud providers. - Use Kubernetes for container orchestration, with k8s-native components such as Flyte, Kueue, and Kyverno for efficient job orchestration. - Invest in Prometheus and Grafana for monitoring, and Terraform to manage infrastructure. Qualifications - 4+ years of experience running ML model inference at scale in production environments. - Strong experience with PyTorch and multi-GPU inference for large models. - Experience with Kubernetes for ML workloads—deploying, scaling, and debugging GPU-based services. - Comfortable working across multiple cloud providers and managing GPU driver compatibility. - Experience with monitoring and observability for ML systems (errors, throughput, GPU utilization). - Self-starter who can work embedded with research teams and move fast. - Strong systems thinking and pragmatic approach to production reliability. - Humility and open-mindedness; at Runway we love to learn from one another. Requirements - Experience building custom inference frameworks or serving systems (Nice to Have). - Deep understanding of distributed training and inference patterns (FSDP, data parallelism, tensor parallelism) (Nice to Have). - Ability to debug low-level issues: NCCL networking problems, CUDA errors, memory leaks, performance bottlenecks (Nice to Have). - Experience with diffusion models or video generation systems (Nice to Have). - Knowledge of real-time or latency-sensitive ML applications (Nice to Have). Benefits - Salary range: $240,000 - $290,000. - Commitment to creating a space where employees can bring their full selves to work and have equal opportunity to succeed. Company Description Runway strives to recruit and retain exceptional talent from diverse backgrounds while ensuring pay equity for our team. Our salary ranges are based on competitive market rates for our size, stage, and industry, and salary is just one part of the overall compensation package we provide. There are many factors that go into salary determinations, including relevant experience, skill level and qualifications assessed during the interview process, and maintaining internal equity with peers on the team. The range shared below is a general expectation for the function as posted, but we are also open to considering candidates who may be more or less experienced than outlined in the job description. In this case, we will communicate any updates in the expected salary range. Lastly, the provided range is the expected salary for candidates in the U.S. Outside of those regions, there may be a change in the range, which again, will be communicated to candidates. We're excited to be recognized as a best place to work by Crain's, InHerSight, BuiltIn NYC, and INC.
ML Engineer, Foundation Model Evaluation
WaymoWaymo is a company in the autonomous driving technology space offering self-driving vehicles with the potential to increase mobility and decrease lives lost in
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description The mission of the Waymo AI Foundations team is to develop machine learning solutions addressing open problems in autonomous driving, towards the goal of safely operating Waymo vehicles in dozens of cities and under all driving conditions. This role follows a hybrid work schedule and you will report to a Senior Research Scientist. - Develop and extend cutting-edge research in robotics and machine learning to advance state-of-the-art methodologies for evaluating the quality, safety, and realism of embodied AI agents - Partner within and across organizations to land disruptive and innovative tech in production - Work with a variety of state-of-the-art Foundation Models - Drive model development through defining evaluation and benchmarks - Implement and extend large scale data and evaluation pipelines Qualifications - Masters degree in Computer Science, Machine Learning, Robotics, similar technical field of study, or equivalent practical experience - Proficiency in Python - Familiarity with one of the modern deep learning frameworks (e.g. Pytorch, JAX, Tensorflow) - Prior work in an industrial or research setting developing methodologies for the evaluation of ML models Requirements - Strong hands-on SWE skills, able to design, implement, and extend large distributed pipelines - Track record of publications in top-tier conferences or leading open source projects in the related fields - Proficiency in C++ - Experience in AV planning and related research - Experience in labeling and curating data for ML eval and training Benefits - Eligibility to participate in Waymo’s discretionary annual bonus program - Equity incentive plan - Generous Company benefits program, subject to eligibility requirements Salary Range The expected base salary range for this full-time position across US locations is listed below. Actual starting pay will be based on job-related factors, including exact work location, experience, relevant training and education, and skill level. Salary Range: $170,000 — $216,000 USD
Principal Machine Learning Engineer
Grace HillHelping owners and operators of real estate increase property performance, reduce operating risk and grow top talent.
• Design and implement the statistical models and ML algorithms that drive our market analysis • Architect how models are trained, versioned, and served in a production environment • Partner with the product team to design the data foundations for every new feature • Act as a force multiplier for our full-stack engineers • Define the HelloData standard for data integrity, pipeline observability, and algorithmic transparency
Senior ML Engineer – Neural Rendering
Torc RoboticsLeading autonomous vehicle technology since 2007, Torc develops automated Level 4, Class 8 trucks with Daimler.
• Implement the latest research advances in Neural Rendering and generative models • Translate cutting edge solution in the domain of autonomous driving for high-quality Camera, LiDAR and Radar sensor simulations • Support implementing a neural rendering framework that allows to scale perception simulation and AV 3.0 training • Integrate the framework in a cloud environment and automate the pipeline to allow scaling for the target verification and validation of our autonomous trucks • Own development projects in the team – From research, design, to implementation, testing and deployment • Design, implement, test and deploy shippable production quality software starting from early prototypes using disciplined software development processes. • Work in the cloud machine learning ecosystem alongside other machine learning services existing in the company. • Proactively assess current capabilities to identify areas for improvement proposing solutions that align with core strategy and operation. • Demonstrate project management skills, serving as project lead guiding less experienced team members in multiple facets of project execution, coaching and mentoring as needed.



