Reddit, Inc. logo
Reddit, Inc.

Dive into anything

Senior Machine Learning Systems Engineer

Machine Learning EngineerMachine Learning EngineerOtherRemoteSeniorTeam 501-1,000Since 2005H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

87 days ago

Salary

$216.7K - $303.4K / year

Seniority

Senior

Job Description

Senior Machine Learning Systems Engineer

Reddit, Inc.

Role Description The Machine Learning Platform team at Reddit is a high-impact team that owns the infrastructure that powers recommendations, content discovery, user and content quantification, while directly impacting other teams such as Growth, Ads, Feeds, and Core Machine Learning teams. As a Senior ML Infrastructure Engineer, you will lead development of a platform for large scale ML models at Reddit. - Design end-to-end model lifecycle patterns (MLOps) to boost velocity of development for ML engineers, including data preparation, model management, experiment tracking, and more. - Zero-to-one development and support of a graph ML codebase and platform that abstracts away common patterns and enables greater model scalability and iteration. - Collaborate with ML engineers on performance tuning, including improving model training time, efficiency, and GPU training costs in a large, distributed ML training environment. - Optimize batch data processing within a data warehouse and with tools such as Apache Beam, Apache Spark, Ray Data, and more. - Architect pipelines to build and maintain massive graph data structures on the order of billions of nodes and tens of billions of edges. Qualifications - 5+ years of experience in ML infrastructure, including model training and model deployments. - Hands-on experience with ML optimization, including memory and GPU profiling. - Deep experience with cloud-based technologies for supporting an ML platform, including tools like GCP BigQuery, Google Cloud Storage, infrastructure-as-code (Terraform), and more. - Hands-on experience administering and integrating MLOps tools for experiment tracking, model serving, and model registries (e.g. MLflow or Wandb). - Proficiency with the common programming languages and frameworks of ML, such as Python, PyTorch, Tensorflow, etc. - Deep experience working with distributed training frameworks, including Ray and Kubernetes. - Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle. - Strong organizational & communication skills. - Experience working with graph databases (Neo4j, JanusGraph, TigerGraph) is a big plus. - Experience working with graph neural networks (GNNs) and associated graph ML frameworks (PyTorch Geometric, Deep Graph Library) is a big plus. Requirements - Pay Transparency: This job posting may span more than one career level. - In addition to base salary, this job is eligible to receive equity in the form of restricted stock units, and depending on the position offered, it may also be eligible to receive a commission. - Reddit offers a wide range of benefits to U.S.-based employees, including medical, dental, and vision insurance, 401(k) program with employer match, generous time off for vacation, and parental leave. - The base salary range for this position is: $216,700 - $303,400 USD. Benefits - Medical, dental, and vision insurance. - 401(k) program with employer match. - Generous time off for vacation. - Parental leave. Company Description Reddit is proud to be an equal opportunity employer, and is committed to building a workforce representative of the diverse communities we serve. Reddit is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures.

Job Requirements

  • 5+ years of experience in ML infrastructure, including model training and model deployments.
  • Hands-on experience with ML optimization, including memory and GPU profiling.
  • Deep experience with cloud-based technologies for supporting an ML platform, including tools like GCP BigQuery, Google Cloud Storage, infrastructure-as-code (Terraform), and more.
  • Hands-on experience administering and integrating MLOps tools for experiment tracking, model serving, and model registries (e.g. MLflow or Wandb).
  • Proficiency with the common programming languages and frameworks of ML, such as Python, PyTorch, Tensorflow, etc.
  • Deep experience working with distributed training frameworks, including Ray and Kubernetes.
  • Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle.
  • Strong organizational & communication skills.
  • Experience working with graph databases (Neo4j, JanusGraph, TigerGraph) is a big plus.
  • Experience working with graph neural networks (GNNs) and associated graph ML frameworks (PyTorch Geometric, Deep Graph Library) is a big plus.
  • Pay Transparency: This job posting may span more than one career level.
  • In addition to base salary, this job is eligible to receive equity in the form of restricted stock units, and depending on the position offered, it may also be eligible to receive a commission.
  • Reddit offers a wide range of benefits to U.S.-based employees, including medical, dental, and vision insurance, 401(k) program with employer match, generous time off for vacation, and parental leave.
  • The base salary range for this position is: $216,700 - $303,400 USD.

Benefits

  • Medical, dental, and vision insurance.
  • 401(k) program with employer match.
  • Generous time off for vacation.
  • Parental leave.

Related Job Pages

More Machine Learning Engineer Jobs

Torc Robotics logo

Ingénieur en apprentissage automatique, II

Torc Robotics

Leading autonomous vehicle technology since 2007, Torc develops automated Level 4, Class 8 trucks with Daimler.

OtherRemoteTeam 501-1,000Since 2007H1B Sponsor

• Développer et entraîner des modèles d’apprentissage profond pour la perception à base de caméras, permettant à la pile d’autonomie de détecter les objets, comprendre les scènes et estimer les informations géométriques provenant de groupes visuels. • Mettre en œuvre un code d’apprentissage automatique de qualité production pour appuyer l’entraînement, l’évaluation et l’inférence de modèles pour les systèmes de perception à base de caméra. • Analyser le rendement de modèles à travers plusieurs scénarios de conduite, identifier les modes d’échec et améliorer la robustesse et la généralisation. • Contribuer au développement et à l’optimisation de pipelines d’entraînement à grande échelle, y compris la préparation des ensembles de données, l’entraînement distribué et la gestion de l’expérimentation. • Travailler étroitement avec les équipes des données pour organiser et améliorer les ensembles de données d’entraînement provenant des registres de flotte, de la simulation et des pipelines d’annotation. • Collaborer avec des équipes pluridisciplinaires de perception, simulation et validation afin d’évaluer le rendement de modèles et contribuer à leur intégration dans la pile d’autonomie. • Améliorer les flux de travail et outils d’expérimentation afin d’accélérer l’itération, la répétabilité et l’évaluation des modèles. • Contribuer aux discussions sur l’architecture de modèle, les stratégies d’entraînement et le design du système de perception.

Michigan
Job Closed
Torc Robotics logo

Machine Learning Engineer II – Roads and Lanes

Torc Robotics

Leading autonomous vehicle technology since 2007, Torc develops automated Level 4, Class 8 trucks with Daimler.

OtherRemoteTeam 501-1,000Since 2007H1B Sponsor

• Develop and train computer vision and deep learning models for lane detection using monocular and multimodal sensor data (cameras, LiDARs, and radars). • Design 3D road surface models and lane geometry in bird's-eye view (BEV) space, and integrate them into Torc's autonomy pipeline. • Analyze model performance, identify corner cases, and improve robustness across various environmental conditions and long-tail scenarios. • Develop and optimize large-scale data processing workflows, including annotation, pseudo-labeling, and data augmentation. • Implement adaptive training and evaluation pipelines for lane perception models. • Own deployment work to optimize models for real-time execution on automotive-grade hardware. • Leverage known SD and HD maps to improve the accuracy and stability of lane estimation. • Contribute to architecture discussions, model reviews, and system-level integration efforts.

Michigan
Job Closed
Torc Robotics logo

ML Engineer II – Learned Behaviors

Torc Robotics

Leading autonomous vehicle technology since 2007, Torc develops automated Level 4, Class 8 trucks with Daimler.

OtherRemoteTeam 501-1,000Since 2007H1B Sponsor

• Develop and train machine learning models for learned behavior systems, including approaches such as behavior cloning, imitation learning, and reinforcement learning. • Implement production-quality ML code to support model training, evaluation, and inference within the autonomy stack. • Analyze model performance, identify failure modes, and propose improvements to increase robustness and generalization across scenarios. • Contribute to model training pipelines and data workflows, curating behavior datasets from simulation, fleet logs, and on-vehicle data. • Collaborate with simulation, validation, and autonomy engineering teams to test and evaluate learned behavior models across diverse driving environments. • Help integrate learned behavior models into simulation and testing workflows, enabling faster iteration and more comprehensive validation. • Support the development of tooling and infrastructure that improves experimentation speed, reproducibility, and model iteration. • Contribute to technical discussions around model architecture and training strategies within the team.

Michigan
Torc Robotics logo

ML Engineer, II – Road & Lane

Torc Robotics

Leading autonomous vehicle technology since 2007, Torc develops automated Level 4, Class 8 trucks with Daimler.

OtherRemoteTeam 501-1,000Since 2007H1B Sponsor

• Develop and train computer vision and deep learning models for road-lane detection using monocular and multimodal sensor data (camera, LiDAR, radar). • Build 3D road surface and lane geometry models in BEV space and integrate them into Torc’s autonomy pipeline. • Analyze model performance, identify corner cases, and improve robustness under diverse environmental and long-tail conditions. • Develop and optimize large-scale data processing workflows, including annotation, pseudo-labeling, and data augmentation. • Implement scalable training and evaluation pipelines for lane perception models. • Own deployment-focused work to optimize models for real-time execution on automotive-grade hardware. • Leverage SD and HD map priors to improve lane estimation accuracy and stability. • Contribute to architectural discussions, model reviews, and system-level integration efforts.

Michigan
$153.2K - $183.3K / year
Job Closed