Reddit is an online platform utilized by thousands of communities to connect and converse about a wide variety of topics, including TV and movie fan theories, s

Staff Machine Learning Systems Engineer

Machine Learning EngineerMachine Learning EngineerOther Remote Lead Company Site

Location

United States

Posted

82 days ago

Salary

$230K - $322K / year

Seniority

Lead

Python PyTorch TensorFlow Kubernetes Ray Apache Spark Apache Beam GCP BigQuery Terraform MLflow

Job Description

Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 121 million daily active unique visitors, Reddit is one of the internet’s largest sources of information. For more information, visit www.redditinc.com. Who We Are: The Machine Learning Platform team at Reddit is a high-impact team that owns the infrastructure that powers recommendations, content discovery, user and content quantification, while directly impacting other teams such as Growth, Ads, Feeds, and Core Machine Learning teams. What You’ll Do: As a Staff ML Infrastructure Engineer, you will lead development of a platform for large scale ML models at Reddit. - Design end-to-end model lifecycle patterns (MLOps) to boost velocity of development for ML engineers, including data preparation, model management, experiment tracking, and more - Zero-to-one development and support of a graph ML codebase and platform that abstracts away common patterns and enables greater model scalability and iteration - Collaborate with ML engineers on performance tuning, including improving model training time, efficiency, and GPU training costs in a large, distributed ML training environment - Optimize batch data processing within a data warehouse and with tools such as Apache Beam, Apache Spark, Ray Data, and more - Architect pipelines to build and maintain massive graph data structures on the order of billions of nodes and tens of billions of edges Who You Might Be: - 8+ years of experience in ML infrastructure, including model training and model deployments - Hands-on experience with ML optimization, including memory and GPU profiling - Deep experience with cloud-based technologies for supporting an ML platform, including tools like GCP BigQuery, Google Cloud Storage, infrastructure-as-code (Terraform), and more - Hands-on experience administering and integrating MLOps tools for experiment tracking, model serving, and model registries (e.g. MLflow or Wandb) - Proficiency with the common programming languages and frameworks of ML, such as Python, PyTorch, Tensorflow, etc. - Deep experience working with distributed training frameworks, including Ray and Kubernetes - Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle. - Strong organizational & communication skills - Experience working with graph databases (Neo4j, JanusGraph, TigerGraph) is a big plus - Experience working with graph neural networks (GNNs) and associated graph ML frameworks (PyTorch Geometric, Deep Graph Library) is a big plus Pay Transparency: This job posting may span more than one career level. In addition to base salary, this job is eligible to receive equity in the form of restricted stock units, and depending on the position offered, it may also be eligible to receive a commission. Additionally, Reddit offers a wide range of benefits to U.S.-based employees, including medical, dental, and vision insurance, 401(k) program with employer match, generous time off for vacation, and parental leave. To learn more, please visit https://www.redditinc.com/careers/. To provide greater transparency to candidates, we share base salary ranges for all US-based job postings regardless of state. We set standard base pay ranges for all roles based on function, level, and country location, benchmarked against similar stage growth companies. Final offer amounts are determined by multiple factors including, skills, depth of work experience and relevant licenses/credentials, and may vary from the amounts listed below. The base salary range for this position is: $230,000—$322,000 USD In select roles and locations, the interviews will be recorded, transcribed and summarized by artificial intelligence (AI). You will have the opportunity to opt out of recording, transcription and summarization prior to any scheduled interviews. During the interview, we will collect the following categories of personal information: Identifiers, Professional and Employment-Related Information, Sensory Information (audio/video recording), and any other categories of personal information you choose to share with us. We will use this information to evaluate your application for employment or an independent contractor role, as applicable. We will not sell your personal information or disclose it to any third party for their marketing purposes. We will delete any recording of your interview promptly after making a hiring decision. For more information about how we will handle your personal information, including our retention of it, please refer to our Candidate Privacy Policy for Potential Employees and Contractors. Reddit is proud to be an equal opportunity employer, and is committed to building a workforce representative of the diverse communities we serve. Reddit is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If, due to a disability, you need an accommodation during the interview process, please let your recruiter know.

Related Categories

Machine Learning Engineer AI Engineer AI Research Scientist LLM Engineer Computer Vision Engineer NLP Engineer

Related Job Pages

Remote Python Jobs (US)More Remote Jobs

More Machine Learning Engineer Jobs

Machine Learning Manager

Pennylane

The Financial OS for accounting firms and business owners

Machine Learning Engineer82 days ago

Full Time RemoteTeam 501-1,000Since 2020H1B No Sponsor

Company Site LinkedIn

• Lead a team of Machine Learning Engineers and Data Engineers • Contribute technically to the design and implementation of machine learning solutions • Collaborate with Product Managers to maximize impact and ensure quality • Grow your team and establish the right culture and processes • Work closely with data engineers and software engineers to deploy solutions

View details: Machine Learning Manager

France

Apply

Job Closed

Machine Learning Ops Engineer II

Sheetz, Inc

Sheetz is committed to the full inclusion of all qualified individuals. Sheetz is committed to considering all applicants regardless of disability who can perform all essential job duties with or without accommodations.

Machine Learning Engineer82 days ago

Other RemoteTeam 10,001

This position offers a base salary range of $78,807.00 - $131,346.00 per year, depending on experience and qualifications, plus bonus based on company performance. One of the MANY work perkz at Sheetz is quarterly employee bonuses based on company performance! And there’s more – A LOT more… like competitive salaries, PTO and parental leave, 401k match and employee stock ownership, limitless professional development and growth opportunities, tuition reimbursement, full medical, vision and dental coverage, and snack discounts! A Machine Learning Ops Engineer II at Sheetz ensures that AI models move seamlessly from “working on a laptop” to running reliably across our stores, applications, and systems at scale. This role powers capabilities like smarter inventory management, enhanced customer experiences, and faster decision-making that keeps pace with the way Sheetz operates. The MLOps Engineer designs, builds, and maintains the pipelines, deployment processes, and monitoring systems that allow models to run continuously and perform consistently. Just as Sheetz kitchens operate around the clock to serve customers, this role keeps our AI systems running 24/7, using data as the ingredients and algorithms as the recipes that drive our technology. This role qualifies for a remote work arrangement within our 7 state footprint (PA, OH, MI, WV, VA, MD, NC). OVERVIEW Support the design, development, and deployment of ML solutions and infrastructure to operationalize machine learning models and ensure their performance at scale. Maintain robust, reproducible, and scalable machine learning workflows, monitor model health in production, and assist in implementing MLOps best practices. Utilize experience and gain technical depth to contribute to the ongoing maturity of the ML ecosystem across the organization. RESPONSIBILITIES (other duties may be assigned) 1. Contribute to the design, automation, and maintenance of end-to-end machine learning pipelines, including model training, validation, deployment, and monitoring 2. Write well-structured, testable, and maintainable code to support robust ML systems 3. Apply software engineering best practices to productionize machine learning workflows 4. Collaborate with internal teams to build, integrate, and scale machine learning solutions that align with business and operational requirements 5. Utilize tools including but not limited to MLflow, TensorFlow, PyTorch, and containerization frameworks (e.g., Docker, Kubernetes) to deploy and manage models in production environments 6. Monitor deployed models for drift, latency, and performance degradation; implement alerting and retraining pipelines as needed to maintain reliability, escalating as required 7. Assist in the setup and optimization of CI/CD pipelines for ML workflows to enable fast and safe model iteration and deployment 8. Maintain documentation, version control, and metadata tracking to ensure models are reproducible and auditable 9. Recommend improvements to MLOps practices, frameworks, and tooling and help to define, and refine, operational standards, as the organization’s ML capabilities mature QUALIFICATIONS (Equivalent combinations of education, licenses, certifications and/or experience may be considered) Education • Bachelor’s degree in Computer Science, Management Information Systems, Computer Engineering, or related discipline is required Experience • Minimum 3 years experience in design, development, and deployment of ML solutions required • Previous utilization of programming languages (Python, Bash) or scripting for automation and ML pipeline orchestration preferred • Previous experience in machine learning pipelines, model lifecycle management, or MLOps concepts (e.g., model deployment, monitoring, CI/CD) preferred • Previous experience in secure development practices and cloud environments (e.g., AWS, GCP, or Azure) preferred Licenses/Certifications • Certifications in cloud platforms (AWS/GCP/Azure), ML Ops, or DevOps tools preferred. Tools & Equipment • General Office Equipment ACCOMMODATIONS Sheetz is committed to the full inclusion of all qualified individuals. Sheetz is committed to considering all applicants regardless of disability who can perform all essential job duties with or without accommodations.

Python Shell Docker Kubernetes MLflow TensorFlow PyTorch CI/CD AWS Observability / Monitoring

View details: Machine Learning Ops Engineer II

United States

$78.8K - $131.3K / year

Apply

Job Closed

Senior Machine Learning Engineer, AI Governance

Optro

Optro helps enterprises transform risk into opportunity, redefining GRC for the agentic future of risk management.

Machine Learning Engineer82 days ago

Full Time RemoteTeam 501-1,000Since 2014H1B No Sponsor

Company Site LinkedIn

• Build, ship, and own product features end-to-end • Work with designers, and product managers to create high-performing product features. • Apply a range of techniques—from classical ML to LLM-based approaches (RAG, prompt engineering, fine-tuning, semantic search)—with a strong focus on reliability, performance, and maintainability • Write well-designed, maintainable, and testable code • Write clear and well-defined design documentation • Troubleshoot, debug, and resolve software bugs • Be product-minded and think about the customer • Stay updated on AI/ML advancements and explore new techniques and tools. • Participate in an Agile software development life cycle • Work with Python, JavaScript, Node.JS, Docker, PostgreSQL, Kubernetes, etc

Docker Java JavaScript Keras Kubernetes Node.js PostgreSQL Python PyTorch Scikit-Learn SDLC Tensorflow

View details: Senior Machine Learning Engineer, AI Governance

Canada

CA$140K - CA$180K / year

Apply

Senior IA/ML Engineer – Eng/Esp

Plain Concepts

Rediscover the meaning of technology | Spain, USA, UK, Germany, Netherlands, Australia and Romania.

Machine Learning Engineer82 days ago

Full Time RemoteTeam 201-500Since 2006H1B No Sponsor

Company Site LinkedIn

• Participating in the design and development of AI solutions for challenging projects. • Building production level ML/AI solutions, with solid software engineering and ML/AI principles. • MLOps Automated deployment and monitoring (models and infrastructure). • Data analysis (data cleaning, variable transformation, etc.). • Developing and training ML models. • Putting AI models into production. • This means parallelizing, optimizing, tuning, testing the models to deploy in a production environment.

Azure Docker gRPC Python PyTorch SQL TensorFlow

View details: Senior IA/ML Engineer – Eng/Esp

Spain

Apply

Staff Machine Learning Systems Engineer

Job Description

Related Guides

Related Categories

Related Job Pages

More Machine Learning Engineer Jobs

Machine Learning Manager

Machine Learning Ops Engineer II

Senior Machine Learning Engineer, AI Governance

Senior IA/ML Engineer – Eng/Esp