Clutch

Expert consulting elevated by human connection

Senior MLOps Engineer

Machine Learning EngineerMachine Learning EngineerContract Remote SeniorTeam 51-200H1B SponsorCompany Site LinkedIn

Location

Brazil

Posted

4 days ago

Salary

Seniority

Senior

Bachelor Degree8 yrs expEnglishAWS Cloud Docker PySpark Python Terraform TypeScript

Job Description

• Take ownership of the ML serving API that serves NBA recommendations, partnering with the data engineer who's been building it, and harden it for low-latency production traffic • Build the first repeatable deployment pipeline: model artifact → versioned, deployable, rollback-able production service, with infrastructure defined as code • Stand up the monitoring foundation: latency/error/drift dashboards, alerting, and audit/trace visibility across models and agents • Build a working relationship with HAL and become the data team's go-to on ML serving and reliability decisions • Be the primary owner (with data engineer support) of the ML serving platform and deployment pipelines for NBA and our ML models • Have at least one production model and one production agent fully instrumented — versioning, monitoring, alerting, and multi-tenant gating in place • Define the data team's playbook for shipping a new ML model to production, end-to-end • Drive architectural decisions across APIs, processing pipelines, distributed compute, storage, search, observability, cloud infrastructure, and model-serving workflows • Mentor the data engineers on MLOps patterns so they can confidently support and extend the systems you own • Operate as the technical lead within the data team for NBA production ML operations — the person other teams come to when they want to understand how Clutch ships and runs ML reliably • Have measurably improved cost and latency • Be shaping the data team's roadmap for the next generation of ML infrastructure, in partnership with the PM and data scientist • Help us decide what to hire next as the team scales

Job Requirements

8+ years of experience in software, data, or ML engineering, with 4–5+ years running ML systems in production — you've taken models from prototype to production and own what happens after deploy
Strong Python — most of the work (serving API, pipelines, tooling, data pipelines) is in Python, and you're comfortable in production codebases, not just notebooks. Some TypeScript is involved for integration with our agent runtime — you don't need to be an expert, comfort with a second language is enough
CI/CD & deployment discipline. You build training and deploy pipelines that take a model artifact to a versioned, deployable, rollback-able production service, with automated testing and reproducible builds. You've implemented CI/CD for ML and built and maintained CI/CD pipelines (GitHub Actions, Bamboo, GitLab CI, or similar)
Infrastructure as code. You manage cloud infrastructure (AWS Lambda, ECS) with Terraform or equivalent — no click-ops, everything reviewable and reproducible
Monitoring & observability discipline. You instrument serving systems for latency, error rates, drift, and cost; you read audit rows and distributed traces; you set up alerting so regressions are caught before users feel them. You treat monitoring as a first-class deliverable, not an afterthought
Reliability rigor. You design for failure: structured error handling, graceful degradation, rollback paths, and runbooks. You have a story about a production incident you handled and how you hardened the system afterward
Experience building and operating low-latency production APIs (FastAPI, BentoML, or equivalent), with opinions on serving, batching, and caching
Comfortable in AWS (Lambda especially), containers (Docker), and GitHub-based workflows
Security & governance. You ensure security and governance across systems: IAM, KMS, access policies, and Secrets Manager/SSM
DevOps / infrastructure knowledge, plus data manipulation and feature engineering
Solid understanding of ML concepts: models, pipelines, metrics, and supervised/unsupervised learning
Integrate and optimize AI/ML services with the company's other systems
You use AI tooling actively in your engineering workflow — not as a novelty, but as a default. You'll be expected to demonstrate this during the technical evaluation
Databricks, PySpark

Benefits

Remote Flexibility: Enjoy the freedom of remote work from anywhere, balancing life and career seamlessly.
Unforgettable Off-Sites: Twice a year, bond with colleagues in exciting destinations, fostering teamwork and fresh ideas.
Paid Time Off and National Holidays: Enjoy 20 PTO days yearly and the National Holidays for relaxation and rejuvenation.
Stock Options: Joining us means having a stake in our success, so you'll receive stock options as part of your compensation package.
Home Office Setup: Create your ideal workspace with a dedicated budget for home office essentials.
Work Trip Budget: Grow personally and professionally with a budget for work-related trips and co-working.

Related Categories

Machine Learning Engineer AI Engineer AI Research Scientist LLM Engineer Computer Vision Engineer NLP Engineer

Related Job Pages

Remote Python Jobs (US)More Remote Jobs

More Machine Learning Engineer Jobs

Senior Staff Machine Learning Engineer, LLM/VLM Model Architecture & Optimization

Waymo

Waymo is an autonomous driving technology company creating a new way forward in mobility.

Machine Learning Engineer4 days ago

Full Time RemoteTeam 1,001-5,000Since 2016H1B Sponsor

Company Site LinkedIn

Role Description The Perception team builds the system which learns the spatial-temporal representation and their semantic meanings of the surrounding environment of the autonomously driving vehicle (ADV), i.e., the system that “perceives” the world around the car. We work jointly with downstream teams on the optimization and integration into the Waymo Driver. We conduct our own research to address real-world problems and collaborate with research teams at Alphabet. We have access to millions of miles of driving data from a diverse set of sensors, enabling engineers like you to: - Develop methods for efficiently and continuously learning from large scale real-world data. - Develop models and model training at scale. - Analyze real-world behavior and develop systems for handling the complexities of interacting with the real-world. - Optimize models for our onboard and offboard hardware. You will: - Design VLM/LLM model architecture and drive strong alignment between model architectures and hardware architectures. - Optimize model performance for on-device use cases (memory, power, compute constrained environments). - Engage directly with research, software engineering, hardware engineering, and product teams to deliver end-to-end solutions. Qualifications - 7+ years of experience in Machine Learning, with a focus on large-scale model development (LLM, VLM, or similar foundation models). - Proven expertise in low-latency on-device inference techniques and a deep understanding of hardware acceleration. - Extensive experience with deep learning frameworks (e.g. PyTorch, JAX) and large-scale model training. - A track record of operating effectively under ambiguity, setting direction amid rapidly evolving research and technical constraints. - Experience applying large language models or foundation models in complex, safety-critical domains (e.g., autonomy, robotics, or other high-reliability systems). - Master's degree in Computer Science, Electrical Engineering, or a related field, or equivalent practical experience. Requirements - Familiarity with large-scale data curation and quality assurance processes for multimodal datasets. - Background in autonomous vehicle perception, motion planning, or decision-making systems. - Publications in top-tier machine learning or computer vision conferences (e.g., NeurIPS, ICML, CVPR, ICCV, ECCV). - PhD in a relevant field. Benefits - Participation in Waymo’s discretionary annual bonus program. - Equity incentive plan. - Generous Company benefits program, subject to eligibility requirements. Salary Range The expected base salary range for this full-time position across US locations is: $298,000 — $368,000 USD Actual starting pay will be based on job-related factors, including exact work location, experience, relevant training and education, and skill level. Your recruiter can share more about the specific salary range for the role location or, if the role can be performed remote, the specific salary range for your preferred location, during the hiring process.

View details: Senior Staff Machine Learning Engineer, LLM/VLM Model Architecture & Optimization

United States

$298K - $368K / year

Apply

Senior Machine Learning Engineer

Zendesk

The intelligent heart of customer experience.

Machine Learning Engineer4 days ago

Full Time RemoteTeam 5,001-10,000Since 2007H1B Sponsor

Company Site LinkedIn

• Own business metrics (e.g. churn reduction, AI attach rate, seller productivity) and form an opinionated point of view on what we should build next to move them • Deeply understand our users - how they work, where they struggle, what they need - and advocate for building the right things, not just building things right • Define success criteria for the intelligent systems you ship, measure whether they're working for users, and iterate until they are • Develop Python backend services and APIs that connect AI capabilities with modern web applications • Build and maintain the data infrastructure (Snowflake, dbt, Airflow) that powers our AI products • Stay abreast of the latest advancements in LLMs, agent architectures, evaluation frameworks, and AI-assisted development

Airflow Cloud Python SQL

View details: Senior Machine Learning Engineer

Portugal

Apply

Job Closed

MLOps Engineer

dv01

The Data Hub Between Lenders and Capital Markets

Machine Learning Engineer4 days ago

Full Time RemoteTeam 51-200Since 2014H1B Sponsor

Company Site LinkedIn

• Build and operate the ML lifecycle platform. Own the tooling that makes model development reproducible and production-ready, with MLflow (or comparable systems) at the center: experiment tracking, model registry, artifact and metadata management, and versioned, repeatable training and inference pipelines. • Own CI/CD and deployment for ML workloads. Build automated pipelines that move models from notebook to production safely, including packaging, containerization, automated testing and validation, staged rollouts, and rollback. • Make models observable and reliable in production. Stand up monitoring for model and service health, including latency, drift, data-quality, and cost signals, with alerting and clear runbooks so issues surface and resolve quickly. • Build the cloud-native foundations. Contribute to and manage containerized workloads on Kubernetes and codify infrastructure with infrastructure-as-code tooling such as Terraform, keeping environments consistent, secure, and reproducible. • Establish sensible guardrails. Implement infrastructure-level governance for ML systems, including access controls, deployment policies, and auditability, partnering with security and compliance to align with our risk and regulatory requirements. • Enable and mentor the teams you support. Define repeatable patterns and shared services that reduce friction for data and application teams, provide technical guidance and mentorship to junior engineers, and contribute to the direction of dv01's MLOps practices.

Cloud Kubernetes Python PyTorch Terraform Go

View details: MLOps Engineer

United States

$185K - $200K / year

Apply

Machine Learning Engineer

Coinbase

We're building an open financial system for the world.

Machine Learning Engineer4 days ago

Full Time RemoteTeam 1,001-5,000Since 2012H1B Sponsor

Company Site LinkedIn

• Build and improve the orchestration layer that manages state transitions, context sharing, and intent routing across vendor and internal LLM frameworks in a distributed conversational environment. • Develop production-grade Python services that bridge advanced AI and ML capabilities with reliable customer-facing products. • Drive well-scoped ML projects from design through delivery, balancing technical trade-offs and collaborating across teams. • Contribute to system design, coding standards, and AI/ML development best practices across the team. • Partner with engineers and cross-functional stakeholders to build secure, scalable, and high-performing AI-enabled experiences. • Participate in design reviews and help ensure features meet Coinbase standards for security, compliance, and performance.

AWS Python

View details: Machine Learning Engineer

India

₹4,408.4K / year

Apply

Job Closed

Senior MLOps Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More Machine Learning Engineer Jobs

Senior Staff Machine Learning Engineer, LLM/VLM Model Architecture & Optimization

Senior Machine Learning Engineer

MLOps Engineer

Machine Learning Engineer