Clutch logo
Clutch

Expert consulting elevated by human connection

Senior MLOps Engineer

Machine Learning EngineerMachine Learning EngineerContractRemoteSeniorTeam 51-200H1B SponsorCompany SiteLinkedIn

Location

Brazil

Posted

4 days ago

Salary

0

Seniority

Senior

Bachelor Degree8 yrs expEnglishAWSCloudDockerPySparkPythonTerraformTypeScript

Job Description

Senior MLOps Engineer

Clutch

• Take ownership of the ML serving API that serves NBA recommendations, partnering with the data engineer who's been building it, and harden it for low-latency production traffic • Build the first repeatable deployment pipeline: model artifact → versioned, deployable, rollback-able production service, with infrastructure defined as code • Stand up the monitoring foundation: latency/error/drift dashboards, alerting, and audit/trace visibility across models and agents • Build a working relationship with HAL and become the data team's go-to on ML serving and reliability decisions • Be the primary owner (with data engineer support) of the ML serving platform and deployment pipelines for NBA and our ML models • Have at least one production model and one production agent fully instrumented — versioning, monitoring, alerting, and multi-tenant gating in place • Define the data team's playbook for shipping a new ML model to production, end-to-end • Drive architectural decisions across APIs, processing pipelines, distributed compute, storage, search, observability, cloud infrastructure, and model-serving workflows • Mentor the data engineers on MLOps patterns so they can confidently support and extend the systems you own • Operate as the technical lead within the data team for NBA production ML operations — the person other teams come to when they want to understand how Clutch ships and runs ML reliably • Have measurably improved cost and latency • Be shaping the data team's roadmap for the next generation of ML infrastructure, in partnership with the PM and data scientist • Help us decide what to hire next as the team scales

Job Requirements

  • 8+ years of experience in software, data, or ML engineering, with 4–5+ years running ML systems in production — you've taken models from prototype to production and own what happens after deploy
  • Strong Python — most of the work (serving API, pipelines, tooling, data pipelines) is in Python, and you're comfortable in production codebases, not just notebooks. Some TypeScript is involved for integration with our agent runtime — you don't need to be an expert, comfort with a second language is enough
  • CI/CD & deployment discipline. You build training and deploy pipelines that take a model artifact to a versioned, deployable, rollback-able production service, with automated testing and reproducible builds. You've implemented CI/CD for ML and built and maintained CI/CD pipelines (GitHub Actions, Bamboo, GitLab CI, or similar)
  • Infrastructure as code. You manage cloud infrastructure (AWS Lambda, ECS) with Terraform or equivalent — no click-ops, everything reviewable and reproducible
  • Monitoring & observability discipline. You instrument serving systems for latency, error rates, drift, and cost; you read audit rows and distributed traces; you set up alerting so regressions are caught before users feel them. You treat monitoring as a first-class deliverable, not an afterthought
  • Reliability rigor. You design for failure: structured error handling, graceful degradation, rollback paths, and runbooks. You have a story about a production incident you handled and how you hardened the system afterward
  • Experience building and operating low-latency production APIs (FastAPI, BentoML, or equivalent), with opinions on serving, batching, and caching
  • Comfortable in AWS (Lambda especially), containers (Docker), and GitHub-based workflows
  • Security & governance. You ensure security and governance across systems: IAM, KMS, access policies, and Secrets Manager/SSM
  • DevOps / infrastructure knowledge, plus data manipulation and feature engineering
  • Solid understanding of ML concepts: models, pipelines, metrics, and supervised/unsupervised learning
  • Integrate and optimize AI/ML services with the company's other systems
  • You use AI tooling actively in your engineering workflow — not as a novelty, but as a default. You'll be expected to demonstrate this during the technical evaluation
  • Databricks, PySpark

Benefits

  • Remote Flexibility: Enjoy the freedom of remote work from anywhere, balancing life and career seamlessly.
  • Unforgettable Off-Sites: Twice a year, bond with colleagues in exciting destinations, fostering teamwork and fresh ideas.
  • Paid Time Off and National Holidays: Enjoy 20 PTO days yearly and the National Holidays for relaxation and rejuvenation.
  • Stock Options: Joining us means having a stake in our success, so you'll receive stock options as part of your compensation package.
  • Home Office Setup: Create your ideal workspace with a dedicated budget for home office essentials.
  • Work Trip Budget: Grow personally and professionally with a budget for work-related trips and co-working.

Related Job Pages

More Machine Learning Engineer Jobs

Waymo logo

Senior Staff Machine Learning Engineer, LLM/VLM Model Architecture & Optimization

Waymo

Waymo is an autonomous driving technology company creating a new way forward in mobility.

Full TimeRemoteTeam 1,001-5,000Since 2016H1B Sponsor

Role Description The Perception team builds the system which learns the spatial-temporal representation and their semantic meanings of the surrounding environment of the autonomously driving vehicle (ADV), i.e., the system that “perceives” the world around the car. We work jointly with downstream teams on the optimization and integration into the Waymo Driver. We conduct our own research to address real-world problems and collaborate with research teams at Alphabet. We have access to millions of miles of driving data from a diverse set of sensors, enabling engineers like you to: - Develop methods for efficiently and continuously learning from large scale real-world data. - Develop models and model training at scale. - Analyze real-world behavior and develop systems for handling the complexities of interacting with the real-world. - Optimize models for our onboard and offboard hardware. You will: - Design VLM/LLM model architecture and drive strong alignment between model architectures and hardware architectures. - Optimize model performance for on-device use cases (memory, power, compute constrained environments). - Engage directly with research, software engineering, hardware engineering, and product teams to deliver end-to-end solutions. Qualifications - 7+ years of experience in Machine Learning, with a focus on large-scale model development (LLM, VLM, or similar foundation models). - Proven expertise in low-latency on-device inference techniques and a deep understanding of hardware acceleration. - Extensive experience with deep learning frameworks (e.g. PyTorch, JAX) and large-scale model training. - A track record of operating effectively under ambiguity, setting direction amid rapidly evolving research and technical constraints. - Experience applying large language models or foundation models in complex, safety-critical domains (e.g., autonomy, robotics, or other high-reliability systems). - Master's degree in Computer Science, Electrical Engineering, or a related field, or equivalent practical experience. Requirements - Familiarity with large-scale data curation and quality assurance processes for multimodal datasets. - Background in autonomous vehicle perception, motion planning, or decision-making systems. - Publications in top-tier machine learning or computer vision conferences (e.g., NeurIPS, ICML, CVPR, ICCV, ECCV). - PhD in a relevant field. Benefits - Participation in Waymo’s discretionary annual bonus program. - Equity incentive plan. - Generous Company benefits program, subject to eligibility requirements. Salary Range The expected base salary range for this full-time position across US locations is: $298,000 — $368,000 USD Actual starting pay will be based on job-related factors, including exact work location, experience, relevant training and education, and skill level. Your recruiter can share more about the specific salary range for the role location or, if the role can be performed remote, the specific salary range for your preferred location, during the hiring process.

United States
$298K - $368K / year
Zendesk logo

Senior Machine Learning Engineer

Zendesk

The intelligent heart of customer experience.

Full TimeRemoteTeam 5,001-10,000Since 2007H1B Sponsor

• Own business metrics (e.g. churn reduction, AI attach rate, seller productivity) and form an opinionated point of view on what we should build next to move them • Deeply understand our users - how they work, where they struggle, what they need - and advocate for building the right things, not just building things right • Define success criteria for the intelligent systems you ship, measure whether they're working for users, and iterate until they are • Develop Python backend services and APIs that connect AI capabilities with modern web applications • Build and maintain the data infrastructure (Snowflake, dbt, Airflow) that powers our AI products • Stay abreast of the latest advancements in LLMs, agent architectures, evaluation frameworks, and AI-assisted development

Portugal
Job Closed
dv01 logo

MLOps Engineer

dv01

The Data Hub Between Lenders and Capital Markets

Full TimeRemoteTeam 51-200Since 2014H1B Sponsor

• Build and operate the ML lifecycle platform. Own the tooling that makes model development reproducible and production-ready, with MLflow (or comparable systems) at the center: experiment tracking, model registry, artifact and metadata management, and versioned, repeatable training and inference pipelines. • Own CI/CD and deployment for ML workloads. Build automated pipelines that move models from notebook to production safely, including packaging, containerization, automated testing and validation, staged rollouts, and rollback. • Make models observable and reliable in production. Stand up monitoring for model and service health, including latency, drift, data-quality, and cost signals, with alerting and clear runbooks so issues surface and resolve quickly. • Build the cloud-native foundations. Contribute to and manage containerized workloads on Kubernetes and codify infrastructure with infrastructure-as-code tooling such as Terraform, keeping environments consistent, secure, and reproducible. • Establish sensible guardrails. Implement infrastructure-level governance for ML systems, including access controls, deployment policies, and auditability, partnering with security and compliance to align with our risk and regulatory requirements. • Enable and mentor the teams you support. Define repeatable patterns and shared services that reduce friction for data and application teams, provide technical guidance and mentorship to junior engineers, and contribute to the direction of dv01's MLOps practices.

United States
$185K - $200K / year
Coinbase logo

Machine Learning Engineer

Coinbase

We're building an open financial system for the world.

Full TimeRemoteTeam 1,001-5,000Since 2012H1B Sponsor

• Build and improve the orchestration layer that manages state transitions, context sharing, and intent routing across vendor and internal LLM frameworks in a distributed conversational environment. • Develop production-grade Python services that bridge advanced AI and ML capabilities with reliable customer-facing products. • Drive well-scoped ML projects from design through delivery, balancing technical trade-offs and collaborating across teams. • Contribute to system design, coding standards, and AI/ML development best practices across the team. • Partner with engineers and cross-functional stakeholders to build secure, scalable, and high-performing AI-enabled experiences. • Participate in design reviews and help ensure features meet Coinbase standards for security, compliance, and performance.

India
₹4,408.4K / year
Job Closed