iHerb, LLC logo
iHerb, LLC

Come join the movement....we are a vehicle to healthy living!

Principal Machine Learning Engineer

Machine Learning EngineerMachine Learning EngineerFull TimeRemoteLeadTeam 1,001-5,000Since 1996H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

1 day ago

Salary

$205K - $230K / year

Seniority

Lead

Job Description

Principal Machine Learning Engineer

iHerb, LLC

• Partner with the Data Platform team in a two-way exchange of best practices • Adopt common patterns and build effective abstractions across different machine learning pipelines that simplify existing machine learning processes and accelerate the modelling process from the business problem’s inception to deploying a model solution into production • Develop horizontal solutions to robustly scale the team’s machine learning models and processes • Build software with Object-oriented Design Patterns and Analysis (OOA and OOD) with an eye toward reducing technical debt and maintaining services at high availability • Participate in requirements reviews, design reviews, and code reviews • Research and prototype new technologies to support the rapid growth of the business • Interact cross-functionally with a wide variety of technical teams and work closely with data and applied scientists to identify opportunities to improve on iHerb’s platform

Job Requirements

  • Strong coding experience (e.g. Java, C#, Python)
  • Experience with gathering data from multiple sources using big data technologies (Spark, Hadoop, BigQuery, Athena, etc.)
  • Experience building machine learning infrastructure following robust software engineering practices
  • Knowledge of modern software development tools, systems, and practices (design patterns, CI/CD, git, unit testing, smoke testing, integration testing, job schedulers, cloud technologies like AWS Lambdas and Google functions, etc.)
  • Exposure to all aspects of the software development life-cycle
  • Experience with messaging technologies (Kafka, Google Pub/Sub, Kinesis, RabbitMQ, etc.)
  • Experience with Docker and Kubernetes
  • High degree of accuracy and attention to detail
  • Excellent organization skills and ability to multitask

Benefits

  • Health insurance
  • 401(k) plan
  • Time Off and Paid Sick Leave
  • Paid holidays
  • Eligible for Restrict Stock Units and annual bonuses

Related Job Pages

More Machine Learning Engineer Jobs

Full TimeRemoteTeam 10,001+Since 1912H1B Sponsor

• Own ML production reliability strategy • Define and lead the operational strategy for production ML systems, including monitoring, traceability, deployment safety, incident response, and post-deployment validation. • Set the standards ML teams use to assess model health, performance, and trustworthiness in production. • Own model traceability and governance • Ensure every production model has clear lineage (data, features, code, artifacts, validation, deployment history) and drive adoption of model registry and metadata tooling across ML teams. • Build end-to-end ML observability • Design and implement monitoring across the full ML signal path: data arrival, feature freshness, distribution stability, candidate generation, ranking behavior, model metrics, serving latency, and SLA performance. • Define production health metrics • Partner with ML, data, product, and business stakeholders to define post-deployment metrics covering model quality, system reliability, business guardrails, and degradation indicators. • Detect drift and degradation proactively • Detect data drift, feature drift, model behavior changes, and silent failures before they impact customers via thresholding, alerting, anomaly detection, and release-over-release monitoring. • Lead diagnostic tooling and root-cause analysis • Build dashboards, logs, and diagnostic workflows that progress quickly from 'recommendations look off' to root cause, with context captured across candidates, features, scores, ranking decisions, and downstream outcomes. • Own ML deployment safety • Define and operate automated gates that prevent bad models or bad data from being promoted to production. • Partner with MLEs to establish validation checks, rollback criteria, canary strategies, shadow testing, and release health reviews. • Lead ML incident response • Own incident response practices for ML systems, including rollback playbooks, hotfix strategies, severity definitions, tradeoff frameworks, communications, and post-mortems. • Drive closure of systemic gaps after incidents rather than only resolving the immediate issue. • Partner across ML Platform, Data, and ML Partner with DevOps/Platform on infrastructure and observability needs; with Data Engineering on data quality, drift, and freshness; and with ML Engineering to embed operational requirements into development and deployment workflows. • Set standards and mentor others Act as the technical lead for ML operations: establish reusable patterns, playbooks, and standards, and mentor engineers on reliability, observability, and operational rigor.

New York
$157K - $235K / year
NBCUniversal logo

Deep Learning Engineer

NBCUniversal

NBCUniversal is a media and entertainment company that develops, produces, and markets a variety of entertainment and news programs internationally. NBCUniversal sets out each day

• Implement core deep-learning, computer vision, and (inverse-)procedural modeling algorithms in Python • Apply cutting-edge research in machine learning and computer graphics to solve real-world problems • Work closely with our cofounders to understand high-level product vision and translate customer requirements into technical milestones • Interact with remote machines via a Unix shell to deploy and test code on large-scale geospatial datasets, ultimately generating 3D content for our customers • Use Git to manage source code and modularize complex implementation tasks into manageable, executable components

New York
$160K - $175K / year
Cash App logo

Staff Applied Machine Learning Engineer - Intelligent Data, Signals & Systems

Cash App

Initially built to take the pain out of peer-to-peer payments, Cash App has gone from a simple product with a single purpose to a dynamic app, bringing a better way to send, spend, invest, borrow and save to our millions of monthly active users. With a mission to redefine the world's relationship with money by making it more relatable, instantly available and universally accessible.

Full TimeRemoteTeam 3,500Since 2013

Block builds simple, powerful tools that make progress towards an economy that's truly open to all. Each of our brands unlocks different aspects of the economy for more people. Square makes commerce and financial services accessible to sellers. Cash App is the easy way to spend, send, and store money. Afterpay is transforming the way customers manage their spending over time. TIDAL is a music platform that empowers artists to thrive as entrepreneurs. Bitkey is a simple self-custody wallet built for bitcoin. Proto is a suite of bitcoin mining products and services. Together, we're helping build a financial system that is open to everyone. Join us. The Role As a Staff Applied Machine Learning Engineer focused on Intelligent Data, Signals & Systems, you will build production ML systems that transform customer behavior, product context, model outputs, and feedback loops into trusted signals used by recommendations, ranking, risk-aware decisioning, growth, and customer intelligence systems. This role centers on customer intelligence and reusable model-derived signal systems: ranking and retrieval, recommendations, search, propensity and churn/LTV, next-best-action decisioning, experimentation, and feedback loops. These systems help product, growth, fraud, and risk teams make better decisions with clear freshness, provenance, confidence, and evaluation guarantees. The work combines production ML systems with composable signal interfaces that can be consumed by product surfaces, decision engines, internal tools, and verified AI-assisted workflows. The role is flexible across Applied ML Engineering domains while still requiring deep expertise. You Will - Build and operate production ML systems that turn customer and product context into trusted signals, rankings, recommendations, and decision capabilities. - Design production data and signal contracts that define intended use, freshness, provenance, confidence, eligibility, and calibration for downstream consumers. - Own ranking, retrieval, recommendation, search, propensity, and next-best-action systems end to end, from feature and candidate generation through serving, experimentation, monitoring, and feedback loops. - Evaluate customer and business impact beyond short-term conversion, including trust, fairness, access, risk, compliance, long-term engagement, and segment-level performance. - Partner across product, growth, data, platform, modeling, risk, and compliance to translate ambiguous goals into measurable ML system designs. - Use AI and agents to accelerate development, analysis, testing, documentation, and operations while exposing reusable capabilities to product services, internal tools, and AI-assisted workflows. You Have - 12+ years building and operating production software and ML systems for business-critical products. - Deep expertise in intelligent systems such as ranking/retrieval, recommendations, search, personalization, growth and lifecycle ML, customer intelligence, propensity/churn/LTV, next-best-action, or model-derived risk signals. - Strong production ML judgment across feature pipelines, model serving, experimentation, monitoring, feedback loops, online/offline consistency, and reliable signal interfaces. - Ability to evaluate impact beyond short-term conversion, including trust, fairness, access, risk, compliance, and long-term engagement. - Experience using AI-assisted engineering tools with appropriate verification, testing, and review for customer-impacting systems. Nice to Have - Experience with semantic retrieval, embeddings, two-tower models, graph features, LLM-powered retrieval or decision systems, entity resolution, or real-time personalization. - Experience with experimentation, online evaluation, interleaving, counterfactual evaluation, multi-objective optimization, or long-term holdouts. - Experience building reusable feature/signal platforms, decision services, customer intelligence layers, model-derived data products, or agent-assisted operations. Technologies We Use and Teach We do not expect candidates to have used our exact stack. We do expect strong production engineering fundamentals, deep domain expertise in intelligent ML systems, and judgment about how ML-derived signals should be used safely in customer-impacting products. Examples of technologies and methods include: - Python, Java, Kotlin, SQL. - TensorFlow, PyTorch, XGBoost/LightGBM, ranking/retrieval systems, embeddings, semantic search, recommendation frameworks. - Event streams, batch pipelines, feature stores, model-serving infrastructure, workflow orchestration, experimentation systems, and data warehouses/lakehouses. - Cloud infrastructure, Kubernetes, observability tooling, coding agents, evaluation harnesses, and agent-assisted operations tooling. We're working to build a more inclusive economy where our customers have equal access to opportunity, and we strive to live by these same values in building our workplace. Block is an equal opportunity employer evaluating all employees and job applicants without regard to identity or any legally protected class. We will consider qualified applicants with arrest or conviction records for employment in accordance with state and local laws and "fair chance" ordinances. We believe in being fair, and are committed to an inclusive interview experience, including providing reasonable accommodations to disabled applicants throughout the recruitment process. We encourage applicants to share any needed accommodations with their recruiter, who will treat these requests as confidentially as possible. Want to learn more about what we're doing to build a workplace that is fair and square? Check out our I+D page . While there is no specific deadline to apply for this role, U.S. roles are typically open for an average of 55 days before being filled by a successful candidate. Please refer to the date listed at the top of this job page for when this role was first posted. Block takes a market-based approach to pay, and pay may vary depending on your location. U.S. locations are categorized into one of four zones based on a cost of labor index for that geographic area. The successful candidate's starting pay will be determined based on job-related skills, experience, qualifications, work location, and market conditions. These ranges may be modified in the future. To find a location's zone designation, please refer to this resource . If a location of interest is not listed, please speak with a recruiter for additional information. Zone A: $276,800 - $415,200 USD Zone B: $276,800 - $415,200 USD Zone C: $276,800 - $415,200 USD Zone D: $276,800 - $415,200 USD Application Guidelines Candidates may submit up to 9 active applications within a 60-day period. Reapplications to the same role are accepted 90 days after a previous application has been reviewed. Use of AI in Our Hiring Process We may use automated AI tools to evaluate job applications for efficiency and consistency. These tools comply with local regulations, including bias audits, and we handle all personal data in accordance with state and local privacy laws. Contact us here with hiring practice or data usage questions. Every benefit we offer is designed with one goal: empowering you to do the best work of your career while building the life you want. Remote work, medical insurance, flexible time off, retirement savings plans, and modern family planning are just some of our offering. Check out our other benefits at Block. Block, Inc. (NYSE: XYZ) builds technology to increase access to the global economy. Each of our brands unlocks different aspects of the economy for more people. Square makes commerce and financial services accessible to sellers. Cash App is the easy way to spend, send, and store money. Afterpay is transforming the way customers manage their spending over time. TIDAL is a music platform that empowers artists to thrive as entrepreneurs. Bitkey is a simple self-custody wallet built for bitcoin. Proto is a suite of bitcoin mining products and services. Together, we're helping build a financial system that is open to everyone.

California + 1 moreAll locations: California | Canada
AvaSure logo

Machine Learning Manager

AvaSure

AI-enabled virtual care—Purpose-built for every clinical setting

Full TimeRemoteTeam 201-500Since 2008H1B No Sponsor

• Lead the architecture and end-to-end execution of the ML lifecycle - data strategy, model development, deployment, and continuous operation - primarily for computer vision and LLM/agentic systems • Own the MLOps foundation: training and deployment pipelines, model serving, CI/CD for models, and reproducible experimentation • Set and enforce standards for model accuracy and quality (evaluation frameworks, offline and online metrics, regression testing of models, and A/B testing) and hold the team to defined targets • Ensure production AI systems are scalable and highly available: define service-level objectives for latency, throughput, and uptime, and establish monitoring, drift detection, alerting, and rollback practices • Plan and manage team workload: delegate tasks, set daily, weekly, and monthly goals, and track progress against them • Partner with product, data engineering, DevOps/infrastructure, and clinical stakeholders to align priorities and drive projects forward

Michigan
$180K - $200K / year