Mozilla logo
Mozilla

Feel good about your work again.

Senior Machine Learning Engineer, AI Platform

AI EngineerMachine Learning EngineerFull TimeRemoteSeniorTeam 501-1,000Since 1998H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

15 hours ago

Salary

$139K - $218K / year

Seniority

Senior

Bachelor Degree4 yrs expEnglishCloudDistributed SystemsPython

Job Description

Senior Machine Learning Engineer, AI Platform

Mozilla

• Design, build, and operate core AI platform components used to train, deploy, and serve machine learning models in production environments. • Own model serving and inference workflows end-to-end, driving improvements in reliability, scalability, performance, and operational excellence. • Lead efforts to optimize inference systems for throughput, latency, and cost efficiency across CPU and GPU workloads. • Design and manage GPU-based inference and training workloads, including performance tuning, capacity planning, and resource utilization optimization. • Own and improve critical parts of the model lifecycle, including packaging, versioning, testing strategies, validation, and deployment automation. • Implement and evolve observability practices (metrics, logging, tracing, alerting) to improve visibility and operational resilience of ML services and pipelines. • Partner closely with product, infrastructure, security, and data teams to design scalable platform capabilities that enable AI-powered features. • Contribute to technical design discussions, propose architectural improvements, and mentor junior engineers through code reviews and knowledge sharing. • Participate in and help improve operational processes, including incident response, on-call rotations, and post-incident reviews.

Job Requirements

  • Bachelor’s degree with 4–6 years of relevant industry experience, or Master’s degree with significant hands-on experience building and operating production ML systems, or work experience equivalent
  • Strong experience developing in Python for machine learning systems, backend services, or distributed data processing.
  • Proven experience deploying and operating ML workloads in cloud environments, including production-grade infrastructure.
  • Solid understanding of model serving architectures, inference pipelines, and performance tradeoffs (latency, throughput, cost, scaling strategies).
  • Hands-on experience working with GPU-based workloads and accelerated computing in production settings.
  • Experience designing CI/CD pipelines and development workflows that support reliable ML system deployment.
  • Ability to independently scope and drive technical initiatives while balancing product and operational priorities.
  • Strong problem-solving skills and the ability to debug performance and reliability issues in distributed systems.
  • Clear and effective communication skills, with experience collaborating across engineering, product, and infrastructure teams.

Benefits

  • Generous performance-based bonus plans to all eligible employees - we share in our success as one team
  • Rich medical, dental, and vision coverage
  • Generous retirement contributions with 100% immediate vesting (regardless of whether you contribute)
  • Quarterly all-company wellness days where everyone takes a pause together
  • Country specific holidays plus a day off for your birthday
  • One-time home office stipend
  • Annual professional development budget
  • Quarterly well-being stipend
  • Considerable paid parental leave
  • Employee referral bonus program
  • Other benefits (life/AD&D, disability, EAP, etc. - varies by country)

Related Job Pages

More AI Engineer Jobs

Figma logo

Support AI Engineer

Figma

Figma was founded in 2012 to build a collaborative, professional-grade interface design tool for the digital age. Created specifically for interface design and built entirely in th

AI Engineer16 hours ago

Role Description Figma is evolving the Product Support experience, powered by AI, automation, and integrated systems. The AI Infrastructure & Tooling team helps make that possible by building intelligent, resilient, and integrated solutions that automate workflows, connect systems, and streamline support operations. As a Support AI Engineer on this team, you'll be the technical execution layer that brings our support tools, customer and account context, internal systems, and AI workflows together. This role is ideal for someone who can move from ambiguous support problems to working technical solutions: - Understanding the workflow - Identifying the systems involved - Building the integration or automation - Validating the data flow - Measuring the impact on customer outcomes and Specialist efficiency This is a full-time role that can be held from one of our US hubs or remotely in the United States. What you'll do at Figma: - Build and operationalize AI-powered workflows that improve Product Support experiences for customers and internal support teams. - Design and maintain integrations across Decagon, Zendesk, Figma admin tooling, internal data sources, and adjacent Product Support platforms. - Bring relevant customer, account, product, billing, file, or admin metadata into support conversations so chatbots and Specialists have the context they need to resolve issues more effectively. - Use LLMs and AI patterns for classification, summarization, routing, recommendations, context enrichment, and workflow automation. - Partner with Engineering, Analytics, Security, Programs, Support, and vendor teams to align on requirements, implementation, governance, and rollout. - Build quality checks, monitoring, fallback paths, and operational guardrails so AI-powered workflows can be trusted in production. - Define success metrics for each workflow, track adoption and impact, and iterate based on customer outcomes, Specialist efficiency, and adoption. Qualifications - 3+ years of experience shipping integrations, automations, or internal tools across customer-facing operational systems. - Strong coding or scripting ability, including experience with APIs, webhooks, data flows, and system and workflow data integrations. - Hands-on experience with LLM-powered workflows, AI automations, or AI-enabled customer/support experiences, including working with operational data to debug issues, improve workflows, and measure impact. - Strong product and stakeholder instincts: you can translate ambiguous support problems into practical, adopted, and measurable technical solutions. - Proven track record of designing AI workflows with clear guardrails, fallback paths, and responsible deployment practices. Requirements - Experience with support platforms like Zendesk, Decagon, Sprinklr, Gainsight, Maestro QA/Rippit, Assembled, Salesforce, or similar systems. - Familiarity with agent assist tooling, AI support chatbots, copilot tooling, RAG, AI observability, or monitoring AI workflows in production. - Experience building internal Slack tooling, workflow automations, or embedded support experiences. - Background in Support Engineering, Internal Tools Engineering, Solutions Engineering, Support Operations, CX Systems, or Business Systems. - Familiarity with customer support metrics such as containment, deflection, CSAT, first contact resolution, routing accuracy. Benefits - Equity to employees - Competitive package of additional benefits, including health, dental & vision - Retirement with company contribution - Parental leave & reproductive or family planning support - Mental health & wellness benefits - Generous PTO - Company recharge days - Learning & development stipend - Work from home stipend - Cell phone reimbursement - Sales incentive pay for most sales roles - Annual bonus plan for eligible non-sales roles

United States
$140K - $202K / year
Full TimeRemoteTeam 11-50Since 2018H1B No Sponsor

• Take complete ownership and deliver major AI engineering features within agreed timelines. • Own AI output quality, structure, and predictability across all user-facing AI interactions. • Design, implement, and maintain output-type-based AI systems, including segmentation, routing, and enforcement. • Ensure consistent output structure and formatting across different LLMs for the same request type. • Integrate and orchestrate multiple LLM providers via OpenRouter, managing model selection, fallback strategies, and cost optimisations. • Design and orchestrate tool-using and agentic AI workflows, defining clean tool contracts (including MCP-based tools), function-calling interfaces, and reliable AI-to-system integrations. • Build and maintain complex, multi-step LLM workflows, including with orchestration frameworks such as LangChain or LlamaIndex, for advanced reasoning, context reuse, and retrieval. • Design and manage production prompt systems with dynamic prompting, context injection, and conditional logic. • Own the deployment and release of LLM experiments, prompt management, and Langfuse-based evaluation pipelines. • Run A/B tests across models, analyse results, and present data-driven impact assessments of AI features and experiments. • Monitor AI system metrics, quality signals, latency, and release health using Langfuse and other observability tools. • Deep-debug complex LLM chains using Langfuse traces, identifying bottlenecks and optimising for cost, latency, and context-window usage, and build output-scoring systems to root-cause hallucinations and logic errors. • Write clean, scalable, and maintainable TypeScript code across the Next.js and Node.js stack. • Build reliable backend logic for AI systems, with strong error handling, request validation, fallback flows, and predictable behaviour in production, including reliable tool execution and AI-to-service integrations. • Ensure high code quality through testing, code reviews, and clear engineering standards. • Monitor, troubleshoot, and improve production performance, reliability, and system health. • Drive maintainability and technical quality through solid architecture, refactoring, and disciplined release practices.

Europe
Full TimeRemoteTeam 11-50Since 2018H1B No Sponsor

• Implement AI-powered product features and workflows based on specifications from senior engineers. • Write, iterate on, and manage production prompt templates following established patterns for dynamic prompting and context injection. • Participate in prompt experimentation workflows by drafting variants, running A/B tests across models via OpenRouter, and documenting results. • Integrate structured output schemas (JSON mode, function calling, Zod/JSON schemas) to ensure AI responses are predictable and application-ready. • Build and maintain output enforcement mechanisms such as validators and repair loops under senior guidance. • Help implement tool calling and function calling integrations so AI features can fetch data or trigger actions, following patterns established by senior engineers. • Contribute to evaluation pipelines and help assess prompt and model quality using Langfuse. • Write clean, maintainable TypeScript code within the Next.js and Node.js stack. • Participate in code reviews and incorporate feedback. • Document technical decisions and system behaviour to support knowledge sharing. • Collaborate with product, growth, data, and billing teams to deliver features on time.

Turkey
Full TimeRemoteTeam 11-50Since 2018H1B No Sponsor

• Implement AI-powered product features and workflows based on specifications from senior engineers. • Write, iterate on, and manage production prompt templates following established patterns for dynamic prompting and context injection. • Participate in prompt experimentation workflows by drafting variants, running A/B tests across models via OpenRouter, and documenting results. • Integrate structured output schemas (JSON mode, function calling, Zod/JSON schemas) to ensure AI responses are predictable and application-ready. • Build and maintain output enforcement mechanisms such as validators and repair loops under senior guidance. • Help implement tool calling and function calling integrations so AI features can fetch data or trigger actions, following patterns established by senior engineers. • Contribute to evaluation pipelines and help assess prompt and model quality using Langfuse. • Write clean, maintainable TypeScript code within the Next.js and Node.js stack. • Participate in code reviews and incorporate feedback. • Document technical decisions and system behaviour to support knowledge sharing. • Collaborate with product, growth, data, and billing teams to deliver features on time.

Europe