We power commerce through conversation by enabling brands to sell, market, and engage their customers—all via text.
Senior Machine Learning Engineer – Inference Platform
Location
United States
Posted
77 days ago
Salary
0
Seniority
Senior
Job Description
Senior Machine Learning Engineer – Inference Platform
Wizard
• Own and evolve our multi-engine inference platform, supporting a variety of model types and serving requirements. • Build and improve production ML pipelines — taking models from experimentation to reliable, high-throughput serving. • Define and implement model versioning, rollout, rollback, and lifecycle management strategies that ensure reproducibility and operational reliability. • Define and enforce serving-layer SLAs, including latency, availability, GPU utilization, Time-to-First-Token (TTFT), and Inter-Token Latency (ITL). • Build observability, monitoring, alerting, and operational tooling for production inference systems. • Apply software engineering best practices, including testing, CI/CD integration, and reproducibility across ML workflows. • Optimize inference performance through efficient resource utilization, hardware-aware serving strategies, and cost-conscious infrastructure design. • Ensure ML serving systems are secure, scalable, and operationally resilient. • Partner with ML, Data, Product, and DevOps teams to turn ideas into production systems, driving the technical decisions on serving and scale.
Job Requirements
- Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related field, or equivalent practical experience.
- 5–8+ years of experience in Software Engineering, ML Engineering, Platform Engineering, or Infrastructure Engineering, with direct ownership of production ML serving systems.
- Hands-on experience running an LLM serving engine (vLLM, TGI, TensorRT-LLM, or SGLang) in production under real load — not just managed or hosted endpoints.
- Strong Python skills and software engineering fundamentals, combined with deep systems and infrastructure knowledge.
- Experience with cloud platforms such as AWS, GCP, or Azure, and familiarity with ML lifecycle tooling, experimentation platforms, and model registries.
- Strong grasp of inference performance — continuous batching, KV-cache and GPU-memory behavior, quantization, and CPU-versus-GPU bottlenecks — with the instinct to profile before tuning.
- Experience serving heterogeneous workloads, including LLMs, embedding models, and extraction models, each with distinct latency, throughput, and scaling requirements.
- Demonstrated ability to balance latency, throughput, reliability, and infrastructure cost while operating production-scale ML systems.
- Experience in high-growth startup environments and comfort operating in fast-moving, evolving technical landscapes.
Benefits
- Health insurance
- Flexible work arrangements
- Professional development opportunities
Related Guides
Related Job Pages
More Machine Learning Engineer Jobs
• Design, build, and operate Samsara’s end-to-end ML platform spanning training, experimentation, batch and online inference, and edge deployment • Partner with product and applied ML teams to design, launch, and iterate ML-powered features (e.g., backend CV models, EcoDriving insights, LLM-based reporting) • Lead throughput and cost estimation for new ML features—from early-stage exploration to production-scale capacity planning • Collaborate on experiment design and evaluation, including defining success metrics, structuring A/B tests or offline evaluations • Evolve shared training and experimentation infrastructure (e.g., job orchestration, cluster configuration, environment management) • Design and operate scalable online and batch inference systems (Ray- and Spark-based) • Partner with firmware and edge teams to define workflows for packaging, validating, and deploying models to Samsara devices • Own the reliability, observability, and security posture of ML systems across cloud and edge environments • Provide Staff+/Senior-Staff-level technical leadership by setting architecture and strategy for ML infrastructure • Drive strong developer experience through documentation, office hours, and best practices • Own or co-own end-to-end technical delivery for high-priority or high-risk initiatives.
• Weiterentwicklung leistungsstarker Computer-Vision-Architekturen zur Erkennung und Diagnose medizinischer Befunde in komplexen radiologischen Datensätzen (MRT, CT). • Konzeption und Implementierung rigoroser Validierungsframeworks zur Sicherstellung von Modellrobustheit, klinischer Wirksamkeit und Konformität mit Standards der Medizinprodukte-Zertifizierung. • Entwicklung von Lösungen mit klinischem Vertrauen im Fokus, insbesondere durch Modellinterpretierbarkeit und Unsicherheitsquantifizierung, um umsetzbare Erkenntnisse für medizinisches Fachpersonal zu liefern. • Optimierung skalierbarer ML-Pipelines in einer modernen Docker- und AWS-Umgebung, um einen reibungslosen Übergang von experimenteller Forschung zu produktionsreifen Deployments zu gewährleisten.
Senior AI Quality/Evaluation Engineer
International Data GroupAt IDC, your work helps shape how the world understands technology and where it goes next. You collaborate with curious, high-caliber colleagues who value rigor, integrity, and shared success. As the premier global provider of trusted technology intelligence, IDC equips business and technology leaders with the evidence they need to make confident decisions. Our insights inform strategy, investment, and innovation across industries and regions. Recognized by IIAR as Analyst Firm of the Year for five consecutive years, IDC sets the standard for credibility and impact. With more than 1,000 analysts worldwide and a truly global perspective, we combine deep expertise with practical relevance. Here, your ideas matter, your voice is heard, and your contributions provide the insights leaders rely on every day. It is meaningful work, backed by a culture that supports growth, collaboration, and long-term career development with a globally respected brand.
Role Description IDC is building the next generation of AI-powered intelligence platforms that transform how technology decisions get made. We are looking for a Senior AI Quality/Evaluation Engineer to establish the evaluation function for the platform's AI systems. This is a solo function initially. You will design and build the evaluation infrastructure that ensures the platform produces accurate, well-sourced, high-quality responses. You will be the first hire in this function and must be able to operate independently, defining your own roadmap and building from scratch. The platform's credibility depends on the quality of its AI-generated intelligence. You will build the automated test suites, regression detection systems, and evaluation frameworks that catch quality issues before they reach users. You will work closely with the product team to translate quality criteria into measurable, automatable test scenarios, and with the AI engineering team to ensure that every pipeline change is evaluated against rigorous standards. What You’ll Do - Design and build the evaluation infrastructure that ensures the platform's AI systems produce accurate, well-sourced, high-quality responses. - Build automated test suites that validate answer quality across agent pipeline changes. - Develop regression detection systems that catch quality degradation before it reaches users. - Create evaluation frameworks that measure response accuracy, citation correctness, and source quality. - Work closely with the product team to translate quality criteria into measurable, automatable test scenarios. - Build cost and latency monitoring that tracks the operational efficiency of AI pipeline execution. - Define evaluation standards and practices that scale as the platform and team grow. Qualifications - 6+ years of software engineering experience, with significant work in testing infrastructure, ML evaluation, or quality systems. - Experience building evaluation or testing frameworks for LLM-based or ML-based systems. - Understanding of how to measure response quality for generative AI: accuracy, groundedness, citation correctness, relevance. - Proficiency in Python. - Ability to operate independently and define your own roadmap. - Experience working at the intersection of engineering and product, translating qualitative quality criteria into quantitative measurements. - Experience with LLM evaluation frameworks (e.g., RAGAS, DeepEval, or custom). - Familiarity with LLM observability tools (e.g., Langfuse, LangSmith, Weights & Biases). - Background in statistical methods for quality measurement (significance testing, distribution analysis). - Experience building A/B testing or experimentation infrastructure. - Background in search relevance evaluation or information retrieval metrics. Benefits - 15 vacation days per year (increases with tenure; carryover allowed). - 10 paid sick days per year. - 1 week paid new parenting leave. - Flexible work options (remote, part-time, flexible hours). - Health, dental, vision, and paramedical coverage for you and your family. - $1,600 annual healthcare spending account. - Employee Assistance Program for counseling and support. - Best Doctors medical second opinions. - Life, AD&D, and long-term disability insurance. - Retirement savings plan with company match (up to 4% of salary). - $75/month technology allowance for home office or phone expenses. - Company-paid cell phone plan.
• Design, build, and scale experimentation and causal inference services • Develop and maintain advanced statistical and ML modules • Build and extend RESTful APIs using FastAPI • Design and optimize large-scale data pipelines using PySpark



