Principal AI/ML Engineer

Machine Learning EngineerMachine Learning EngineerFull TimeRemoteLeadTeam 1,001-5,000H1B No SponsorCompany SiteLinkedIn

Location

Belgium

Posted

26 days ago

Salary

0

Seniority

Lead

Bachelor Degree8 yrs expEnglishDockerKubernetesPythonPyTorch

Job Description

Principal AI/ML Engineer

team.blue

• Architect and evolve our multi-agent orchestration platform (currently built on Hermes / Multica) • Design and implement voice AI pipelines — STT, real-time TTS with streaming, VAD, SIP/RTP telephony integration • Build and maintain RAG pipelines with retrieval quality measurement, re-ranking, and hybrid search over vector + keyword indexes • Fine-tune and evaluate LLMs for domain-specific tasks including customer support, classification, and structured extraction • Own the AI observability stack: tracing, instrumentation, cost tracking, and quality regression alerting • Define and enforce guardrails: hallucination detection, PII redaction, output safety scanning, and rate-limiting across multi-tenant deployments • Build data ingestion, preprocessing, and feature pipelines supporting model training and continual learning • Set architectural standards for AI systems across the group; conduct design reviews and own ADRs for major decisions • Mentor ML engineers and applied scientists; grow the team's capabilities in production AI • Engage with external research partners and track emerging work to identify signals worth productionizing

Job Requirements

  • 8+ years in ML Engineering, Applied AI, or Research Engineering with at least 2 years in a lead or staff-level role
  • Deep, hands-on experience with LLMs in production: fine-tuning, RLHF/DPO, prompt engineering, RAG, and tool use
  • Fluent in Python and the core ML stack: PyTorch, Transformers (HuggingFace), PEFT/LoRA
  • Real experience with LLM inference serving — vLLM, TensorRT-LLM, or TGI — in a latency-sensitive production environment
  • Practical knowledge of agentic frameworks: multi-agent coordination, tool-call orchestration, context/memory management, and observability (Langfuse, Opik, or equivalent)
  • Experience with speech AI (ASR/TTS pipelines) or real-time audio systems is a strong plus
  • Solid understanding of MLOps: experiment tracking (MLflow/W&B), model registries, containerization (Docker/Kubernetes), and CI/CD for ML
  • Awareness of LLM-specific risk: hallucination, prompt injection, data leakage, fairness, and privacy — and how to mitigate them in production
  • Strong communication skills: you can write a crisp design doc, run a productive architecture review, and explain tradeoffs to a non-technical stakeholder
  • Nice to have Experience with voice pipelines end-to-end: VAD → ASR → LLM → TTS → SIP/RTP telephony

Benefits

  • Diversity & Inclusion are at our core
  • ESG efforts and ambitious sustainability goals

Related Job Pages

More Machine Learning Engineer Jobs

HRE GROUP logo

Machine Learning Engineer

HRE GROUP

Carreira | Recrutamento | Seleção

Full TimeRemoteTeam 1-10Since 2016H1B No Sponsor

• Contribute to the development and implementation of frameworks to evaluate and monitor the innovative machine learning solutions at Workiva • Assist in building the platform and metrics to evaluate and govern the ML/GenAI based solutions • Support the development of tools, systems, infrastructure, and automation to evaluate the performance and monitoring of applications • Work closely with senior team members to troubleshoot issues related to accuracy, safety latency of ML based solutions • Gain hands-on experience with Workiva’s technical standards and methods, while taking ownership of assigned activities

Portugal
Job Closed
Home Depot logo

Staff Machine Learning Engineer

Home Depot

Home Depot is a Fortune 500 company and the world's largest specialty retailer of home-improvement products. Founded in 1978 with its first two stores in Atlant

Role Description The Staff Software Engineer is responsible for leading a team of engineers building and designing a product that our customers and associates love. As a Staff Software Engineer, you will be part of a dynamic team with engineers of all experience levels who help each other build and grow technical and leadership skills while creating, deploying, and supporting production applications. In this role, you will also provide technical leadership on machine learning systems, including: - Model development - Production deployment - Monitoring - Lifecycle management of ML solutions operating at scale Staff Software Engineers will assist in: - Product and tool selection - Configuration - Security - Resilience - Performance tuning - Production monitoring They contribute to foundational code elements that can be reused as well as architectural diagrams and other product-related documentation. You will help define best practices for building reliable, explainable, and maintainable ML systems that integrate seamlessly with broader software platforms. As a Staff Software Engineer, you will be a core player on the product team and are expected to build and grow the skillsets of the more junior Engineers. Qualifications - 3 - 6 years of relevant work experience - Strong experience designing, training, evaluating, and deploying machine learning models in production environments - Experience with ML lifecycle management - Experience building and operating ML pipelines using cloud-native services - Strong understanding of applied statistics and model evaluation metrics - Experience with algorithms such as clustering, forecasting, anomaly detection, and neural networks - Experience in advanced machine learning techniques such as NLP, convolutional neural networks, and embeddings generation - Experience in training machine learning models with extremely large datasets - Experience with Data Analysis and Machine Learning Tools and Libraries - Experience in Google Cloud Platform and AI/ML related components - Experience in effective data engineering practices and big data platforms - Experience in a modern scripting language (preferably Python) - Experience in writing SQL queries against a relational database - Experience in version control systems (preferable Git) - Experience in a Linux or Unix based environment - Experience in a CI/CD toolchain - Experience in REST and effective web service design - Experience in production systems design - Experience in cloud computing platform and associated automation patterns - Experience in defensive coding practices and patterns for high Availability - Experience in A/B testing and effective REST design - Familiarity with advanced machine learning architectures Requirements - Must be eighteen years of age or older - Must be legally permitted to work in the United States Benefits - Typically requires overnight travel 5% to 20% of the time - Most of the time is spent sitting in a comfortable position with frequent opportunities to move about - Located in a comfortable indoor area Company Description The pay range for this position is between $120,000.00 - $190,000.00 for California, Colorado, Connecticut, Rhode Island, Nevada, New York City, Ithaca (NY), Westchester County (NY), and Washington residents.

United States
$120K - $190K / year

Principal Machine Learning

AAA - American Automobile Association

AAA Life Insurance Company is a division of the American Automobile Association that began in 1969 and today supports over 1.8 million active life insurance and

• Establish engineering standards, best practices, and evaluation frameworks for AI systems • Lead technical decision-making for model selection, system design, and deployment strategies • Act as the subject matter expert for agentic AI and modern LLM-based systems within the organization • Architect and deliver production-grade, multi-step AI agents capable of autonomous reasoning, tool orchestration, task decomposition, memory management, and human-in-the-loop escalation • Design and deliver AI systems on enterprise cloud platforms (e.g., AWS, Azure), including LLM services (AWS Bedrock, Azure OpenAI) • Own the agent evaluation and observability stack, including benchmarking, tracing, regression testing, and performance monitoring • Optimize LLM inference costs and resource utilization for production workloads • Partner with business leaders to identify, prioritize, and shape AI-driven initiatives aligned with organizational goals • Translate complex business problems into scalable AI solutions with measurable impact • Drive roadmap planning and investment decisions related to AI and automation • Collaborate with IT, data engineering, and operations teams to integrate AI solutions into enterprise systems • Mentor and develop machine learning engineers and data scientists • Provide technical guidance and elevate team capabilities in modern AI practices • Ensure responsible and compliant use of AI systems, including managing risks related to model behavior, data usage, and regulatory considerations in a highly regulated industry • Lead evaluation and integration of external AI platforms and vendors, including assessment of cost, intellectual property, scalability, security, and long-term architectural impact

Michigan
Job Closed
Full TimeRemoteTeam 10,001+H1B No Sponsor

• Você será responsável pela sustentação da plataforma de automação n8n em modelo self-hosted sobre a infraestrutura da Amazon Web Services, garantindo disponibilidade, segurança, performance e atualização do ambiente. • Também atuará na administração e operação do AWS EKS, realizando rotinas de manutenção, controle de versões, ajustes de capacidade e suporte às aplicações executadas no cluster. • Faz parte das suas atribuições o desenvolvimento, evolução e sustentação de pipelines de CI/CD, apoiando os times de ciência de dados na entrega contínua, padronização de esteiras e automação de deploys. • Será igualmente responsável pela implementação e manutenção de infraestrutura como código (IaC) utilizando Terraform, garantindo versionamento, reprodutibilidade e governança dos ambientes. • Por fim, atuará fortemente em troubleshooting em ambiente cloud, investigando incidentes, falhas de pipeline, problemas de rede, acesso, performance e integração entre serviços, participando ativamente da melhoria contínua da plataforma e da confiabilidade dos ambientes produtivos.

Brazil