Google Cloud AI+ML Partner of the Year. We drive business impact through innovative cloud engineering, analytics and AI.
Lead Machine Learning Engineer
Location
Canada
Posted
67 days ago
Salary
CAD$175K - CAD$200K / year
Seniority
Senior
Job Description
Lead Machine Learning Engineer
Datatonic
• Act as the lead technical authority in high-stakes engagements • Partner with the commercial team to architect winning solutions • Lead the delivery of enterprise-grade systems such as GenAI agents, real-time recommendation engines, or predictive maintenance models • Own the complete technical lifecycle for projects, designing end-to-end ML architectures on GCP and implementing robust MLOps pipelines • Collaborate with the Head of Delivery to define the technical DNA of ML practice, evolving best practices • Spearhead the development of internal accelerators and reusable frameworks • Formally mentor and coach junior and mid-level engineers through code reviews and technical guidance
Job Requirements
- 7+ years of professional experience in machine learning and software engineering
- At least 2 years in a formal or informal leadership capacity (e.g., tech lead, project lead, or senior mentor)
- Proven ability to architect and deploy scalable, production-grade ML solutions on a major cloud platform (GCP is a significant asset)
- Hands-on experience with Infrastructure-as-Code tools (e.g., Terraform) and designing for distributed computing
- Deep, hands-on expertise in Python for backend ML systems
- Mastery of software engineering best practices (e.g., clean architecture, robust testing, CI/CD)
- Ability to design and build REST APIs (e.g., using Flask/FastAPI)
- Proficient in SQL for complex data manipulation
- Exceptional ability to communicate complex technical concepts to diverse audiences (C-level stakeholders to junior engineers)
Benefits
- 20 days of paid vacation per calendar year
- Public Holidays for your Province of Residence
- 5 Wellness days (sickness, personal time, mental health)
- 5 Lifestyle days (religious events, volunteer day, sick day)
- Matching Group Retirement Savings Plan after 3 months
- Competitive Group Insurance plan on Day 1 - individual premium paid 100%!
- Virtual Medicine and Family Assistance Program - 100% employer-paid!
- Home office budget - We are 100% remote!
- CAD $70/month for internet/phone expenses
- CAD $1,500 every 3 years for tech accessories and office equipment (monitor, keyboard, mouse, desk, etc.) starting on Day 1
- Company-supplied MacBook Pro or Air
- CAD $400/year for books, relevant app subscriptions or an e-reader.
- Opportunities for paid certifications
- Opportunities for professional and personal learning through Udemy Business
- Regular company off-sites and meetups
Related Guides
Related Job Pages
More Machine Learning Engineer Jobs
Machine Learning Developer
Instituto de Pesquisas EldoradoSomos um Instituto de Pesquisa, Desenvolvimento e Inovação único no Brasil. Estamos em constante inspiração pelo novo!
• Develop and support all Machine Learning and data curation activities; • Develop and integrate autonomous agents based on LLMs for complex tasks; • Implement RAG pipelines (indexing, semantic search, integration with vector databases); • Create scalable architectures for orchestrating multiple agents (LangChain, LangGraph); • Fine-tune and optimize LLMs for specific contexts; • Ensure security, governance and compliance in AI solutions (LGPD, AI Act); • Define metrics and continuously validate the quality of responses (accuracy, relevance, consistency); • Monitor performance and apply techniques to reduce cost and latency (quantization, distillation); • Lead and support all Machine Learning and data curation activities; • Develop and integrate autonomous agents based on LLMs for complex tasks; • Implement RAG pipelines (indexing, semantic search, integration with vector databases); • Create scalable architectures for orchestrating multiple agents (LangChain, LangGraph); • Fine-tune and optimize LLMs for specific contexts. Ensure security, governance and compliance in AI solutions (LGPD, AI Act); • Define metrics and continuously validate the quality of responses (accuracy, relevance, consistency); Monitor performance and apply techniques to reduce cost and latency (quantization, distillation).
Principal Machine Learning Engineer
LimeBuilding a future where transportation is shared, affordable and carbon-free. Join us! www.li.me/careers
• Drive alignment across teams on ML strategy, standards, and long-term technical direction by serving as a technical leader for Lime’s ML Center of Excellence • Guide recommendations for ML infrastructure, tooling, and architecture (training, serving, feature stores, experimentation, monitoring) • Define and evolve ML development processes, including model review, experimentation rigor, deployment, optimization, and operations • Establish best practices for ML monitoring, observability, alerting, and model performance health in production • Drive reusable feature development patterns and shared ML capabilities that enable teams to move faster and more safely • Partner with platform, data, and product engineering teams to ensure ML systems are reliable, scalable, and cost effective • Identify and prioritize opportunities where ML will improve Lime’s product, operations, or efficiency • Act as a force multiplier by mentoring data scientists and machine learning engineers, raising the quality bar for machine learning across Lime
Staff Machine Learning Engineer
TerraTerra is the Next Generation Claims and Policy Software for Workers’ Comp.
• Design, train, test, and iterate on diffusion models for 3D geological models • Design, train, test, and iterate on an approach to for conditioning generation on geophysical data and other observations • Inform the generation of synthetic data to improve model performance • Adapt diffusion modeling approach to specific real-world projects in collaboration with project teams.
• Design and maintain training systems that can process and learn from petabyte-scale multimodal datasets (e.g., video and point cloud data). This includes ensuring data is efficiently loaded, distributed, and processed across large GPU clusters. • Identify and resolve bottlenecks in the training pipeline, including data loading, preprocessing, model computation, and inter-node communication, to maximize GPU utilization and reduce training time. • Work with the ML team to develop and refine neural network architectures suitable for autonomy tasks, particularly those handling high-dimensional and sequential sensor data. • Create and adjust loss functions and training strategies that help the model learn effectively from complex multimodal inputs and improve autonomy performance. • Configure, monitor, and maintain large-scale distributed training jobs across multiple machines and GPUs, ensuring stability, fault tolerance, and efficient resource usage. • Implement scalable systems to preprocess, transform, and augment large robotics datasets so that they are suitable for model training. • Work closely with ML scientists and other engineers to integrate new models, experiments, and training approaches into the production training pipeline. • Analyze training metrics, model outputs, and experiment logs to assess model performance and guide improvements in architecture, data usage, or training strategies. • Develop tools and workflows that allow teams to run experiments, track results, and iterate quickly on new model ideas or training approaches.




