Job Closed

This listing is no longer active.

Building a #BetterWorkingWorld by providing trust through assurance and helping organizations grow, transform & operate.

Senior Manager – ML Ops

Machine Learning EngineerMachine Learning EngineerFull Time Remote SeniorTeam 10,001+Since 1989H1B SponsorCompany Site LinkedIn

Location

India

Posted

130 days ago

Salary

Seniority

Senior

Postgraduate Degree15 yrs expEnglishAWS Azure Docker Grafana Kubernetes Prometheus Python PyTorch TensorFlow Terraform

Job Description

• Design the comprehensive, 5–10-year architectural vision for a unified ML Ops platform that strategically leverages both AWS (SageMaker, EKS) and Azure (Azure ML, AKS) services to maximize resilience and capability. • Establish and lead the ML/AI Architecture Review Board (ARB), setting global standards for technology stack selection, architectural patterns, and security guardrails for all AI production deployments. • Direct the enterprise-wide adoption and governance of IaC using Terraform or equivalent tools to ensure consistent, auditable, and secure provisioning of multi-cloud infrastructure (compute, networking, security groups, data plane). • Architect and oversee the implementation of automated, end-to-end Continuous Integration, Continuous Delivery, and Continuous Training pipelines that facilitate rapid, zero-downtime model deployments and rollbacks across hybrid/multi-cloud environments. • Design the architecture for containerized ML workloads and inference services using enterprise-scale Kubernetes (AKS/EKS) clusters, focusing on service mesh implementation, efficient autoscaling strategies, and network isolation. • Ensure the ML platform architecture can handle the massive scale and high throughput required for real-time risk, fraud, and customer interaction models within financial services. • Architect and enforce robust Model Risk Management (MRM) frameworks, embedding regulatory compliance, audit trails, model versioning, and explainability (XAI) requirements directly into the ML Ops pipelines to meet banking/insurance sector mandates. • Define the enterprise standard for AI Ops observability, leveraging unified monitoring tools (e.g., Prometheus/Grafana) to track multi-cloud system health, proactively detect and auto-remediate Model Drift, Data Quality issues, and prediction latency. • Implement strategic architectural patterns and governance policies to drive maximum cost-efficiency and transparency across all Azure and AWS ML/compute resources, including chargeback and budget enforcement. • Design and mandate secure data governance, Role-Based Access Control (RBAC), and Secrets Management across the multi-cloud architecture, ensuring data isolation and secure cross-cloud communication.

Job Requirements

15+ years of professional experience in Enterprise Architecture, Software Engineering, or Strategic IT Leadership.
7+ years in a dedicated ML Ops Architect, Chief Architect with direct responsibility for enterprise-wide platform governance.
Deep expertise in designing and implementing enterprise-grade ML Ops platforms, preferably in the banking and insurance sectors.
Expert-level architectural proficiency and hands-on experience in both AWS and Azure: Azure: Azure Machine Learning, AKS, Azure DevOps, Azure Security Center, Azure Governance.
AWS: AWS SageMaker, EKS, Lambda, S3, IAM, AWS Code Services.
Demonstrated success in designing and deploying highly regulated, production-grade ML Ops solutions at enterprise scale.
Mastery of Infrastructure as Code (IaC), specifically Terraform, for consistent multi-cloud deployment.
Expert knowledge of Kubernetes orchestration and containerization (Docker).
Proven experience implementing Model Risk Management (MRM) and XAI frameworks in a regulated environment.
Strategic understanding of programming skills, especially Python and major ML frameworks (TensorFlow, PyTorch), sufficient to set and govern enterprise coding and model packaging standards.
Proven experience designing and governing robust monitoring solutions for production ML systems (e.g., Prometheus, Grafana, Datadog) for enterprise-wide AI Ops.
Master’s degree in computer science, Engineering, or a related quantitative field.

Benefits

Competitive salary
Flexible working hours

Related Categories

Machine Learning Engineer AI Engineer AI Research Scientist LLM Engineer Computer Vision Engineer NLP Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More Machine Learning Engineer Jobs

Staff ML Ops Engineer

Albert Invent

Invent the future, faster.

Machine Learning Engineer130 days ago

Other RemoteTeam 51-200Since 2022H1B No Sponsor

Company Site LinkedIn

• Design, deploy, and maintain Kubernetes infrastructure supporting AI/ML workloads • Manage containerized services, autoscaling, networking, and resource optimization • Design and build high-performance Python APIs and services using FastAPI or similar frameworks • Architect backend systems for scalability, reliability, and low latency • Build integrations between AI/ML systems and the broader Albert platform • Build and operate distributed systems that handle compute-intensive and high-throughput workloads • Design for fault tolerance, graceful degradation, and horizontal scalability • Implement async workflows, job queues, and task orchestration as needed • Architect and maintain data pipelines and storage systems supporting AI/ML workflows • Implement observability including logging, metrics, tracing, and alerting • Own system reliability—troubleshoot issues, conduct post-mortems, and continuously improve • Design CI/CD pipelines and promote automation best practices • Partner closely with ML engineers to understand requirements and deliver production-ready infrastructure • Translate ML prototypes and research code into scalable, maintainable systems

AWS Azure Distributed Systems Flask GCP Kubernetes Microservices Python

View details: Staff ML Ops Engineer

California

Apply

Staff Machine Learning Engineer, Perception

Path Robotics

Enabling Robots To Build So That Humans Can Create.

Machine Learning Engineer130 days ago

Other RemoteTeam 201-500Since 2014H1B Sponsor

Company Site LinkedIn

• Lead the development and implementation of advanced algorithms for robotic perception systems tailored to industrial welding tasks, integrating data from diverse vision sensors such as RGB/GigE, LiDAR, and ToF depth sensors. • Oversee research initiatives to address complex welding-related challenges, utilizing image processing, point cloud data, and 3D sensor fusion, contributing to innovative solutions for domain-specific problems. • Collaborate with multidisciplinary teams to design and lead experiments evaluating state-of-the-art deep learning models, optimizing machine learning systems for robotic perception in welding. • Stay at the forefront of advancements in Robotics, Computer Vision, and ML research, driving the integration of cutting-edge technologies into real-world applications, and ensuring these innovations have a high impact on production systems. • Mentor and guide junior engineers, providing technical leadership and fostering collaboration to enhance team expertise in perception systems and machine learning. • Contribute to strategic decisions about system architecture and the direction of robotics perception technologies within the company, ensuring alignment with product and business goals.

Python

View details: Staff Machine Learning Engineer, Perception

Ohio

Apply

Job Closed

Machine Learning Engineer II – Ad Forecasting

Spotify

Passionate music fans. Innovative tech pros. Perfect harmony. Join our band.

Machine Learning Engineer130 days ago

Other RemoteTeam 5,001-10,000Since 2008H1B Sponsor

Company Site LinkedIn

• Design and implement machine learning systems to predict future ad inventory,demand, and performance • Research and apply best practices for driving automation with respect to human review processes • Partner with multiple teams to shape and enhance shared systems and pipelines • Come up with creative ways to apply AI tools to develop innovative solutions • Collaborate with and lead backend engineers, data scientists, data engineers, and product managers to establish baselines, inform product decisions, and develop new technologies

Apache HTTP Server Distributed Systems Java Python Scala Apache Spark

View details: Machine Learning Engineer II – Ad Forecasting

New York

$148.9K - $212.7K / year

Apply

Job Closed

Director, Machine Learning – Platform

Flex

Flex splits your bills into smaller, stress-free payments throughout the month. Start today with your rent bill!

Machine Learning Engineer130 days ago

Other RemoteTeam 201-500Since 2019H1B Sponsor

Company Site LinkedIn

• Own the end-to-end machine learning platform, including model development workflows, training, deployment, monitoring, retraining, and lifecycle management. • Define and execute the roadmap for scalable ML infrastructure supporting both real-time and batch use cases. • Lead applied machine learning initiatives supporting compliance and customer success, including areas such as: • Compliance monitoring, alerting, and investigation support • Customer success automation, prioritization, and insight generation • Internal operational tooling and responsible AI adoption • Partner with cross-functional stakeholders to translate complex business, customer, and regulatory problems into production ML solutions. • Build, mentor, and scale a small team of ML engineers and applied scientists, operating as a player-coach when needed.

View details: Director, Machine Learning – Platform

United States

$280K - $350K / year

Apply

Job Closed

Senior Manager – ML Ops

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More Machine Learning Engineer Jobs

Staff ML Ops Engineer

Staff Machine Learning Engineer, Perception

Machine Learning Engineer II – Ad Forecasting

Director, Machine Learning – Platform