Job Closed
This listing is no longer active.
Building a #BetterWorkingWorld by providing trust through assurance and helping organizations grow, transform & operate.
Senior Manager – ML Ops
Location
India
Posted
130 days ago
Salary
0
Seniority
Senior
Job Description
Senior Manager – ML Ops
EY
• Design the comprehensive, 5–10-year architectural vision for a unified ML Ops platform that strategically leverages both AWS (SageMaker, EKS) and Azure (Azure ML, AKS) services to maximize resilience and capability. • Establish and lead the ML/AI Architecture Review Board (ARB), setting global standards for technology stack selection, architectural patterns, and security guardrails for all AI production deployments. • Direct the enterprise-wide adoption and governance of IaC using Terraform or equivalent tools to ensure consistent, auditable, and secure provisioning of multi-cloud infrastructure (compute, networking, security groups, data plane). • Architect and oversee the implementation of automated, end-to-end Continuous Integration, Continuous Delivery, and Continuous Training pipelines that facilitate rapid, zero-downtime model deployments and rollbacks across hybrid/multi-cloud environments. • Design the architecture for containerized ML workloads and inference services using enterprise-scale Kubernetes (AKS/EKS) clusters, focusing on service mesh implementation, efficient autoscaling strategies, and network isolation. • Ensure the ML platform architecture can handle the massive scale and high throughput required for real-time risk, fraud, and customer interaction models within financial services. • Architect and enforce robust Model Risk Management (MRM) frameworks, embedding regulatory compliance, audit trails, model versioning, and explainability (XAI) requirements directly into the ML Ops pipelines to meet banking/insurance sector mandates. • Define the enterprise standard for AI Ops observability, leveraging unified monitoring tools (e.g., Prometheus/Grafana) to track multi-cloud system health, proactively detect and auto-remediate Model Drift, Data Quality issues, and prediction latency. • Implement strategic architectural patterns and governance policies to drive maximum cost-efficiency and transparency across all Azure and AWS ML/compute resources, including chargeback and budget enforcement. • Design and mandate secure data governance, Role-Based Access Control (RBAC), and Secrets Management across the multi-cloud architecture, ensuring data isolation and secure cross-cloud communication.
Job Requirements
- 15+ years of professional experience in Enterprise Architecture, Software Engineering, or Strategic IT Leadership.
- 7+ years in a dedicated ML Ops Architect, Chief Architect with direct responsibility for enterprise-wide platform governance.
- Deep expertise in designing and implementing enterprise-grade ML Ops platforms, preferably in the banking and insurance sectors.
- Expert-level architectural proficiency and hands-on experience in both AWS and Azure: Azure: Azure Machine Learning, AKS, Azure DevOps, Azure Security Center, Azure Governance.
- AWS: AWS SageMaker, EKS, Lambda, S3, IAM, AWS Code Services.
- Demonstrated success in designing and deploying highly regulated, production-grade ML Ops solutions at enterprise scale.
- Mastery of Infrastructure as Code (IaC), specifically Terraform, for consistent multi-cloud deployment.
- Expert knowledge of Kubernetes orchestration and containerization (Docker).
- Proven experience implementing Model Risk Management (MRM) and XAI frameworks in a regulated environment.
- Strategic understanding of programming skills, especially Python and major ML frameworks (TensorFlow, PyTorch), sufficient to set and govern enterprise coding and model packaging standards.
- Proven experience designing and governing robust monitoring solutions for production ML systems (e.g., Prometheus, Grafana, Datadog) for enterprise-wide AI Ops.
- Master’s degree in computer science, Engineering, or a related quantitative field.
Benefits
- Competitive salary
- Flexible working hours
Related Guides
Related Job Pages
More Machine Learning Engineer Jobs
• Design, deploy, and maintain Kubernetes infrastructure supporting AI/ML workloads • Manage containerized services, autoscaling, networking, and resource optimization • Design and build high-performance Python APIs and services using FastAPI or similar frameworks • Architect backend systems for scalability, reliability, and low latency • Build integrations between AI/ML systems and the broader Albert platform • Build and operate distributed systems that handle compute-intensive and high-throughput workloads • Design for fault tolerance, graceful degradation, and horizontal scalability • Implement async workflows, job queues, and task orchestration as needed • Architect and maintain data pipelines and storage systems supporting AI/ML workflows • Implement observability including logging, metrics, tracing, and alerting • Own system reliability—troubleshoot issues, conduct post-mortems, and continuously improve • Design CI/CD pipelines and promote automation best practices • Partner closely with ML engineers to understand requirements and deliver production-ready infrastructure • Translate ML prototypes and research code into scalable, maintainable systems
Staff Machine Learning Engineer, Perception
Path RoboticsEnabling Robots To Build So That Humans Can Create.
• Lead the development and implementation of advanced algorithms for robotic perception systems tailored to industrial welding tasks, integrating data from diverse vision sensors such as RGB/GigE, LiDAR, and ToF depth sensors. • Oversee research initiatives to address complex welding-related challenges, utilizing image processing, point cloud data, and 3D sensor fusion, contributing to innovative solutions for domain-specific problems. • Collaborate with multidisciplinary teams to design and lead experiments evaluating state-of-the-art deep learning models, optimizing machine learning systems for robotic perception in welding. • Stay at the forefront of advancements in Robotics, Computer Vision, and ML research, driving the integration of cutting-edge technologies into real-world applications, and ensuring these innovations have a high impact on production systems. • Mentor and guide junior engineers, providing technical leadership and fostering collaboration to enhance team expertise in perception systems and machine learning. • Contribute to strategic decisions about system architecture and the direction of robotics perception technologies within the company, ensuring alignment with product and business goals.
Machine Learning Engineer II – Ad Forecasting
SpotifyPassionate music fans. Innovative tech pros. Perfect harmony. Join our band.
• Design and implement machine learning systems to predict future ad inventory,demand, and performance • Research and apply best practices for driving automation with respect to human review processes • Partner with multiple teams to shape and enhance shared systems and pipelines • Come up with creative ways to apply AI tools to develop innovative solutions • Collaborate with and lead backend engineers, data scientists, data engineers, and product managers to establish baselines, inform product decisions, and develop new technologies
Director, Machine Learning – Platform
FlexFlex splits your bills into smaller, stress-free payments throughout the month. Start today with your rent bill!
• Own the end-to-end machine learning platform, including model development workflows, training, deployment, monitoring, retraining, and lifecycle management. • Define and execute the roadmap for scalable ML infrastructure supporting both real-time and batch use cases. • Lead applied machine learning initiatives supporting compliance and customer success, including areas such as: • Compliance monitoring, alerting, and investigation support • Customer success automation, prioritization, and insight generation • Internal operational tooling and responsible AI adoption • Partner with cross-functional stakeholders to translate complex business, customer, and regulatory problems into production ML solutions. • Build, mentor, and scale a small team of ML engineers and applied scientists, operating as a player-coach when needed.




