Invent the future, faster.
Staff ML Ops Engineer
Location
California
Posted
130 days ago
Salary
0
Seniority
Lead
Job Description
Staff ML Ops Engineer
Albert Invent
• Design, deploy, and maintain Kubernetes infrastructure supporting AI/ML workloads • Manage containerized services, autoscaling, networking, and resource optimization • Design and build high-performance Python APIs and services using FastAPI or similar frameworks • Architect backend systems for scalability, reliability, and low latency • Build integrations between AI/ML systems and the broader Albert platform • Build and operate distributed systems that handle compute-intensive and high-throughput workloads • Design for fault tolerance, graceful degradation, and horizontal scalability • Implement async workflows, job queues, and task orchestration as needed • Architect and maintain data pipelines and storage systems supporting AI/ML workflows • Implement observability including logging, metrics, tracing, and alerting • Own system reliability—troubleshoot issues, conduct post-mortems, and continuously improve • Design CI/CD pipelines and promote automation best practices • Partner closely with ML engineers to understand requirements and deliver production-ready infrastructure • Translate ML prototypes and research code into scalable, maintainable systems
Job Requirements
- A degree in Computer Science or a related field with 7+ years of industry experience (Bachelor's) or 5+ years (Master's or PhD) in software engineering
- Experience supporting AI/ML teams or deploying ML systems in production
- Experience with GPU workloads and scheduling
- Advanced proficiency in Python including async programming and performance optimization
- Deep experience with Kubernetes—cluster management, networking, autoscaling, and troubleshooting
- Strong background in distributed systems and microservices architecture
- Experience with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code
- Proficiency in REST API development using FastAPI, Flask, or similar
- Experience with containerization and CI/CD pipelines
- Track record of operating production systems at scale
Benefits
- Health insurance
- Flexible working hours
- Professional development opportunities
Related Guides
Related Job Pages
More Machine Learning Engineer Jobs
Staff Machine Learning Engineer, Perception
Path RoboticsEnabling Robots To Build So That Humans Can Create.
• Lead the development and implementation of advanced algorithms for robotic perception systems tailored to industrial welding tasks, integrating data from diverse vision sensors such as RGB/GigE, LiDAR, and ToF depth sensors. • Oversee research initiatives to address complex welding-related challenges, utilizing image processing, point cloud data, and 3D sensor fusion, contributing to innovative solutions for domain-specific problems. • Collaborate with multidisciplinary teams to design and lead experiments evaluating state-of-the-art deep learning models, optimizing machine learning systems for robotic perception in welding. • Stay at the forefront of advancements in Robotics, Computer Vision, and ML research, driving the integration of cutting-edge technologies into real-world applications, and ensuring these innovations have a high impact on production systems. • Mentor and guide junior engineers, providing technical leadership and fostering collaboration to enhance team expertise in perception systems and machine learning. • Contribute to strategic decisions about system architecture and the direction of robotics perception technologies within the company, ensuring alignment with product and business goals.
Machine Learning Engineer II – Ad Forecasting
SpotifyPassionate music fans. Innovative tech pros. Perfect harmony. Join our band.
• Design and implement machine learning systems to predict future ad inventory,demand, and performance • Research and apply best practices for driving automation with respect to human review processes • Partner with multiple teams to shape and enhance shared systems and pipelines • Come up with creative ways to apply AI tools to develop innovative solutions • Collaborate with and lead backend engineers, data scientists, data engineers, and product managers to establish baselines, inform product decisions, and develop new technologies
Director, Machine Learning – Platform
FlexFlex splits your bills into smaller, stress-free payments throughout the month. Start today with your rent bill!
• Own the end-to-end machine learning platform, including model development workflows, training, deployment, monitoring, retraining, and lifecycle management. • Define and execute the roadmap for scalable ML infrastructure supporting both real-time and batch use cases. • Lead applied machine learning initiatives supporting compliance and customer success, including areas such as: • Compliance monitoring, alerting, and investigation support • Customer success automation, prioritization, and insight generation • Internal operational tooling and responsible AI adoption • Partner with cross-functional stakeholders to translate complex business, customer, and regulatory problems into production ML solutions. • Build, mentor, and scale a small team of ML engineers and applied scientists, operating as a player-coach when needed.
Senior Machine Learning Engineer, Research Team
GR8 TechLaunch, grow, or upgrade your iGaming business with GR8 Tech high-performance Sportsbook and iGaming platform.
• Take technical ownership of core components of recommendation and personalization systems (retrieval, ranking, evaluation). • Design and evolve two-tower / embedding-based retrieval models and downstream rankers. • Drive architectural and modeling decisions with a strong understanding of trade-offs between model quality, system complexity, latency, and cost. • Define and promote best practices for ML system design, experimentation, evaluation, and deployment. • Review ML designs, pipelines, and code with a focus on correctness, maintainability, and production readiness. • Act as a technical point of reference for ML-related decisions within the team. • Develop, train, and improve ML models for retrieval and ranking use cases. • Work with embedding-based deep learning models and classical ML approaches. • Perform in-depth data analysis, feature exploration, and systematic error analysis. • Build reproducible experiments and robust offline evaluation pipelines. • Optimize models for both offline metrics and online business KPIs. • Design and operate batch and real-time training and inference workflows in a cloud environment, with awareness of scalability and cost trade-offs. • Design, run, and analyze offline experiments and online A/B tests. • Own ML components in production, with a strong focus on reliability, observability, and safe iteration. • Monitor model performance and data quality in production. • Collaborate on scalable training and serving infrastructure for ML systems. • Participate in incident analysis related to ML systems and contribute to root-cause analysis and long-term fixes. • Design ML systems with failure modes in mind, including fallbacks and graceful degradation. • Work closely with Data Engineering on data pipelines and feature generation. • Partner with Product and Analytics to translate business goals into clear ML objectives and success metrics. • Act as a technical mentor for ML engineers, providing guidance on modeling, experimentation, and production ML. • Provide constructive feedback through code reviews and design discussions, supporting the growth of the team.




