Albert Invent logo
Albert Invent

Invent the future, faster.

Staff ML Ops Engineer

Machine Learning EngineerMachine Learning EngineerOtherRemoteLeadTeam 51-200Since 2022H1B No SponsorCompany SiteLinkedIn

Location

California

Posted

130 days ago

Salary

0

Seniority

Lead

Job Description

Staff ML Ops Engineer

Albert Invent

• Design, deploy, and maintain Kubernetes infrastructure supporting AI/ML workloads • Manage containerized services, autoscaling, networking, and resource optimization • Design and build high-performance Python APIs and services using FastAPI or similar frameworks • Architect backend systems for scalability, reliability, and low latency • Build integrations between AI/ML systems and the broader Albert platform • Build and operate distributed systems that handle compute-intensive and high-throughput workloads • Design for fault tolerance, graceful degradation, and horizontal scalability • Implement async workflows, job queues, and task orchestration as needed • Architect and maintain data pipelines and storage systems supporting AI/ML workflows • Implement observability including logging, metrics, tracing, and alerting • Own system reliability—troubleshoot issues, conduct post-mortems, and continuously improve • Design CI/CD pipelines and promote automation best practices • Partner closely with ML engineers to understand requirements and deliver production-ready infrastructure • Translate ML prototypes and research code into scalable, maintainable systems

Job Requirements

  • A degree in Computer Science or a related field with 7+ years of industry experience (Bachelor's) or 5+ years (Master's or PhD) in software engineering
  • Experience supporting AI/ML teams or deploying ML systems in production
  • Experience with GPU workloads and scheduling
  • Advanced proficiency in Python including async programming and performance optimization
  • Deep experience with Kubernetes—cluster management, networking, autoscaling, and troubleshooting
  • Strong background in distributed systems and microservices architecture
  • Experience with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code
  • Proficiency in REST API development using FastAPI, Flask, or similar
  • Experience with containerization and CI/CD pipelines
  • Track record of operating production systems at scale

Benefits

  • Health insurance
  • Flexible working hours
  • Professional development opportunities

Related Job Pages

More Machine Learning Engineer Jobs

Path Robotics logo

Staff Machine Learning Engineer, Perception

Path Robotics

Enabling Robots To Build So That Humans Can Create.

OtherRemoteTeam 201-500Since 2014H1B Sponsor

• Lead the development and implementation of advanced algorithms for robotic perception systems tailored to industrial welding tasks, integrating data from diverse vision sensors such as RGB/GigE, LiDAR, and ToF depth sensors. • Oversee research initiatives to address complex welding-related challenges, utilizing image processing, point cloud data, and 3D sensor fusion, contributing to innovative solutions for domain-specific problems. • Collaborate with multidisciplinary teams to design and lead experiments evaluating state-of-the-art deep learning models, optimizing machine learning systems for robotic perception in welding. • Stay at the forefront of advancements in Robotics, Computer Vision, and ML research, driving the integration of cutting-edge technologies into real-world applications, and ensuring these innovations have a high impact on production systems. • Mentor and guide junior engineers, providing technical leadership and fostering collaboration to enhance team expertise in perception systems and machine learning. • Contribute to strategic decisions about system architecture and the direction of robotics perception technologies within the company, ensuring alignment with product and business goals.

Ohio
Job Closed
Spotify logo

Machine Learning Engineer II – Ad Forecasting

Spotify

Passionate music fans. Innovative tech pros. Perfect harmony. Join our band.

OtherRemoteTeam 5,001-10,000Since 2008H1B Sponsor

• Design and implement machine learning systems to predict future ad inventory,demand, and performance • Research and apply best practices for driving automation with respect to human review processes • Partner with multiple teams to shape and enhance shared systems and pipelines • Come up with creative ways to apply AI tools to develop innovative solutions • Collaborate with and lead backend engineers, data scientists, data engineers, and product managers to establish baselines, inform product decisions, and develop new technologies

New York
$148.9K - $212.7K / year
Job Closed
Flex logo

Director, Machine Learning – Platform

Flex

Flex splits your bills into smaller, stress-free payments throughout the month. Start today with your rent bill!

OtherRemoteTeam 201-500Since 2019H1B Sponsor

• Own the end-to-end machine learning platform, including model development workflows, training, deployment, monitoring, retraining, and lifecycle management. • Define and execute the roadmap for scalable ML infrastructure supporting both real-time and batch use cases. • Lead applied machine learning initiatives supporting compliance and customer success, including areas such as: • Compliance monitoring, alerting, and investigation support • Customer success automation, prioritization, and insight generation • Internal operational tooling and responsible AI adoption • Partner with cross-functional stakeholders to translate complex business, customer, and regulatory problems into production ML solutions. • Build, mentor, and scale a small team of ML engineers and applied scientists, operating as a player-coach when needed.

United States
$280K - $350K / year
Job Closed
GR8 Tech logo

Senior Machine Learning Engineer, Research Team

GR8 Tech

Launch, grow, or upgrade your iGaming business with GR8 Tech high-performance Sportsbook and iGaming platform.

OtherRemoteTeam 501-1,000H1B No Sponsor

• Take technical ownership of core components of recommendation and personalization systems (retrieval, ranking, evaluation). • Design and evolve two-tower / embedding-based retrieval models and downstream rankers. • Drive architectural and modeling decisions with a strong understanding of trade-offs between model quality, system complexity, latency, and cost. • Define and promote best practices for ML system design, experimentation, evaluation, and deployment. • Review ML designs, pipelines, and code with a focus on correctness, maintainability, and production readiness. • Act as a technical point of reference for ML-related decisions within the team. • Develop, train, and improve ML models for retrieval and ranking use cases. • Work with embedding-based deep learning models and classical ML approaches. • Perform in-depth data analysis, feature exploration, and systematic error analysis. • Build reproducible experiments and robust offline evaluation pipelines. • Optimize models for both offline metrics and online business KPIs. • Design and operate batch and real-time training and inference workflows in a cloud environment, with awareness of scalability and cost trade-offs. • Design, run, and analyze offline experiments and online A/B tests. • Own ML components in production, with a strong focus on reliability, observability, and safe iteration. • Monitor model performance and data quality in production. • Collaborate on scalable training and serving infrastructure for ML systems. • Participate in incident analysis related to ML systems and contribute to root-cause analysis and long-term fixes. • Design ML systems with failure modes in mind, including fallbacks and graceful degradation. • Work closely with Data Engineering on data pipelines and feature generation. • Partner with Product and Analytics to translate business goals into clear ML objectives and success metrics. • Act as a technical mentor for ML engineers, providing guidance on modeling, experimentation, and production ML. • Provide constructive feedback through code reviews and design discussions, supporting the growth of the team.

United States
Job Closed