Principal MLOps Engineer

Machine Learning EngineerMachine Learning EngineerFull TimeRemoteLeadTeam 201-500

Location

United States

Posted

41 days ago

Salary

$150K - $200K / year

Seniority

Lead

No structured requirement data.

Job Description

Principal MLOps Engineer

Raft Company Website

Role Description Raft is building mission-critical AI and data platforms for the Department of Defense (DoD). Our systems ingest and process massive volumes of real-time data from hundreds of sensors and operational sources, transform that data into usable intelligence, and deliver it to operators through mission applications and common operational pictures that support time-sensitive decision-making. Our platform operates at scale, processing billions of events per day with low-latency data pipelines and cloud-native infrastructure. As Raft expands its AI capabilities, we are investing in a more mature end-to-end machine learning platform to support model development, evaluation, deployment, monitoring, and lifecycle management across both cloud and constrained operational environments. In this role, you will help design, deploy, and mature Raft’s ML platform and MLOps infrastructure. You will work across Kubernetes-based deployment environments, GPU-enabled infrastructure, model serving systems, CI/CD pipelines, and secure production operations to enable rapid and reliable delivery of machine learning capabilities. This role is ideal for someone who understands both the infrastructure needed to run ML systems in production and the practical needs of ML engineers building and deploying models. What you’ll do: - Design, build, and maintain secure, scalable MLOps infrastructure and deployment pipelines for production ML systems - Help mature Raft’s internal ML platform and model lifecycle capabilities, including model packaging, registry/catalog workflows, deployment, monitoring, and operational support - Deploy and manage machine learning workloads on Kubernetes, including GPU-enabled clusters - Support model serving and inference infrastructure for a range of ML use cases, including traditional ML, computer vision, speech/audio, and LLM-based systems - Build and maintain CI/CD workflows for ML services, model artifacts, and platform components - Partner closely with ML engineers, software engineers, and product teams to move models from experimentation to reliable operational deployment - Improve observability, reliability, security, and maintainability across ML infrastructure and services - Help evaluate and standardize runtime patterns, serving frameworks, and deployment architectures for production ML workloads - Contribute to infrastructure decisions across edge, on-prem, and cloud-hosted deployment environments - Support compliance-driven deployment practices and secure software supply chain requirements in defense environments - Get hands-on with customers at the most forward-leaning places in the Department of War Qualifications - 7+ years of relevant hands-on experience in software engineering, platform engineering, DevOps, MLOps, or related technical roles - 5+ years of experience with Docker and Kubernetes in production environments - 5+ years of experience supporting enterprise cloud infrastructure or applications in AWS, Azure, or similar environments - Strong experience provisioning, operating, and troubleshooting Kubernetes clusters in production - Experience building and maintaining machine learning platforms, infrastructure, or pipelines used by engineering or data science teams - Practical experience deploying machine learning workloads on Kubernetes - Experience managing clusters or workloads that use GPUs - Strong understanding of Helm and Kubernetes deployment patterns - Strong scripting or programming skills, preferably in Python - Experience with modern software engineering practices including Git, CI/CD, DevOps, and Agile/Scrum workflows - Strong troubleshooting, systems thinking, and communication skills - Ability to work independently and collaboratively in a fast-moving environment - Ability to obtain and maintain a Top Secret clearance - Ability to obtain Security+ certification within the first 90 days of employment Highly preferred: - Experience with ML model serving and inference platforms such as Triton Inference Server, KServe, Ray Serve, vLLM, or similar technologies - Experience with secure and compliant deployment practices in regulated or government environments - Experience with Kubernetes-based ML platforms such as Kubeflow - Familiarity with service mesh technologies such as Istio - Experience provisioning and debugging complex CI/CD systems - Experience with infrastructure as code tools such as Terraform - Familiarity with software supply chain security, container hardening, vulnerability management, and runtime scanning - Experience supporting ML systems across multiple deployment environments, including cloud, on-prem, and edge - Background working with machine learning engineers on model training, evaluation, packaging, and release workflows - Familiarity with storage and artifact systems used in ML platforms, such as S3-compatible object stores, registries, and metadata/catalog system What success looks like: - You help Raft stand up a more mature and repeatable ML platform for deploying and managing models in production - ML engineers can move faster because deployment, serving, and platform workflows are clearer, more reliable, and easier to use - Model deployments become more secure, observable, and supportable across real-world mission environments - The organization gains stronger infrastructure for model lifecycle management, including deployment standards, runtime patterns, and platform ownership Clearance Requirements: - Ability to obtain and maintain a Top Secret clearance Work Type: - Remote in DMV; McLean, VA; Boston, MA; San Antonio, TX; Colorado Springs, CO; Tampa, FL; Honolulu, HI - May require up to 40% travel Salary Range: $150,000.00 - $200,000.00 What we will offer you: - Highly competitive salary - Fully covered healthcare, dental, and vision coverage - 401(k) and company match - Take as you need PTO + 11 paid holidays - Education & training benefits - Annual budget for your tech/gadgets needs - Monthly box of yummy snacks to eat while doing meaningful work - Remote, hybrid, and flexible work options - Team off-site in fun places! - Generous Referral Bonuses - And More!

Related Job Pages

More Machine Learning Engineer Jobs

River Financial logo

Senior/Staff Machine Learning Engineer

River Financial

Buy and mine Bitcoin. Zero-fee DCA. 100% reserve custody. Built for people who want more Bitcoin. www.River.com

Full TimeRemoteTeam 51-200Since 2020H1B No Sponsor

• Design, build, and iterate on machine learning models and LLM-based systems that power critical decisions across fraud, compliance, growth, and operations • Work with messy, real-world data to identify signals, build features, and continuously improve model performance • Make practical tradeoffs between model performance, interpretability, and operational cost • Partner closely with product and operations to identify and solve problems that directly impact experience of hundreds of thousands of clients • Contribute to backend systems and data pipelines that support model training and inference (without being primarily an infrastructure role) • Write high-quality, tested code and participate in code reviews • Take long-term ownership of critical systems as River scales

Europe
$150K - $250K / year
Affirm logo

Machine Learning Engineer II, Fraud

Affirm

Affirm is a financial services company that is on a mission to provide its customers with “honest financial products that improve lives.” As an employer, Af

• You will develop and iterate on fraud prediction models using a mix of approaches for tabular and behavioral data • You will build and scale feature pipelines and training datasets from proprietary and third-party signals, partnering with data and platform teams when needed. • You will prototype new modeling ideas and features, run offline experiments, and drive the best-performing approaches into production with appropriate risk controls. • You will help productionize models: integrate into batch and/or real-time decision systems, and improve reliability, latency, and operational robustness. • You will instrument and monitor model and data health, and help define retraining/backtesting workflows as fraud patterns evolve. • You will collaborate across Engineering, Fraud Analytics, Product, and ML Platform to define requirements, evaluate tradeoffs, and communicate results clearly to both technical and non-technical audiences.

Canada
$125K - $175K / year
Full TimeRemoteTeam 501-1,000Since 2005H1B No Sponsor

• Design, build, and deploy production-grade machine learning models and systems at scale • Own the full ML lifecycle: from problem definition and feature engineering to training, evaluation, deployment, and monitoring • Build scalable data and model pipelines with strong reliability, observability, and automated retraining • Work with large-scale datasets to improve ranking, recommendations, search relevance, prediction, content/user understanding, and optimization systems. • Partner cross-functionally with Product, Data Science, Infrastructure, and Engineering teams to translate complex problems into ML solutions • Improve system performance across latency, throughput, and model quality metrics • Research and apply state-of-the-art machine learning and AI techniques, including deep learning, graph & transformers based, and LLM evaluation/alignment • Contribute to technical strategy, architecture, and long-term ML roadmap

Canada
Job Closed
Amgen logo

Senior Machine Learning Engineer - Forecasting

Amgen

Founded in 1980, Amgen (short for Applied Molecular Genetics) is a biotechnology firm focused on developing human therapeutics. As an employer, Amgen has been d

Career Category Information Systems Job Description Join Amgen’s Mission of Serving Patients At Amgen, if you feel like you’re part of something bigger, it’s because you are. Our shared mission—to serve patients living with serious illnesses—drives all that we do. Since 1980, we’ve helped pioneer the world of biotech in our fight against the world’s toughest diseases. With our focus on four therapeutic areas –Oncology, Inflammation, General Medicine, and Rare Disease– we reach millions of patients each year. As a member of the Amgen team, you’ll help make a lasting impact on the lives of patients as we research, manufacture, and deliver innovative medicines to help people live longer, fuller happier lives. Our award-winning culture is collaborative, innovative, and science based. If you have a passion for challenges and the opportunities that lay within them, you’ll thrive as part of the Amgen team. Join us and transform the lives of patients while transforming your career. Senior Machine Learning Engineer - Forecasting What You Will Do We are seeking a Senior Machine Learning Engineer, Forecasting to join the Forecasting team within the AI & Data organization. This role will design, build, deploy, and maintain scalable machine learning systems that power forecasting capabilities and uncertainty-aware decision support across the company. This senior member of the team will work cross-functionally to translate advanced forecasting methods into reliable, production-grade solutions that support critical business processes and help Amgen deliver on its “every patient, every time” mandate. The role is particularly well suited to a strong engineer who is excited about building robust ML infrastructure, productionizing state-of-the-art forecasting models, and enabling decision-support solutions that inform multi-horizon planning and business decision-making. Key Responsibilities - Design, build, and maintain scalable machine learning systems and forecasting pipelines to support demand forecasting across near-, medium-, and long-term planning horizons. - Productionize advanced statistical, Bayesian, and machine learning forecasting models, including training, validation, deployment, and lifecycle management. - Build and optimize data pipelines, feature engineering workflows, and batch and real-time inference systems using large, complex datasets. - Own the end-to-end ML engineering lifecycle, including solution design, prototyping, model integration, testing, deployment, monitoring, observability, and continuous improvement. - Develop robust MLOps capabilities, including model versioning, CI/CD, automated retraining, performance monitoring, drift detection, and rollback strategies. - Partner closely with data scientists and business stakeholders to operationalize forecasting, simulation, and scenario-analysis capabilities that support strategic decision-making. - Establish and promote software engineering best practices, including code quality, documentation, reproducibility, and system reliability. - Research and evaluate emerging tools, platforms, and methodologies in machine learning engineering, forecasting, and AI for potential application to business problems. Basic Qualifications - Doctorate degree OR - Master’s degree and 2 years of applying data science in enterprise environments experience OR - Bachelor’s degree and 4 years of applying data science in enterprise environments experience OR - Associate’s degree and 8 years of applying data science in enterprise environments experience OR - High school diploma / GED and 10 years of applying data science in enterprise environments experience Preferred Qualifications - 6+ years of experience in machine learning engineering, software engineering, or a related field, with a demonstrated track record of deploying production ML systems that deliver business value. - Strong experience building and maintaining end-to-end ML pipelines and production systems for forecasting or other predictive modeling use cases. - Expertise in model serving, and operationalizing probabilistic, Bayesian, or predictive models in production environments. - Strong programming skills in Python and SQL, with experience using tools such as scikit-learn, PyTorch, TensorFlow, and orchestration or workflow tools for ML pipelines. - Experience with cloud platforms, distributed data processing, containerization, and ML deployment patterns. - Strong understanding of software engineering fundamentals, including system design, testing, performance optimization, and maintainability. - Strong collaboration and communication skills, with the ability to work effectively across technical and non-technical teams. - An intellectually curious self-starter who can take ambiguous problems and build scalable solutions from the ground up. - Experience building and deploying forecasting models for biotech/pharma use cases with knowledge of healthcare commercial concepts such as payer/provider dynamics, formulary access, and coverage. - Experience partnering closely with data scientists to translate advanced statistical or machine learning models into reliable production services. - Experience leveraging machine learning and forecasting systems in retail, consumer goods, supply chain, or manufacturing applications. - Familiarity with model monitoring, explainability, and governance requirements in regulated or high-impact business environments. What You Can Expect Of Us As we work to develop treatments that take care of others, we also work to care for your professional and personal growth and well-being. From our competitive benefits to our collaborative culture, we’ll support your journey every step of the way. The expected annual salary range for this role in the U.S. (excluding Puerto Rico) is posted. Actual salary will vary based on several factors including but not limited to, relevant skills, experience, and qualifications. In addition to the base salary, Amgen offers a Total Rewards Plan, based on eligibility, comprising of health and welfare plans for staff and eligible dependents, financial plans with opportunities to save towards retirement or other goals, work/life balance, and career development opportunities that may include: - A comprehensive employee benefits package, including a Retirement and Savings Plan with generous company contributions, group medical, dental and vision coverage, life and disability insurance, and flexible spending accounts - A discretionary annual bonus program, or for field sales representatives, a sales-based incentive plan - Stock-based long-term incentives - Award-winning time-off plans - Flexible work models where possible. Refer to the Work Location Type in the job posting to see if this applies. Apply now and make a lasting impact with the Amgen team. careers.amgen.com In any materials you submit, you may redact or remove age-identifying information such as age, date of birth, or dates of school attendance or graduation. You will not be penalized for redacting or removing this information. Application deadline Amgen does not have an application deadline for this position; we will continue accepting applications until we receive a sufficient number or select a candidate for the position. Sponsorship Sponsorship for this role is not guaranteed. As an organization dedicated to improving the quality of life for people around the world, Amgen fosters an inclusive environment of diverse, ethical, committed and highly accomplished people who respect each other and live the Amgen values to continue advancing science to serve patients. Together, we compete in the fight against serious disease. Amgen is an Equal Opportunity employer and will consider all qualified applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, disability status, or any other basis protected by applicable law. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation. .Salary Range 156,190.05USD -211,315.95 USD

United States
$156K - $211K / year