At Capital One, we think and work like a tech company, using our digital fluency to transform everything about the customer experience. We’re bending data to our will, and turning a stodgy industry on its head. That’s reflected in our ranking as the number one business technology innovator in the U.S. in the 2016 InformationWeek Elite 100.
Lead Machine Learning Engineer
Location
California + 2 moreAll locations: California | New York | Virginia
Posted
13 days ago
Salary
$197.3K - $245.6K / year
Seniority
Senior
Job Description
Lead Machine Learning Engineer
Capital One
• part of an Agile team dedicated to productionizing machine learning applications and systems at scale • participate in the detailed technical design, development, and implementation of machine learning applications using existing and emerging technology platforms • focus on machine learning architectural design • develop and review model and application code • ensure high availability and performance of machine learning applications • continuously learn and apply the latest innovations and best practices in machine learning engineering • design, build, and deliver ML models and components that solve real-world business problems • inform ML infrastructure decisions using understanding of ML modeling techniques and issues • solve complex problems by writing and testing application code, developing and validating ML models • collaborate as part of a cross-functional Agile team • retrain, maintain, and monitor models in production • leverage or build cloud-based architectures, technologies, and platforms to deliver optimized ML models at scale • construct optimized data pipelines to feed ML models • leverage continuous integration and continuous deployment best practices to ensure successful deployment
Job Requirements
- Bachelor’s Degree
- At least 6 years of experience designing and building data-intensive solutions using distributed computing (Internship experience does not apply)
- At least 4 years of experience programming with Python, Scala, or Java
- At least 2 years of experience building, scaling, and optimizing ML systems
- Master's or Doctoral Degree in computer science, electrical engineering, mathematics, or a similar field (preferred)
- 3+ years of experience building production-ready data pipelines that feed ML models (preferred)
- 3+ years of on-the-job experience with an industry recognized ML framework such as scikit-learn, PyTorch, Dask, Spark, or TensorFlow (preferred)
- 2+ years of experience developing performant, resilient, and maintainable code (preferred)
- 2+ years of experience with data gathering and preparation for ML models (preferred)
- 1+ years of experience leading teams developing ML solutions using industry best practices, patterns, and automation (preferred)
- Experience developing and deploying ML solutions in a public cloud such as AWS, Azure, or Google Cloud Platform (preferred)
- Experience designing, implementing, and scaling complex data pipelines for ML models and evaluating their performance (preferred)
- ML industry impact through conference presentations, papers, blog posts, open source contributions, or patents (preferred)
- Experience leveraging interactive AI tooling to accelerate productivity, utilizing capabilities beyond basic code completion (preferred)
Benefits
- comprehensive, competitive, and inclusive set of health, financial and other benefits that support total well-being
Related Guides
Related Job Pages
More Machine Learning Engineer Jobs
Senior Machine Learning Ops Engineer
Sheetz, IncSheetz is committed to the full inclusion of all qualified individuals. Sheetz is committed to considering all applicants regardless of disability who can perform all essential job duties with or without accommodations.
Role Description A Senior Machine Learning Ops Engineer at Sheetz ensures that AI models move seamlessly from “working on a laptop” to running reliably across our stores, applications, and systems at scale. This role powers capabilities like smarter inventory management, enhanced customer experiences, and faster decision-making that keeps pace with the way Sheetz operates. The MLOps Engineer designs, builds, and maintains the pipelines, deployment processes, and monitoring systems that allow models to run continuously and perform consistently. Just as Sheetz kitchens operate around the clock to serve customers, this role keeps our AI systems running 24/7, using data as the ingredients and algorithms as the recipes that drive our technology. This role qualifies for a remote work arrangement within our 7 state footprint (PA, OH, MI, WV, VA, MD, NC). Responsibilities - Lead the end-to-end development and optimization of ML pipelines, including training, validation, deployment, monitoring, and retraining workflows at scale. - Guide the use of and implement infrastructure for tools such as ML flow, TensorFlow, PyTorch, Docker, and Kubernetes to support scalable production workflows for model deployment and lifecycle management. - Design and monitor tools for performance monitoring, drift detection, and automated alerting. - Develop CI/CD pipelines to enable safe, rapid model iteration, deployment, and retraining across environments. - Write, review, and maintain high-quality, production ready code, ensuring robust, reproducible, and secure ML systems. - Apply advanced software engineering and ML Ops best practices to operationalize machine learning solutions efficiently and reliably. - Collaborate with cross-functional teams to align ML solutions with business needs and system requirements and guide integration efforts to embed ML into production applications. - Maintain thorough documentation, version control, metadata tracking, and lineage to support reproducibility and compliance of ML models. - Recommend and implement improvements to ML infrastructure, frameworks, and operational standards, elevating the organization’s ML maturity and capabilities. - Mentor and coach junior engineers, providing guidance on technical challenges, workflow design, and career development. Qualifications - Bachelor’s degree in Computer Science, Management Information Systems, Computer Engineering, or related discipline is required. - Minimum 5 years hands-on experience in designing, developing, and operationalizing machine learning solutions, with a strong focus on ML Ops practices and infrastructure is required. - Previous experience working with large databases – both structured and unstructured – to build data pipelines and self-service dashboards for business users required. - Previous experience in managing machine learning pipelines, lifecycle management, and deployment at scale—including training, validation, serving, and monitoring required. - Previous experience with CI/CD pipelines for ML workflows and containerization tools such as Docker and Kubernetes preferred. - Previous experience with secure and scalable cloud environments (e.g., AWS, GCP, Azure) and infrastructure-as-code and platform-as-a-service (PaaS) offerings preferred. - Cloud Platforms (AWS, GCP, Azure) preferred. - MLOps tools and frameworks (e.g., ML Flow, Kubeflow, TFX) preferred. - DevOps certifications (e.g. Docker, Kubernetes, Terraform, CI/CD Tools) preferred. Company Description
Machine Learning Scientist, Multimodal AI
NateraWe are a global leader in cell-free DNA (cfDNA) testing, dedicated to oncology, women’s health, and organ health.
• Design, implement, and evaluate deep learning models across biomedical data modalities • Develop multimodal AI architectures integrating H&E whole-slide imaging data with molecular and clinical data sources • Build scalable, production-quality ML workflows and pipelines using cloud infrastructure (AWS) • Apply modern ML techniques including CNNs, vision transformers (ViTs), sequence transformers, representation learning, and foundation model fine-tuning • Collaborate with technical and clinical teams to translate ML prototypes into validated tools • Analyze model outputs to generate reproducible biological and clinical insights • Document pipelines thoroughly and communicate data-driven findings to stakeholders
Senior MLOps Engineer
GuildAt Guild, we unlock opportunity for America’s workforce through education, skilling, and career mobility.
Role Description Guild is seeking a Senior MLOps Engineer . As a Senior ML Ops Engineer, you'll be pivotal in designing and implementing infrastructure and tooling that allows teams to efficiently develop, deploy, and iterate on machine learning models and AI agents. Your contributions will enable rapid innovation, consistent reliability, and effective scaling of Guild's AI capabilities. This is a role that will be pivotal in establishing Guild’s ML / AI platform. Qualifications - 5–7 years of experience in MLOps, DevOps, software engineering, or related fields. - Strong experience in building and maintaining scalable machine learning infrastructure and pipelines. - Expertise with cloud platforms (AWS, Azure, or GCP), particularly in managed AI/ML services. - Proficiency with containerization (Docker, Kubernetes) and orchestration tools. - Experience in MCP (model context protocol); any specific experience with Databricks MCP or AWS MCP is a plus. - Experience in model deployment frameworks and serving infrastructure (TensorFlow Serving, TorchServe, FastAPI, etc.). - Skilled in infrastructure-as-code tools like Terraform and familiarity with CI/CD automation (GitHub Actions, Jenkins). - Deep understanding of ML lifecycle management, monitoring, version control, and experiment tracking tools (e.g., MLflow, Kubeflow, Weights & Biases). - Strong coding skills, especially in Python, and familiarity with software engineering best practices. - Knowledge of monitoring, logging, and alerting systems for ML models in production. Requirements - Design, implement, and maintain platforms for seamless deployment, management, and monitoring of ML models and AI agents. - Develop and optimize CI/CD pipelines tailored specifically for AI and machine learning workflows. - Collaborate closely with data scientists, software engineers, and product teams to streamline ML model productionization. - Ensure infrastructure is scalable, secure, and adheres to best practices in reliability and observability. - Provide technical leadership in adopting best practices for model governance, versioning, testing, and validation. - Continuously improve platform performance, efficiency, and ease-of-use to accelerate development cycles. - Mentor team members on MLOps standards, practices, and emerging technologies. Benefits - Access to low-cost, high-quality health care options through Collective Health and Kaiser (due to coverage limitations, Kaiser is currently only available in CA & CO). - Access to a 401k to help save for the future. - Vacation policy to rest and recharge. - 8 days of fully-paid sick leave, to take the time to heal and or recover. - Family-friendly benefits, including 12 weeks of parental leave for non-birthing parents and 18-20 weeks for birthing parents; 2-week ramp-up period for when employees return from a leave of 6 weeks or more; as well as employer-paid short-term and long-term disability, employer-sponsored life insurance, fertility and caregiving benefits. - Well-rounded wellness benefits including free and low cost mental health resources and financial wellbeing support services. - Education benefits and tuition assistance to help your future development and growth.
Senior ML Engineer, LLMs, AWS
ProvectusWe help businesses leverage cloud, data, and AI to reimagine the way they operate, compete, and deliver customer value.
• Create ML models from scratch or improve existing models. • Collaborate with the engineering team, data scientists, and product managers on production models. • Develop experimentation roadmap. • Set up a reproducible experimentation environment and maintain experimentation pipelines. • Monitor and maintain ML models in production to ensure optimal performance. • Write clear and comprehensive documentation for ML models, processes, and pipelines. • Stay updated with the latest developments in ML and AI and propose innovative solutions.



