Terra logo
Terra

Terra is the Next Generation Claims and Policy Software for Workers’ Comp.

Staff Machine Learning Engineer

Machine Learning EngineerMachine Learning EngineerFull TimeRemoteLeadTeam 51-200H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

63 days ago

Salary

0

Seniority

Lead

Bachelor DegreeEnglishPyTorch

Job Description

Staff Machine Learning Engineer

Terra

• Design, train, test, and iterate on diffusion models for 3D geological models • Design, train, test, and iterate on an approach to for conditioning generation on geophysical data and other observations • Inform the generation of synthetic data to improve model performance • Adapt diffusion modeling approach to specific real-world projects in collaboration with project teams.

Job Requirements

  • Extensive PyTorch Experience
  • Deep understanding of PyTorch, including writing custom modules, optimizing training, and debugging issues in large-scale models.
  • Expertise in Developing Large Deep Learning Models from Scratch
  • Proven ability to design, implement, and train complex deep learning architectures from the ground up.
  • Data Curation Skills
  • Hands-on experience in creating, cleaning, and maintaining high-quality datasets tailored for machine learning applications.
  • Strong Software Engineering and Design Experience
  • Proficient in software development best practices, including version control, testing, and code optimization.
  • Familiarity with designing scalable and maintainable systems.
  • Bonus points if you:
  • Experience with Generative Models
  • Familiarity with generative architectures, particularly diffusion models, and an emphasis on posterior sampling methods.
  • Knowledge of Transformer Architectures
  • Experience building and training transformers, especially in applications involving 3D data.
  • Scaling Models Across Large GPU Clusters
  • Expertise in parallelizing models across multiple GPUs and optimizing distributed training pipelines.
  • Cloud Infrastructure Expertise
  • Experience setting up, managing, and optimizing cloud environments for machine learning workloads, including provisioning resources and managing costs.

Benefits

  • Health insurance
  • Flexible work arrangements
  • Professional development opportunities

Related Job Pages

More Machine Learning Engineer Jobs

Serve Robotics logo

Lead Machine Learning Engineer

Serve Robotics

Meet the future of sustainable, self-driving delivery.

Full TimeRemoteTeam 51-200Since 2017H1B Sponsor

• Design and maintain training systems that can process and learn from petabyte-scale multimodal datasets (e.g., video and point cloud data). This includes ensuring data is efficiently loaded, distributed, and processed across large GPU clusters. • Identify and resolve bottlenecks in the training pipeline, including data loading, preprocessing, model computation, and inter-node communication, to maximize GPU utilization and reduce training time. • Work with the ML team to develop and refine neural network architectures suitable for autonomy tasks, particularly those handling high-dimensional and sequential sensor data. • Create and adjust loss functions and training strategies that help the model learn effectively from complex multimodal inputs and improve autonomy performance. • Configure, monitor, and maintain large-scale distributed training jobs across multiple machines and GPUs, ensuring stability, fault tolerance, and efficient resource usage. • Implement scalable systems to preprocess, transform, and augment large robotics datasets so that they are suitable for model training. • Work closely with ML scientists and other engineers to integrate new models, experiments, and training approaches into the production training pipeline. • Analyze training metrics, model outputs, and experiment logs to assess model performance and guide improvements in architecture, data usage, or training strategies. • Develop tools and workflows that allow teams to run experiments, track results, and iterate quickly on new model ideas or training approaches.

Canada
$177K - $215K / year
Full TimeRemoteTeam 1,001-5,000H1B No Sponsor

• Lead full lifecycle AI/ML system development in Palantir Foundry, including data pipeline integration, feature engineering, model implementation, deployment, and production monitoring • Design, develop, and optimize machine learning models and algorithms for performance, scalability, and reliability • Collaborate with data scientists and engineers to transition models from research and experimentation into production systems • Build and maintain model deployment, versioning, and monitoring workflows • Integrate AI/ML solutions into existing platforms and business processes • Evaluate and apply emerging AI/ML technologies where they provide clear business value • Ensure robustness, maintainability, and performance of deployed AI/ML systems in operational environments.

Maryland
$170K - $210K / year
Job Closed
Fusion Risk Management logo

Machine Learning Engineer

Fusion Risk Management

Fusion Risk Management is recognized as the most innovative and fastest growing provider of cloud-based enterprise software for business continuity risk management, IT disaster recovery and crisis management. Fusion is transforming the industry and has been named a leader in Gartner's Magic Quadrant for Business Continuity Management software.

Full TimeRemoteTeam 258Since 2006

The Role We're looking for a product-minded Machine Learning Engineer to pioneer the engineering of intelligent resilience systems at Fusion. This role will focus on designing, building, deploying, and operating production-grade machine learning systems-including reinforcement learning and optimization-driven intelligence-to power the next generation of resilience capabilities. You will architect and deliver scalable ML systems that unify resilience data from some of the world's largest and most systemically important organizations. This includes building robust model pipelines, integrating simulation and optimization engines into production services, and establishing strong ML Ops and AI Ops practices to ensure reliability, performance, and governance at scale. This is a high-ownership role for someone who thrives at the intersection of software engineering and machine learning-someone who wants to build durable AI/ML infrastructure, ship intelligent product features, and solve complex real-world operational resilience challenges. Key Responsibilities - Design, build, deploy, and maintain production machine learning systems, including reinforcement learning components and intelligent optimization-driven features. - Architect scalable ML pipelines for training, validation, deployment, monitoring, and automated retraining. - Maintain and expand operations for simulation (Monte Carlo, Bayesian Networks) and optimization engines (linear, constraint, CP-SAT) for continued reliable service. - Own ML Ops and AI Ops practices, including CI/CD for models, automated testing, model validation, performance monitoring, drift detection, observability, and governance frameworks. - Refactor and harden existing AI systems to improve scalability, latency, cost efficiency, and fault tolerance. - Build and maintain data pipelines and feature engineering workflows that support reliable and reproducible model training. - Collaborate closely with product and engineering teams to translate resilience use cases into scalable, maintainable ML-powered product capabilities. - Contribute to the design of Fusion's ML architecture, infrastructure standards, and long-term intelligent systems roadmap. Knowledge, Skills, and Abilities - Strong software engineering foundation with hands-on experience building and deploying machine learning systems in production environments. - Experience designing ML architectures, APIs, and services that integrate with enterprise SaaS platforms. - Deep understanding of model lifecycle management: experimentation, validation, deployment, monitoring, retraining, and versioning. - Experience with reinforcement learning, decision systems, simulation modeling, or optimization techniques. - Strong experience building scalable data and feature pipelines using cloud-native tools (e.g., Azure, Snowflake, dbt, Salesforce integrations, or similar platforms). - Proficiency in writing clean, maintainable, well-tested code with version control, CI/CD, and observability best practices. - Familiarity with containerization and distributed systems (Docker, Kubernetes, serverless architectures, or similar). - Ability to design modular, extensible ML systems that evolve alongside product requirements. - Strong communication skills and the ability to explain system behavior, tradeoffs, and architectural decisions to technical and non-technical stakeholders. Qualifications (Education and Experience) Bachelor's or Master's degree in Computer Science, Machine Learning, Artificial Intelligence, Engineering, or a related field. 3+ years of experience building, deploying, and operating machine learning systems in production environments. Experience with reinforcement learning, decision intelligence systems, or control systems (strongly preferred). Experience with simulation, optimization, constraint programming, or operations research techniques (preferred). Experience building ML pipelines in cloud environments (Azure preferred). Experience implementing ML Ops tooling for testing, validation, monitoring, retraining, and governance (preferred). Experience deploying AI-powered systems within enterprise SaaS environments (nice to have). Milestones for the First Six Months In One Month, You Will: - Complete onboarding and gain familiarity with Fusion's resilience domain, existing product line, simulation and optimization engines. - Contribute code to existing ML systems and participate in production improvements. - Review and assess current ML pipeline and deployment practices. In Three Months, You Will: - Design and deploy at least one production-ready ML component or reinforcement learning module. - Improve reliability, performance, or scalability of existing intelligent systems. - Implement monitoring, validation, and automated testing for one production AI/ML system. In Six Months, You Will: - Own and deliver a production-grade intelligent capability (e.g., adaptive optimization engine, reinforcement-driven decision module, or production-trained GPT workflow). - Establish baseline ML Ops standards for model deployment, monitoring, retraining, and governance. - Lead architectural improvements to Fusion's ML infrastructure. - Propose and prototype new ML-driven product capabilities that extend Fusion's resilience intelligence platform. Compensation & Benefits The annual base salary range for this position is $135,000-$155,000, depending on experience, qualifications, and relevant skill set. The position is also eligible for an annual bonus. Fusion offers a comprehensive benefits package including medical, dental, vision, and a 401(k) plan. Disclaimers Fusion is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, disability, age, pregnancy, military service or discharge status, genetic information, sex, sexual orientation, gender identity, or national origin. Nothing in this job posting should be construed as an offer or guarantee of employment.

United States
$135K - $155K / year
SharkNinja logo

Lead Machine Learning Engineer

SharkNinja

SharkNInja is a global leader in the housewares and small appliances industry, providing innovative vacuum cleaners via the company’s Shark brand and serving

About Us SharkNinja is a global product design and technology company, with a diversified portfolio of 5-star rated lifestyle solutions that positively impact people's lives in homes around the world. Powered by two trusted, global brands, Shark and Ninja, the company has a proven track record of bringing disruptive innovation to market and developing one consumer product after another has allowed SharkNinja to enter multiple product categories, driving significant growth and market share gains. Headquartered in Needham, Massachusetts with more than 4,100 associates, the company's products are sold at key retailers, online and offline, and through distributors around the world. The Lead Machine Learning Engineer will play a critical hands-on role in designing, building, and deploying machine learning solutions that power SharkNinja's next generation of consumer products, digital experiences, and operational capabilities. This role combines deep technical execution with technical leadership, mentoring, and cross-functional collaboration. You will lead end-to-end ML initiatives-from problem framing and data exploration through production deployment-while helping to establish best practices for scalable, reliable ML at SharkNinja. Key Responsibilities - Lead the design, development, and deployment of production-grade machine learning models across consumer, digital, and operational use cases. - Partner with Product, Engineering, and Business stakeholders to translate real-world problems into ML-driven solutions with measurable impact. - Own end-to-end ML workflows, including data preparation, feature engineering, model training, evaluation, deployment, and monitoring. - Build and maintain scalable ML pipelines and services that integrate seamlessly with SharkNinja's data and software platforms. - Provide technical leadership and mentorship to ML engineers and data scientists, raising the bar on model quality, reliability, and performance. - Drive best practices in model versioning, experimentation, monitoring, and lifecycle management. - Collaborate with Data Engineering and Platform teams to ensure ML solutions are secure, performant, and production-ready. - Contribute to technical design reviews and influence architectural decisions related to ML systems. - Stay current with emerging ML techniques and technologies, applying them thoughtfully to SharkNinja use cases. Qualifications Must-Haves - Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related field. - 6+ years of experience building and deploying machine learning models in production environments. - Strong experience with Python and ML frameworks such as TensorFlow, PyTorch, or scikit-learn. - Solid understanding of supervised and unsupervised learning, model evaluation, and feature engineering. - Experience deploying ML models using cloud-based platforms and services (AWS, GCP, or Azure). - Familiarity with MLOps practices, including CI/CD for ML, model monitoring, and retraining strategies. - Ability to clearly communicate complex technical concepts to both technical and non-technical partners. - Proven ability to work in fast-paced, cross-functional environments with a "progress over perfection" mindset. Nice-to-Haves - Experience with real-time or embedded ML systems. - Exposure to recommendation systems, personalization, forecasting, or computer vision. - Experience working in consumer products, IoT, or e-commerce environments. - Familiarity with data orchestration tools (e.g., Airflow) and ML platforms (e.g., SageMaker, Databricks). Salary and Other Compensation: The annual salary range for this position is displayed below. Factors which may affect starting pay within this range may include geography/market, skills, education, experience and other qualifications of the successful candidate. The Company offers the following benefits for this position, subject to applicable eligibility requirements: medical insurance, dental insurance, vision insurance, flexible spending accounts, health savings accounts (HSA) with company contribution, 401(k) retirement plan with matching, employee stock purchase program, life insurance, AD&D, short-term disability insurance, long-term disability insurance, generous paid time off, company holidays, parental leave, identity theft protection, pet insurance, pre-paid legal insurance, back-up child and eldercare days, product discounts, referral bonus program, and more. Pay Range $175,500-$214,500 USD Our Culture At SharkNinja, we don't just raise the bar-we push past it every single day. Our Outrageously Extraordinary mindset drives us to tackle the impossible, push boundaries, and deliver results that others only dream of. If you thrive on breaking out of your swim lane, you'll be right at home. What We Offer We offer competitive health insurance, retirement plans, paid time off, employee stock purchase options, wellness programs, SharkNinja product discounts, and more. We empower your personal and professional growth with high impact Learning Programs featuring bold voices redefining what's possible. When you join, you're not just part of a company-you're part of an outrageously extraordinary community. Together, we won't just launch products-we'll disrupt entire markets. At SharkNinja, Diversity, Equity, and Inclusion are vital to our global success. Valuing each unique voice and blending all of our diverse skills strengthens SharkNinja's innovation every day. We support ALL associates in bringing their authentic selves to work, making an impact, and having the opportunity for career acceleration. With help from our leadership, associates, and our community, we aim to have equity be a key component of the SharkNinja DNA. Learn more about us: Life At SharkNinja Outrageously Extraordinary SharkNinja Candidate Privacy Notice - For candidates based in all regions, please refer to this Candidate Privacy Notice. - For candidates based in China, please refer to this Candidate Privacy Notice. - For candidates based in Vietnam, please refer to this Candidate Privacy Notice. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, disability, or any other class protected by legislation, and local law. SharkNinja will consider reasonable accommodations consistent with legislation, and local law. If you require a reasonable accommodation to participate in the job application or interview process, please contact SharkNinja People & Culture at accommodations@sharkninja.com

United States
$175.5K - $214.5K / year
Job Closed