Serverless AI Inference - run any model, at any scale, without managing GPUs

Machine Learning Engineer – Inference Optimization

Machine Learning EngineerMachine Learning EngineerFull Time Remote SeniorTeam 1-10Since 2023H1B No SponsorCompany Site LinkedIn

Location

Worldwide

Posted

142 days ago

Salary

Seniority

Senior

EnglishDistributed Systems PyTorch

Job Description

• Optimize inference latency, throughput, and cost for large-scale ML models in production • Profile and bottleneck GPU/CPU inference pipelines (memory, kernels, batching, IO) • Implement and tune techniques such as: • Quantization (fp16, bf16, int8, fp8) • KV-cache optimization & reuse • Speculative decoding, batching, and streaming • Model pruning or architectural simplifications for inference • Collaborate with research engineers to productionize new model architectures • Build and maintain inference-serving systems (e.g. Triton, custom runtimes, or bespoke stacks) • Benchmark performance across hardware (NVIDIA / AMD GPUs, CPUs) and cloud setups • Improve system reliability, observability, and cost efficiency under real workloads

Job Requirements

Strong experience in ML inference optimization or high-performance ML systems
Solid understanding of deep learning internals (attention, memory layout, compute graphs)
Hands-on experience with PyTorch (or similar) and model deployment
Familiarity with GPU performance tuning (CUDA, ROCm, Triton, or kernel-level optimizations)
Experience scaling inference for real users (not just research benchmarks)
Comfortable working in fast-moving startup environments with ownership and ambiguity
Experience with LLM or long-context model inference
Knowledge of inference frameworks (TensorRT, ONNX Runtime, vLLM, Triton)
Experience optimizing across different hardware vendors
Open-source contributions in ML systems or inference tooling
Background in distributed systems or low-latency services

Benefits

Competitive compensation + meaningful equity at Series A

Related Categories

Machine Learning Engineer AI Engineer AI Research Scientist LLM Engineer Computer Vision Engineer NLP Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More Machine Learning Engineer Jobs

Machine Learning Engineer – Platform

Artera.net

Artera is a Swiss ISP that produces premium hosting and cloud services.

Machine Learning Engineer142 days ago

Full Time RemoteTeam 11-50Since 2002H1B No Sponsor

Company Site LinkedIn

• Work on the AI Platform team focusing on scalable and efficient pipelines for model training, evaluation, and data processing • Build and evolve core libraries used by AI scientists to develop, launch, and monitor AI products • Optimize GPU and CPU efficiency and data throughput of large-scale foundation models • Ensure Artera’s observability infrastructure provides a clear picture of model performance optimization

AWS Docker Kubernetes Node.js Python PyTorch Ray Tensorflow Terraform

View details: Machine Learning Engineer – Platform

California

$140K - $180K / year

Apply

Senior Machine Learning Engineer

A3Data

Machine Learning Engineer142 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Define and implement scalable, reproducible, monitorable, production-ready Machine Learning architectures. • Develop, evolve, and maintain production Machine Learning pipelines and services, ensuring reliability and performance. • Deploy highly available models and pipelines with a focus on MLOps, CI/CD, and automation. • Collaborate with data scientists, data engineers, developers, and business stakeholders. • Diagnose and resolve complex issues related to models and pipelines in production. • Lead technical discussions and workshops, and support architectural decisions with teams and clients. • Contribute to raising the client's and A3 Data's technical maturity by promoting best practices.

AWS Azure Docker GCP Kubernetes Python PyTorch scikit-learn TensorFlow

View details: Senior Machine Learning Engineer

Brazil

Apply

Machine Learning Engineer – Mid-level

A3Data

Machine Learning Engineer142 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Develop, train, and improve Machine Learning models, ensuring reproducibility, scalability, and production monitoring; • Implement and manage the model lifecycle, with versioning for code, data, metrics, and artifacts, following MLOps best practices; • Package models as scalable, highly available services integrated into automated pipelines; • Support and continuously improve ML solutions in production, identifying and fixing issues; • Collaborate with Data Engineering, Data Science, and business teams in a multidisciplinary environment; • Perform code reviews and support the technical development of more junior engineers; • Participate in technical discussions with clients, explaining solutions, architectural decisions, and trade-offs.

Airflow PySpark Python PyTorch scikit-learn TensorFlow

View details: Machine Learning Engineer – Mid-level

Brazil

Apply

Senior Machine Learning Scientist

Matterworks

AI-Powered Tools to Engineer Biology

Machine Learning Engineer143 days ago

Other RemoteTeam 11-50H1B No Sponsor

Company Site LinkedIn

• Design, adapt, and optimize deep learning architectures for scientific domains and data modalities. • Own and deliver on complex ML projects, including experiment design, implementation, evaluation, and iteration based on results. • Write clean, well-tested code in PyTorch and NumPy enabling a high experimentation rate. • Stay current with deep learning research and its applications in chemistry and biology. • Propose and prototype new ideas to enhance our modeling capabilities. • Work closely with scientists and engineers across the team to integrate models into our product and infrastructure.

NumPy Python PyTorch

View details: Senior Machine Learning Scientist

United States

Apply

Job Closed

Machine Learning Engineer – Inference Optimization

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More Machine Learning Engineer Jobs

Machine Learning Engineer – Platform

Senior Machine Learning Engineer

Machine Learning Engineer – Mid-level

Senior Machine Learning Scientist