Artera is a Swiss ISP that produces premium hosting and cloud services.
Machine Learning Engineer – Platform
Location
California
Posted
141 days ago
Salary
$140K - $180K / year
Seniority
Senior
Job Description
Machine Learning Engineer – Platform
Artera.net
• Work on the AI Platform team focusing on scalable and efficient pipelines for model training, evaluation, and data processing • Build and evolve core libraries used by AI scientists to develop, launch, and monitor AI products • Optimize GPU and CPU efficiency and data throughput of large-scale foundation models • Ensure Artera’s observability infrastructure provides a clear picture of model performance optimization
Job Requirements
- 5+ years of industry software engineering experience
- 4+ years of industry experience using one of PyTorch, TensorFlow, or JAX in Python
- 3+ years of industry experience building with AWS, Docker, and Kubernetes
- 1+ years of industry experience optimizing large-scale, high data-throughput, distributed machine learning training pipelines
- Experience in using ML orchestration frameworks such as Flyte, Ray, Kubeflow, Metaflow, MLFlow, Dagster, Argo Workflow or Prefect
- Experience using Terraform, SqlAlchemy
- Experience in multi-node and multi-gpu training
- Experience deploying and maintaining infrastructure for machine learning training and production inference
- Familiarity with TorchScript, ONNXRuntime, DeepSpeed, AWS Neuron or similar approaches to inference optimization
Benefits
- 401k matching
- unlimited paid time off (PTO)
Related Guides
Related Job Pages
More Machine Learning Engineer Jobs
• Define and implement scalable, reproducible, monitorable, production-ready Machine Learning architectures. • Develop, evolve, and maintain production Machine Learning pipelines and services, ensuring reliability and performance. • Deploy highly available models and pipelines with a focus on MLOps, CI/CD, and automation. • Collaborate with data scientists, data engineers, developers, and business stakeholders. • Diagnose and resolve complex issues related to models and pipelines in production. • Lead technical discussions and workshops, and support architectural decisions with teams and clients. • Contribute to raising the client's and A3 Data's technical maturity by promoting best practices.
• Develop, train, and improve Machine Learning models, ensuring reproducibility, scalability, and production monitoring; • Implement and manage the model lifecycle, with versioning for code, data, metrics, and artifacts, following MLOps best practices; • Package models as scalable, highly available services integrated into automated pipelines; • Support and continuously improve ML solutions in production, identifying and fixing issues; • Collaborate with Data Engineering, Data Science, and business teams in a multidisciplinary environment; • Perform code reviews and support the technical development of more junior engineers; • Participate in technical discussions with clients, explaining solutions, architectural decisions, and trade-offs.
• Design, adapt, and optimize deep learning architectures for scientific domains and data modalities. • Own and deliver on complex ML projects, including experiment design, implementation, evaluation, and iteration based on results. • Write clean, well-tested code in PyTorch and NumPy enabling a high experimentation rate. • Stay current with deep learning research and its applications in chemistry and biology. • Propose and prototype new ideas to enhance our modeling capabilities. • Work closely with scientists and engineers across the team to integrate models into our product and infrastructure.
AI/ML Engineer
FTI - Frontier Technology Inc.Right Data. Best Decisions. | Technology and deep data expertise to drive the best defense and intelligence decisions.
• Design, develop, and deploy AI/ML models and pipelines that meet mission and performance objectives. • Build, train, and fine-tune models using frameworks such as PyTorch, TensorFlow, scikit-learn, Hugging Face, and LangChain. • Develop and operationalize MLOps pipelines (MLflow, Kubeflow, DVC, or custom training/inference orchestration). • Implement and optimize vector databases (Milvus, Pinecone, Chroma, FAISS) and retrieval architectures (RAG, graph, hybrid). • Write clean, efficient Python code for data ingestion, feature engineering, embeddings, and inference services. • Experiment with fine-tuning and optimization of LLMs and task-specific models (LoRA, QLoRA, PEFT). • Contribute to agent-based applications using frameworks like LangGraph, AutoGen, CrewAI, or DSPy. • Integrate AI services into real-world systems via APIs, event-driven workflows, or UI copilots. • Collaborate with data engineers, software developers, and mission analysts to ensure AI models are production-ready and aligned with customer needs. • Participate in peer reviews, contribute to shared repositories, and document models and experiments for reproducibility.



