Serverless AI Inference - run any model, at any scale, without managing GPUs
Machine Learning Engineer – Multilingual Data
Location
Worldwide
Posted
142 days ago
Salary
0
Seniority
Senior
Job Description
Machine Learning Engineer – Multilingual Data
Featherless AI
• Design, build, and maintain large-scale multilingual datasets across high- and low-resource languages • Develop data pipelines for collection, cleaning, normalization, deduplication, and labeling • Implement quality filters using statistical, heuristic, and model-based methods • Work with researchers to define language coverage, benchmarks, and evaluation metrics • Analyze dataset bias, coverage gaps, and failure modes across regions and scripts • Support training, fine-tuning, and distillation workflows with high-quality multilingual data • Continuously iterate on datasets based on model performance and real-world usage
Job Requirements
- 3+ years of experience as an ML Engineer, Applied Scientist, or similar role
- Strong experience working with multilingual or non-English datasets
- Solid understanding of NLP fundamentals (tokenization, embeddings, language modeling)
- Experience building scalable data pipelines (Python, Spark, Ray, or similar)
- Familiarity with Unicode, scripts, tokenization challenges, and language-specific quirks
- Comfort collaborating with researchers and translating research needs into production systems
Benefits
- Competitive compensation + meaningful equity at Series A stage
Related Guides
Related Job Pages
More Machine Learning Engineer Jobs
Machine Learning Engineer – Platform
Artera.netArtera is a Swiss ISP that produces premium hosting and cloud services.
• Work on the AI Platform team focusing on scalable and efficient pipelines for model training, evaluation, and data processing • Build and evolve core libraries used by AI scientists to develop, launch, and monitor AI products • Optimize GPU and CPU efficiency and data throughput of large-scale foundation models • Ensure Artera’s observability infrastructure provides a clear picture of model performance optimization
• Define and implement scalable, reproducible, monitorable, production-ready Machine Learning architectures. • Develop, evolve, and maintain production Machine Learning pipelines and services, ensuring reliability and performance. • Deploy highly available models and pipelines with a focus on MLOps, CI/CD, and automation. • Collaborate with data scientists, data engineers, developers, and business stakeholders. • Diagnose and resolve complex issues related to models and pipelines in production. • Lead technical discussions and workshops, and support architectural decisions with teams and clients. • Contribute to raising the client's and A3 Data's technical maturity by promoting best practices.
• Develop, train, and improve Machine Learning models, ensuring reproducibility, scalability, and production monitoring; • Implement and manage the model lifecycle, with versioning for code, data, metrics, and artifacts, following MLOps best practices; • Package models as scalable, highly available services integrated into automated pipelines; • Support and continuously improve ML solutions in production, identifying and fixing issues; • Collaborate with Data Engineering, Data Science, and business teams in a multidisciplinary environment; • Perform code reviews and support the technical development of more junior engineers; • Participate in technical discussions with clients, explaining solutions, architectural decisions, and trade-offs.
• Design, adapt, and optimize deep learning architectures for scientific domains and data modalities. • Own and deliver on complex ML projects, including experiment design, implementation, evaluation, and iteration based on results. • Write clean, well-tested code in PyTorch and NumPy enabling a high experimentation rate. • Stay current with deep learning research and its applications in chemistry and biology. • Propose and prototype new ideas to enhance our modeling capabilities. • Work closely with scientists and engineers across the team to integrate models into our product and infrastructure.



