Ikue

The world’s first customer data platform designed by telcos for telcos.

Machine Learning Engineer

Machine Learning EngineerMachine Learning EngineerFull Time Remote SeniorTeam 11-50H1B No SponsorCompany Site LinkedIn

Location

South Africa

Posted

63 days ago

Salary

Seniority

Senior

Bachelor Degree3 yrs expEnglishAWS Cloud Python Spark SQL

Job Description

• Design and construct Ikue's AI Studio in collaboration with Product owners and Data Scientists • Design and build machine learning pipelines (model build, evaluation, deploy, monitoring) • Integrate machine learning outputs into real-time and batch data pipelines • Ensure machine learning and data pipelines are monitored, reliable and supportable (including expert support when required)

Job Requirements

BSc Computer Science or Engineering
3+ years working experience as a Machine Learning Engineer
Advanced skills developing in Python, Spark, SQL
Experience deploying and maintaining common machine learning models (e.g., binary classification, regression, clustering) in the cloud (AWS ECS and Sagemaker preferable)
AWS Associate Developer certification (Machine Learning Speciality preferable)
Excellent problem solving and analytical skills

Benefits

You will become part of an international environment that embraces diversity and professionalism
A dynamic and motivated team, with a good sense of humour
Freedom to take responsibility, grow within the team

Related Categories

Machine Learning Engineer AI Engineer AI Research Scientist LLM Engineer Computer Vision Engineer NLP Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More Machine Learning Engineer Jobs

Machine Learning Engineer – Modeling, Algorithms

TRACTIAN

Artificial Intelligence Quarterbacking Your Maintenance

Machine Learning Engineer63 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• **Algorithm Development:** Design and train models to solve specific physical problems (e.g., machine uptime detection or production count prediction). • **Signal Processing:** Apply statistical methods to raw time-series data to extract meaningful features and reduce noise. • **Validation:** Define and monitor metrics (accuracy, recall, precision) to validate model performance on real-world data before and after deployment. • **Model Serving:** Develop and maintain RESTful APIs (using frameworks like FastAPI) to expose your models for real-time inference. • **Production Standards:** Write clean, modular, and testable Python code. You are expected to use version control, write unit tests, and follow software design patterns. • **Performance Optimization:** Profile and optimize model inference code to ensure low latency and efficient resource usage.

Numpy Pandas Python PyTorch Scikit-Learn SQL Tensorflow

View details: Machine Learning Engineer – Modeling, Algorithms

Brazil

Apply

Machine Learning Engineer – ML Training Platform

Pluralis Research

Protocol Learning: Multi-participant, low-bandwidth model parallel.

Machine Learning Engineer63 days ago

Full Time RemoteTeam 1-10H1B No Sponsor

Company Site LinkedIn

• Architect, build, and scale the foundational infrastructure powering our decentralized ML training platform • Design resource management systems provisioning and orchestrating compute across AWS, GCP, and Azure using infrastructure-as-code (Pulumi/Terraform) • Handle dynamic scaling, state synchronization, and concurrent operations across hundreds of heterogeneous nodes • Architect fault-tolerant infrastructure for distributed ML including GPU clusters, health monitoring, and resilient retry strategies • Build systems that simulate and handle real-world network conditions

AWS Azure Cloud Docker Google Cloud Platform Grafana Kubernetes Prometheus Python Terraform

View details: Machine Learning Engineer – ML Training Platform

California

Apply

Machine Learning Engineer – Distributed ML Systems

Pluralis Research

Protocol Learning: Multi-participant, low-bandwidth model parallel.

Machine Learning Engineer63 days ago

Full Time RemoteTeam 1-10H1B No Sponsor

Company Site LinkedIn

• Design and implement large-scale distributed training systems optimized for heterogeneous hardware operating under low-bandwidth, high-latency conditions. • Develop and optimize model-parallel training strategies (data, tensor, pipeline parallelism) with custom sharding techniques that minimize communication overhead. • Optimize GPU utilization, memory efficiency, and compute performance across distributed nodes. • Implement robust checkpointing, state synchronization, and recovery mechanisms for long-running, fault-prone training jobs. • Build monitoring and metrics systems to track training progress, model quality, and system bottlenecks. • Architect resilient training systems where nodes can fail, networks can partition, and participants can dynamically join or leave. • Design and optimize peer-to-peer topologies for decentralized coordination across non-co-located nodes. • Implement NAT traversal, peer discovery, dynamic routing, and connection lifecycle management. • Profile and optimize communication patterns to reduce latency and bandwidth overhead in multi-participant environments.

Distributed Systems gRPC Python

View details: Machine Learning Engineer – Distributed ML Systems

United States

Apply

Engenheiro de Machine Learning – Sênior

A3Data

Machine Learning Engineer63 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Empacotar e versionar modelos de machine learning (MLflow, SageMaker Model Registry) • Definir e implementar serviços AWS adequados (SageMaker, Lambda, ECS/EKS, API Gateway, entre outros) • Construir e manter esteiras CI/CD, garantindo automação de testes, build e deploy • Automatizar deploys em múltiplos ambientes (dev/staging/prod) com segurança e rollback • Expor modelos para consumo por outros serviços (via endpoints ou Lambdas) • Configurar e acompanhar monitoramentos em produção (CloudWatch, logs, métricas) • Colaborar com times multidisciplinares para garantir soluções eficientes, seguras e escaláveis.

AWS Azure Cloud DynamoDB Python

View details: Engenheiro de Machine Learning – Sênior

Brazil

Apply

Machine Learning Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More Machine Learning Engineer Jobs

Machine Learning Engineer – Modeling, Algorithms

Machine Learning Engineer – ML Training Platform

Machine Learning Engineer – Distributed ML Systems

Engenheiro de Machine Learning – Sênior