TRACTIAN logo
TRACTIAN

Artificial Intelligence Quarterbacking Your Maintenance

Machine Learning Engineer – Modeling, Algorithms

Machine Learning EngineerMachine Learning EngineerFull TimeRemoteSeniorTeam 51-200H1B No SponsorCompany SiteLinkedIn

Location

Brazil

Posted

63 days ago

Salary

0

Seniority

Senior

Job Description

Machine Learning Engineer – Modeling, Algorithms

TRACTIAN

• **Algorithm Development:** Design and train models to solve specific physical problems (e.g., machine uptime detection or production count prediction). • **Signal Processing:** Apply statistical methods to raw time-series data to extract meaningful features and reduce noise. • **Validation:** Define and monitor metrics (accuracy, recall, precision) to validate model performance on real-world data before and after deployment. • **Model Serving:** Develop and maintain RESTful APIs (using frameworks like FastAPI) to expose your models for real-time inference. • **Production Standards:** Write clean, modular, and testable Python code. You are expected to use version control, write unit tests, and follow software design patterns. • **Performance Optimization:** Profile and optimize model inference code to ensure low latency and efficient resource usage.

Job Requirements

  • Education:** Bachelor’s degree in Computer Science, Mathematics, Physics, Statistics, or Engineering.
  • Modeling Core:** Strong grasp of probability, statistics, and linear algebra. Practical experience with Time-Series Analysis and Signal Processing.
  • Python Proficiency:** Advanced knowledge of Python and the data science stack (Pandas, NumPy, Scikit-Learn, PyTorch/TensorFlow).
  • Software Engineering:** Experience writing production-grade software. You must be comfortable with Object-Oriented Programming (OOP) and writing API endpoints.
  • Testing:** Habitual use of testing frameworks (e.g., PyTest) to ensure algorithmic stability.
  • Data Handling:** Proficiency with SQL for data querying and analysis.

Related Job Pages

More Machine Learning Engineer Jobs

Pluralis Research logo

Machine Learning Engineer – ML Training Platform

Pluralis Research

Protocol Learning: Multi-participant, low-bandwidth model parallel.

Full TimeRemoteTeam 1-10H1B No Sponsor

• Architect, build, and scale the foundational infrastructure powering our decentralized ML training platform • Design resource management systems provisioning and orchestrating compute across AWS, GCP, and Azure using infrastructure-as-code (Pulumi/Terraform) • Handle dynamic scaling, state synchronization, and concurrent operations across hundreds of heterogeneous nodes • Architect fault-tolerant infrastructure for distributed ML including GPU clusters, health monitoring, and resilient retry strategies • Build systems that simulate and handle real-world network conditions

California
Pluralis Research logo

Machine Learning Engineer – Distributed ML Systems

Pluralis Research

Protocol Learning: Multi-participant, low-bandwidth model parallel.

Full TimeRemoteTeam 1-10H1B No Sponsor

• Design and implement large-scale distributed training systems optimized for heterogeneous hardware operating under low-bandwidth, high-latency conditions. • Develop and optimize model-parallel training strategies (data, tensor, pipeline parallelism) with custom sharding techniques that minimize communication overhead. • Optimize GPU utilization, memory efficiency, and compute performance across distributed nodes. • Implement robust checkpointing, state synchronization, and recovery mechanisms for long-running, fault-prone training jobs. • Build monitoring and metrics systems to track training progress, model quality, and system bottlenecks. • Architect resilient training systems where nodes can fail, networks can partition, and participants can dynamically join or leave. • Design and optimize peer-to-peer topologies for decentralized coordination across non-co-located nodes. • Implement NAT traversal, peer discovery, dynamic routing, and connection lifecycle management. • Profile and optimize communication patterns to reduce latency and bandwidth overhead in multi-participant environments.

United States
Full TimeRemoteTeam 51-200H1B No Sponsor

• Empacotar e versionar modelos de machine learning (MLflow, SageMaker Model Registry) • Definir e implementar serviços AWS adequados (SageMaker, Lambda, ECS/EKS, API Gateway, entre outros) • Construir e manter esteiras CI/CD, garantindo automação de testes, build e deploy • Automatizar deploys em múltiplos ambientes (dev/staging/prod) com segurança e rollback • Expor modelos para consumo por outros serviços (via endpoints ou Lambdas) • Configurar e acompanhar monitoramentos em produção (CloudWatch, logs, métricas) • Colaborar com times multidisciplinares para garantir soluções eficientes, seguras e escaláveis.

Brazil
Full TimeRemoteTeam 51-200H1B No Sponsor

• Develop, program, and test machine learning systems • Design architectures for reproducible, scalable, and monitored Machine Learning solutions • Assist the data science team in designing and building model deployment pipelines • Build data engineering processes/pipelines within the data environment • Define architecture standards, procedures, and tooling • Define data layers and their intended usage • Implement, fine-tune, and monitor LLM models (e.g., Bedrock, OpenSearch, HuggingFace) • Build inference pipelines for RAG-based responses (Retrieval-Augmented Generation) • Ensure versioning and reuse of trained models and embeddings • Work on continuous improvement of response relevance and source ranking • Work on reducing token costs using optimization techniques • Support integrations with AI APIs, vector stores, and document repositories • Communicate information clearly with technical and business team members • Help identify potential errors and report inconsistencies

Brazil