Job Closed

This listing is no longer active.

Somos una empresa de tecnología que busca impulsar y habilitar el comercio digital en Latinoamérica.

Machine Learning Engineer

Machine Learning EngineerMachine Learning EngineerFull Time Remote SeniorTeam 201-500H1B No SponsorCompany Site LinkedIn

Location

Colombia

Posted

99 days ago

Salary

Seniority

Senior

4 yrs expEnglishAirflow AWS gRPC Kubernetes NumPy Pandas PostgreSQL Python PyTorch scikit-learn Apache Spark SQL TensorFlow

Job Description

• Scale Addi’s competitive advantage by building a world-class ML Ops foundation that accelerates the transition from model prototype to production, while ensuring our AI systems from credit scoring to generative agents are resilient, cost-efficient, and seamlessly integrated into our core financial product. • Ensure ML/AI systems can be served reliably in production, maintaining strong operational excellence for availability, latency, and incident response, in partnership with the Data Scientist role for model/agent logic and iteration. • Build and maintain the serving and integration layer for ML/AI solutions (APIs, connectors, asynchronous execution patterns), enabling seamless integration with internal systems and Ops tooling. • Establish clear mechanisms for monitoring and reliability of ML/AI systems in production (dashboards, alerts, core KPIs, regression detection, and data/feature quality checks). • Enable repeatable delivery for ML/AI services through strong engineering practices (CI/CD, testing, release strategies, rollback, and operational runbooks). • Make contributions to our Architecture Decision Records repository by evaluating and proposing platform upgrades for ML/AI systems (e.g., feature serving patterns, workflow orchestration, scalable storage) to improve reliability, scalability, and reuse.

Job Requirements

Proven experience in architecting and serving production-grade ML systems
4–7 years of experience in software engineering, with at least 3 years focused specifically on ML Ops or Data Engineering in a production environment
Demonstrates the ability to design high-availability serving layers using APIs (FastAPI, gRPC) and asynchronous execution patterns to handle high-concurrency fintech workloads
Possesses a deep understanding of the "handshake" between data science and engineering, ensuring models are packaged, versioned, and integrated into internal systems without friction
Expert-level knowledge of AWS (or similar), Kubernetes, Airflow/Prefect, and Databricks/Spark
Track record of implementing request batching and model quantization to balance high-performance throughput with infrastructure costs
Possesses strong technical fluency in the Python and Data ecosystem
Exhibits advanced Python engineering skills, moving beyond simple scripting to build modular, testable, and maintainable codebases
Expert-level knowledge of core ML libraries (NumPy, Pandas, scikit-learn) and at least one deep learning framework (PyTorch or TensorFlow)
Solid expertise in data-intensive stacks like Spark or Databricks and the ability to write complex, optimized SQL for feature extraction and data validation
Experienced in establishing mission-critical observability and reliability
Has a demonstrated ability to build comprehensive monitoring suites (logs, metrics, traces) that detect not just system downtime, but ML-specific failures like data drift or feature quality regressions
Track record of leading incident response and post-mortems, with a focus on reducing Mean Time to Recovery (MTTR) for model-related production issues
Proven ability to implement automated alerting and regression detection that prevents degraded models from impacting the end-customer experience
Demonstrates a mastery of ML orchestration and engineering best practices
Proven experience in building repeatable CI/CD pipelines for ML (MLOps), including automated testing, canary releases, and seamless rollback strategies
Has solid expertise in workflow orchestration tools (e.g., Airflow, Prefect) and storage patterns (Postgres, Vector DBs) required for complex ML lifecycles
Experienced in contributing to Architecture Decision Records (ADRs) to standardize feature serving patterns and scalable storage across the engineering org
Track record of building and scaling AI Agentic systems
Possesses practical experience with the components of modern AI agents, including RAG (Retrieval-Augmented Generation), orchestration frameworks (LangChain/LlamaIndex), and guardrail implementation
Demonstrates an understanding of the unique operational challenges of LLMs, such as token cost management, prompt versioning, and latency optimization
Experienced in evaluating and integrating graph-based architectures or graph databases when required for complex data relationship mapping
Exhibits exceptional cross-functional communication and ownership
Proven ability to translate highly technical infrastructure bottlenecks into clear business risks or opportunities for non-technical stakeholders
Demonstrates an "Ownership Mentality" by taking end-to-end responsibility for the reliability of the ML platform, from the initial architectural proposal to 2:00 AM incident resolution
Varies communication style effectively to mentor Data Scientists on engineering best practices while collaborating with Product Managers on roadmap feasibility.

Benefits

Work on a problem that truly matters
Be part of something big from the ground up
Unparalleled growth opportunity
Join a world-class team
Competitive compensation & meaningful ownership

Related Categories

Machine Learning Engineer AI Engineer AI Research Scientist LLM Engineer Computer Vision Engineer NLP Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More Machine Learning Engineer Jobs

AI/ML Engineer

DOHE

Championing EdTech Startups to improve society

Machine Learning Engineer99 days ago

Full Time RemoteTeam 11-50Since 2023H1B No Sponsor

Company Site LinkedIn

• Build the end-to-end pipeline: recording → transcription → summarization → coaching report • Auto-generate Pre-Diagnosis Reports from survey responses and startup data • Design expert matching algorithms using startup profiles and diagnosis results • Architect and implement Multi-Agent systems • Fine-tune LLMs and train domain-specific models • Design prompts, build evaluation frameworks, manage model versions, and run A/B tests

Python

View details: AI/ML Engineer

South Korea

Apply

Job Closed

Principal ML Engineer

Agero, Inc.

Agero is a leading provider of driver assistance, accident management, consumer affairs support and connected vehicle services for stakeholders across the automotive industry, including the world’s largest automakers, auto retailers, insurers, rideshare providers and other brands. As the driving force behind mobility support throughout all points in the vehicle ownership journey - from purchase to maintenance and breakdown to resell or trade in - we deliver a suite of powerful, innovative services and technology solutions that enable our 100+ clients to provide their drivers with enhanced communication, safety, and convenience for whatever their vehicle need.

Machine Learning Engineer99 days ago

Other RemoteTeam 1,001-5,000Since 1994H1B No Sponsor

Company Site LinkedIn

• Architect & ship: Design end-to-end Python services (batch + streaming) that ingest model outputs, run constrained optimization, and surface real-time dispatch decisions. • Model & simulate: Build/extend ML models (gradient-boosting, deep learning, OR-Tools) and run time-horizon simulations to quantify cost vs. service-level trade-offs. • Operationalize: Automate training, validation, A/B rollout, and monitoring (SageMaker / Airflow). • Lead & collaborate: Partner with Product, Ops, and Data Engineering; mentor a small squad of ML engineers; present findings to execs. • Continuously improve: Instrument NPS / cost telemetry, identify failure modes, and iterate.

Airflow AWS Azure GCP Python PyTorch SQL

View details: Principal ML Engineer

Arizona + 12 more

$150K - $200K / year

Apply

AI/ML Engineer

NMDP

We save lives through cell therapy.

Machine Learning Engineer99 days ago

Other RemoteTeam 1,001-5,000Since 1987H1B No Sponsor

Company Site LinkedIn

• Work across diverse GenAI platforms like AWS, Salesforce, Oracle, Snowflake, MS Copilot, and other 3rd party GenAI platforms and libraries. • Automate workflows involving extraction of complex, multimodal unstructured content from variety of sources in to highly accurate and reliable structured content using platforms like AWS Textract and Bedrock • Design and build MCP hosts, clients and servers • Establish and use frameworks for automated LLM testing • Create regression test suites to detect drift or prompt breakage • Integrate with internal and external web services using secure authentication and authorization mechanisms • Adopt and ensure safe practices to protect against prompt injections, jailbreaks, and conform to enterprise security guidelines • Design, develop, and deploy production-grade traditional ML models (e.g., regression, classification, clustering, recommender systems) for a variety of business use cases. • Design, maintain, and optimize end-to-end AI/ML pipelines including data ingestion, training, evaluation, deployment, and monitoring on cloud infrastructure (e.g., AWS or equivalent) • Ensure AI/ML solutions are scalable, reliable, secure, and cost-effective within cloud environments • Create reusable components, frameworks, and best practices to accelerate AI development • Partner with data scientists, architects, product managers, business stakeholders and technical teams across organization to align AI solutions with organizational goals. • Provide hands-on technical support and mentorship to technical teams across the enterprise.

Airflow AWS Azure Docker Oracle Database Python PyTorch scikit-learn SQL TensorFlow

View details: AI/ML Engineer

United States

Apply

Job Closed

Head of Machine Learning

Pragmatike

Remote first tech projects

Machine Learning Engineer99 days ago

Other RemoteTeam 11-50Since 2022H1B No Sponsor

Company Site LinkedIn

• Define the companys ML strategy: where ML should be applied across products, what infrastructure is required, and how to approach build vs. buy decisions. • Design and build production ML systems end-to-end — including data pipelines, model training workflows, evaluation frameworks, and inference serving. • Establish rigorous evaluation methodology to measure model quality, detect regressions, and support data-driven iteration. • Own the data strategy: determine what data is needed, how it should be labeled, how feedback loops are structured, and how models continuously improve. • Partner closely with product and backend engineers to integrate ML into customer-facing systems. • Write production-quality code within the existing codebase and contribute to architectural decisions. • Over time, help recruit, mentor, and lead the ML team as the function expands.

Java Python PyTorch Apache Spark TensorFlow TypeScript

View details: Head of Machine Learning

New York + 1 more

$300K - $400K / year

Apply

Job Closed

Machine Learning Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More Machine Learning Engineer Jobs

AI/ML Engineer

Principal ML Engineer

AI/ML Engineer

Head of Machine Learning