Job Closed
This listing is no longer active.
Somos una empresa de tecnología que busca impulsar y habilitar el comercio digital en Latinoamérica.
Machine Learning Engineer
Location
Colombia
Posted
99 days ago
Salary
0
Seniority
Senior
Job Description
Machine Learning Engineer
Addi
• Scale Addi’s competitive advantage by building a world-class ML Ops foundation that accelerates the transition from model prototype to production, while ensuring our AI systems from credit scoring to generative agents are resilient, cost-efficient, and seamlessly integrated into our core financial product. • Ensure ML/AI systems can be served reliably in production, maintaining strong operational excellence for availability, latency, and incident response, in partnership with the Data Scientist role for model/agent logic and iteration. • Build and maintain the serving and integration layer for ML/AI solutions (APIs, connectors, asynchronous execution patterns), enabling seamless integration with internal systems and Ops tooling. • Establish clear mechanisms for monitoring and reliability of ML/AI systems in production (dashboards, alerts, core KPIs, regression detection, and data/feature quality checks). • Enable repeatable delivery for ML/AI services through strong engineering practices (CI/CD, testing, release strategies, rollback, and operational runbooks). • Make contributions to our Architecture Decision Records repository by evaluating and proposing platform upgrades for ML/AI systems (e.g., feature serving patterns, workflow orchestration, scalable storage) to improve reliability, scalability, and reuse.
Job Requirements
- Proven experience in architecting and serving production-grade ML systems
- 4–7 years of experience in software engineering, with at least 3 years focused specifically on ML Ops or Data Engineering in a production environment
- Demonstrates the ability to design high-availability serving layers using APIs (FastAPI, gRPC) and asynchronous execution patterns to handle high-concurrency fintech workloads
- Possesses a deep understanding of the "handshake" between data science and engineering, ensuring models are packaged, versioned, and integrated into internal systems without friction
- Expert-level knowledge of AWS (or similar), Kubernetes, Airflow/Prefect, and Databricks/Spark
- Track record of implementing request batching and model quantization to balance high-performance throughput with infrastructure costs
- Possesses strong technical fluency in the Python and Data ecosystem
- Exhibits advanced Python engineering skills, moving beyond simple scripting to build modular, testable, and maintainable codebases
- Expert-level knowledge of core ML libraries (NumPy, Pandas, scikit-learn) and at least one deep learning framework (PyTorch or TensorFlow)
- Solid expertise in data-intensive stacks like Spark or Databricks and the ability to write complex, optimized SQL for feature extraction and data validation
- Experienced in establishing mission-critical observability and reliability
- Has a demonstrated ability to build comprehensive monitoring suites (logs, metrics, traces) that detect not just system downtime, but ML-specific failures like data drift or feature quality regressions
- Track record of leading incident response and post-mortems, with a focus on reducing Mean Time to Recovery (MTTR) for model-related production issues
- Proven ability to implement automated alerting and regression detection that prevents degraded models from impacting the end-customer experience
- Demonstrates a mastery of ML orchestration and engineering best practices
- Proven experience in building repeatable CI/CD pipelines for ML (MLOps), including automated testing, canary releases, and seamless rollback strategies
- Has solid expertise in workflow orchestration tools (e.g., Airflow, Prefect) and storage patterns (Postgres, Vector DBs) required for complex ML lifecycles
- Experienced in contributing to Architecture Decision Records (ADRs) to standardize feature serving patterns and scalable storage across the engineering org
- Track record of building and scaling AI Agentic systems
- Possesses practical experience with the components of modern AI agents, including RAG (Retrieval-Augmented Generation), orchestration frameworks (LangChain/LlamaIndex), and guardrail implementation
- Demonstrates an understanding of the unique operational challenges of LLMs, such as token cost management, prompt versioning, and latency optimization
- Experienced in evaluating and integrating graph-based architectures or graph databases when required for complex data relationship mapping
- Exhibits exceptional cross-functional communication and ownership
- Proven ability to translate highly technical infrastructure bottlenecks into clear business risks or opportunities for non-technical stakeholders
- Demonstrates an "Ownership Mentality" by taking end-to-end responsibility for the reliability of the ML platform, from the initial architectural proposal to 2:00 AM incident resolution
- Varies communication style effectively to mentor Data Scientists on engineering best practices while collaborating with Product Managers on roadmap feasibility.
Benefits
- Work on a problem that truly matters
- Be part of something big from the ground up
- Unparalleled growth opportunity
- Join a world-class team
- Competitive compensation & meaningful ownership
Related Guides
Related Job Pages
More Machine Learning Engineer Jobs
• Build the end-to-end pipeline: recording → transcription → summarization → coaching report • Auto-generate Pre-Diagnosis Reports from survey responses and startup data • Design expert matching algorithms using startup profiles and diagnosis results • Architect and implement Multi-Agent systems • Fine-tune LLMs and train domain-specific models • Design prompts, build evaluation frameworks, manage model versions, and run A/B tests
Principal ML Engineer
Agero, Inc.Agero is a leading provider of driver assistance, accident management, consumer affairs support and connected vehicle services for stakeholders across the automotive industry, including the world’s largest automakers, auto retailers, insurers, rideshare providers and other brands. As the driving force behind mobility support throughout all points in the vehicle ownership journey - from purchase to maintenance and breakdown to resell or trade in - we deliver a suite of powerful, innovative services and technology solutions that enable our 100+ clients to provide their drivers with enhanced communication, safety, and convenience for whatever their vehicle need.
• Architect & ship: Design end-to-end Python services (batch + streaming) that ingest model outputs, run constrained optimization, and surface real-time dispatch decisions. • Model & simulate: Build/extend ML models (gradient-boosting, deep learning, OR-Tools) and run time-horizon simulations to quantify cost vs. service-level trade-offs. • Operationalize: Automate training, validation, A/B rollout, and monitoring (SageMaker / Airflow). • Lead & collaborate: Partner with Product, Ops, and Data Engineering; mentor a small squad of ML engineers; present findings to execs. • Continuously improve: Instrument NPS / cost telemetry, identify failure modes, and iterate.
• Work across diverse GenAI platforms like AWS, Salesforce, Oracle, Snowflake, MS Copilot, and other 3rd party GenAI platforms and libraries. • Automate workflows involving extraction of complex, multimodal unstructured content from variety of sources in to highly accurate and reliable structured content using platforms like AWS Textract and Bedrock • Design and build MCP hosts, clients and servers • Establish and use frameworks for automated LLM testing • Create regression test suites to detect drift or prompt breakage • Integrate with internal and external web services using secure authentication and authorization mechanisms • Adopt and ensure safe practices to protect against prompt injections, jailbreaks, and conform to enterprise security guidelines • Design, develop, and deploy production-grade traditional ML models (e.g., regression, classification, clustering, recommender systems) for a variety of business use cases. • Design, maintain, and optimize end-to-end AI/ML pipelines including data ingestion, training, evaluation, deployment, and monitoring on cloud infrastructure (e.g., AWS or equivalent) • Ensure AI/ML solutions are scalable, reliable, secure, and cost-effective within cloud environments • Create reusable components, frameworks, and best practices to accelerate AI development • Partner with data scientists, architects, product managers, business stakeholders and technical teams across organization to align AI solutions with organizational goals. • Provide hands-on technical support and mentorship to technical teams across the enterprise.
• Define the companys ML strategy: where ML should be applied across products, what infrastructure is required, and how to approach build vs. buy decisions. • Design and build production ML systems end-to-end — including data pipelines, model training workflows, evaluation frameworks, and inference serving. • Establish rigorous evaluation methodology to measure model quality, detect regressions, and support data-driven iteration. • Own the data strategy: determine what data is needed, how it should be labeled, how feedback loops are structured, and how models continuously improve. • Partner closely with product and backend engineers to integrate ML into customer-facing systems. • Write production-quality code within the existing codebase and contribute to architectural decisions. • Over time, help recruit, mentor, and lead the ML team as the function expands.




