Expert consulting elevated by human connection
Senior ML Engineer
Location
Brazil
Posted
5 days ago
Salary
0
Seniority
Senior
Job Description
Senior ML Engineer
Clutch
• Take ownership of the ML API that serves NBA recommendations, partnering with the data engineer who's been building it, and harden it for low-latency production traffic. • Ship your first agent tool contract end-to-end: schema design, handler implementation, structured-error contract, unit tests, deployed via HAL's runtime. • Set up the eval foundation for our agents: golden transcripts, rubric-based judges, regression suites that run on every prompt or model change. • Build a working relationship with HAL and become the data team's go-to on agent infrastructure decisions. • Be the primary owner (with data engineer support) of the ML API and the agent tool layer that wraps NBA and our ML models. • Have shipped at least one production-grade agent (customer-facing or partner-facing) with prompt versioning, evals, observability, and multi-tenant gating in place. • Define the data team's playbook for shipping a new ML model as an LLM-callable tool, end-to-end. • Mentor the data engineers on ML/AI patterns so they can confidently support and extend the systems you own. • Operate as the technical lead within the data team for NBA production AI at Clutch — the person other teams come to when they want to understand how NBA ships ML and agents responsibly. • Have measurably improved agent cost and latency (target: 30%+ reduction on P95 latency or per-conversation cost on at least one agent). • Be shaping the data team's roadmap for the next generation of ML and AI products, in partnership with the PM and data scientist. • Help us decide what to hire next as the team scales.
Job Requirements
- 7+ years of engineering experience, with a proven track record of building and shipping production ML systems — you've taken models from prototype to production and own what happens after deploy.
- Strong Python — most of the work (ML training, evaluation, the ML API, data pipelines) is in Python, and you're comfortable in production codebases, not just notebooks. Some TypeScript is involved for tool contracts and integration with our agent runtime — you don't need to be an expert, comfort with a second language is enough.
- Tool-design discipline for LLM consumption. Can take an ML model or data source and shape it into an LLM-callable tool with narrow input/output schemas, identity-required and scope-gated dispatch, and structured-error contracts (RATE_LIMITED, UPSTREAM_ERROR, NOT_FOUND) that the agent runtime converts to graceful tool-results instead of crashing.
- Eval discipline for non-deterministic systems. You treat evals as the unit-test equivalent for agents: golden transcripts, rubric-based judges, regression suites that run on every prompt or model change. You understand the difference between offline metrics and online evals, and use both.
- Prompt-shape literacy. You read a system prompt the way another engineer reads code: audience, register, compliance guardrails, template-var allow-list, allowed-tools section. You debug "why did the agent do that?" by reading the prompt and tool descriptions before reaching for model swaps. You've shipped at least one agent where the prompt was version-controlled and reviewed as code.
- Tool implementation rigor. You build handlers behind tool contracts with identity fields read from request context (never from LLM-supplied args), output re-parsed through the tool's schema before return, structured-error throws on every failure path, and unit tests covering both happy path and each named error. You have a story about a tool you shipped, a bug production traffic surfaced, and how you hardened it.
- Experience building and maintaining low-latency production APIs (FastAPI, BentoML, or equivalent), with opinions on serving, batching, and caching.
- Comfortable in AWS (Lambda especially), Docker, and GitHub-based workflows.
- You use AI tooling actively in your engineering workflow — not as a novelty, but as a default. You'll be expected to demonstrate this during the technical evaluation.
Benefits
- Remote Flexibility: Enjoy the freedom of remote work from anywhere, balancing life and career seamlessly.
- Unforgettable Off-Sites: Twice a year, bond with colleagues in exciting destinations, fostering teamwork and fresh ideas.
- Paid Time Off and National Holidays: Enjoy 20 PTO days yearly and the National Holidays for relaxation and rejuvenation.
- Stock Options: Joining us means having a stake in our success, so you'll receive stock options as part of your compensation package.
- Home Office Setup: Create your ideal workspace with a dedicated budget for home office essentials.
- Work Trip Budget: Grow personally and professionally with a budget for work-related trips and co-working.
Related Guides
Related Job Pages
More Machine Learning Engineer Jobs
• ML Model Development: Design, develop, and implement scalable machine learning models to solve complex business problems • MLOps & Production: Build and maintain robust ML pipelines, ensuring deployment, monitoring, and maintenance of models in production • Feature Engineering: Create and optimize features using dbt and PySpark, working with large volumes of data • Workflow Orchestration: Develop and manage data and ML pipelines using Apache Airflow • Data Processing: Perform large-scale distributed data processing with PySpark • Collaboration: Work closely with data scientists, data engineers, and product teams to deliver end-to-end solutions • Optimization: Monitor model performance, identify degradation, and implement continuous improvements • Documentation: Maintain clear technical documentation of architecture, models, and processes
Role Description This role at Arbitration Forums is as unique as it is rewarding because of the AF IPAAL Values (Integrity, Passion, Accountability, Achievement, Leadership) and TRI Model (Trust, Respect, Inclusion). The MLOps Engineer is responsible for closing the gap between machine learning models development and their operational deployment. This role ensures that machine learning models are efficiently running in the production environment and are continuously monitored for performance. The MLOps Engineer contributes to Arbitration Forums AI-powered portfolio of products and services by enhancing the scalability and reliability of machine learning applications. This role works closely with data scientists, AI engineers, software development, and DevOps teams to automate and streamline the model lifecycle, from development to deployment and monitoring. Qualifications - Bachelor’s or Master’s degree in Computer Science, Information Systems, Data Science, or a related field. - Minimum of 6 years of experience in data science, machine learning, data management, data governance, or a related role. - Minimum of 6 years as a MLOps Engineer or in a similar role. - Technical Skills: - Working knowledge of cloud services (i.e., MS Azure, AWS, Google Cloud). - Experience with AI tools, such as MS Azure ML, Snowflake, Databricks, CortexAI, Dataiku. - Deep understanding of data science principles, algorithms, and tools. - Strong knowledge of data governance, data security, and compliance practices. - Proficiency in programming languages such as Python, R, or Java. - Experience with containerization tools like Docker and orchestration tools like Kubernetes. - Proficiency in ML frameworks such as TensorFlow, PyTorch, or Scikit-learn. - Working knowledge of CI/CD pipelines, DevOps practices, and automation frameworks. - Deep understanding of data engineering concepts and tools. - Familiarity with data visualization and reporting tools (e.g., Webfocus, Power BI). - Soft Skills: - Excellent analytical and problem-solving abilities. - Strong communication and interpersonal skills to collaborate with cross-functional teams. - Ability to lead projects and mentor junior staff. - Auto Insurance claims industry experience preferred. Requirements - Design, implement, and maintain machine learning pipelines and workflows for the continuous deployment and integration of machine learning models. - Optimize the pipelines for scalability, efficiency, and cost-effectiveness. - Collaborate with data scientists and AI engineers to understand model requirements and optimize deployment processes. - Automate the training, testing, and deployment processes for machine learning models. - Establish and enforce best practices for version control, documentation, and code quality in ML projects. - Monitor model performance and optimize algorithms for efficiency. - Conduct regular maintenance and updates to deployed models. - Collaborate with cross-functional teams to integrate machine learning solutions into business processes and applications. - Work with go to market, product management, and IT functions as well as stakeholders in AF and its members to identify the optimal methods for model rollout and adoption. - Maintain and optimize the cloud-based machine learning infrastructure and make recommendations for improvements. - Manage and allocate resources effectively, including computer power and storage for model inference. - Develop practices and utilize tools for data validation, model testing, and versioning. - Troubleshoot and resolve machine learning operational issues. - Document processes, workflows, and best practices for ML Operations. - Provide technical leadership and mentorship to junior data team members. Benefits - Support data observability efforts to ensure the data continuum and enforce governance standards. - Other duties as assigned by manager or project needs. Americans with Disability Specifications - PHYSICAL DEMANDS: The physical demands described here are representative of those that must be met by an employee to successfully perform the essential functions of this job. While performing the duties of this job, the employee is occasionally required to stand; walk; sit; use hands to finger, handle, or feel objects, tools, or controls; reach with hands and arms; climb stairs; balance; stoop, kneel, crouch or crawl; talk or hear; taste or smell. The employee must occasionally lift and/or move up to 25 pounds. Specific vision abilities required by the job include close vision, distance vision, color vision, peripheral vision, depth perception, and the ability to adjust focus. - WORK ENVIRONMENT: This is a fully remote position requiring reliable high-speed internet access and a dedicated workspace. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.
• Lead the strategic development and growth of Circuit Check's most significant customer relationships in the AI/ML data center infrastructure market • Architect and execute multi-year account strategies that drive large-scale program wins, deepen executive-level partnerships, and position the company as the preferred technology partner for hyperscale and enterprise AI customers • Serve as the primary executive-level interface with strategic accounts, building trusted advisor relationships with VP and C-suite decision-makers at hyperscale cloud providers, AI chipmakers, and data center operators • Coordinate engineering, product management and operations around customer requirements • Partner with Product Line Managers to translate customer insights into product roadmap inputs aligned with AI/ML market direction • Identify patterns in AI/ML technology adoption, supply chain shifts, and customer investment priorities that create strategic openings for Circuit Check
• Projetar, desenvolver e implementar soluções de Inteligência Artificial e Machine Learning em ambientes corporativos. • Construir pipelines de dados para treinamento, validação, monitoramento e re-treinamento de modelos. • Desenvolver e operacionalizar modelos preditivos, classificadores, sistemas de recomendação e soluções de IA Generativa. • Trabalhar com LLMs (Large Language Models), agentes inteligentes, RAG (Retrieval-Augmented Generation) e arquiteturas multiagentes. • Desenvolver APIs e serviços para disponibilização de modelos em produção. • Implementar práticas de MLOps e LLMOps para automação do ciclo de vida dos modelos. • Avaliar desempenho, acurácia, viés e governança dos modelos de IA. • Atuar na integração de soluções de IA com aplicações corporativas, ERPs, CRMs e plataformas digitais. • Garantir segurança, observabilidade, escalabilidade e conformidade das soluções implementadas. • Apoiar áreas de negócio na identificação de oportunidades de aplicação de IA. • Produzir documentação técnica e compartilhar conhecimento com equipes internas.



