Fieldguide

Powering the future of trust with modern software for assurance & advisory firms.

AI Engineer, Quality – Evals

AI EngineerMachine Learning EngineerFull Time Remote JuniorTeam 11-50H1B SponsorCompany Site LinkedIn

Location

California

Posted

50 days ago

Salary

$170K - $220K / year

Seniority

Junior

Bachelor Degree1 yr expEnglishPostgreSQL Python React TypeScript

Job Description

• Design and build a unified evaluation platform that serves as the single source of truth for all of our agentic systems and audit workflows • Build observability systems that surface agent behavior, trace execution, and failure modes in production, and feedback loops that turn production failures into first-class evaluation cases • Own the evaluation infrastructure stack including integration with LangSmith and LangGraph. • Translate customer problems into concrete agent behaviors and workflows • Integrate and orchestrate LLMs, tools, retrieval systems, and logic into cohesive, reliable agent experiences • Build automated pipelines that evaluate new models against all critical workflows within hours of release • Design evaluation harnesses for our most complex Agentic systems and workflows • Implement comparison frameworks that measure effectiveness, consistency, latency, and cost across model versions • Design guardrails and monitoring systems that catch quality regressions before they reach customers • Use AI as core leverage in how you design, build, test, and iterate • Prototype quickly to resolve uncertainty, then harden systems for enterprise-grade reliability • Build evaluations, feedback mechanisms, and guardrails so agents improve over time • Work with SMEs and ML Engineers to create evaluation datasets by curating production traces. • Design prompts, retrieval pipelines, and agent orchestration systems that perform reliably at scale • Define and document evaluation standards, best practices, and processes for the engineering organization • Advocate for evaluation-driven development and make it easy for the team to write and run evals • Partner with product and ML engineers to integrate evaluation requirements into agent development from day one • Take full ownership of large product areas rather than executing on narrow tasks

Job Requirements

Multiple years of experience shipping production software in complex, real-world systems
Experience with TypeScript, React, Python, and Postgres
Built and deployed LLM-powered features serving production traffic
Implemented evaluation frameworks for model outputs and agent behaviors
Designed observability or tracing infrastructure for AI/ML systems
Worked with vector databases, embedding models, and RAG architectures
Experience with evaluation platforms (LangSmith, Langfuse, or similar)
Comfort operating in ambiguity and taking responsibility for outcomes
Deep empathy for professional-grade, mission-critical software (experience with audit and accounting workflows are not required)

Benefits

Competitive compensation packages with meaningful ownership
Flexible PTO
401k
Wellness benefits, including a bundle of free therapy sessions
Technology & Work from Home reimbursement
Flexible work schedules

Related Categories

AI Engineer Machine Learning Engineer AI Research Scientist LLM Engineer Computer Vision Engineer NLP Engineer

Related Job Pages

AI Engineer Jobs in California Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More AI Engineer Jobs

AI Engineer

Million Dollar Sellers

A proven network of entrepreneurs with specific eCommerce knowledge, in an on-demand community.

AI Engineer50 days ago

Full Time RemoteTeam 1-10H1B No Sponsor

Company Site LinkedIn

We are hiring an AI Engineer to build the AI and agent systems that run MDS. This is a pure individual contributor role focused on one thing: using Claude and modern agent tooling to replace manual work that currently depends on operator judgment. You are joining an established tech team. Our Tech Lead owns our app and the broader automation architecture. Our Automations Specialist keeps the existing Make, Zapier, and GHL workflows running. Your role is to sit alongside them as the AI specialist: identifying where a Claude-powered agent beats a traditional automation, designing and shipping those builds, and upgrading existing workflows with AI when it raises the ceiling. A representative project: take our event registration review workflow (Luma inbound, Airtable lookups, LinkedIn and web verification, outcome emails, currently about 20 minutes of manual work per registrant) and ship a Claude-powered agent that handles the enrichment and qualification end to end, with a reviewer surface for one-click human approval, a custom MCP connector to Luma, full audit logging in Airtable, a test harness, and a runbook. You own it from whiteboard to production to month-six maintenance.

View details: AI Engineer

Mexico

$5K - $7.5K / month

Apply

AI Engineer

Darwoft

You have just found the top firm for your next successful software development project! 🧠💻📱.

AI Engineer50 days ago

Full Time RemoteTeam 51-200Since 2010H1B No Sponsor

Company Site LinkedIn

Role Description This role requires full-time dedication, with clear priority given to Darwoft projects during the established working hours. It is not compatible with other full-time professional engagements. Any additional professional activities must be disclosed in advance and must not interfere with the responsibilities or working hours of this role. We’re partnering with a fast-growing fintech project focused on building an AI-powered conversational platform used by thousands of users in the United States. The product goes far beyond traditional chatbots, leveraging Large Language Models (LLMs) and autonomous AI agents to handle complex, multi-step workflows related to financial operations. We’re looking for a Senior AI Engineer to join a core AI initiative, working hands-on on the design, development, and scaling of agentic systems in production. In this role, you’ll help evolve conversational experiences into advanced multi-agent architectures capable of reasoning, planning, and executing actions autonomously. Your work will have direct impact on real users and real business outcomes. What You’ll Be Doing - Design, build, test, and deploy autonomous AI agents using Python and modern agentic frameworks. - Develop LLM-based systems that go beyond simple Q&A, enabling reasoning, planning, and execution across multi-step workflows. - Implement Retrieval-Augmented Generation (RAG) pipelines using vector databases to ensure accurate, grounded responses. - Integrate AI agents with internal services, APIs, and production systems in collaboration with engineering and product teams. - Build evaluation, monitoring, and optimization pipelines for LLM-powered systems, focusing on accuracy, latency, reliability, and cost. - Apply advanced prompt engineering techniques and tool/function calling to enhance agent capabilities. - Stay current with the latest advancements in Generative AI, LLMs, and agentic architectures, applying best practices to production systems. Qualifications - 5+ years of experience in professional software development. - 2+ years of hands-on experience building and deploying AI / Generative AI solutions in production. - Strong proficiency in Python. - Solid experience working with LLMs and agentic frameworks (OpenAI SDK, LangChain, LlamaIndex, CrewAI, or similar). - Proven experience with agentic systems, including memory/state management and multi-agent workflows. - Experience working with vector databases and RAG-based architectures. - Strong understanding of software engineering fundamentals: Git, testing, CI/CD pipelines. - Ability to translate business requirements into scalable, maintainable technical solutions. - Strong communication skills in English within a fully remote environment. Nice to Have - Experience in fintech, payments, fraud detection, or financial platforms. - Experience evaluating and optimizing LLM systems in production (A/B testing, observability). - Contributions to open-source projects or public technical repositories. - Experience working in fast-paced, high-growth product environments. Benefits - Contractor agreement with payment in USD. - 100% remote work. - Argentina's public holidays. - English classes. - Referral program. - Access to learning platforms.

View details: AI Engineer

Latin America (LATAM)

Apply

Job Closed

AI Engineer

JAMS Software

JAMS orchestrates IT and data processes with control, visibility, and reliability.

AI Engineer50 days ago

Full Time RemoteTeam 51-200Since 1987H1B No Sponsor

Company Site LinkedIn

• Use AI-powered development tools (e.g., copilots, code assistants, automated testing tools) to design, write, and refactor code • Rapidly prototype and ship features with the help of AI-assisted workflows • Translate product requirements into working software with high velocity • Validate, debug, and improve AI-generated code to production standards • Build internal tools and automations that leverage AI to improve team productivity • Continuously evaluate and adopt new AI tools to enhance development workflows • Collaborate with product and design teams to deliver high-quality features quickly • Maintain strong code quality, testing, and documentation practices—even when moving fast

React SQL .NET

View details: AI Engineer

United States

Apply

Job Closed

Senior Applied AI Engineer

MAIA

Empowering the Mittelstand through AI-powered SaaS Solutions.

AI Engineer50 days ago

Full Time RemoteTeam 1-10Since 2021H1B Sponsor

Company Site LinkedIn

• Build and evolve MAIA's core product capabilities • Focus on backend and AI systems, including RAG pipelines and LLM integrations • Collaborate closely with Product, DevOps, and customer-facing teams • Design, implement, and ship features from discovery through production rollout

Cloud Grafana Java PostgreSQL Rust SQL TypeScript Go

View details: Senior Applied AI Engineer

Germany

€75K - €85K / year

Apply

Job Closed

AI Engineer, Quality – Evals

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More AI Engineer Jobs

AI Engineer

AI Engineer

AI Engineer

Senior Applied AI Engineer