Job Closed

This listing is no longer active.

Fieldguide logo
Fieldguide

Powering the future of trust with modern software for assurance & advisory firms.

AI Engineer, Quality

AI EngineerMachine Learning EngineerOtherRemoteSeniorTeam 11-50H1B SponsorCompany SiteLinkedIn

Location

California

Posted

110 days ago

Salary

$170K - $220K / year

Seniority

Senior

Bachelor DegreeExperience acceptedEnglishPostgreSQLPythonReactTypeScript

Job Description

AI Engineer, Quality

Fieldguide

• Design and build a unified evaluation platform that serves as the single source of truth for all of our agentic systems and audit workflows • Build observability systems that surface agent behavior, trace execution, and failure modes in production, and feedback loops that turn production failures into first-class evaluation cases • Own the evaluation infrastructure stack including integration with LangSmith and LangGraph. • Translate customer problems into concrete agent behaviors and workflows • Integrate and orchestrate LLMs, tools, retrieval systems, and logic into cohesive, reliable agent experiences • Build automated pipelines that evaluate new models against all critical workflows within hours of release • Design evaluation harnesses for our most complex Agentic systems and workflows • Implement comparison frameworks that measure effectiveness, consistency, latency, and cost across model versions • Design guardrails and monitoring systems that catch quality regressions before they reach customers • Use AI as core leverage in how you design, build, test, and iterate • Prototype quickly to resolve uncertainty, then harden systems for enterprise-grade reliability • Build evaluations, feedback mechanisms, and guardrails so agents improve over time • Work with SMEs and ML Engineers to create evaluation datasets by curating production traces. • Design prompts, retrieval pipelines, and agent orchestration systems that perform reliably at scale • Define and document evaluation standards, best practices, and processes for the engineering organization • Advocate for evaluation-driven development and make it easy for the team to write and run evals • Partner with product and ML engineers to integrate evaluation requirements into agent development from day one • Take full ownership of large product areas rather than executing on narrow tasks

Job Requirements

  • Multiple years of experience shipping production software in complex, real-world systems
  • Experience with TypeScript, React, Python, and Postgres
  • Built and deployed LLM-powered features serving production traffic
  • Implemented evaluation frameworks for model outputs and agent behaviors
  • Designed observability or tracing infrastructure for AI/ML systems
  • Worked with vector databases, embedding models, and RAG architectures
  • Experience with evaluation platforms (LangSmith, Langfuse, or similar)
  • Comfort operating in ambiguity and taking responsibility for outcomes
  • Deep empathy for professional-grade, mission-critical software (experience with audit and accounting workflows are not required)

Benefits

  • Health insurance
  • Professional development opportunities
  • Flexible work arrangements

Related Job Pages

More AI Engineer Jobs

PostHog logo

AI Product Engineer

PostHog

Product analytics, session replay, feature flags, A/B testing, data warehouse, CDP, surveys. PostHog does that.

AI Engineer110 days ago
OtherRemoteTeam 11-50Since 2020H1B No Sponsor

Help us to increase the number of successful products in the world! 🌍 Location: We are full-remote and globally distributed! Our current team is distributed between GMT-8 and GMT+2, so we currently only hire in these timezones. 🎤 Interview process: 1) Call with one of our Talent Partners, 2) 60min technical interview, and 3) 15min call with a co-founder, 4) PostHog SuperDay (paid day of work). Read more about our interview process. 🖥️ Team: New team, Tasks - with Peter 💰 Compensation: Please check our compensation calculator . 🦔 Read more about how we hire and how we think about diversity & inclusion . About PostHog We're shipping every product that companies need to run their business from their first day, to the day they IPO, and beyond. The operating system for folks who build software. We started with open-source product analytics, launched out of Y Combinator's W20 cohort . We've since shipped more than a dozen products , including: A built-in data warehouse , so users can query product and customer data together using custom SQL insights. A customer data platform , so they can send their data wherever they need with ease. PostHog AI , an AI-powered analyst that answers product questions, helps users find useful session recordings, and writes custom SQL queries. Next on the roadmap are CRM, Workflow, revenue analytics, and support products. When we say every product that companies need to run their business, we really mean it! We are: Product-led . More than 100,000 companies have installed PostHog, mostly driven by word-of-mouth. We have intensely strong product-market fit. Default alive . Revenue is growing 10% MoM on average, and we're very efficient. We raise money to push ambition and grow faster, not to keep the lights on. Well-funded. We've raised more than $100m from some of the world's top investors . We're set up for a long, ambitious journey. We're focused on building an awesome product for end users, hiring exceptional teammates, shipping fast, and being as weird as possible . Things we care about Transparency: Everyone can read about our roadmap, how we pay (or even let go of) people, our strategy, and how we work, in our public company handbook . Internally, we share revenue, notes and slides from board meetings, and fundraising plans, so everyone has the context they need to make good decisions. Autonomy: We don’t tell anyone what to do. Everyone chooses what to work on next based on what's going to have the biggest impact on our customers, and what they find interesting and motivating to work on. Engineers lead product teams and make product decisions . Teams are flexible and easy to change when needed. Shipping fast: Why not now? We want to build a lot of products; we can't do that shipping at a normal pace. We've built the company around small teams – autonomous, highly-efficient groups of cracked engineers who can outship much larger companies because they own their products end-to-end. Time for building: Nothing gets shipped in a meeting. We're a natively remote company. We default to async communication – PRs > Issues > Slack. Tuesdays and Thursdays are meeting-free days , and we prioritize heads down building time over perfect coordination. This will be the most productive job you've ever had. Ambition: We want to solve big problems. We strongly believe that aiming for the best possible upside, and sometimes missing, is better than never trying. We're optimistic about what's possible and our ability to get there. Being weird: Weird means redesigning an already world-class website for the 5th time. It means shipping literally every product that relates to customer data. It means building an objectively unnecessary developer toy with dubious shareholder value. Doing weird stuff is a competitive advantage. And it's fun. Who we're looking for We’re looking for a full-stack engineer who knows how to leverage LLMs to make PostHog 10x more powerful. You’ve built and shipped agentic AI applications before and understand it’s more than just hitting an API with a good prompt. You’ll fit right in if you: Ship end-to-end AI apps — from backend infra to frontend UX. Collaborate widely — work across teams, find the right people, and make progress fast. Think like a product builder — care about users and outcomes, not just code. If you’ve worked in autonomous agents, workflow automation, or AI copilots before, you’ll feel at home — but here you’ll get to build it from scratch, in the open, with massive impact. What makes this role unique In this role, you’re not hacking together an AI agent hoping someone will use it — you’re building on top of a firehose of real customer data with immediate impact, whether that means creating new AI-powered tools or building observability features that help others understand how their agents perform in the wild. Impact from day one: You’re building agents on top of real customer data — not toy demos, not “when we get users.” Data advantage: PostHog already collects all the context agents need to be useful: analytics, sessions, events, feature flags, and more. You’ll build on top of it all. Agents that matter: Instead of copilots on the side, you’ll create background agents embedded into engineering workflows - changing how software gets built. Or, you might focus on building the analytics and observability layer around those agents - surfacing insights, clustering behaviors, and measuring model quality so teams can improve their own AI systems with confidence. Build in the open: Work with feedback from the largest open-source product engineering communities in the world. What you'll be doing Owning products and features from beginning to end. This means originating ideas based on your intuition, talking to users, and understanding our strategy and goals. It means testing MVPs in production with real users. It means iterating on their feedback, owning pricing, and ensuring the ongoing success of your work. Collaborating with design (when necessary). Product engineers at PostHog are full-stack, so we expect you to ship and own the basic UX of your work using our design system. From there, it's up to you to decide when to collaborate with our design team to iterate and polish the experience. Implementing AI features. LLMs, eh? They're getting prettaaay, prettaay good. All our products integrate with PostHog AI, so you'll likely be working with the PostHog AI team to implement AI features in your products. For some teams, that means designing observability and evaluation features for AI products. Talking to users. Good product engineers read feedback from users and iterate quickly. Great product engineers have users they're friendly with, talk with them frequently, bounce ideas off them, and iterate with them when they ship new things. Doing support. Every week, one person in each engineering team is designated the Support hero. Their job is to investigate and resolve issues reported by customers for their product. Giving users support from real engineers and shipping fixes and improvements in real-time is one of the best ways to spark joy in users. This role will also include some on-call time, too. Writing docs. We have a content team that will collaborate with you on reviewing, polishing, and improving your documentation, but the best person to document a new feature is the person who built it. Requirements Have worked at a high-growth SaaS company before. Extensive knowledge of Django and/or TypeScript-based React. Experience building AI-native products or integrating AI into existing software. If you have a disability, please let us know if there's any way we can make the interview process better for you - we're happy to accommodate! #LI-DNI

United States
Job Closed
OtherRemoteSince 2017

Valence has built the only first-to-market AI native coaching platform for enterprise, offering personalized, expert, and human-like guidance and support to any leader or employee. We’re not just talking about the future of work — we’re building it now, with the most innovative Fortune 500 companies across healthcare, financial services, manufacturing, and technology. Our focus is on the problems that actually decide whether AI changes how organizations operate — the ones with no playbook, no obvious answers, and no guarantee of success. If you want to be part of the small group that defines how AI transforms the future at a global scale, this is your chance. And this isn’t for everyone. We’re not looking for people who want predictability or incremental progress. We only want those who are restless at the edge of what’s possible, who get bored when things feel “done,” and who are driven to redefine what AI can mean for leaders, companies, and the world. Because at Valence, the work worth doing is the kind that redefines work itself. The Role As a founding Applied AI Engineer at Valence, you will help define and build the future of AI-powered leadership coaching , working directly with our Head of AI and cross-functional product and engineering teams. You’ll be at the intersection of generative AI research and product execution - helping design, build, and refine intelligent systems that deliver context-aware, personalized coaching at enterprise scale . This role is purposefully broad and adaptable: we don’t expect any one candidate to check every technical box, but we do look for engineers who are eager to learn, iterate rapidly, and contribute meaningfully across a range of challenges from model behavior to production systems. You’ll tackle real-world AI problems - from transforming enterprise data into actionable context, to optimizing conversational experiences, to shaping how AI engages with users in meaningful and responsible ways. Your work will directly impact how our platform performs in high-stakes enterprise deployments and how leaders around the world grow through AI-facilitated insights. About Valence We're the only company pioneering leadership coaching for large enterprises in an AI-first way. Our mission is to transform how the world's biggest companies approach learning and development, helping teams work better together through AI-powered personalization that adapts to individual goals and organizational culture using the latest advances in machine learning and natural language processing. We've been featured in Harvard Business Review, TIME, World Economic Forum, Financial Times, Forbes and an Inc. 5000 fastest-growing private companies in America. Our clients represent the most diverse and sophisticated enterprise AI implementations globally, including Coca-Cola, Delta, Nestlé, General Mills, Schneider Electric, Deutsche Telekom, AstraZeneca, Prudential, CVS and Bristol Myers Squibb. Working at Valence means you'll work directly with Fortune 500 technology leaders, building expertise through the most complex enterprise deployments while gaining insight into diverse organizational approaches to AI transformation. These aren't just any enterprise clients - they're the companies defining what AI-first business transformation looks like across every major industry. What You'll Do Architect and build enterprise-grade AI and conversational systems that power coaching workflows and user experiences. Develop, evaluate, and refine LLM-based components - balancing performance, scalability, and reliability in real use cases. Integrate and manage diverse sources of structured and unstructured data to improve contextual understanding and output quality. Partner closely with product, engineering, and design to translate user needs into impactful technical solutions. Rapidly prototype and iterate on systems that span backend services, data pipelines, and frontend interactions as needed. Build tooling, tests, and automation to support reliable model deployment , observability, and continuous improvement. Help streamline data and science workflows, enabling fast experimentation and data-driven decisions. We recognize that not every candidate will have experience in every area above - we value growth potential, curiosity, and the ability to learn on the job. What We're Looking For Technical foundation: 3+ years of experience in software engineering, AI/ML, data-intensive systems, AI/ML development (ideally including a Master's or Ph.D. in Computer Science, ML, Data Science, or a related field). AI systems mindset: Familiarity with language systems (e.g., NLP, conversational interfaces, IR) and comfort reasoning about model behavior, context, and evaluation - both theoretical and practical knowledge. Data tooling & analysis: Experience with core data science tools such as NumPy, scikit-learn, Pandas, PySpark , plus SQL and common visualization tools (e.g., matplotlib, Seaborn, Plotly, or BI tools) to explore and communicate insights. Cloud & deployment experience: Comfortable developing and deploying services in cloud environments (AWS, GCP, Azure) and working with containerization/orchestration (Docker, Kubernetes). Engineering excellence: Strong software engineering skills, including writing maintainable code, debugging distributed systems, and collaborating in cross-functional teams. Growth orientation: Eagerness to tackle unfamiliar problems, learn new technologies, and contribute to shaping our platform and culture. Communication: Ability to explain technical ideas clearly and work effectively with both technical and non-technical stakeholders. Nice-to-have (but not required): experience with ML lifecycle tools (e.g., MLflow, Weights & Biases), familiarity with Cloud ML services, or past work building generative AI applications. What You'll Get Ownership & Rapid Growth Outsized missions from day one, with direct responsibility for company-defining projects Work alongside the executive team with transparency into strategy and decision-making Influence on direction through real-time customer feedback and market insights AI-First Operator Work directly with cutting-edge AI models and next-generation platforms Build expertise in enterprise AI implementation across Fortune 500 companies and multiple industries Establish yourself as a recognized leader among peers in shaping how AI transforms work at a global scale Compensation Competitive salary including base + bonuses Comprehensive health coverage (medical, dental, vision) from day one Generous PTO, company-wide R&R shutdowns, and paid parental leave Retirement plan support for US and global employees Equity Meaningful ownership in a venture-backed company at a growth inflection point Financial upside that comes from scaling fast Top-up grants as we scale and you deliver exceptional performance — your compensation grows alongside your impact Top-Performing Culture A culture built for top talent: intensity to win, growth without limits, and a team that solves hard problems and celebrates big wins together Learn more about us and meet our team here Location and Work Environment If candidates are based in NYC or Toronto they can work hybrid in our offices, otherwise this role can be remote. Candidates must be comfortable working with colleagues in different time zones (UK), and have valid travel documents without work authorization restrictions in the US. Diversity and Inclusion We are dedicated to creating a diverse and inclusive environment where everyone feels valued and supported. We encourage applications from candidates of all backgrounds and offer accommodations upon request throughout the hiring process. If you have any questions, please reach out to Allison Langille, Head of People, at jobs@valence.co. Employment Verification & Commitment We use third-party services to verify employment history, education, and other information relevant to your candidacy. Employment is contingent upon the successful completion of these verification checks. This is a full-time role that requires a high level of focus, availability, and commitment. Employees may not hold concurrent full-time employment with another organization while employed at Valence. Any outside consulting, advisory, freelance, or other professional work must be disclosed and approved in advance and must not interfere with job responsibilities, availability, performance, or create a conflict of interest - including risks related to confidentiality, intellectual property, or competition. #LI-HYBRID

New York + 1 moreAll locations: New York | California
Job Closed
Lyric - Clarity in motion. logo

Lead AI Engineer

Lyric - Clarity in motion.

Simplifying the business of care.

AI Engineer110 days ago
OtherRemoteTeam 201-500H1B No Sponsor

Lyric is an AI-first, platform-based healthcare technology company, committed to simplifying the business of care by preventing inaccurate payments and reducing overall waste in the healthcare ecosystem, enabling more efficient use of resources to reduce the cost of care for payers, providers, and patients. Lyric, formerly ClaimsXten, is a market leader with 35 years of pre-pay editing expertise, dedicated teams, and top technology. Lyric is proud to be recognized as 2025 Best in KLAS for Pre-Payment Accuracy and Integrity and is HI-TRUST and SOC2 certified, and a recipient of the 2025 CandE Award for Candidate Experience. . Interested in shaping the future of healthcare with AI? Explore opportunities at lyric.ai/careers and drive innovation with #YouToThePowerOfAI. The Lead AI Engineer will drive the development of intelligent systems that extract and structure data from unstructured documents such as PDFs, scanned forms, and free-text content. This role will lead the design and deployment of advanced machine learning and generative AI solutions, with a particular emphasis on language models (SLM/LLMs) and their application to document understanding and data extraction at scale. ESSENTIAL JOB RESPONSIBILITIES & KEY PERFORMANCE OUTCOMES Lead the architecture, development, and deployment of AI/ML systems for document ingestion, understanding, and data extraction. Build and create good datasets and a system of good validation and verifications of data and ML systems. Build and fine-tune LLMs and generative AI models to interpret, summarize, and extract information from complex unstructured content. Develop NLP pipelines leveraging techniques such as OCR, entity recognition, text classification, summarization, and semantic parsing. Integrate LLMs with retrieval systems (RAG), vector databases, and structured outputs suitable for downstream consumption. Collaborate cross-functionally to align technical solutions with product requirements and compliance needs. REQUIRED QUALIFICATIONS Minimum of seven (7) years of experience in AI/ML engineering, with at least three (3) years in a technical or team leadership role Previous Technical Leadership in the AI/ML leadership space Hands-on experience building and deploying S/LLMs or generative AI applications (e.g., using Llama, Deepseek or similar frameworks) Proven track record of extracting structured data from unstructured document sources, including scanned forms, free-text reports, and complex layouts Strong software engineering skills in Python and ML frameworks (e.g., Kubeflow, PyTorch, multi-agentic frameworks) Experience with OCR technologies (e.g., Tesseract, Amazon Textract), NLP techniques, and model deployment in production environments Deep understanding of NLP methods including embeddings, transformers, named entity recognition (NER), and text classification Familiarity with MLOps, version control, CI/CD, and cloud platforms (AWS, GCP, or Azure) PREFERRED QUALIFICATIONS Experience implementing retrieval-augmented generation (RAG), prompt engineering, or fine-tuning foundation models Familiarity with vector databases (e.g., Postgres-pg-vector, Pinecone, FAISS, Weaviate) and semantic search Strong experience shipping production ML systems with a track record of monitoring and improving the ML systems Experience working in regulated domains such as healthcare, legal, or finance ***The US base salary range for this full-time position is: $211,551.00 - $317,326.00 The specific salary offered to a candidate may be influenced by a variety of factors including but not limited to the candidate’s relevant experience, education, and work location. Please note that the compensation details listed in US role postings reflect the base salary only, and does not reflect the value of the total rewards compensation. *** Lyric is an Equal Opportunity Employer that strives to create an inclusive environment, empower employees and embrace collaborative success.

United States
$211.6K - $317.3K / year
OtherRemoteTeam 11-50

RevenueBase: - We're building the data infrastructure that makes AI agents trustworthy instead of error-prone. - We provide continuously refreshed, verified B2B data for autonomous AI agents and GTM workflows. - We've tripled growth while maintaining 100% gross dollar retention and staying cashflow positive. - We power AI agents for Clay, Zoominfo, Dun & Bradstreet, and the next generation of AI GTM tools. About the Role We are looking for a Senior Data & AI Platform Engineer to build internal tools and services on top of our large-scale data infrastructure. Your primary focus will be developing systems that leverage vector embeddings, LLM APIs, and semantic search to unlock value from structured and unstructured data. This is a hands-on engineering role for someone who enjoys building practical AI-powered tools — not just experiments — and shipping them into production in a fast-moving startup environment. What You’ll Do - Design and build data-driven tools that operate on large datasets stored in S3 and Snowflake - Implement pipelines that: - Extract specific columns or datasets from Snowflake - Generate vector embeddings via APIs such as OpenAI - Store and manage embeddings in vector databases like Pinecone - Enable semantic search and similarity-based retrieval - Develop enrichment workflows that: - Query structured data - Use LLM APIs to generate new derived columns - Write enriched results back into Snowflake - Build reusable internal services and SDKs around embedding generation, prompt orchestration, and data augmentation - Optimize performance and cost across AWS infrastructure - Work closely with product and data teams to turn use cases into scalable engineering solutions - Ensure reliability, observability, and maintainability of AI-powered pipelines Example Projects - Tool to extract a single Snowflake column, generate embeddings, push to Pinecone, and expose a semantic search API - Batch enrichment pipeline that queries records from Snowflake, calls OpenAI APIs for structured enrichment, and writes new columns back - Internal framework for LLM-based data transformation and validation - Query abstraction layer to make AI-enhanced analytics accessible to non-engineering teams Required Qualifications - 5+ years of software engineering experience - Strong backend engineering skills (Python preferred; other modern languages acceptable) - Solid experience with: - AWS (IAM, Lambda, ECS/EKS, S3, networking, security best practices) - Data warehousing (Snowflake preferred) - API design and distributed systems - Hands-on experience working with LLM APIs (e.g., OpenAI) and embedding workflows - Experience with vector databases (Pinecone or similar) - Strong understanding of data modeling, ETL/ELT patterns, and performance optimization - Production experience in at least one startup environment - Ability to operate independently and ship high-impact systems end-to-end Nice to Have - Experience building internal developer platforms or data tooling - Familiarity with prompt engineering and evaluation pipelines - Experience with orchestration frameworks (Airflow, Prefect, Dagster) - Exposure to retrieval-augmented generation (RAG) systems - Infrastructure-as-code experience (Terraform, CDK) - Experience managing large-scale embedding refresh and re-indexing workflows What Success Looks Like - Engineers and analysts can easily leverage AI-powered data enrichment - Embedding-based search works reliably at scale - New AI use cases can be implemented quickly using shared internal tooling - Systems are robust, observable, and cost-efficient Why Join Us? - Work on practical, production-grade AI systems - Direct impact on how data is leveraged across the company - Startup speed with real ownership and autonomy - Opportunity to define the internal AI platform from the ground up

United States
Job Closed