Job Closed

This listing is no longer active.

International Data Group logo
International Data Group

At IDC, your work helps shape how the world understands technology and where it goes next. You collaborate with curious, high-caliber colleagues who value rigor, integrity, and shared success. As the premier global provider of trusted technology intelligence, IDC equips business and technology leaders with the evidence they need to make confident decisions. Our insights inform strategy, investment, and innovation across industries and regions. Recognized by IIAR as Analyst Firm of the Year for five consecutive years, IDC sets the standard for credibility and impact. With more than 1,000 analysts worldwide and a truly global perspective, we combine deep expertise with practical relevance. Here, your ideas matter, your voice is heard, and your contributions provide the insights leaders rely on every day. It is meaningful work, backed by a culture that supports growth, collaboration, and long-term career development with a globally respected brand.

Senior AI Quality/Evaluation Engineer

Location

Worldwide

Posted

65 days ago

Salary

C$100K - C$143K / year

Seniority

Senior

Job Description

Senior AI Quality/Evaluation Engineer

International Data Group

Role Description IDC is building the next generation of AI-powered intelligence platforms that transform how technology decisions get made. We are looking for a Senior AI Quality/Evaluation Engineer to establish the evaluation function for the platform's AI systems. This is a solo function initially. You will design and build the evaluation infrastructure that ensures the platform produces accurate, well-sourced, high-quality responses. You will be the first hire in this function and must be able to operate independently, defining your own roadmap and building from scratch. The platform's credibility depends on the quality of its AI-generated intelligence. You will build the automated test suites, regression detection systems, and evaluation frameworks that catch quality issues before they reach users. You will work closely with the product team to translate quality criteria into measurable, automatable test scenarios, and with the AI engineering team to ensure that every pipeline change is evaluated against rigorous standards. What You’ll Do - Design and build the evaluation infrastructure that ensures the platform's AI systems produce accurate, well-sourced, high-quality responses. - Build automated test suites that validate answer quality across agent pipeline changes. - Develop regression detection systems that catch quality degradation before it reaches users. - Create evaluation frameworks that measure response accuracy, citation correctness, and source quality. - Work closely with the product team to translate quality criteria into measurable, automatable test scenarios. - Build cost and latency monitoring that tracks the operational efficiency of AI pipeline execution. - Define evaluation standards and practices that scale as the platform and team grow. Qualifications - 6+ years of software engineering experience, with significant work in testing infrastructure, ML evaluation, or quality systems. - Experience building evaluation or testing frameworks for LLM-based or ML-based systems. - Understanding of how to measure response quality for generative AI: accuracy, groundedness, citation correctness, relevance. - Proficiency in Python. - Ability to operate independently and define your own roadmap. - Experience working at the intersection of engineering and product, translating qualitative quality criteria into quantitative measurements. - Experience with LLM evaluation frameworks (e.g., RAGAS, DeepEval, or custom). - Familiarity with LLM observability tools (e.g., Langfuse, LangSmith, Weights & Biases). - Background in statistical methods for quality measurement (significance testing, distribution analysis). - Experience building A/B testing or experimentation infrastructure. - Background in search relevance evaluation or information retrieval metrics. Benefits - 15 vacation days per year (increases with tenure; carryover allowed). - 10 paid sick days per year. - 1 week paid new parenting leave. - Flexible work options (remote, part-time, flexible hours). - Health, dental, vision, and paramedical coverage for you and your family. - $1,600 annual healthcare spending account. - Employee Assistance Program for counseling and support. - Best Doctors medical second opinions. - Life, AD&D, and long-term disability insurance. - Retirement savings plan with company match (up to 4% of salary). - $75/month technology allowance for home office or phone expenses. - Company-paid cell phone plan.

Related Job Pages

More Machine Learning Engineer Jobs

Evnek logo

Machine Learning Engineer

Evnek

Redefining Possibilities with Agentic AI Solutions !

ContractRemoteTeam 51-200Since 2021H1B No Sponsor

• Design, build, and scale experimentation and causal inference services • Develop and maintain advanced statistical and ML modules • Build and extend RESTful APIs using FastAPI • Design and optimize large-scale data pipelines using PySpark

India
Job Closed

Senior Machine Learning Engineer

Paradigm Health

Paradigm Health is transforming the clinical research industry through a mission-driven approach that connects patients, providers, and life sciences companies in a more efficient,

• Lead the development, testing, and deployment of ML models and pipelines, with a focus on scalability and integration into production systems. • Design and refine GenAI/LLM-based models to streamline and automate clinical trial operations, from data gathering to real-time performance monitoring. • Partner with clinicians, informaticists, data scientists, and engineers to build solutions aligned with Paradigm’s mission and goals. • Drive improvements in model deployment infrastructure, develop monitoring tools, and refine model performance to ensure robust production-level reliability. • Mentor junior ML engineers, contributing to team knowledge-sharing and establishing best practices for data science and machine learning. • Present complex technical insights and results to both technical and non-technical stakeholders, advocating for data science-driven strategies that align with business objectives.

United States
$180K - $200K / year
Job Closed
Toloka Annotators logo

Freelance Annotator (English) - AI Trainer

Toloka Annotators

Be a key player in crafting the high-quality data essential for AI innovation. Perfect for aspiring freelancers

Part TimeRemoteTeam 51-200

Please submit your resume in English and indicate your level of English. At Toloka, we connect smart, curious people from around the world with freelance online tasks that train and improve artificial intelligence. What we do The Toloka Annotators connects individuals with Generative AI projects from leading tech innovators. Our mission is to unlock the full potential of AI by involving real people from around the world in the development process. About the Role Annotation is what helps AI make sense of the world. As an annotator, you may be invited to take part in online projects such as rating AI-generated content, evaluating factual accuracy, or comparing responses - when projects are available. Responsibilities: - Carefully review provided data (text, images, or videos) - Label or classify content based on project guidelines - Identify and flag factually incorrect, sensitive, inappropriate, or unclear material Important note: This is project-based work. Tasks are available only when projects are active. You may be invited to one or more projects depending on your profile and current opportunities. Each project has its own compensation level based on scope and expertise required. On this project, AI trainers earn up to $23 per hour equivalent.

Oklahoma
Job Closed
Toloka Annotators logo

Freelance Annotator (English) - AI Trainer

Toloka Annotators

Be a key player in crafting the high-quality data essential for AI innovation. Perfect for aspiring freelancers

Part TimeRemoteTeam 51-200

Please submit your resume in English and indicate your level of English. At Toloka, we connect smart, curious people from around the world with freelance online tasks that train and improve artificial intelligence. What we do The Toloka Annotators connects individuals with Generative AI projects from leading tech innovators. Our mission is to unlock the full potential of AI by involving real people from around the world in the development process. About the Role Annotation is what helps AI make sense of the world. As an annotator, you may be invited to take part in online projects such as rating AI-generated content, evaluating factual accuracy, or comparing responses - when projects are available. Responsibilities: - Carefully review provided data (text, images, or videos) - Label or classify content based on project guidelines - Identify and flag factually incorrect, sensitive, inappropriate, or unclear material Important note: This is project-based work. Tasks are available only when projects are active. You may be invited to one or more projects depending on your profile and current opportunities. Each project has its own compensation level based on scope and expertise required. On this project, AI trainers earn up to $23 per hour equivalent.

Texas
Job Closed