Job Closed
This listing is no longer active.
Cincinnatus is an enterprise staffing company that partners with leading technology companies to source and employ highly skilled professionals for full-time and long-term contingent roles. Cincinnatus serves as the employer of record for these engagements, providing W-2 employment, payroll, benefits, and compliance, while placing employees directly within client teams to work on high-impact initiatives. Roles hired through Cincinnatus are not project-based or freelance engagements. They are structured, role-based positions that typically involve full-time or fixed-term commitments, close collaboration with a client's internal teams, and integration into standard enterprise workflows. Cincinnatus is a legal entity separate from Mercor. While opportunities may be discovered through Mercor's platform, employment, onboarding, payroll, and benefits for these roles are administered by Cincinnatus. Equal Employment Opportunity Cincinnatus is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or any other legally protected characteristic. Cincinnatus is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans throughout the job application process.
AI Model Evaluation Specialist
Location
Worldwide
Posted
76 days ago
Salary
$25 - $35 / hour
Seniority
Mid Level
No structured requirement data.
Job Description
AI Model Evaluation Specialist
Mercor
Role Description - Write realistic prompts reflecting professional and consumer domain-specific guidance. - Evaluate AI-generated responses for factual accuracy and practical usefulness. - Identify fabricated claims and misleading reasoning in model outputs. - Score and rank model responses using structured rubrics. - Provide written justifications with specific evidence for evaluations. Qualifications - Professional experience applying domain expertise in a practitioner or advisory capacity. - Familiarity with industry-specific standards, regulations, or clinical guidelines. - Strong written communication and critical reasoning skills. Company Description Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark, General Catalyst, Peter Thiel, Adam D'Angelo, Larry Summers, and Jack Dorsey.
Related Guides
Related Categories
Related Job Pages
More Artificial Intelligence Jobs
• Створювати AI-контент (зображення, відео, персонажі тощо) для соціальних мереж, рекламних кампаній та внутрішніх R&D-задач. • Працювати з різними AI-моделями, підбирати найкращі інструменти під задачу та інтегрувати їх у продакшн. • Створювати та оптимізувати промпти. • Аналізувати референси й стилі, підлаштовуючи вихід під потрібний tone of voice. • Співпрацювати з дизайнерами, маркетологами, R&D й контент-командою.
Contract Operations, AI Compliance Analyst
Manning Global AGKeeping you connected with new worldwide roles in IT, AI, Telecoms & Engineering!
• Download and manage contracts, agreements and other legal documents in Dutch, German, authorized sources. • Reviewing, labelling, summarizing non-English legal documents, preparing inventory and creating a repository for key clauses & obligations. • Ensuring that extracted information using AI-enabled tools precisely captures legal intent and language-specific nuances, while maintaining both the integrity and accuracy of the original legal meaning. • Support ongoing process improvements by identifying gaps, inconsistencies, or automation opportunities in contract workflows. • Maintaining trackers up to date by regularly recording document status, key deadlines, and completion progress. • Conduct secondary quality assurance reviews to confirm thoroughness, adherence to standard operating procedures, and uniformity.
AI Safety Evaluator – Malay
WelocalizeReach, Grow, and Engage Global Audiences with Multilingual Content
• Evaluate AI-generated responses using a structured safety rubric • Complete two independent evaluations per item • Provide concise, well-structured rationales in English • Participate in calibration sessions • Support arbitration when evaluation discrepancies occur • Maintain quality and throughput targets during the evaluation window
• Define and execute the Agentic AI & GenAI research and development strategy for HIS products • Lead cross-functional AI science teams to deliver NLU-powered and autonomous AI solutions that enhance product capabilities • Establish partnerships with academic, research, and healthcare organizations to drive innovation • Oversee the development of agent-based AI systems • Advance state-of-the-art NLU and LLM technologies for HIS-specific language models, clinical text understanding, and workflow automation • Ensure models meet the highest standards for accuracy, fairness, explainability, and compliance with healthcare regulations • Build and mentor a team of AI scientists, engineers, and product collaborators • Establish and manage AI model evaluation, validation, and monitoring frameworks • Work closely with product management and engineering to bring innovations from research into production at scale



