Job Closed
This listing is no longer active.
Cincinnatus is an enterprise staffing company that partners with leading technology companies to source and employ highly skilled professionals for full-time and long-term contingent roles. Cincinnatus serves as the employer of record for these engagements, providing W-2 employment, payroll, benefits, and compliance, while placing employees directly within client teams to work on high-impact initiatives. Roles hired through Cincinnatus are not project-based or freelance engagements. They are structured, role-based positions that typically involve full-time or fixed-term commitments, close collaboration with a client's internal teams, and integration into standard enterprise workflows. Cincinnatus is a legal entity separate from Mercor. While opportunities may be discovered through Mercor's platform, employment, onboarding, payroll, and benefits for these roles are administered by Cincinnatus. Equal Employment Opportunity Cincinnatus is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or any other legally protected characteristic. Cincinnatus is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans throughout the job application process.
Language Model Evaluator
Location
United States + 3 moreAll locations: United States | Egypt | United Arab Emirates | Saudi Arabia
Posted
80 days ago
Salary
$23 / hour
Seniority
Mid Level
No structured requirement data.
Job Description
Language Model Evaluator
Mercor
Role Description Evaluate LLM-generated responses on their ability to effectively answer user queries. - Conduct fact-checking using trusted public sources and external tools. - Generate high-quality human evaluation data by annotating response strengths, areas for improvement, and factual inaccuracies. - Assess reasoning quality, clarity, tone, and completeness of responses. - Ensure model responses align with expected conversational behavior and system guidelines. - Apply consistent annotations by following clear taxonomies, benchmarks, and detailed evaluation guidelines. Qualifications - Bachelor’s degree - Native speaker or ILR 5/primary fluency (C2 on the CEFR scale) in Arabic - Significant experience using large language models (LLMs) - Excellent writing skills - Strong attention to detail - Adaptable and comfortable moving across topics, domains, and customer requirements - Background or experience in domains requiring structured analytical thinking - Excellent college-level mathematics skills Requirements - Prior experience with RLHF, model evaluation, or data annotation work (preferred) - Experience writing or editing high-quality written content (preferred) - Experience comparing multiple outputs and making fine-grained qualitative judgments (preferred) - Familiarity with evaluation rubrics, benchmarks, or quality scoring systems (preferred) Benefits - Compensation: $23/hour - Type: Full-time or Part-time Contract Work - Location: Geography restricted to Egypt, Saudi Arabia, UAE, USA Application Process - Upload resume - AI interview based on your resume - Submit form Resources & Support - For details about the interview process and platform information, please check: Interview Process - For any help or support, reach out to: support@mercor.com - Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.
Job Requirements
- Bachelor’s degree
- Native speaker or ILR 5/primary fluency (C2 on the CEFR scale) in Arabic
- Significant experience using large language models (LLMs)
- Excellent writing skills
- Strong attention to detail
- Adaptable and comfortable moving across topics, domains, and customer requirements
- Background or experience in domains requiring structured analytical thinking
- Excellent college-level mathematics skills
- Prior experience with RLHF, model evaluation, or data annotation work (preferred)
- Experience writing or editing high-quality written content (preferred)
- Experience comparing multiple outputs and making fine-grained qualitative judgments (preferred)
- Familiarity with evaluation rubrics, benchmarks, or quality scoring systems (preferred)
Benefits
- Compensation: $23/hour
- Type: Full-time or Part-time Contract Work
- Location: Geography restricted to Egypt, Saudi Arabia, UAE, USA
- Application Process
- Upload resume
- AI interview based on your resume
- Submit form
- Resources & Support
- For details about the interview process and platform information, please check: Interview Process
- For any help or support, reach out to: support@mercor.com
- Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.
Related Guides
Related Job Pages
More AI Engineer Jobs
AI Platform Analyst
World Wide Technology Healthcare SolutionsFounded in 1990, World Wide Technology (WWT) is a global systems integrator with $13.4 billion in annual revenue that provides digital strategy, innovative technology and supply chain solutions to large public and private organizations.
Role Description World Wide Technology, Inc. is currently seeking an AI Platform Analyst for our IT AI Center of Excellence Workforce AI. AI Platform Analysts within the IT AI Center of Excellence team support the AI platforms and Workforce AI initiatives at World Wide Technology. As an AI Platform Analyst, you play a pivotal role in supporting and optimizing the AI platforms that drive innovation at WWT. This position involves ensuring that the platforms are effectively governed, maintained, and aligned with best practices. The analyst will also be responsible for training and enabling employees to leverage AI technologies effectively. A strong candidate demonstrates the following competencies: - Curiosity & Learning Agility: Stays ahead of a rapidly evolving AI landscape and brings new ideas back to WWT. - Collaboration: Builds strong relationships across IT and business teams. - Ownership: Takes end-to-end accountability for platform health and user experience. - Clear Communication: Translates complexity into practical guidance for all levels of the organization. - Analytical Thinking: Uses data to identify trends, measure adoption, and drive decisions. - DOER: Favors action over continuous discussion and is not afraid to get hands on keyboard to make ideas a reality. Key responsibilities include: - Platform Support: - Serve as a point of contact for internal AI platform issues, escalations, and service requests. - Monitor platform health, availability, and performance; coordinate with vendors and internal IT teams to resolve incidents. - Manage license provisioning, user access, and platform configurations. - Maintain documentation for platform operations, known issues, and runbooks. - Evaluate and test new platform features and updates before rollout. - Integrations & Connectivity: - Partner with IT, engineering, and business teams to identify and implement integrations between AI platforms and internal tools (e.g., ITSM, collaboration tools, data systems). - Ensure AI platforms are connected to relevant data sources, workflows, and enterprise systems in a secure and scalable way. - Coordinate with vendors on API capabilities, integration roadmaps, and technical requirements. - Governance & Compliance: - Develop and maintain AI platform governance frameworks, including acceptable use policies, data handling standards, and access controls. - Ensure platforms comply with WWT's security, privacy, and data governance requirements. - Partner with Legal, Compliance, and Security teams to assess risks associated with AI platform usage. - Track and report on platform usage, adoption metrics, and policy adherence. - Stay current on emerging AI regulations and industry best practices, recommending policy updates as needed. - User Enablement & Training: - Design and deliver training programs, resources, and onboarding experiences for employees using AI platforms. - Create and maintain self-service learning content including guides, videos, FAQs, and prompt libraries. - Champion best practices for responsible and effective AI use across the organization. - Gather and act on user feedback to continuously improve the platform experience. - Partner with business units to identify use cases and drive meaningful adoption. Qualifications - 1-3+ years of experience in a technology analyst, platform operations, IT, developer, or related role. - Experience supporting and administering enterprise SaaS platforms. - Strong understanding of data privacy, security, and governance principles. - Excellent communication skills with the ability to translate technical concepts for non-technical audiences. - Demonstrated ability to manage multiple priorities in a fast-paced environment. - Experience creating training or enablement content for end users. - Familiarity with AI/ML platforms and tools (e.g., Microsoft Copilot, Claude, ChatGPT Enterprise, Glean, Google Gemini, or similar). - Experience with enterprise integrations, APIs, or workflow automation. - Knowledge of AI governance frameworks and responsible AI principles. - Experience in change management or organizational adoption programs. - Applicants must be authorized to work in the United States. We are unable to provide sponsorship for this position. Requirements - A reasonable estimate of the current base pay range for this position is $71,200 to $89,000 annually. Actual salary will be based on a variety of factors, including shift, location, experience, skill set, performance, licensure and certification, and business needs. - The range for this position in other geographic locations may differ. - Certain positions may also be eligible for variable incentive compensation, such as bonuses or commissions, that are not included in the base pay. Benefits - Health and Wellbeing: Health, Dental, and Vision Care, Onsite Health Centers, Employee Assistance Program, Wellness program. - Financial Benefits: Competitive pay, Profit Sharing, 401k Plan with Company Matching, Life and Disability Insurance, Tuition Reimbursement. - Paid Time Off: PTO and Sick Leave (starting at 20 days per year) & Holidays (10 per year), Parental Leave, Military Leave, Bereavement. - Additional Perks: Nursing Mothers Benefits, Voluntary Legal, Pet Insurance, Employee Discount Program.
• Work with other engineers on a wide variety of AI engineering tasks, including prompt engineering, retrieval-augmented generation, and agentic workflow optimization, etc to improve our existing applied AI systems • Identify new opportunities to apply emerging AI capabilities to different parts of the Poe product • Take end-to-end ownership of applied AI systems - from prototyping, data pipelines, model optimization/evaluation to reliable deployment at scale
Principal AI Application Engineer
Code for AmericaGovernment can and should work well for everyone. We're people-centered problem solvers showing that it's possible.
Code for America believes government can work for the people, by the people, in the new digital age, and that government at all levels can and should work well for all people. For more than a decade, we’ve worked to show that with the mindful use of technology, we can break down barriers, meet community needs, and find real solutions. Our employees build and transform government and community tools and services, making them so good they inspire change. We merge the best parts of technology, nonprofit, and government to help support the people who need it most. With a focus on transparency and fairness, and deep empathy for partners in government and community organizations and the people that our partners serve, we’re building a movement of motivated change agents driven by meaningful results and lasting impact. At Code for America, you contribute to exciting work while learning and developing in a supportive and flexible environment. Our compensation and benefits are holistic and thoughtfully curated to represent our employees and our mission. Help us drive real generational change that lasts. Code for America is looking for a talented Principal AI Application Engineer who will bridge the gap between complex policy mandates and technical execution. By building modular, model-agnostic systems, you will prove that civic tech can respond to urgent legislative shifts faster and more responsibly than traditional monolithic "black boxes." About the role: The mission of New Ventures within Code for America is to imagine and then build a future of radically improved public service delivery. We are looking for two Principal AI Application Engineers to join a high-impact, lean team that is creating the tools, proof of concepts, and durable products that will define how the next generation of government services are delivered. In this role, you will be architecting the connective tissue that allows government agencies to deploy responsible, portable, and effective AI. This role will report to the VP of New Ventures team and is expected to travel no more than 10% of the time. Code for America is based in California and can employ those who reside full-time within the United States. This is a remote position. In this position you will: - Rapid Prototyping and Experimental Development - Build and ship high-impact experiments (e.g., digital lockers or AI-augmented procurement tools) to expand the "impact possibilities frontier." - Translate vague policy objectives into robust, working systems. - Identify when to use LLMs, agentic orchestration, or RAG patterns versus simple rules engines or traditional ML models. - Collaborate with policy and domain experts to co-design civic-sector benchmarks, translating complex regulatory requirements into automated evaluation pipelines that rigorously measure model performance. - Identify and define new opportunity spaces by translating emerging policy, technology, and user needs into actionable technical bets. - Infrastructure and AI System Architecture - Architect foundational tools, including declarative task specifications and agentic data layers. - Design infrastructure that respects public sector constraints, focusing on portability and explainability. - Build data layers that interoperate with legacy systems (COBOL, SQL, etc.) to deliver modern value without multi-year migrations. - Thought Leadership and Community of Practice - Establish Code for America as a leader in responsible AI through external thought leadership, including publications, talks, and open-source contributions. - Share demos and earned insights internally to help the organization iterate toward better standards and internal use cases of responsible AI. - Document architectural decisions, successes, and failures to create a blueprint for responsible AI in government. - Drive alignment across engineering, product, policy, and program teams to ensure solutions are technically sound, policy-compliant, and operationally viable. - Partner with and mentor fellow engineers through hands-on code reviews and technical guidance, ensuring the team stays grounded in best practices for responsible AI - Active Stewardship and System Integrity - Maintain system health through rigorous, hands-on code reviews and the development of shared utilities. - Ensure craftsmanship and system explainability for the vulnerable populations we serve. About you: - Senior Technical Ownership: - 7+ years of experience in high-ownership environments (former technical founders encouraged); ability to take a vague objective to a finished system. - AI & System-Level Thinking: - Hands-on experience building with LLMs, agentic orchestration, and RAG patterns, with the pragmatism to know when not to use them. - Pragmatic Architecture: - Ability to think in "primitives" and "capabilities," preferring modular, reusable frameworks over bespoke scripts. - Algorithmic Accountability: - Passion for bias detection, harm mitigations, and building systems that are explainable to the people they serve. - Modern Infrastructure Fluency: - Mastery of Git, Linux, CI/CD, Infrastructure as Code(Terraform), and container-based workflows. - Intellectual Humility: - A critical eye toward the limitations of AI, especially "black box" logic in high-stakes public services. - Demonstrated ability to influence technical direction and drive alignment across teams without formal authority. It’s a bonus if you have: - Written Communication: Ability to distill complex architecture into compelling prose for policy-makers or the public. - Regulated Industry Experience: Prior work in Civic Tech, FinTech, or HealthTech where auditability is a core requirement. What you’ll get - Salary: Code for America’s salary bands are transparent as a part of our commitment to transparency and fairness. As part of our hiring practices, we aim to target the midpoint of the 2nd quartile of the range for all new hires. Offer targets vary based on market geographic location. The offer targets for this role range from $143,884 to $176,138, annually. Benefits and perks: - Values: - Leadership and teammates who share a strong work ethic and values, and who respect and care for one another - A collaborative, cross-functional, hardworking, and joyful environment - Employee Enablement Support: - Laptop provided - A one-time $700 payment for remote environment setup; $200 stipend (in first paycheck) and up to $500 reimbursement, in accordance with our equipment policy - Cell phone and/or internet reimbursement of $50 per month - Professional Development: - $500 annual (per calendar year) stipend towards professional development; prorated at time of hire - Up to $500 of professional development funds can be rolled over each year, up to a maximum of $1000 - Training / guidance for staff required to utilize AI as part of their role, plus opportunities for employees to gain AI-related skills to support job and career growth - Retirement & 401k Plans: - Employees receive a 100% employer match on the first 3% of contributions. - Employees with 3+ years of service receive an additional 50% match on contributions between 3% and 5%, for a maximum employer contribution of 5% - Medical: - At least one no cost health insurance option for full-time employees for employee-only coverage - A minimum of 80% of the cost of dependent coverage - Remote Work: - Code for America employees may work remotely across the US - Code for America employees main residence must be within the US - Full-time employees work 40 hours per week, Monday - Friday - Collaborative working hours: we aim to hold all internal meetings between 10 AM - 3 PM PT. We expect all Code for America staff to be available during these set working hours - Time Off: - Open personal time off (subject to manager approval), a minimum of 14 paid holidays, and an org-wide closure from Christmas Day through New Year's Day - Paid sick time; up to 96 hours annually - 17 weeks of paid parental and family leave - 3 weeks of paid sabbatical after 5 years of service Equal Employment Opportunity: Code for America is an equal opportunity employer. Applicants will not be discriminated against because of race, color, creed, sex, sexual orientation, gender identity or expression, age, religion, national origin, citizenship status, disability, ancestry, marital status, veteran status, medical condition or any protected category prohibited by local, state or federal laws. Code for America Workers United: This position is not covered by a Collective Bargaining Agreement between Code for America and Code for America Workers United, affiliated with OPEIU, Local 1010. The agreement was ratified on January 13, 2026, and is currently in effect. #LI-MD1 #LI-Remote
We Are: At Data Society Group, we provide the highest quality, leading-edge, industry-tailored data and AI training and solutions for Fortune 1,000 companies and federal, state, and local governmental organizations. We partner with our clients to educate, equip, and empower their workforce with the skills they need to achieve their goals and expand their impact. Data Society Group publishes CDO Magazine, the preeminent global publication for Data Officers. Our executive boards include industry leaders, engineers, and data scientists from across the world. We are empowering the workforce of the future, from data literacy for all employees to support for data engineers and data scientists to train up on the most complex AI solutions and Machine Learning skills. About the Role: We are seeking an experienced AI Subject Matter Expert (SME) / Instructor to deliver engaging, hands-on virtual instruction in modern full-stack development and AI-assisted coding workflows. This role is ideal for practitioners who are actively building applications and are equally passionate about designing and developing high-quality technical training content. You will help shape both what is taught and how it is taught, creating impactful learning experiences that prepare professionals to work effectively with modern development stacks and AI-powered tools. As an independent contractor, you’ll enjoy the flexibility to teach subjects you’re passionate about while being supported by a collaborative and communicative team. Your role is not just to instruct, but to inspire—creating a positive and supportive learning environment where learners leave empowered and ready to act. Key Responsibilities: - Design, develop, and continuously improve course materials, including slide decks, labs, exercises, and assessments - Translate real-world development workflows into structured, learner-friendly content - Deliver high-quality, live instruction on full-stack JavaScript/TypeScript development and AI-assisted coding practices - Guide learners through hands-on exercises building real-world applications - Demonstrate best practices for integrating AI tools into the software development lifecycle - Collaborate with internal teams to refine curriculum based on learner feedback and evolving industry trends - Mentor learners and provide technical support throughout the training experience What You Bring: We are looking for a practitioner-educator who is as skilled at writing code as they are at facilitating impactful learning sessions. The ideal candidate possesses the following attributes: - The ability to convert complex, real-world technical workflows into educational content that is both structured and scalable. - A proactive commitment to staying updated on the latest advancements in AI development tools and industry methodologies. - A teaching style characterized by clarity, high energy, and the flexibility to thrive in live, interactive instructional settings. Minimum Qualifications: - Production-level experience building full-stack applications using: - JavaScript and TypeScript - Backend frameworks such as Node.js/Express or Next.js - Frontend frameworks such as React - Proven experience developing technical training content or curricula, including hands-on labs and project-based learning - Ability to break down complex technical concepts into clear, structured, and engaging instructional materials - Active, real-world use of AI coding assistants, such as GitHub Copilot or Cursor, within IDEs like VS Code - Experience incorporating AI tools into day-to-day development workflows (e.g., code generation, debugging, refactoring) - Working knowledge of prompt engineering techniques, including: - Chain-of-thought prompting - Few-shot prompting - Contract-first prompting - Hands-on experience configuring and working with MCP server connections in VS Code - Strong communication and facilitation skills Preferred Qualifications - Familiarity with Figma, including Dev Mode and component inspection - Experience delivering training to professional, enterprise, or government audiences - Background in instructional design, adult learning principles, or curriculum development frameworks


