Applied Research Scientist, LLM Evaluation – Post-Training at Innodata

View details: Evaluation Scenario Writer - AI Agent Testing Specialist

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation isproject-based, not permanent employment. What this opportunity involves You’ll create challenging coding test cases that push AI coding systems to their limits: - Review and refine realistic coding tasks based on provided production codebases with realistic scope, requirements and information sources - Write comprehensive functional tests that validate actual end-to-end behavior and edge-cases, not just superficial checks - Craft “fair but hard” challenges where the AI has all the context it needs, but has to work for it (information scattered across files and external sources, complex reasoning required) - Analyze AI failures to understand what the model struggles with vs. what it masters - Iterate based on feedback from expert QA reviewers who score your work on 7 quality criteria What we look for This opportunity is a good fit for experienced developers, software engineers, and/or test automation specialists open to part-time, non-permanent projects. Ideally, contributors will have: - Degree in Computer Science, Software Engineering or related fields - 5+ years in software development, primarily Python (pytest, async/await, subprocess, file operations) - Background in Full-Stack development, with an equal focus on building React-based interfaces and robust Back-end systems - Experience writing tests (functional, integration – not just running them) - Docker containers (running evaluations locally in containers) - CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results) - English proficiency - B2 How it works Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid Effort estimate Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted. Payment - Paid contributions, with rates up to $80/hour* - Fixed project rate or individual rates, depending on the project - Some projects include incentive payments *Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.

Iowa

Job Closed

Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift

Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid. Project time expectations: Tasks are estimated to require around 10–20 hours per week during active phases, based on project requirements; This is an estimate, not a guaranteed workload, and applies only while the project is active. Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.

Research Scientist105 days ago

View details: Evaluation Scenario Writer - AI Agent Testing Specialist

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation isproject-based, not permanent employment. What this opportunity involves You’ll create challenging coding test cases that push AI coding systems to their limits: - Review and refine realistic coding tasks based on provided production codebases with realistic scope, requirements and information sources - Write comprehensive functional tests that validate actual end-to-end behavior and edge-cases, not just superficial checks - Craft “fair but hard” challenges where the AI has all the context it needs, but has to work for it (information scattered across files and external sources, complex reasoning required) - Analyze AI failures to understand what the model struggles with vs. what it masters - Iterate based on feedback from expert QA reviewers who score your work on 7 quality criteria What we look for This opportunity is a good fit for experienced developers, software engineers, and/or test automation specialists open to part-time, non-permanent projects. Ideally, contributors will have: - Degree in Computer Science, Software Engineering or related fields - 5+ years in software development, primarily Python (pytest, async/await, subprocess, file operations) - Background in Full-Stack development, with an equal focus on building React-based interfaces and robust Back-end systems - Experience writing tests (functional, integration – not just running them) - Docker containers (running evaluations locally in containers) - CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results) - English proficiency - B2 How it works Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid Effort estimate Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted. Payment - Paid contributions, with rates up to $80/hour* - Fixed project rate or individual rates, depending on the project - Some projects include incentive payments *Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.

Florida

Job Closed

Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift

Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid. Project time expectations: Tasks are estimated to require around 10–20 hours per week during active phases, based on project requirements; This is an estimate, not a guaranteed workload, and applies only while the project is active. Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.

Research Scientist105 days ago

View details: Evaluation Scenario Writer - AI Agent Testing Specialist

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation isproject-based, not permanent employment. What this opportunity involves You’ll create challenging coding test cases that push AI coding systems to their limits: - Review and refine realistic coding tasks based on provided production codebases with realistic scope, requirements and information sources - Write comprehensive functional tests that validate actual end-to-end behavior and edge-cases, not just superficial checks - Craft “fair but hard” challenges where the AI has all the context it needs, but has to work for it (information scattered across files and external sources, complex reasoning required) - Analyze AI failures to understand what the model struggles with vs. what it masters - Iterate based on feedback from expert QA reviewers who score your work on 7 quality criteria What we look for This opportunity is a good fit for experienced developers, software engineers, and/or test automation specialists open to part-time, non-permanent projects. Ideally, contributors will have: - Degree in Computer Science, Software Engineering or related fields - 5+ years in software development, primarily Python (pytest, async/await, subprocess, file operations) - Background in Full-Stack development, with an equal focus on building React-based interfaces and robust Back-end systems - Experience writing tests (functional, integration – not just running them) - Docker containers (running evaluations locally in containers) - CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results) - English proficiency - B2 How it works Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid Effort estimate Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted. Payment - Paid contributions, with rates up to $80/hour* - Fixed project rate or individual rates, depending on the project - Some projects include incentive payments *Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.

Colorado

Job Closed

Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift

Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid. Project time expectations: Tasks are estimated to require around 10–20 hours per week during active phases, based on project requirements; This is an estimate, not a guaranteed workload, and applies only while the project is active. Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.

Research Scientist105 days ago

View details: Evaluation Scenario Writer - AI Agent Testing Specialist

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation isproject-based, not permanent employment. What this opportunity involves You’ll create challenging coding test cases that push AI coding systems to their limits: - Review and refine realistic coding tasks based on provided production codebases with realistic scope, requirements and information sources - Write comprehensive functional tests that validate actual end-to-end behavior and edge-cases, not just superficial checks - Craft “fair but hard” challenges where the AI has all the context it needs, but has to work for it (information scattered across files and external sources, complex reasoning required) - Analyze AI failures to understand what the model struggles with vs. what it masters - Iterate based on feedback from expert QA reviewers who score your work on 7 quality criteria What we look for This opportunity is a good fit for experienced developers, software engineers, and/or test automation specialists open to part-time, non-permanent projects. Ideally, contributors will have: - Degree in Computer Science, Software Engineering or related fields - 5+ years in software development, primarily Python (pytest, async/await, subprocess, file operations) - Background in Full-Stack development, with an equal focus on building React-based interfaces and robust Back-end systems - Experience writing tests (functional, integration – not just running them) - Docker containers (running evaluations locally in containers) - CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results) - English proficiency - B2 How it works Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid Effort estimate Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted. Payment - Paid contributions, with rates up to $80/hour* - Fixed project rate or individual rates, depending on the project - Some projects include incentive payments *Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.

Alabama