Job Closed

This listing is no longer active.

Mindrift logo
Mindrift

Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid. Project time expectations: Tasks are estimated to require around 10–20 hours per week during active phases, based on project requirements; This is an estimate, not a guaranteed workload, and applies only while the project is active. Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.

Evaluation Scenario Writer - AI Agent Testing Specialist

Location

New York

Posted

108 days ago

Salary

0

Seniority

Mid Level

English

Job Description

Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation isproject-based, not permanent employment. What this opportunity involves You’ll create challenging coding test cases that push AI coding systems to their limits: - Review and refine realistic coding tasks based on provided production codebases with realistic scope, requirements and information sources  - Write comprehensive functional tests that validate actual end-to-end behavior and edge-cases, not just superficial checks - Craft “fair but hard” challenges where the AI has all the context it needs, but has to work for it (information scattered across files and external sources, complex reasoning required) - Analyze AI failures to understand what the model struggles with vs. what it masters - Iterate based on feedback from expert QA reviewers who score your work on 7 quality criteria What we look for This opportunity is a good fit for experienced developers, software engineers, and/or test automation specialists open to part-time, non-permanent projects. Ideally, contributors will have:  - Degree in Computer Science, Software Engineering or related fields - 5+ years in software development, primarily Python (pytest, async/await, subprocess, file operations) - Background in Full-Stack development, with an equal focus on building React-based interfaces and robust Back-end systems - Experience writing tests (functional, integration – not just running them) - Docker containers (running evaluations locally in containers) - CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results) - English proficiency - B2 How it works Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid Effort estimate  Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted. Payment - Paid contributions, with rates up to $80/hour* - Fixed project rate or individual rates, depending on the project - Some projects include incentive payments *Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.

Related Categories

Related Job Pages

More Research Scientist Jobs

Perle Systems logo

Research Scientist – Frontier AI Evaluation

Perle Systems

Leading the Way in highly reliable Device Networking, Media Conversion, and IoT connectivity Hardware.

Research Scientist109 days ago
Full TimeRemoteTeam 51-200Since 1976H1B No Sponsor

• Evaluate advanced scientific reasoning produced by state-of-the-art AI systems • Assess multi-step derivations, proofs, theoretical arguments, and domain-specific outputs • Identify subtle conceptual flaws or logical inconsistencies • Provide research-grade corrective feedback

Egypt
Reality Defender (YC W22) logo

Applied Scientist, Level 2

Reality Defender (YC W22)

Enterprise-Grade Deepfake Detection Platform

Research Scientist110 days ago
OtherRemoteTeam 11-50H1B No Sponsor

• Investigate new feature extraction and data augmentation techniques for generative image/video detection. • Collaborate with scientists and engineers across the organization. • Perform research into deepfake image/video detection. • Aid in integrating insights gained from research into RD products. • Write up results of research for internal reports and submission to academic journals/workshops.

New York
$110.0K - $160K / year
Job Closed
Full TimeRemoteTeam 201-500Since 1979H1B No Sponsor

• Manage PV non‑conformances and CAPAs, including conducting investigations (using recognised techniques), documenting outcomes, and ensuring timely closure. • Support the QMS with metrics, trend analysis, and risk indicators. • Perform metric calculations using Power BI dashboards, validating outputs through agreed filters and late‑case reconciliation, and prepare presentations where required. • Serve as QA SME for GVP, GCP, GMP as they relate to safety data. • Collaborate cross‑functionally to ensure high‑quality, compliant documentation and processes, while effectively managing workload and coordinating key deliverables using strong project‑management skills. • Support trending analyses with third‑party vendors and internal stakeholders. • Escalate PV operations issues to appropriate PV management. • Contribute to global PV operations process improvements to reduce repeat issues. • Ensure data integrity across safety systems, documentation repositories, and workflow platforms.

India
Job Closed

Role Description EDGE Engineering and Science is seeking a Senior malacologist to join our established protected species practice. The ideal candidate will have experience in designing and leading aquatic field surveys for mussels, fish, crayfish, and/or other macroinvertebrates. They will manage, direct, and oversee the work of environmental scientists and survey staff as required. Responsibilities include: - Produce technical survey reports in accordance with current scientific writing standards, regulatory agency requirements and/or function as a technical reviewer for aquatic ecology reports prepared by other staff. - Manage all aspects related to aquatic ecology for small to medium sized projects, including personnel resourcing, field efforts, document production, and client and agency coordination. - Field work is required approximately 40-60% of the time and is conducted outdoors in heat/cold, wet/humid, and dry/arid conditions. - Contribute to aquatic ecology sections of complex, multidisciplinary environmental documentation efforts, including Biological Assessments, Environmental Impact Statements, and Management Plans. - Participate in management, maintenance, and calibration of aquatic field equipment (e.g., boats, SCUBA, WQ equipment, etc.). - Actively engage in marketing activities, including client communication and preparation of technical proposals containing scope, timelines, and cost estimates. - Lead and manage various state, federal, and local regulatory permit application processes, such as Section 404/401, Endangered Species Act Section 7 & 10 Consultation, NPDES, etc. - Mentor staff to foster growth and development of junior staff. - Lead preparation of scopes of work, fee proposals, and responses to RFPs, including teaming arrangements and attending and presenting at project/contract interviews. - Participate in and assist Practice Leads with strategic planning and Business Development efforts, including leading initiatives. Qualifications - Bachelor’s degree in aquatic ecology or related field, plus 5+ years of experience with essential functions and responsibilities as outlined above. - Experience preparing biological consultation documents for ESA Section 7 and/or 10, including Biological Assessments and Biological Evaluations, and obtaining concurrence from USFWS and state resource agencies. - Current state and federal (Section 10(a)(1)) scientific collection permits for protected freshwater mussels. - Proficient in freshwater mussel identifications, survey techniques, and standardized protocols with abilities to obtain state-specific scientific collection permits. - Open water dive certification – minimum (i.e., PADI, SSI, or ADCI). - Proficiency in Microsoft Office Word, Excel, and PowerPoint and familiarity with AutoCAD and/or Microstation, ArcMap, and/or other GIS programs. - Field work will also require snorkeling, SCUBA, surveying in murky waters, and driving watercraft. - Some lifting (up to 50 lbs) may be required as needed. Requirements - Exceptional interpersonal skills and outstanding oral and written communication skills allowing positive internal, client, agency and contractor relationships. - Excellent technical writing skills for environmental reports. - Outstanding problem-solving abilities. - Demonstrated passion for the business of environmental consulting, technical excellence and quality, leadership potential, a commitment to lifelong learning and growth, and the desire to build a long and rewarding career with EDGE. Benefits - Comprehensive employee benefits including medical, dental, vision, life and disability insurance. - Employer matched 401(k) plan. - Annual performance bonus program. - Student loan repayment assistance after 12 months of employment for employees who graduated within 6 years of start date. - Competitive maternity leave. - Excellent compensation packages commensurate with experience.

United States
Job Closed