Job Closed

This listing is no longer active.

24-MAG logo
24-MAG

This opportunity is available through a leading AI-driven work platform.

PhD Rater

Location

United States

Posted

101 days ago

Salary

0

Job Description

PhD Rater

24-MAG

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description This role involves supporting a frontier-model evaluation initiative focused on advanced STEM reasoning and agentic workflows. - Design challenging real-world STEM problems for model evaluation - Implement benchmark tasks inside agentic development environments using Python - Create reproducible tasks with executable tests and clearly defined specifications - Analyse model and agent outputs to identify reasoning gaps and failure modes - Evaluate how AI systems perform on complex data science, machine learning, finance, and coding tasks - Document benchmark tasks, environments, and evaluation outcomes Qualifications - Active or recently completed PhD from a top-tier U.S.-based university - Deep expertise in data science, machine learning, finance, and/or Python-based programming - Strong research background in advanced STEM domains - Experience designing complex technical problems or research benchmarks - Ability to analyse model reasoning traces and diagnose deeper system behaviour issues - Strong analytical and research documentation skills Requirements - PhD in Computer Science, Data Science, Machine Learning, Finance, or related STEM fields Nice to Have - Experience working with agentic frameworks or LLM tooling ecosystems - Familiarity with frameworks such as LangChain, AutoGen, MetaGPT, CrewAI, LlamaIndex, BabyAGI, or related systems - Contributions to open-source software or research projects - Experience analysing complex model behaviour or agent workflows Benefits - Independent contractor role - Fully remote with flexible scheduling - Part-time research engagement with expected availability of 30+ hours per week - Competitive rates between $50–$100/hour depending on expertise - Weekly payments via Stripe or Wise - Projects may extend or adjust depending on scope and performance - No access to confidential or proprietary information from employers or institutions Company Description This opportunity is available through a leading AI-driven work platform.

Job Requirements

  • Active or recently completed PhD from a top-tier U.S.-based university
  • Deep expertise in data science, machine learning, finance, and/or Python-based programming
  • Strong research background in advanced STEM domains
  • Experience designing complex technical problems or research benchmarks
  • Ability to analyse model reasoning traces and diagnose deeper system behaviour issues
  • Strong analytical and research documentation skills
  • PhD in Computer Science, Data Science, Machine Learning, Finance, or related STEM fields
  • Nice to Have
  • Experience working with agentic frameworks or LLM tooling ecosystems
  • Familiarity with frameworks such as LangChain, AutoGen, MetaGPT, CrewAI, LlamaIndex, BabyAGI, or related systems
  • Contributions to open-source software or research projects
  • Experience analysing complex model behaviour or agent workflows

Benefits

  • Independent contractor role
  • Fully remote with flexible scheduling
  • Part-time research engagement with expected availability of 30+ hours per week
  • Competitive rates between $50–$100/hour depending on expertise
  • Weekly payments via Stripe or Wise
  • Projects may extend or adjust depending on scope and performance
  • No access to confidential or proprietary information from employers or institutions

Related Categories

Related Job Pages

More Research Engineer Jobs

Material Security logo

Senior Threat Research Engineer

Material Security

Material protects accounts even after they’re compromised or harmful messages get through.

Research Engineer102 days ago
OtherRemoteTeam 11-50Since 2017H1B No Sponsor

As a Senior Threat Research Engineer at Material Security, you will be contributing directly to the product by improving the capability to detect email-based threats. Your mission is to leverage your analytical skills to identify and track threats and adversaries that have been able to sneak past other email security systems and to help mature our internal detection and response program. You will also improve our process of creating and maintaining our detection system. Your day-to-day will involve a mix of exploration, analysis, triage, and building directly alongside world-class engineers and security experts. Responsibilities - Improve the processes, tooling, and methodologies used to detect malicious or otherwise dangerous emails. - Author detection rules that allow customers to detect email-based threats where other tools have failed. - Research attacker campaigns to identify ways to fingerprint attacker activity, infrastructure, and tactics. - Identify signals and features that are useful for training message classification systems. - Ensure a high standard of privacy for our customers’ data. - Work with our Security Architects and customers to drive down risk by improving customer email security posture and leveraging their data to enable them to make better informed decisions around risk. What We're Looking For - Technical Ability: Solid data analysis skills including writing SQL queries, experience writing detections, responding to security incidents, and the ability to parse through large datasets. - Security Domain Expertise: A successful candidate should be intimately familiar with modern adversary behavior and techniques and understand how to leverage data sources to identify them. - Collaboration & Communication: We take pride in being a transparent security team that works hard to find ways to say "yes" and enables Material to grow quickly and securely. As a Security Engineer, you'll be working closely with software engineers, data scientists, and product managers. This requires a collaborative spirit and great communication skills. - Ownership: We love security engineers who deeply care about the impact of their work and find satisfaction in a job well done. The Security Team at Material is passionate about building things in a first-class manner and avoiding shortcuts that accrue technical debt and increase toil across the team. We expect candidates to understand how to build and implement tooling to be robust and resilient. - Breadth & Growth: Being a great security engineer means continually learning new and more advanced techniques in your field, but also gaining a breadth of skills to bridge the gaps in getting things done. Whether it is improving your software development abilities, becoming an expert in a specific security subdomain, product management, or customer care, you have the drive to learn more and more. --- Material Security is a remote-first workplace with an office in San Francisco, California. By clicking "Apply for this Job", you acknowledge that you have read the California Candidate Privacy Notice Regarding Use of Personal Information and hereby agree to its terms. Compensation at Material Security is determined by a range of factors, including but not limited to the individual’s particular combination of knowledge, skills, competencies, and experience. The projected compensation range for this position is $190,000-235,000. Equal Opportunity Employer Statement Material Security is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of race, color, religion, creed, national origin, ancestry, sex, gender, gender identity or expression, sexual orientation, age, marital status, veteran status, disability, genetic information, or any other legally protected status. All employment decisions are based on qualifications, merit, and business needs.

United States
$190K - $235K / year

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description The Director, Evaluation + Research is responsible for leading the organization’s research, data collection, and evaluation efforts that support our mission of uplifting Native arts, cultures, and communities. This role will play a critical part in assessing program effectiveness, tracking impact, and ensuring that data-driven insights inform strategic decision-making. - Bring experience in Indigenous-centered evaluation methodologies and a commitment to ethical and culturally responsive research practices. - Collaborate closely with program teams, development staff, and external stakeholders to translate complex data into meaningful insights. Qualifications - Bachelor’s degree (Master’s preferred) in Statistical Analysis, Management Information Systems, Evaluation, Arts Administration, Nonprofit Management or related field. - 5+ years of experience in program evaluation, impact measurement, or research, preferably in an arts, cultural, or community engagement non-profit and/or Indigenous organization setting. - Broad knowledge of tribal communities and cultures across North America. - An understanding and awareness of the Native arts and culture field. - Strong analytical skills, including proficiency in statistical analysis and data visualization tools. - Proficient in Google Suite, Adobe Suite; Excel power user, Outcome Tracker, Access, SalesForce, ESRI, Tableau, Submittable, Kindful - experience with NetSuite software a plus. - Ability to work collaboratively across departments and with external partners. - Strong writing and communication skills, with experience preparing reports for diverse audiences. - Commitment to ethical research practices and Indigenous data sovereignty principles. Requirements - Develop and oversee the organization’s evaluation and research strategies, ensuring alignment with mission-driven goals. - Use Indigenous research methodologies and community-based participatory research to inform program design and strategic planning. - Gather qualitative and quantitative data on program effectiveness, community impact, and stakeholder feedback. - Design and implement evaluation frameworks to measure program success. - Provide reports with findings and recommendations for program improvement. - Partner with Advancement & Communications to identify stories and data/analysis for use in content for communications and fundraising. - Generate impact reports for funders, partners, and internal stakeholders. - Facilitate knowledge-sharing sessions to improve program effectiveness. - Maintain and update the Evaluation Operations manual ongoing and annually. - Engage with community members to ensure culturally responsive evaluation practices. Benefits - Health, Dental, and Vision are provided to full-time employees effective the 1st of the month after hire. - Sick Leave: 48 hours - available on the date of hire and can be used as earned. - 11 Paid Holidays with 1 Personal Culturally Significant Day. - Annual Leave: 70 hours for the first year of employment. - 401(k) with company match.

United States
Job Closed
Airbnb logo

Staff AI Innovation Engineer

Airbnb

Airbnb is a community based on connection and belonging.

Research Engineer104 days ago
OtherRemoteTeam 5,001-10,000Since 2007H1B Sponsor

• Architect the next generation of Airbnb’s internal operating system • Design and build AI-powered solutions for Employee Experience pain points • Translate HR and workforce needs into scalable products • Rapidly build prototypes and working solutions • Partner closely with EX, BizTech, Legal, Privacy, and InfoSec

United States
$180K - $225K / year
Job Closed
LILT logo

Research Engineer – Evaluations, Applied AI

LILT

The complete AI solution for enterprise translation and content creation.

Research Engineer104 days ago
ContractRemoteTeam 201-500H1B Sponsor

• Eval Architecture & Benchmarking: Design and implement automated and human-in-the-loop evaluation frameworks to measure model performance across multiple modalities (text, code, image, etc.). • Calibration & Peer Review: Act as the Gold Standard reviewer for other engineers. You will calibrate their data generation and evaluation contributions, providing technical feedback to ensure scientific consistency and high-fidelity output. • Frontier Sample Generation: Write and refine complex prompts and golden response pairs for frontier-model training, specifically focusing on edge cases in reasoning and multilingual contexts. • Quality Control (End-to-End): Develop the logic for multi-modal QC checks, ensuring that high-volume data samples are correct across diverse domains and languages. • Technical Mentorship: Bring new knowledge and best practices to our established delivery and forward-deployed engineering teams on model evaluations.

Argentina
Job Closed