Anyone AI logo
Anyone AI

We invest in people from Latam to bridge the talent gap in AI.

Human Data Evals Lead

Data ScientistData ScientistFull TimeRemoteLeadTeam 11-50Since 2022H1B No SponsorCompany SiteLinkedIn

Location

Northern America + 1 moreAll locations: Northern America | Latin America (LATAM)

Posted

3 days ago

Salary

0

Seniority

Lead

Job Description

Human Data Evals Lead

Anyone AI

Role Description You will own Anyone AI’s data initiatives and proposals to AI labs, from the data proposal or responding to requests, through pilot delivery. You own how we build proposals and develop the sample packages and benchmarks: frontier-grade packages across reasoning, coding, agents, and tool use, multi-modal and others, produced in collaboration with subject-matter experts, with expert-verified ground truth, multi-model headroom results, and QC that survives buyer-side scrutiny. You are the person who designs the sample that demonstrates our quality, converts pilots into production engagements. On a small team, this is the operational center of the Human Data Division. Responsibilities - Proposals & requests: - Study public benchmarks and eval targets, and turn them into proposals and sample packages that demonstrate capability and win the work. - Respond to lab data requests and pilots. - Sample & benchmark development: - Design and build the sample packages, working with subject-matter experts. - Every package meets the bar of our current sample set: - Expert-verified, exact-match-checkable ground truth and gold reasoning trajectories. - Multi-model evaluation showing real headroom, and proof the task discriminates the model, not just that it's hard. - Rigorous QC structure: calibration layers, severity-weighted rubrics, deterministic verifiers, evidence maps, etc. - Subject-matter experts: - Recruit, brief, calibrate, and review a pool of experts across coding, agentic/tool-use, and STEM/reasoning. - Raise their output to our standard and keep it there; be the arbiter of what "correct" and "frontier-difficulty" mean. - Lab relationships: - Be a direct point of contact for lab partners on Slack and calls, with support from the CEO and the wider team. - Keep senior lab contacts informed, surface what they actually need, and pull in the CEO and subject-matter experts when the conversation calls for it. - Pilot delivery: - Own pilots end to end: scoping, SOW, staffing, production, QC, and delivery. - Nothing ships before it's lab-ready, and nothing comes back rejected as "not frontier-level" without us already knowing why. Qualifications - 5+ years in technical delivery, quality, or program management, with recent experience in AI/ML data, model evaluation, or benchmarking. - Hands-on experience delivering data or evaluation work to AI labs or enterprise ML teams, scoping through delivery. - Working fluency with how frontier models are evaluated: benchmarks, rubrics, pass rates, headroom, and what makes a task discriminate a model. - Proven people/vendor leadership, you've recruited, calibrated, and held a team or expert pool to a quality standard. - Fluent English. Spanish is a nice to have. Experience - Originated data or benchmark proposals for AI labs, translated eval targets into sample tasks that demonstrate capability, and owned the engagement through delivery. - Deep evaluation and quality expertise: LLM benchmarking, with real strength in code-model evaluation. - Built QC processes and artifact standards that met enterprise or lab requirements, and set a quality bar a team of experts was held to. - Thrives in ambiguous, fast-moving environments where the rules are still being written, and delivers under pressure.

Related Categories

Related Job Pages

More Data Scientist Jobs

Autodesk logo

Senior Principal Data Scientist, AEC

Autodesk

How the world gets designed and made. #MakeAnything

Data Scientist3 days ago
Full TimeRemoteTeam 10,001+Since 1982H1B No Sponsor

• Design and implement predictive models to analyze and anticipate user behavior • Define and establish data instrumentation, telemetry, and observability standards • Develop frameworks and prototypes for analyzing and optimizing user experiences driven by AI agents • Collaborate with product and engineering teams • Create analytical models and reporting frameworks • Guide experimentation strategies, including A/B testing • Provide technical leadership and recommendations on data architecture and system scalability • Translate ambiguous product questions into structured analytical programs.

California + 2 moreAll locations: California | New York | Washington
$159K - $285.6K / year
Worldwide Clinical Trials logo

Principal Clinical Data Manager

Worldwide Clinical Trials

As a leading full-service global CRO, we work to create solutions that advance new treatments from discovery to reality.

Data Scientist3 days ago
Full TimeRemoteTeam 1,001-5,000H1B Sponsor

• Oversee, lead, manage and provide technical expertise within the assigned complex projects/programs • Provide subject matter expert support, solution management and departmental support for project initiatives and training • Provide fully independent and autonomous leadership of data management services across multiple complex global projects/programs • Liaise with DM Management at regular intervals • Collaborate with internal WCT departments working on the same project • Provide mentorship to other members of the DM department • Participate in sponsor audits, regulatory authority inspections and other third party meetings

North Carolina
$99K - $196K / year
Circle logo

Staff Data Scientist – Payments

Circle

Circle helps businesses and developers harness the power of stablecoins for payments and internet commerce worldwide.

Data Scientist3 days ago
Full TimeRemoteTeam 501-1,000Since 2013H1B Sponsor

• Partner with the Payments team to design and develop foundational datasets, metrics, and analytical frameworks that support product strategy and business decision-making. • Analyze customer behavior, product adoption, and transaction activity to generate actionable insights that improve payment experiences and drive business growth. • Research onchain payments ecosystems, blockchain transaction data, and competitive payment solutions to inform product development and market positioning. • Lead strategic analyses that identify new opportunities for product innovation, operational efficiency, and revenue growth. • Build scalable reporting, dashboarding, and automation solutions using SQL, Python, and business intelligence tools to increase organizational leverage. • Communicate analytical findings through compelling visualizations and data storytelling tailored to technical and non-technical stakeholders, including senior leadership. • Influence product and business strategy by partnering cross-functionally with Product, Engineering, Operations, and Business leaders to drive data-informed decision-making.

California
$195K - $257.5K / year
Brillio logo

Principal Data Scientist

Brillio

Turning technological disruptions into the advantages. Let's create something Brillian(t) together!

Data Scientist3 days ago
Full TimeRemoteTeam 1,001-5,000H1B Sponsor

• Design and implement robust statistical models and machine learning algorithms for large-scale data analysis and predictive analytics • Lead end-to-end development of data science projects, including hypothesis testing, regression analysis, classification, and forecasting • Collaborate with cross-functional teams to define business requirements, translate them into analytical solutions, and drive measurable impact • Optimize and automate data pipelines using Python, PySpark, and R, ensuring efficient data processing and feature engineering • Develop, validate, and maintain probabilistic graph models and advanced statistical computing frameworks • Utilize industry-leading ML frameworks such as TensorFlow, PyTorch, and Sci-Kit Learn to build, train, and deploy models • Establish rigorous model evaluation and monitoring processes using tools like Great Expectations and Evidently AI • Mentor and guide junior data scientists, fostering technical excellence and continuous learning within the team

New Jersey
$185K - $190K / year