Job Closed

This listing is no longer active.

you.com

You.com, founded in 2020 by AI experts Richard Socher and Bryan McCann, is a rapidly growing AI-powered productivity platform headquartered in Palo Alto, Califo

Senior AI Scientist

Location

California

Posted

131 days ago

Salary

$160K - $200K / year

Seniority

Senior

Bachelor Degree1 yr expEnglishBootstrapPython

Job Description

Senior AI Scientist

you.com

• Define and own what “good” means for search-augmented and agentic AI systems by designing evaluation frameworks that measure real-world quality, reliability, and user-relevant behavior beyond standard benchmarks. • Invent and validate novel evaluation methodologies for non-deterministic systems (LLMs, agents, RAG), including behavioral evals, long-tail and adversarial test sets, and task-specific metrics. • Develop rigorous statistical frameworks for model comparison, regression detection, and uncertainty estimation, ensuring evaluation results are defensible and decision-ready. • Build and maintain scalable evaluation systems—datasets, gold standards, eval harnesses, scoring pipelines, and analysis tooling—that can be reused across products and customers. • Lead customer-facing evaluation research, working directly with enterprise customers to translate domain-specific quality requirements into credible, actionable evals that support product decisions and sales outcomes. • Drive competitive evaluations and internal quality reviews, surfacing meaningful performance differences, trade-offs, and failure modes to inform product strategy and prioritization. • Partner with engineering and product teams to integrate evals into development loops, release gating, and ongoing quality monitoring. • Mentor and set standards for evaluation practice, reviewing eval designs, guiding other scientists, and shaping the long-term evals roadmap as systems become more agentic and complex. • End-to-End Project Leadership: Lead the development of new AI-driven projects, encompassing ideation, prototyping, research, infrastructure design, scalability, monitoring, and evaluation. • Rapid Iteration: Adapt quickly to user feedback and evolving requirements, ensuring continuous improvement in a fast-paced environment.

Job Requirements

  • Strong grounding in applied ML and statistics, with experience evaluating non-deterministic AI systems (LLMs, agents, RAG, search).
  • Deep experience with AI evaluation, including metric design, gold dataset creation, head-to-head comparisons, slicing, and error analysis.
  • Statistical rigor in model comparison, using methods such as paired tests, bootstrap confidence intervals, and robustness analyses.
  • Proficiency in Python for evaluation and analysis, including building eval harnesses, data pipelines, scoring logic, and reproducible analysis workflows.
  • Ability to translate vague product or customer goals into measurable evaluation criteria, and to challenge metrics or conclusions that don’t reflect real quality.
  • Comfort engaging directly with customers and cross-functional stakeholders, explaining evaluation results, trade-offs, and limitations clearly.
  • Strong written and verbal communication, including documenting methodologies and contributing to external publications or talks.

Benefits

  • Hubs in San Francisco and New York City offering regular in-person gatherings and co-working sessions
  • Flexible PTO with U.S. holidays observed and a week shutdown in December to rest and recharge*
  • A competitive health insurance plan covers 100% of the policyholder and 75% for dependents*
  • 12 weeks of paid parental leave in the US*
  • 401k program, 3% match - vested immediately!*
  • $500 work-from-home stipend to be used up to a year of your start date*
  • $1,200 per year Health & Wellness Allowance to support your personal goals*
  • The chance to collaborate with a team at the forefront of AI research

Related Job Pages

More AI Research Scientist Jobs

Stability AI logo

Multimodal Generative AI Researcher

Stability AI

We are building the foundation to activate humanity’s potential.

Full TimeRemoteTeam 51-200H1B No Sponsor

• Design and fine-tune large-scale VLMs / LLMs — and hybrid architectures — for tasks such as visual reasoning, retrieval, 3D understanding, and embodied interaction. • Build robust, efficient training and evaluation pipelines (data curation, distributed training, mixed precision, scalable fine-tuning). • Conduct in-depth analysis of model performance: ablations, bias / robustness checks, and generalisation studies. • Collaborate across research, engineering, and 3D / graphics teams to bring models from prototype to production. • Publish impactful research and help establish best practices for multimodal model adaptation.

United Kingdom
OtherRemoteTeam 1,001-5,000Since 1995H1B Sponsor

• Rapidly prototype AI-driven use cases using Java-based services, frontend components, and integration with AI/ML tools. • Build POCs leveraging agentic AI frameworks (LangGraph, AutoGen, CrewAI) and test novel interactions with LLMs and external APIs. • Work with SQL/NoSQL databases to model knowledge graphs, embeddings, and AI-generated outputs at scale. • Develop intuitive UIs using React or Angular to test agent-based user workflows. • Partner with architects and ML engineers to evaluate AI frameworks and identify production candidates. • Translate architectural patterns into real-world test cases that help validate platform assumptions.

North Carolina
Job Closed
Study.com logo

Senior User Researcher – AI Research Operations

Study.com

Study.com, previously known as Education Portal, was founded by Cal Poly State University classmates Adrian Ridner and Ben Wilson. The duo formed the company to

• Lead and scale high-impact research across product development, brand strategy, and marketing. Independently conduct complex generative and evaluative studies that uncover unmet needs, inform product vision, and de-risk decisions at scale. • Implement advanced AI-native practices and embed AI into research workflows to ensure speed, quality, and impact. Define what it means to work AI-natively across research internally and for our customers. Stay up to date on the latest AI-based research techniques and tools. • Connect the dots, patterns and connections across teams. Ensure insights are shared across teams and inform decisions. Partner with senior leadership to identify and evangelize the highest-impact opportunities for our customers. • Be the in-house expert across all core research methods, including usability testing, ethnographic research, surveys, segmentation, message testing, brand tracking, and concept validation. • Coach designers, marketers, product managers and other employees on conducting research, reviewing their projects and adapting the AI first workflow based on needs • Lead and moderate company-wide Voice of the Customer (VoC) sessions, interviewing real customers in live forums to bring their perspectives directly to cross-functional teams • Manage external research contractors and vendors, including scoping, quality control, and budget alignment • Build and scale a research enablement model, coaching designers, marketers, and product managers to run self-serve research with confidence and consistency. Develop tools, templates, and guidance that scale research capabilities and impact. • Collaborate with cross-functional leadership to build and prioritize a unified, impact-driven research roadmap • Be a skilled communicator to peers to executives. Articulate effective customer stories to drive change and inform strategy of product initiatives to business strategy.

Latin America
Job Closed
Full TimeRemoteTeam 1,001-5,000H1B No Sponsor

• Explore and articulate how AI agents should use search, including new interaction patterns, query formulations, and evaluation criteria beyond traditional keyword relevance • Conduct deep applied research on search quality, focusing on moving from “textual match” toward answering the underlying question or intent • Train ranking and reranking models that optimize for answering intent and efficient resolution of agent queries, rather than purely textual similarity. • Design, prototype, and evaluate ranking and reranking approaches, including neural and LLM-based methods • Work on semantic retrieval and embedding-based ANN systems, including model selection, adaptation, and tradeoffs between recall, precision, and latency • Define and evolve quality metrics and evaluation methodologies appropriate for agentic search and question answering • Collaborate closely with backend and ML engineers to turn research ideas into production-ready components • Partner with product managers to connect research insights with product strategy and roadmap decisions • Serve as a technical thought leader for applied AI topics in the team, and help set direction for future research-oriented hires

Netherlands
Job Closed