Job Closed
This listing is no longer active.
Leading the artificial intelligence transformation for insurance carriers.
Senior Data Scientist – LLM Evaluation
Location
New York
Posted
145 days ago
Salary
$200K - $240K / year
Seniority
Senior
Job Description
Senior Data Scientist – LLM Evaluation
EvolutionIQ
• Establish the Gold Standard: Design and implement comprehensive scorecards and benchmarking suites for LLM-based extraction, summarization, and chat interfaces. • Bridge the Gap with SMEs: Act as the technical lead in working with Subject Matter Experts (SMEs) to codify their expertise into evaluation datasets and "ground truth" labels. • Scale Labeling: Design the statistical guardrails to scale both our human and automated labeling efforts. • Quantify Risk: Provide clear, data-driven "Go/No-Go" recommendations for model deployment based on rigorous error analysis and statistical confidence intervals.
Job Requirements
- 5+ years of experience in Data Science with a strong background in traditional statistics (hypothesis testing, experimental design, regression analysis).
- 2+ years of focused experience working with LLMs, specifically in evaluation, benchmarking, and prompt auditing.
- Master’s or PhD in Statistics, Mathematics, or a related quantitative field.
- Proven ability to work with non-technical SMEs to translate their qualitative feedback into quantitative metrics.
- Proficient in Python (Pandas, Scikit-learn, Statsmodels) and SQL.
- Familiarity with LLM evaluation frameworks (e.g., RAGAS, LangSmith, or proprietary scorecard systems) is a major plus.
Benefits
- Medical, dental, vision, short & long-term disability, life insurance and AD&D, and 401k matching.
- Additional family, wellness, and pet benefits.
- Paid time off and sick leave, 100% paid parental leave (16 weeks for primary caregivers and 12 weeks for secondary caregivers).
- We offer a flexible schedule for new parents returning to work.
- Catered lunches, happy hours, pet-friendly spaces, and monthly technology stipend.
- $1,000/year for each employee for professional development, as well opportunities for tuition reimbursement.
- We are open to sponsoring candidates currently in the U.S. who need to transfer their active visa.
Related Guides
Related Categories
Related Job Pages
More Data Scientist Jobs
Senior Manager, Data Science – Payments, Treasury
GustoGusto, formerly known as ZenPayroll, is a privately-held financial services company dedicated to revolutionizing how businesses handle employee benefits. Gusto
• Lead the strategy and application of machine learning and advanced analytics across payments and treasury • Own end-to-end delivery and cross-functional execution • Set technical standards and scalable frameworks for analytical and modeling capabilities • Communicate insights and recommendations clearly to stakeholders
Data Scientist
Prosper MarketplaceProviding affordable financial solutions to consumers across the credit spectrum.
• Build industry-leading machine learning models for managing credit and fraud risks • Leverage multiple complex data sources such as credit bureau reports and customer supplied information at large scale • Collaborate with engineers to deploy your models into a production environment • Propose and execute solutions to various problems within business constraints • Use responsible AI technique following regulatory requirements and lending best practices • Actively monitor the credit risk models in production • Extract the most value out of data to significantly impact our key business metrics • Conduct ad-hoc analysis related to risk management, investor services, operations and corporate development
Lead Data Scientist, ML
May MobilityTransforming cities through autonomous technology to create a safer, greener, more accessible world.
• Work independently with cross functional teams to develop software and system requirements. • Design, implement, and deploy state-of-the-art machine learning models. • Monitor the performance of the ML models and drive continuous improvement. • Lead team code quality activities including design and code reviews. • Communicate complex analytical findings and model performance metrics to both technical and non-technical stakeholders through clear visualizations and presentations. • Provide technical guidance to team members.
• Engaging with sales leads as a “human explicability layer,” up-leveling understanding for non-technical audiences to ensure that complex underwriting results are communicated clearly to drive sales. • Leading complex and dynamic collaborations between the Data Science and Growth teams. • Helping us refine the competitiveness of our value-based care contracts. • Ensuring quality and availability of various data sources. • Leveraging a deep understanding of ML concepts (e.g., using SHAP or feature importance but also less formal quantitative instinct) to consult on the application of payment models. • Designing and engineering analytics tooling to stress-test our underwriting outputs. • Deepening knowledge of healthcare models. • Evaluating vendors and technologies.




