Analyze research papers at superhuman speed
Evaluation Engineer
Location
California
Posted
136 days ago
Salary
$140K - $200K / year
Seniority
Senior
Job Description
Evaluation Engineer
Elicit
• You'll build a comprehensive system that runs fast, is easy to use, and supports quickly building new evals: • Speed: You’ll build a lightning-fast basic evals infrastructure that schedules tasks to introduce practically no latency; and then you’ll figure out clever ways to solve the fundamental sources of latency (building a version of Elicit, running it on a query, and evaluating it using LMs) • Interfaces: ML engineers need evals to kick off automatically on relevant commits, with results they can see at a glance and drill into. Product managers need dashboards showing performance over time and what's going wrong in production. • Architecture: Your code must be well-architected so other team members and ML engineers can understand and build on it. An engineer starting on a new feature should be able to quickly add examples and run an eval. • We need to evaluate how well Elicit actually helps with decision-making in pharma, not just measure what's easy to measure. This requires encoding real knowledge about how pharma customers make decisions (for example, choosing appropriate gold standards). • You'll provide appropriate statistical tests and confidence intervals so we can trust our results. • In a typical month, expect to spend: • 60% working on the core eval platform • 15% working closely with the evals team to build and improve specific evals (e.g., an eval of our paper search within our systematic review flow) • 10% mentoring our evals engineering intern • The rest on learning how people interact with the eval system so you can make it work better for them, and understanding what our users want from Elicit so evals measure what matters
Job Requirements
- At least 3 years of experience as a professional software engineer, with demonstrated experience building complex backend systems (e.g., backend for a complex website, data pipelines, etc.)
- Aptitude and interest in evaluating how Elicit helps with pharma decision-making. There's no particular experience you must have, but we'll evaluate your aptitude.
- Knowledge of statistics (for e.g. calculating power and credence intervals for evals)
- Experience with advanced Python (asyncio/trio and parallel processing strategies)
- Front-end experience and strong UX sensibility (you'll be building dashboards). TypeScript experience is a plus.
- Experience building developer tools (ML engineers are one of your most important clients)
- Previous experience as a data engineer or working on AI infrastructure
- Knowledge of pharma/biomed
- Experience evaluating ML systems
- Experience building language-model-based systems (helps with understanding Elicit and how to evaluate it)
Benefits
- Flexible work environment: work from our office in Oakland or remotely with time zone overlap (between GMT and GMT-8), as long as you can travel for in-person retreats and coworking events
- Fully covered health, dental, vision, and life insurance for you, generous coverage for the rest of your family
- Flexible vacation policy, with a minimum recommendation of 20 days/year + company holidays
- 401K with a 6% employer match
- A new Mac + $1,000 budget to set up your workstation or home office in your first year, then $500 every year thereafter
- $1,000 quarterly AI Experimentation & Learning budget, so you can freely experiment with new AI tools, take courses, purchase educational resources, or attend AI-focused conferences and events
- A team administrative assistant who can help you with personal and work tasks
Related Guides
Related Categories
Related Job Pages
More Engineer Jobs
Senior ICT Engineer – Postgres DB
Deutsche Telekom IT Solutions SlovakiaGrowing bigger, getting better. An IT company which creates values for its customers and helps its region to improve.
• Administration, operation, and maintenance of Postgres environments • Design, implementation, and optimization of database architectures • Planning and executing installations, upgrades, migrations, and patching of Postgres DB • Monitoring performance and tuning Postgres databases • Implementing backup & recovery strategies • Troubleshooting database issues, identifying root causes, and implementing corrective actions • Ensuring availability, reliability, and high performance of database systems • Supporting high availability and disaster recovery (HA/DR) configurations • Creating and maintaining system and process documentation • Collaboration with development, infrastructure, and operations teams • Ensuring adherence to security policies, compliance requirements, and audit standards • Contributing to automation initiatives and continuous service improvement
ICT Engineer – MariaDB Expert
Deutsche Telekom IT Solutions SlovakiaGrowing bigger, getting better. An IT company which creates values for its customers and helps its region to improve.
• Administration, operation, and maintenance of MariaDB database environments • Design, implementation, and optimization of MariaDB architectures • Planning and executing installations, upgrades, migrations, and patching of MariaDB instances • Monitoring database performance and tuning queries, indexes, and configurations • Implementing and maintaining backup & recovery strategies • Troubleshooting database issues and ensuring high availability and optimal performance of MariaDB systems • Collaborating closely with development, infrastructure, and operations teams
Lead Human Factors Engineer
RESPECAchieving the impossible. Transforming our clients' visions into reality.
• Ensure OCI’s digital solutions are designed for the 'human in the loop' • Conduct performance assessments using simulation • Establish usability metrics and recommend design enhancements for clinical tools • Analyze policy and aggregate process performance to identify latent issues • Promote HCD methods and practices across the VA
Data Conversion Engineer
Lumin DigitalLumin Digital is a FinTech company whose innovative digital solutions help financial institutions engage their customers and grow. The company has hired in the past for hybrid remo
• Converting online banking data for new clients into the Lumin Digital platform. • Performing select technical configurations of the Lumin Digital banking product. • Partnering closely with clients and internal teams from implementation kickoff through production launch. • Loading, validating, and troubleshooting data received from external vendors. • Ensuring accuracy, scalability, and successful client launches. • Mapping, loading, validating, and testing data received in various file formats. • Meeting with clients to iron out any data related issues. • Collaborating with development, services, product, and support teams in cross-functional initiatives. • Managing and supporting data conversion efforts for client launches, vendor transitions, and mergers or acquisitions. • Identifying, troubleshooting, and resolving data and system issues in a timely manner. • Proactively identifying risks that may impact project timelines and recommending mitigation strategies. • Leading and contributing to continuous improvements in data conversion processes, best practices, and scalability. • Ensuring data accuracy and integrity throughout the conversion lifecycle.



