Job Closed
This listing is no longer active.
Research Engineer, Agentic AI Evals
Location
California
Posted
117 days ago
Salary
$0
Seniority
Mid Level
Job Description
Research Engineer, Agentic AI Evals
hud
About HUD HUD (YC W25) is developing agentic evals for Computer Use Agents (CUAs) that browse the web. Our CUA Evals framework is the first comprehensive evaluation tool for CUAs. Our Mission: People don't actually know if AI agents are working. To make AI agents work in the real world, we need detailed evals for a huge range of tasks. We're backed by Y Combinator, and work closely with frontier AI labs to provide agent evaluation infrastructure at scale. About the role We're looking for a research engineer to help build out task configs and environments for evaluation datasets on HUD's CUA evaluation framework . Responsibilities Build out environments for HUD's CUA evaluation datasets, including evals for safety redteaming, general business tasks, long-horizon agentic tasks etc. Deliver custom CUA datasets and evaluation pipelines requested by clients Contribute to improving the HUD evaluation harness, depending on your interests, skills, and current organizational priorities. (Optional, but highly valued!) Experience Technical Skills Proficiency in Python, Docker, and Linux environments React experience for frontend development Production-level software development experience preferred Strong technical aptitude and demonstrated problem-solving ability Strong candidates may have: Startup experience in early-stage technology companies with ability to work independently in fast-paced environments Strong communication skills for remote collaboration across time zones Familiarity with current AI tools and LLM capabilities Understanding of safety and alignment considerations in AI systems Evidence of rapid learning and adaptability in technical environments (e.g. programming competitions) Have hands-on experience with or contributed to LLM evaluation frameworks (EleutherAI, Inspect, or similar) Built custom evaluation pipelines or datasets Worked with agentic or multimodal AI evaluation systems We prioritize technical aptitude and learning potential over years of experience. Motivated candidates are encouraged to apply even if they don't meet all criteria. Representative projects: Creating and solving challenging competitive programming problem-sets Curating large high-quality datasets, especially for research and evaluation of multimodal AI agents Designing complex, functional fullstack applications. Bonus points if they have users / adopters. We prioritise contributions that show quality and quantity , such as building out large, high-quality datasets. Imagine making about ~10 small puzzles in mock web environments a day. Team & Company Details Team Size : ~15 people currently, mostly full-time in-person, but some remote. Our team: Our team includes 4 international Olympiad medallists (IOI, ILO, IPhO), serial AI startup founders, and researchers with publications at ICLR, NeurIPS etc Company stage: We have received $2 million in seed funding, plus very strong demand and revenue growth beyond that. We are scaling profitably and fast to meet demand. Logistics Employment : Fulltime preferred, but willing to consider part-time/internship arrangements for exceptional candidates. Location : Fully remote-friendly. We already have several fulltime, 100% remote hires. But if you’re in the San Francisco Bay Area or Singapore, we do have an office you can work together in. We do prefer applicants who can show up to meetings in Pacific Time (UTC-7:00/8:00) or China/Singapore Time (UTC +8:00). Visa Sponsorship : We provide support for relocation and visas for strong full-time candidates to USA or Singapore. For part-time/contract/internship arrangements, we'll work fully remote (which makes things simpler anyway). Timeline : Applications are rolling. The process should involve 1 initial call, 1 five-hour take-home assignment and 1 paid, weeklong work trial before final offer. Due to high volume, we may not actively respond to every application, but feel free to contact us at recruiting@hud.so or elsewhere if we missed your application!
Related Guides
Related Categories
Related Job Pages
More Research Engineer Jobs
About Revelare Networks Revelare Networks is a small defense contractor headquartered in Maryland, with a geographically distributed team across the United States. We specialize in securing Department of Defense (DoD) communications against sophisticated adversaries. Our projects range from hands-on hardware security to advanced software development, focused on research, development, and innovation in the defense industry. Position Overview We are seeking a passionate and skilled Research Engineer to support our R&D contracts with the DoD. The ideal candidate will have experience in software development, machine learning, and the 5G system architecture, combined with an interest in contributing to cutting-edge defense and security projects. This position offers exposure to multi-disciplinary engineering efforts, working alongside teams of experts and researchers tackling challenging, high-impact problems. Salary range $140-$180K depending on experience Key Responsibilities: - Conduct research and development activities focused on improving cellular communication security. - Modify and deploy 5G mobile core to support new functionality - Collaborate with cross-functional teams to design, implement, and test innovative solutions. - Develop software applications using C/C++, Python, Java, or React. - Create and manage REST APIs for seamless integration of services. - Utilize Docker and Linux environments for application deployment and management.
We help make autonomous technologies more efficient, safer, and accessible. Helm.ai builds AI software for autonomous driving and robotics. Our Deep Teaching™ methodology is uniquely data and capital efficient, allowing us to surpass traditional approaches. Our unsupervised learning software can train neural networks without the need for human annotation or simulation and is hardware-agnostic. We work with some of the world's largest automotive manufacturers and we've raised over $100M from Honda, Goodyear Ventures, Mando, and others to help us scale. Our team is made up of people with a diverse set of experiences in software and academia. We work together towards one common goal: to integrate the software you'll help us build into hundreds of millions of vehicles. You will: You will work collaboratively to improve our models and iterate on novel research directions, sometimes in just days. We're looking for talented engineers who'd enjoy applying their skills to deeply complex and novel AI problems. Here, you will: Apply and extend the Helm proprietary algorithmic toolkit for unsupervised learning and perception problems at scale Carefully execute development and maintenance of tools used for deep learning experiments designed to provide new functionality for customers or address relevant corner cases in the system as a whole Work closely with software and autonomous vehicle engineers to deploy algorithms on internal and customer vehicle platforms You have: A sense of practical optimism: not all experiments are successful, but the ones that are more than make up for it! Comfort operating in a fast-paced environment to deliver customer projects Introspection, thoughtfulness, and detail-orientation Experience working with neural networks, Tensorflow and/or PyTorch Fluency in Python and working knowledge of C/C++ programing A strong interest in unsupervised learning, computer vision, and/or the autonomous vehicle industry Master’s or Ph.D. in a related field and/or 5+ years of experience in a related field The pay range for this position is estimated to fall in the base range of approximately $150,000 and $250,000. Base compensation for this position will vary based on location, qualifications, and relevant experience. The offered base salary may be above or below this range and compensation for the position may include additional compensation in the form of equity or a bonus/commission.
Research Engineer
Oddin.ggMarket-leading esports betting ecosystem: real-time odds, iFrame, risk management, official esports data, and more.
• Our goal is to research and train fast and high-quality SOTA TTS models for realistic and emotional voice generation for entertainment and education applications. • You will be in charge of maintaining the “current best” TTS model we have - assembling the results of the best experiments into one speech production system, and evaluating it in terms of performance and quality. • You will be in immediate collaboration with a our TTS researcher team, and close cooperating with product engineering and platform roles to ensure smooth deployment
Waabi, founded by AI visionary Raquel Urtasun, is the leader in Physical AI. With a world-class team, we're unlocking the next era of autonomous transportation with technology that's powering commercial autonomous trucks and robotaxis. Waabi is backed by and partners with world leaders in AI, automotive, logistics, and deep tech. With offices in Toronto, San Francisco, Dallas, and Pittsburgh, Waabi is growing quickly and looking for diverse, innovative and collaborative candidates who want to impact the world in a positive way. To learn more visit: www.waabi.ai As a Research Engineer in Neural Rendering, you will create the next generation of multi-sensor rendering systems for autonomous driving. You will collaborate with our team of world-renowned scientists and engineers to build innovative, practical, and scalable rendering and content creation solutions for self-driving. We value original, high-impact ideas and rigorous experimental validation. You will… - Be part of a team of multidisciplinary Research Scientists and Engineers working on building a best-in-class multi-sensor simulation stack. - Use cutting edge techniques in reconstruction, neural rendering and generative AI to build large-scale, efficient digital twins from real-world camera, LiDAR, and RADAR data. You will work on shipping next-generation simulation software which mixes traditional real-time rendering with NeRF, 3D Gaussian Splatting, diffusion models, etc. - Collaborate with Waabi’s autonomy and safety teams to improve the realism and diversity of Waabi World. - Have the opportunity to make contributions to high-impact research papers submitted to top conferences or journals (CVPR, ECCV, ICCV, SIGGRAPH, NeurIPS, ICLR, ICRA, RA-L). Qualifications: - The US yearly salary range for this role is: $134,000 - $235,000 USD - The Canada salary range for this role is: $171,000 - $270,000 CAD In addition to competitive perks & benefits, Waabi US Inc. and Waabi Canada Inc.'s yearly salary ranges are determined based on several factors in accordance with the Company’s compensation practices. The salary base range is reflective of the minimum and maximum target for new hire salaries for the position across all US and Canada locations. Note: The Company provides additional compensation for employees in this role, including discretionary equity incentive awards and discretionary annual performance bonus. Perks/Benefits: - Competitive compensation and equity awards. - Health and Wellness benefits encompassing Medical, Dental and Vision coverage (for full-time employees only). - Unlimited Vacation. - Flexible hours and Work from Home support. - Daily drinks, snacks and catered meals (when in office). - Regularly scheduled team building activities and social events both on-site, off-site & virtually. - As we grow, this list continues to evolve! Waabi is a technology start-up building technologies to transform the way the world moves. Join our talented team to be a part of the future and to make an impact! Waabi is an equal opportunity employer. We celebrate diversity and are committed to creating a supportive, inclusive, and accessible workplace for all our employees. We seek applicants of all backgrounds and identities, across race, color, ethnicity, national origin or ancestry, age, citizenship, religion, sex, sexual orientation, gender identity or expression, military or veteran status, marital status, pregnancy or parental status, caregiver status, disability, or any other characteristic protected by law. We make workplace accommodations for qualified individuals with disabilities as required by applicable law. If reasonable accommodation is needed to participate in the job application or interview process please let our recruiting team know.



