Research institute investigating the trajectory of AI
Software Engineer, Benchmarking
Location
United States
Posted
1 day ago
Salary
$125K - $200K / year
Seniority
Mid Level
Job Description
Software Engineer, Benchmarking
Epoch AI
• Implement benchmarks: Implement AI benchmarks within our evaluation infrastructure (primarily using the Inspect library) to expand the suite of capabilities we track. Develop our existing suite of benchmarks so we can quickly and painlessly evaluate new model releases. • Develop new benchmarks: Contribute to the development of brand new benchmarks. You will have the opportunity to pitch and prototype your own ideas in addition to helping out with existing projects. • Collaborate: Work closely with researchers, analysts, and other engineers at Epoch AI to ensure evaluation data and outputs are accurate, insightful, and effectively integrated into our research products and publications.
Job Requirements
- Solid engineering skills: A strong software engineering background with more than two years of professional experience building and maintaining complex systems. You are expected to regularly contribute high-quality, robust, and maintainable code and be comfortable diving deep into existing codebases and infrastructure.
- Ideas and creativity: Candidates should be able to generate their own ideas for new benchmarks, experiments, novel things to try, and other projects.
- Mission-driven: You’re motivated by Epoch AI’s mission to provide rigorous, independent insight into key trends in AI. You want to deliver public, trustworthy evaluations of AI capabilities on challenging benchmarks, empowering researchers, policymakers, and the wider public to make well-informed decisions about AI.
- AI domain expertise or cybersecurity experience are strong pluses but not required.
Benefits
- Fully remote environment, including flexible work hours and schedules for most roles.
- Competitive global benefits program, including a comprehensive health insurance program—including supplemental benefits specific to a local country, as available and mandated by local law—and life insurance and a pension plan, if applicable in your country.
- Generous paid time off (PTO), including no specific limit on PTO with 30 days per year protected, unlimited personal and sick leave, and up to 6 months (combination of paid + unpaid) parental leave for permanent staff.
- A flexible and generous expense policy for you to spend on equipment and a large range of productivity tools or learning/development opportunities you might find valuable, subject to regulations and manager approval.
- Paid work trips, including 3 staff retreats per year and relevant conferences.
- Access to our very well-equipped offices in Berkeley, California, including paid meals, snacks, gym, and more. All staff, independently of where they are based, have access to the office for at least 20 days each year.
Related Guides
Related Job Pages
More Full-stack Engineer Jobs
• Own and evolve critical catalog systems, including product onboarding, attribute management, data enrichment, validation workflows, and product readiness tooling. • Design and operate scalable services and data models that ensure product information is accurate, complete, consistent, and available to downstream systems. • Drive discovery with Product, Merchandising, Data Science, and Operations partners to identify high-impact problems, evaluate tradeoffs, and define clear technical roadmaps. • Independently lead initiatives from concept through production rollout, including technical design, implementation, launch, and measurement of outcomes. • Improve catalog data quality by building systems that detect, prevent, and resolve incomplete, inaccurate, or inconsistent product information. • Ensure catalog workflows support efficient merchandising operations and help teams prepare products for sale at scale. • Mentor engineers through code reviews, architecture discussions, technical feedback, and shared ownership of engineering standards. • Instrument, monitor, and tune systems for reliability, performance, scalability, and cost efficiency. • Work collaboratively with cross-functional partners to align technical solutions with business needs and operational workflows.
Senior Energy Engineer
5Professional energy advisory service for your business, city or school. Energy made human.
• Translate complex energy concepts, utility structures, and technical findings into clear, actionable terms for non-technical decision makers across all client and prospect interactions. • Lead client-facing meetings to discuss energy usage, billing mechanics, and delivery cost components, including capacity and transmission charges, power factor, peak demand performance, and demand response programs. • Conduct cost-benefit analyses for energy efficiency upgrades, on-site generation, power storage, and electrification initiatives including compliance pathways such as NYC Local Law 97. • Serve as owner's representative on engagements where 5 is part of the project team, managing vendor and contractor relationships on the client's behalf to protect their interests and keep projects on track. • Monitor and communicate new utility programs, incentives, and regulatory changes that impact energy costs, then work collaboratively with clients to develop strategies that minimize spend and optimize their site's energy consumption. • Host Lunch & Learns and knowledge sessions for client facilities teams covering energy fundamentals, emerging programs, and building best practices. • Join prospect and client meetings alongside 5's advisors to provide technical depth, answer complex questions, and help close opportunities where engineering expertise makes the difference. • Contribute to improving 5's internal data workflows and client deliverables so that complex analysis is consistently presented in formats that are intuitive, visual, and decision-ready.
• Partner with product managers to understand business requirements and implement technical solutions • Collaborate with UX/UI designers to implement intuitive and accessible user interfaces • Develop, test, and maintain efficient, reusable, and reliable code using modern frameworks and languages • Participate in peer code reviews and provide constructive feedback • Optimize system performance, scalability, and reliability for post-deployment • Work with SQL and cloud-based storage solutions (e.g., Azure Blob Storage, Data Lake) • Create and maintain unit tests and integration tests • Troubleshoot and resolve application issues • Identify and address technical debt as part of ongoing development • Contribute to architecture and design decisions under guidance from Principal engineers • Apply best practices for mobile, web, and backend systems • Design and consume RESTful APIs within microservices architecture • Create and maintain clear and accurate technical documentation • Stay informed about emerging technologies and industry best practices • Communicate progress, risks, and blockers to team members and stakeholders • Collaborate effectively with cross-functional teams • Use tools like Azure DevOps, Teams, and Lucid Chart for effective collaboration
Senior FullStack Software Engineer
ArionkoderArionkoder is a Product Development company that helps companies scale their impact by crafting digital solutions.
Role Description We are looking for a Senior Full-Stack Engineer to design, build, and ship features end-to-end across one of our core product teams — from the interface a user touches, through the API, down to the data. You own meaningful parts of the product front to back: - Shape the approach with your team. - Build both the UI and the services behind it. - Test them and run them in production. This is a hands-on role: most of your day is spent writing, reviewing, and debugging real code across the stack, with the autonomy to make sound technical decisions and the judgment to know which ones to raise. You care about more than making it work. You think about: - Edge cases - Failure modes - Performance — both on the server and as the user perceives it - Accessibility - The engineers who'll read your code next You collaborate closely with product managers, QA, and fellow engineers to turn ambiguous goals into reliable software, and you push back early when requirements don't add up. You bring an AI-first working style — using AI tooling daily to move faster across design, coding, and review — with the judgment to catch where its output falls short. This is not a role for someone who lives on only one side of the stack. We need an engineer who's genuinely comfortable building and owning a feature from the component to the database, takes it from ambiguous intent to shipped, reliable software, and raises the bar for the team around them. Qualifications - 5+ years of professional software engineering experience building and operating production systems, working across both frontend and backend. - Strong fluency in JavaScript/TypeScript and Python; comfortable owning non-trivial code in both. - Production experience with a modern framework (React, Vue, or similar) in TypeScript. - Hands-on experience building services on AWS — using core compute, data, and identity services. - Comfortable with modern CI/CD and Git workflows. - Strong production instincts: writes testable code, handles failure deliberately, and can debug and profile a live system. - Reads and works within an existing architecture. - AI-first working style: hands-on daily use of AI tooling with the judgment to verify its output. - Senior judgment and autonomy: takes ambiguous requirements to ship software with minimal hand-holding. - Advanced English; comfortable in a fully remote, distributed team. Requirements - Frontend depth: working grasp of responsive layout, accessibility, and browser performance. - Backend depth: building services and APIs (REST/GraphQL or event-driven) with solid data modeling across relational and non-relational stores. - Experience integrating with multiple third-party platforms in production. - Exposure to Infrastructure as Code (Terraform, CloudFormation/CDK, or Pulumi). - Experience working in multi-tenant or enterprise-facing products with real security and data-residency constraints. - Familiarity with observability tooling (Datadog, Grafana, Sentry, or similar). - Experience mentoring engineers or leading the implementation of a sizable feature. Benefits - 💵 Monthly USD Service Fee - 🌴 20 business days of paid Agreed Time Off—Annual + 6 Local Holidays - 🐣 Caregiver Time Off - 💻 A dynamic remote work culture. You can work from anywhere! - 🚀 An entrepreneurial environment.




