A

Artificial Analysis, Inc.

Remote Jobs

3 open rolesLatest: Jun 11, 2026, 12:00 AM UTC
Post Date
Minimum Salary
Experience

3 Jobs

Role Description We're looking for a Senior Full Stack Software Engineer to join our team and lead full stack development projects focusing on our website, artificialanalysis.ai. You'll work closely with our founders to build core parts of the software stack for our early stage start-up. The coming wave of AI scaling is going to change the world in ways we don't yet understand — and we're offering a front row seat. - Lead full-stack development and optimization of the Artificial Analysis website - Work with our TypeScript / Next.js stack (hosted on Vercel) to build and maintain website features, including using our PostgreSQL databases on Supabase - Design and implement user interfaces and data visualizations that distill complex AI benchmarking data into intuitive, interactive experiences for many thousands of daily users - Push forward our engineering practice more broadly, incorporating industry best practices into our DevOps, infrastructure management and CI/CD - Mentor and manage a small team of developers to drive a culture of engineering excellence - Embrace an AI-native workflow, using cutting-edge AI tools to generate leverage in a fast-changing industry Qualifications - 3+ years of professional software engineering experience, with a strong focus on modern front-end web development (JavaScript/TypeScript, React, Next.js, Node.js) - Knowledge of modern AI and experience using LLMs - Passion for AI and eagerness to work at the forefront of technological innovation - The ability to balance writing maintainable code and making sound architectural decisions with the need to ship features quickly - Strong problem-solving skills and ability to distill complex concepts into actionable insights in the face of uncertainty - Excellent communication and collaboration skills - Proven ability to lead projects independently in a fast-paced environment Requirements - Preferred but not essential: Bachelor's or Master's in Computer Science, Engineering, or related field (e.g. Physics) - Experience with Python libraries for data analysis (e.g. pandas) and AI/ML frameworks (e.g. PyTorch) - Experience visualizing & presenting data - Creation of analytical reports Benefits - Shape how AI gets built: The leading AI labs track our benchmarks and use them to guide their development priorities. Your work will directly influence the direction of AI. - Become a world expert in AI: You will evaluate every major model, across every major capability, as they are released. Very few roles offer this breadth of exposure to frontier AI. - Work with the most important players in AI: You'll manage relationships with teams at the leading AI labs and major enterprises as a trusted, independent voice. - Join at a defining moment: We're 35+ people and fast growing, backed by some of the most connected investors in AI. The people who join now will shape the product, the team, and the strategy as we scale. - Competitive compensation including equity - Our team is split across San Francisco, Sydney, and Melbourne

Australia

Role Description Artificial Analysis benchmarks leading image and video generation models, providing the AI industry with independent quality and performance comparisons. Our media generation benchmarks rely on structured human preference evaluations to assess output quality across models. We're hiring a Solutions Engineer to manage our media generation benchmarking pipeline. You'll run image and video generation evaluations, manage human preference studies, and serve as a technical point of contact for media generation model providers. This is a process-driven, operational role suited to someone who is detail-oriented, comfortable with Python, and can manage pipelines reliably day-to-day. - Generate image and video outputs across models according to standardized evaluation protocols - Set up and manage human preference evaluation studies, including study design, participant management, and quality control - Process and analyze preference vote data to produce benchmark results - Manage the end-to-end pipeline: from prompt execution through to published results - Serve as a technical point of contact for media generation model providers — communicating results, explaining methodology, and handling queries - Monitor data quality, flag anomalies, and ensure consistency across evaluation rounds - Maintain documentation of processes and configurations - Stay current with new image and video model releases Qualifications - 3+ years of experience in a technical operations, data operations, or solutions engineering role - Comfortable with Python scripting and working with APIs - Experience managing research studies, data collection pipelines, or crowdsourcing platforms is a strong plus - Detail-oriented with strong process management skills — you can run recurring workflows reliably without oversight - Good written and verbal English communication skills - Responsive, organized, and dependable Requirements - Experience with image or video generation models (Midjourney, DALL-E, Stable Diffusion, Runway, Sora, etc.) - Background in data analysis or research operations - Familiarity with human evaluation methodologies or preference-based ranking systems - Experience in B2B SaaS or developer tools Benefits - Shape how AI gets built: The leading AI labs track our benchmarks and use them to guide their development priorities. Your work will directly influence the direction of AI. - Become a world expert in AI: You will evaluate every major model, across every major capability, as they are released. Very few roles offer this breadth of exposure to frontier AI. - Work with the most important players in AI: You'll manage relationships with teams at the leading AI labs and major enterprises as a trusted, independent voice. - Join at a defining moment: We're 35+ people and fast growing, backed by some of the most connected investors in AI. The people who join now will shape the product, the team, and the strategy as we scale. - Competitive compensation including equity - Our team is split across San Francisco, Sydney, and Melbourne

United States

Role Description Artificial Analysis maintains one of the most comprehensive language model benchmarking suites in the industry, evaluating frontier models across quality, speed, and pricing for the AI labs and enterprises that rely on our data. We're hiring a Solutions Engineer to own the day-to-day operation of our language model benchmarking stack. This is a hands-on, operational role: - Add new models to our evaluation pipeline - Run and debug benchmarks - Serve as the primary technical point of contact for AI lab customers - Explain results, field methodology questions, and resolve API endpoint issues over Slack and video calls This is not a software engineering role focused on building new systems. It's about running a sophisticated existing stack exceptionally well, consistently and reliably, while being the trusted technical face of Artificial Analysis to our customers. Qualifications - 5+ years of experience in a client-facing technical role — solutions engineering, support engineering, technical consulting, or similar - Strong Python proficiency and comfort working with complex codebases you didn't write - Hands-on experience working with AI/ML model APIs (OpenAI, Anthropic, Google, Meta, etc.) - Excellent debugging skills — you can trace issues across APIs, data pipelines, and code - Strong written and verbal English communication skills, with the ability to explain technical concepts clearly to technical stakeholders - Highly responsive and reliable — you take ownership of customer issues and follow through - Comfortable with operational, repeatable work — you find satisfaction in running things well rather than building from scratch - High attention to detail and calm under pressure Requirements - Experience with AI evaluation, benchmarking, or testing methodologies (nice to have) - Familiarity with LLM inference infrastructure (tokenization, latency measurement, throughput metrics) (nice to have) - Experience working in or with AI labs or model providers (nice to have) - Background in B2B SaaS or developer tools (nice to have) Benefits - Shape how AI gets built: The leading AI labs track our benchmarks and use them to guide their development priorities. - Become a world expert in AI: You will evaluate every major model, across every major capability, as they are released. - Work with the most important players in AI: You'll manage relationships with teams at the leading AI labs and major enterprises as a trusted, independent voice. - Join at a defining moment: We're 35+ people and fast growing, backed by some of the most connected investors in AI. - Competitive compensation including equity

United States