Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid. Project time expectations: Tasks are estimated to require around 10–20 hours per week during active phases, based on project requirements; This is an estimate, not a guaranteed workload, and applies only while the project is active. Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.
Senior AI Agent Evaluation Engineer
Location
Worldwide
Posted
19 hours ago
Salary
$50 / hour
Seniority
Senior
No structured requirement data.
Job Description
Senior AI Agent Evaluation Engineer
Mindrift
Role Description Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment. What this opportunity involves: - Building a dataset to evaluate AI coding agents - how well a model handles real-world developer tasks. - Creating challenging tasks and evaluation criteria within realistic simulated environments: - Build realistic developer environments - a virtual company with codebase, infrastructure, and context (tickets, docs, conversations) that forms a believable development history. - Design tasks from intermediate states of these environments - craft the prompt, define what "solved" means, and ensure the task is solvable by an AI agent. - Write tests that verify agent solutions - accept all valid approaches and reject incorrect ones, neither too strict nor too lenient. - Iterate on tasks and tests based on QA feedback - review agent solutions, analyze failures, and refine until the evaluation is fair and robust. What this is NOT: - Not data labeling. - Not prompt engineering. - Not writing code from scratch - the agent writes most of the code; you guide and evaluate. Qualifications - 5+ years in software development. - Core stack: Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, Redis. - Experience writing tests (functional, integration). - English proficiency - B2+. Requirements - Deep understanding of where models fail and what scenarios reveal the difference between a good and a bad solution. - Ability to create tasks that genuinely challenge the best models. - Writing tests that accept all correct solutions and reject incorrect ones. Benefits - Compensation up to $50/hr equivalent, depending on level and pace. - Tasks are estimated at ~20 hours each; you set your own schedule. Effort Estimate Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted.
Related Guides
Related Job Pages
More AI Engineer Jobs
About Airwallex Airwallex is the only unified payments and financial platform for global businesses. Powered by our unique combination of proprietary infrastructure and software, we empower over 200,000 businesses worldwide - including Brex, Rippling, Navan, Qantas, SHEIN and many more - with fully integrated solutions to manage everything from business accounts, payments, spend management and treasury, to embedded finance at a global scale. Proudly founded in Melbourne, we have a team of over 2,200 of the brightest and most innovative people in tech across 26 offices around the globe. Valued at US$8 billion and backed by world-leading investors including T. Rowe Price, Visa, Mastercard, Robinhood Ventures, Sequoia, Salesforce Ventures, DST Global, and Lone Pine Capital, Airwallex is leading the charge in building the global payments and financial platform of the future. If you're ready to do the most ambitious work of your career, join us. Attributes We Value We hire successful builders with founder-like energy who want real impact, accelerated learning, and true ownership. You bring strong role-related expertise and sharp thinking, and you're motivated by our mission and operating principles. You move fast with good judgment, dig deep with curiosity, and make decisions from first principles, balancing speed and rigor. You're humble and collaborative; turn zero-to-one ideas into real products, and you "get stuff done" end-to-end. You use AI to work smarter and solve problems faster. Here, you'll tackle complex, high-visibility problems with exceptional teammates and grow your career as we build the future of global banking. If that sounds like you, let's build what's next. About the team As the pioneers in our new Generative AI department, you will spearhead the development of AI agents. These AI agents will transform the way we provide support and automation solutions for both our internal teams and external customers. You'll have the unique opportunity to shape the architecture and API design of cutting-edge systems in support of our customers, automation, search, and data analysis. Your role is crucial in driving innovation and setting the standard for future developments in this exciting new field. What you'll do As the staff engineer in our new Generative AI department, you will define the technical and strategic vision for next-generation AI solutions. By harnessing cutting-edge language models, you will drive internal efficiency and transform user experience for our customers. Your work will set the standard for AI engineering excellence, paving the way for the next wave of intelligent products and services. This role is based in Singapore. Responsibilities: - Spearhead the creation of AI solutions to enable growth and drive internal efficiency across the business. - Design and implement robust API and system architecture for new AI applications. - Assist in AI application roadmapping and strategy. - Collaborate effectively with a diverse, cross-functional team to leverage existing AI technologies, and research and implement new ones, to meet product goals. - Providing prompt and effective solutions or workarounds in response to application issues or software bugs, maintaining high client satisfaction. - Embrace the challenges of a fast-paced, high-growth environment with a focus on innovation. Who you are We're looking for people who meet the minimum qualifications for this role. The preferred qualifications are great to have, but are not mandatory. Minimum qualifications: - 10+ years in backend software development, with a passion for AI and language technologies. - Proven track record of delivering major impact. - Proficiency in Python, Typescript or Javascript. - Experience in design and development of large-scale distributed, high concurrency, high load, high availability systems. - Experience in vector databases. - A relevant degree in Computer Science, Mathematics or related fields. Preferred qualifications: - Familiarity with MLOps practices and deployment pipelines for AI-driven products. #singapore Applicant Safety Policy: Fraud and Third-Party Recruiters To protect you from recruitment scams, please be aware that Airwallex will not ask for bank details, sensitive ID numbers (i.e. passport), or any form of payment during the application or interview process. All official communication will come from an @airwallex.com email address. Please apply only through careers.airwallex.com or our official LinkedIn page. Airwallex does not accept unsolicited resumes from search firms/recruiters. Airwallex will not pay any fees to search firms/recruiters if a candidate is submitted by a search firm/recruiter unless an agreement has been entered into with respect to specific open position(s). Search firms/recruiters submitting resumes to Airwallex on an unsolicited basis shall be deemed to accept this condition, regardless of any other provision to the contrary. Equal opportunity Airwallex is proud to be an equal opportunity employer. We value diversity and anyone seeking employment at Airwallex is considered based on merit, qualifications, competence and talent. We don't regard color, religion, race, national origin, sexual orientation, ancestry, citizenship, sex, marital or family status, disability, gender, or any other legally protected status when making our hiring decisions. If you have a disability or special need that requires accommodation, please let us know. #BI-Hybrid
Operations Associate – AI Developer
SitetrackerWe power the rapid deployment of tomorrow's infrastructure. Deploy what's next.
• Join Sitetracker's Operations team • Build, test, and deploy AI agents and automations • Manage Salesforce Flows and automation • Work on afternoon shift • Collaborate with the Director of Operations • Deliver documented builds for quality review
AI, ML Engineer - Python, Agentic AI
UnitedHealth GroupUnitedHealth Group is a healthcare and well-being company that’s dedicated to improving the health outcomes of millions around the world. We are comprised of
Title: AI/ML Engineer - Python / Agentic AI - Remote Nationwide or Hybrid in MN/DC Requisition number: 2357479 Job category: Technology Primary location: Minnetonka, MN Overtime status: Exempt Travel: No Job Description: Optum Tech is a global leader in health care innovation. Our teams develop cutting-edge solutions that help people live healthier lives and help make the health system work better for everyone. From advanced data analytics and AI to cybersecurity, we use innovative approaches to solve some of health care's most complex challenges. Your contributions here have the potential to change lives. Ready to build the next breakthrough? Join us to start Caring. Connecting. Growing together. We are seeking an AI/ML Engineer with deep expertise in Generative AI technologies to develop and deploy cutting-edge solutions within the Optum Care Strategic Platform (Facets). The ideal candidate will have hands-on experience with Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), Agentic AI, and frameworks such as LangGraph and LangChain. Strong proficiency in Python along with Software Engineering background using SQL and API development are essential. This role will involve close collaboration with Claims Operations. You'll enjoy the flexibility to work remotely * from anywhere within the U.S. as you take on some tough challenges. For all hires in the Minneapolis or Washington, D.C. area, you will be required to work in the office a minimum of four days per week. Primary Responsibilities: - Be able to understand the Facets Delivery, Client priority and requirements - Proven Good in adapting & adopting new technology footprints to optimize business solutions using AI Adoption - Apply a solid understanding of Agentic AI concepts and frameworks such as LangGraph and LangChain to design and implement intelligent, autonomous systems - Utilize expertise in Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and Transformer architectures to build scalable and high-performing AI solutions - Develop sophisticated generative AI algorithms and models to create new data samples, patterns, or content based on existing datasets and user inputs - Build and maintain robust APIs using frameworks such as FastAPI to enable seamless integration of AI services - Develop secure, maintainable software using SQL and modern engineering patterns - Deploy AI/ML solutions in production environments, preferably on cloud platforms like Azure - Implement machine learning and natural language processing projects from concept to production - Conduct rigorous model testing, validation, and performance optimization - Containerize AI/ML projects and implement CI/CD pipelines using tools like GitHub Actions - Collaborate with stakeholders to preprocess, analyze, and interpret large-scale datasets - Stay current with emerging trends and advancements in generative AI and ML technologies - Mentor and lead other engineers to reach their potential - Ensure code quality and security by identifying and resolving vulnerabilities - Maintain solid programming practices and standards - Provide timely updates on tasks to reporting supervisor - Adherence to the defined SLA Compliance etc - Provide support and leadership in critical production support issue resolution - Participate in on-call rotations as scheduled - Work with team to achieve timely resolution of all production issues meeting or exceeding Service Level Agreements You'll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in. Required Qualifications: - Undergraduate degree or equivalent experience - 3+ years of experience in Python programming - 2+ years of experience with build, Deploying and manage ML models on cloud platforms such as Microsoft Azure or any other Cloud platform - 2+ years of Database Design/Programming - 1+ years of Terraform scripts understanding - 1+ years of Git action and CI/CD experience needed Preferred Qualifications: - 2+ years of software engineering and testing experience in C# / .Net programming language - Healthcare Claims EDI Experience - Prompt engineering experience - Experience leveraging Large Language Models (LLMs) - Experience working with various AI/ML infrastructure, tools and platforms across the full stack of AI/ML technology - Experience with the challenges of delivering and monitoring LLM based cloud systems into production - Proven solid communication skills with the ability to explain complex technical concepts to diverse audiences - All employees working remotely will be required to adhere to UnitedHealth Group's Telecommuter Policy. Pay is based on several factors including but not limited to local labor markets, education, work experience, certifications, etc. In addition to your salary, we offer benefits such as, a comprehensive benefits package, incentive and recognition programs, equity stock purchase and 401k contribution (all benefits are subject to eligibility requirements). No matter where or when you begin a career with us, you'll find a far-reaching choice of benefits and incentives. The salary for this role will range from $98,500 to $176,000 annually based on full-time employment. We comply with all minimum wage laws as applicable. At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone-of every race, gender, sexuality, age, location and income-deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes - an enterprise priority reflected in our mission. UnitedHealth Group is an Equal Employment Opportunity employer under applicable law and qualified applicants will receive consideration for employment without regard to race, national origin, religion, age, color, sex, sexual orientation, gender identity, disability, or protected veteran status, or any other characteristic protected by local, state, or federal laws, rules, or regulations. UnitedHealth Group is a drug - free workplace. Candidates are required to pass a drug test before beginning employment.
Lead AI engineer
MastercardFounded in 1966, Mastercard is a worldwide transaction, payment-processing, and consulting company best known for its line of personal and business credit cards. As an employer, Ma
Our Purpose Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential. Title and Summary Lead AI engineer Who is Mastercard? Mastercard is a global technology company in the payments industry. Our mission is to connect and power an inclusive, digital economy that benefits everyone, everywhere by making transactions safe, simple, smart, and accessible. Using secure data and networks, partnerships and passion, our innovations and solutions help individuals, financial institutions, governments, and businesses realise their greatest potential. Our decency quotient, or DQ, drives our culture and everything we do inside and outside of our company. With connections across more than 210 countries and territories, we are building a sustainable world that unlocks priceless possibilities for all. Overview The CNPF Data & AI organisation is looking for a Lead AI Engineering Engineer to drive hands-on delivery of applied AI and agentic capabilities across our platforms. This role sits at the intersection of software engineering, machine learning engineering, and applied data science, with a strong emphasis on building production-grade AI systems. This is a senior individual contributor and technical leadership role. You will lead by example through deep hands-on engineering, influence technical direction, and partner closely with Applied AI, Data Science, and Product teams to take AI solutions from experimentation to secure, scalable production. Role • Lead hands-on development of AI and agentic systems from design through production deployment• Build and operate ML/AI services, pipelines, and APIs using strong software engineering practices• Design and implement ML engineering capabilities such as model serving, monitoring, evaluation, and retraining• Partner with data scientists to productionise models and experiments efficiently• Contribute directly to data preparation, feature engineering, experimentation, and modelling when required• Drive technical design reviews and provide mentorship to engineers and data scientists• Ensure AI solutions meet Mastercard standards for performance, reliability, security, and governance• Collaborate closely with platform, security, and infrastructure teams to ship responsibly at scale All about you • Strong experience as a hands-on AI engineer, ML engineer, or senior software engineer working on production AI systems• Solid foundations in software engineering, system design, and distributed systems• Proven experience productionising machine learning models and operating them at scale• Comfortable working across data engineering, ML engineering, and applied data science tasks• Experience with large-scale data platforms and modern ML/AI tooling• Strong problem-solving skills and ability to work with ambiguous requirements• Ability to influence technical direction without formal people management responsibility• Clear communication skills and comfort collaborating across functions What Makes You Stand Out• You have built and operated AI or agentic applications that run in real production environments• Hands-on experience implementing agent-based or LLM-powered systems beyond simple POCs• Strong intuition for reliability, observability, and failure handling in AI systems• Ability to move fluidly between engineering execution and applied modeling when needed• Track record of raising the technical bar for teams through code, design, and mentorship Corporate Security Responsibility Every person working for, or on behalf of, Mastercard is responsible for information security. All activities involving access to Mastercard assets, information, and networks come with an inherent risk to the organisation and therefore it is expected that the successful candidate must:• Abide by Mastercard's security policies and practices• Ensure the confidentiality and integrity of the information being accessed• Report any suspected information security violation or breach• Complete all mandatory security trainings in accordance with Mastercard's guidelines Corporate Security Responsibility All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must: - Abide by Mastercard's security policies and practices; - Ensure the confidentiality and integrity of the information being accessed; - Report any suspected information security violation or breach, and - Complete all periodic mandatory security trainings in accordance with Mastercard's guidelines.

