Gramian Consulting logo
Gramian Consulting

We get talents. You get results.

AI Evaluation Engineer – Software Engineering Domain

Full-stack EngineerSoftware EngineerContractRemoteSeniorTeam 2-10Since 2025H1B No SponsorCompany SiteLinkedIn

Location

Egypt

Posted

8 days ago

Salary

0

Seniority

Senior

Bachelor Degree3 yrs expEnglish

Job Description

AI Evaluation Engineer – Software Engineering Domain

Gramian Consulting

• Design realistic terminal-based benchmark tasks for AI evaluation systems • Create technically deep debugging and investigation scenarios • Develop task specifications involving infrastructure, workflows, pipelines, or operational failures • Write clear solution approaches and deterministic evaluation criteria • Identify realistic edge cases, failure modes, and system constraints • Design multi-step reasoning challenges across complex technical environments • Contribute expertise across one or more engineering or operational domains • Review and refine benchmark quality, difficulty, and validation logic • Collaborate with reviewers and researchers on AI evaluation workflows

Job Requirements

  • 3–10 years of experience in software engineering or related technical domains
  • Strong debugging, analytical, and systems reasoning skills
  • Good understanding of system architecture, dependencies, and operational processes
  • Experience with terminal, CLI, automation, or developer tooling workflows
  • Exposure to AI systems, LLMs, benchmarking, or evaluation frameworks is preferred
  • Ability to design technically rigorous and realistic engineering scenarios

Related Job Pages

More Full-stack Engineer Jobs

Qonto logo

Tech Lead – AI Lab

Qonto

Qonto is a Paris, Île-de-France, France-based financial services company self-described as the “neobank” for startups, medium businesses, freelancers, and company creators. Mo

• Join our AI Lab as a Tech Lead and help Qonto build the AI-first product that transforms how 600,000+ businesses manage their finances. • Spend most of your time doing what you love — shipping — while providing the technical direction and light mentorship that turns a high-velocity squad into a high-quality one. • Work alongside Sophie, our Manager for the AI Lab, you'll set the technical standard for a 13-person cross-functional team as we scale from one AI agent to many. • Directly manage 5 engineers across Backend, Web, and Mobile — leading by project and track, not by stack — keeping pace with our ambition: one new agent every 6 weeks. • Act as the technical interface between the AI Lab and the broader organisation — understand how features are built across Qonto, bring that knowledge into the Lab, and ensure what we build in agents integrates with the wider product.

IDF + 4 moreAll locations: IDF | BE | Louisiana | Maryland | France
ShippyPro logo

GTM Engineer

ShippyPro

Make people work better. The easiest way to ship, track and return your e–commerce orders.

Full TimeRemoteTeam 51-200H1B No Sponsor

Role Description You’ve spent years building systems, APIs, automations, and integrations. But you also love GTM, marketing, sales, funnels, and growth. You care about what converts, not just what works. You’re the kind of engineer who opens Claude before Stack Overflow, connects tools for fun, and constantly experiments. At ShippyPro, you’ll join the Engineering Team and work with Product, Marketing, and Sales to build the systems behind how we acquire, convert, and scale customers globally. This is engineering applied to growth. Qualifications - Several years of engineering experience - Strong interest in GTM, marketing, sales, and growth systems - Experience with APIs, integrations, automations, and internal tooling - Builder mentality — you prefer shipping over overthinking - Strong curiosity around AI workflows, agents, MCP ecosystems, and LLM-powered operations - Fast execution and experimentation mindset - Comfort working in ambiguity and moving quickly Requirements - Build internal GTM tools, automations, and workflows - Connect APIs across CRM, product, analytics, marketing, and outbound systems - Develop AI-powered workflows and agents - Experiment with MCP servers, integrations, and AI tooling - Improve attribution, funnel visibility, and lifecycle operations - Support rapid experimentation across acquisition and conversion initiatives - Identify inefficiencies and solve them through engineering Benefits - Competitive salary between €29,000 and €35,000, calculated through our salary simulator – built on objective metrics, because we believe in unbiased compensation - Meal vouchers (office or remote) - Mental healthcare - Yearly learning budget and AI tools - Remote flexibility with expenses-paid trips to HQ for team meetups - No clock-in/out policy and one-time home office allowance - Birthday Time Off – one extra day off, just for you - Career Growth Program – clear growth paths, structured goals, and continuous feedback - An international team that moves fast and cares about building things well

Worldwide
€29K - €35K / year
Gramian Consulting Group logo

AI Evaluation Engineer - Software Engineering Domain

Gramian Consulting Group

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.

Role Description We are looking for highly analytical engineers and technical domain experts to contribute to advanced AI evaluation and benchmarking projects focused on realistic terminal-based and infrastructure-heavy workflows. In this role, you will design technically challenging tasks that evaluate how AI systems reason through debugging, operational failures, complex workflows, and multi-step problem-solving scenarios. The ideal candidate has strong experience working with production systems, debugging, automation, or large-scale engineering workflows, and can design realistic technical challenges that simulate real-world engineering environments. This role is particularly well suited for professionals with backgrounds in backend engineering, infrastructure, DevOps, data systems, MLOps, cybersecurity, or platform engineering. - CONTRACT: Contractor assignment (5 weeks) - COMMITMENT: Full-time (40h/week) or Part-time (20h/week) with minimum 4h PST overlap - LOCATION: Remote — Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Pakistan, Indonesia, Kenya, Nigeria, Turkey, Vietnam - PROCESS: One technical assessment/interview (~45 min) Responsibilities - Design realistic terminal-based benchmark tasks for AI evaluation systems - Create technically deep debugging and investigation scenarios - Develop task specifications involving infrastructure, workflows, pipelines, or operational failures - Write clear solution approaches and deterministic evaluation criteria - Identify realistic edge cases, failure modes, and system constraints - Design multi-step reasoning challenges across complex technical environments - Contribute expertise across one or more engineering or operational domains - Review and refine benchmark quality, difficulty, and validation logic - Collaborate with reviewers and researchers on AI evaluation workflows Qualifications - 3–10 years of experience in software engineering or related technical domains - Strong debugging, analytical, and systems reasoning skills - Good understanding of system architecture, dependencies, and operational processes - Experience with terminal, CLI, automation, or developer tooling workflows - Exposure to AI systems, LLMs, benchmarking, or evaluation frameworks is preferred - Ability to design technically rigorous and realistic engineering scenarios

India + 9 moreAll locations: India | Brazil | Colombia | Egypt | Pakistan | Indonesia | Bangladesh | Ghana | Kenya | Nigeria
The Hello Team logo

Software Developer

The Hello Team

Managed global staffing across 30 plus countries with enterprise recruiting, oversight, training, and performance manage

Full TimeRemoteTeam 1,001-5,000Since 2021H1B No Sponsor

• Develop and maintain scalable software features and applications • Assist in building and improving management software systems • Test, debug, and troubleshoot applications • Monitor and maintain system stability and performance • Collaborate with the team on product enhancements • Participate in code reviews and contribute to development best practices • Integrate and maintain APIs and frontend/backend components • Document technical processes and workflows

New York