We get talents. You get results.
AI Evaluation Engineer – Software Engineering Domain
Location
Egypt
Posted
8 days ago
Salary
0
Seniority
Senior
Job Description
AI Evaluation Engineer – Software Engineering Domain
Gramian Consulting
• Design realistic terminal-based benchmark tasks for AI evaluation systems • Create technically deep debugging and investigation scenarios • Develop task specifications involving infrastructure, workflows, pipelines, or operational failures • Write clear solution approaches and deterministic evaluation criteria • Identify realistic edge cases, failure modes, and system constraints • Design multi-step reasoning challenges across complex technical environments • Contribute expertise across one or more engineering or operational domains • Review and refine benchmark quality, difficulty, and validation logic • Collaborate with reviewers and researchers on AI evaluation workflows
Job Requirements
- 3–10 years of experience in software engineering or related technical domains
- Strong debugging, analytical, and systems reasoning skills
- Good understanding of system architecture, dependencies, and operational processes
- Experience with terminal, CLI, automation, or developer tooling workflows
- Exposure to AI systems, LLMs, benchmarking, or evaluation frameworks is preferred
- Ability to design technically rigorous and realistic engineering scenarios
Related Guides
Related Job Pages
More Full-stack Engineer Jobs
Tech Lead – AI Lab
QontoQonto is a Paris, Île-de-France, France-based financial services company self-described as the “neobank” for startups, medium businesses, freelancers, and company creators. Mo
• Join our AI Lab as a Tech Lead and help Qonto build the AI-first product that transforms how 600,000+ businesses manage their finances. • Spend most of your time doing what you love — shipping — while providing the technical direction and light mentorship that turns a high-velocity squad into a high-quality one. • Work alongside Sophie, our Manager for the AI Lab, you'll set the technical standard for a 13-person cross-functional team as we scale from one AI agent to many. • Directly manage 5 engineers across Backend, Web, and Mobile — leading by project and track, not by stack — keeping pace with our ambition: one new agent every 6 weeks. • Act as the technical interface between the AI Lab and the broader organisation — understand how features are built across Qonto, bring that knowledge into the Lab, and ensure what we build in agents integrates with the wider product.
GTM Engineer
ShippyProMake people work better. The easiest way to ship, track and return your e–commerce orders.
Role Description You’ve spent years building systems, APIs, automations, and integrations. But you also love GTM, marketing, sales, funnels, and growth. You care about what converts, not just what works. You’re the kind of engineer who opens Claude before Stack Overflow, connects tools for fun, and constantly experiments. At ShippyPro, you’ll join the Engineering Team and work with Product, Marketing, and Sales to build the systems behind how we acquire, convert, and scale customers globally. This is engineering applied to growth. Qualifications - Several years of engineering experience - Strong interest in GTM, marketing, sales, and growth systems - Experience with APIs, integrations, automations, and internal tooling - Builder mentality — you prefer shipping over overthinking - Strong curiosity around AI workflows, agents, MCP ecosystems, and LLM-powered operations - Fast execution and experimentation mindset - Comfort working in ambiguity and moving quickly Requirements - Build internal GTM tools, automations, and workflows - Connect APIs across CRM, product, analytics, marketing, and outbound systems - Develop AI-powered workflows and agents - Experiment with MCP servers, integrations, and AI tooling - Improve attribution, funnel visibility, and lifecycle operations - Support rapid experimentation across acquisition and conversion initiatives - Identify inefficiencies and solve them through engineering Benefits - Competitive salary between €29,000 and €35,000, calculated through our salary simulator – built on objective metrics, because we believe in unbiased compensation - Meal vouchers (office or remote) - Mental healthcare - Yearly learning budget and AI tools - Remote flexibility with expenses-paid trips to HQ for team meetups - No clock-in/out policy and one-time home office allowance - Birthday Time Off – one extra day off, just for you - Career Growth Program – clear growth paths, structured goals, and continuous feedback - An international team that moves fast and cares about building things well
AI Evaluation Engineer - Software Engineering Domain
Gramian Consulting GroupGramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.
Role Description We are looking for highly analytical engineers and technical domain experts to contribute to advanced AI evaluation and benchmarking projects focused on realistic terminal-based and infrastructure-heavy workflows. In this role, you will design technically challenging tasks that evaluate how AI systems reason through debugging, operational failures, complex workflows, and multi-step problem-solving scenarios. The ideal candidate has strong experience working with production systems, debugging, automation, or large-scale engineering workflows, and can design realistic technical challenges that simulate real-world engineering environments. This role is particularly well suited for professionals with backgrounds in backend engineering, infrastructure, DevOps, data systems, MLOps, cybersecurity, or platform engineering. - CONTRACT: Contractor assignment (5 weeks) - COMMITMENT: Full-time (40h/week) or Part-time (20h/week) with minimum 4h PST overlap - LOCATION: Remote — Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Pakistan, Indonesia, Kenya, Nigeria, Turkey, Vietnam - PROCESS: One technical assessment/interview (~45 min) Responsibilities - Design realistic terminal-based benchmark tasks for AI evaluation systems - Create technically deep debugging and investigation scenarios - Develop task specifications involving infrastructure, workflows, pipelines, or operational failures - Write clear solution approaches and deterministic evaluation criteria - Identify realistic edge cases, failure modes, and system constraints - Design multi-step reasoning challenges across complex technical environments - Contribute expertise across one or more engineering or operational domains - Review and refine benchmark quality, difficulty, and validation logic - Collaborate with reviewers and researchers on AI evaluation workflows Qualifications - 3–10 years of experience in software engineering or related technical domains - Strong debugging, analytical, and systems reasoning skills - Good understanding of system architecture, dependencies, and operational processes - Experience with terminal, CLI, automation, or developer tooling workflows - Exposure to AI systems, LLMs, benchmarking, or evaluation frameworks is preferred - Ability to design technically rigorous and realistic engineering scenarios
Software Developer
The Hello TeamManaged global staffing across 30 plus countries with enterprise recruiting, oversight, training, and performance manage
• Develop and maintain scalable software features and applications • Assist in building and improving management software systems • Test, debug, and troubleshoot applications • Monitor and maintain system stability and performance • Collaborate with the team on product enhancements • Participate in code reviews and contribute to development best practices • Integrate and maintain APIs and frontend/backend components • Document technical processes and workflows



