Job Closed

This listing is no longer active.

Gramian Consulting

We get talents. You get results.

AI Evaluation Engineer – Software Engineering Domain

Full-stack EngineerSoftware EngineerContract Remote SeniorTeam 2-10Since 2025H1B No SponsorCompany Site LinkedIn

Location

Egypt

Posted

59 days ago

Salary

Seniority

Senior

Bachelor Degree3 yrs expEnglish

Job Description

• Design realistic terminal-based benchmark tasks for AI evaluation systems • Create technically deep debugging and investigation scenarios • Develop task specifications involving infrastructure, workflows, pipelines, or operational failures • Write clear solution approaches and deterministic evaluation criteria • Identify realistic edge cases, failure modes, and system constraints • Design multi-step reasoning challenges across complex technical environments • Contribute expertise across one or more engineering or operational domains • Review and refine benchmark quality, difficulty, and validation logic • Collaborate with reviewers and researchers on AI evaluation workflows

Job Requirements

3–10 years of experience in software engineering or related technical domains
Strong debugging, analytical, and systems reasoning skills
Good understanding of system architecture, dependencies, and operational processes
Experience with terminal, CLI, automation, or developer tooling workflows
Exposure to AI systems, LLMs, benchmarking, or evaluation frameworks is preferred
Ability to design technically rigorous and realistic engineering scenarios

Related Categories

Remote Full-stack Engineer Jobs Remote Software Engineer Jobs Remote Backend Engineer Jobs Frontend Engineer Android Engineer iOS Engineer Game Engineer

Related Job Pages

Remote Full-stack Engineer Jobs More Remote Jobs

More Full-stack Engineer Jobs

Tech Lead – AI Lab

Qonto

The finance solution that energizes SMEs and freelancers

Full-stack Engineer59 days ago

Full Time RemoteTeam 501-1,000H1B No Sponsor

Company Site LinkedIn

• Join our AI Lab as a Tech Lead and help Qonto build the AI-first product that transforms how 600,000+ businesses manage their finances. • Spend most of your time doing what you love — shipping — while providing the technical direction and light mentorship that turns a high-velocity squad into a high-quality one. • Work alongside Sophie, our Manager for the AI Lab, you'll set the technical standard for a 13-person cross-functional team as we scale from one AI agent to many. • Directly manage 5 engineers across Backend, Web, and Mobile — leading by project and track, not by stack — keeping pace with our ambition: one new agent every 6 weeks. • Act as the technical interface between the AI Lab and the broader organisation — understand how features are built across Qonto, bring that knowledge into the Lab, and ensure what we build in agents integrates with the wider product.

Python Ruby Go

View details: Tech Lead – AI Lab

IDF + 4 more

Apply

Job Closed

GTM Engineer

ShippyPro

Make people work better. The easiest way to ship, track and return your e–commerce orders.

Full-stack Engineer59 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

Role Description You’ve spent years building systems, APIs, automations, and integrations. But you also love GTM, marketing, sales, funnels, and growth. You care about what converts, not just what works. You’re the kind of engineer who opens Claude before Stack Overflow, connects tools for fun, and constantly experiments. At ShippyPro, you’ll join the Engineering Team and work with Product, Marketing, and Sales to build the systems behind how we acquire, convert, and scale customers globally. This is engineering applied to growth. Qualifications - Several years of engineering experience - Strong interest in GTM, marketing, sales, and growth systems - Experience with APIs, integrations, automations, and internal tooling - Builder mentality — you prefer shipping over overthinking - Strong curiosity around AI workflows, agents, MCP ecosystems, and LLM-powered operations - Fast execution and experimentation mindset - Comfort working in ambiguity and moving quickly Requirements - Build internal GTM tools, automations, and workflows - Connect APIs across CRM, product, analytics, marketing, and outbound systems - Develop AI-powered workflows and agents - Experiment with MCP servers, integrations, and AI tooling - Improve attribution, funnel visibility, and lifecycle operations - Support rapid experimentation across acquisition and conversion initiatives - Identify inefficiencies and solve them through engineering Benefits - Competitive salary between €29,000 and €35,000, calculated through our salary simulator – built on objective metrics, because we believe in unbiased compensation - Meal vouchers (office or remote) - Mental healthcare - Yearly learning budget and AI tools - Remote flexibility with expenses-paid trips to HQ for team meetups - No clock-in/out policy and one-time home office allowance - Birthday Time Off – one extra day off, just for you - Career Growth Program – clear growth paths, structured goals, and continuous feedback - An international team that moves fast and cares about building things well

Google Tag Manager AI LLM CRM

View details: GTM Engineer

Worldwide

€29K - €35K / year

Apply

Job Closed

AI Evaluation Engineer - Software Engineering Domain

Gramian Consulting Group

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.

Full-stack Engineer59 days ago

Contract Remote

Role Description We are looking for highly analytical engineers and technical domain experts to contribute to advanced AI evaluation and benchmarking projects focused on realistic terminal-based and infrastructure-heavy workflows. In this role, you will design technically challenging tasks that evaluate how AI systems reason through debugging, operational failures, complex workflows, and multi-step problem-solving scenarios. The ideal candidate has strong experience working with production systems, debugging, automation, or large-scale engineering workflows, and can design realistic technical challenges that simulate real-world engineering environments. This role is particularly well suited for professionals with backgrounds in backend engineering, infrastructure, DevOps, data systems, MLOps, cybersecurity, or platform engineering. - CONTRACT: Contractor assignment (5 weeks) - COMMITMENT: Full-time (40h/week) or Part-time (20h/week) with minimum 4h PST overlap - LOCATION: Remote — Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Pakistan, Indonesia, Kenya, Nigeria, Turkey, Vietnam - PROCESS: One technical assessment/interview (~45 min) Responsibilities - Design realistic terminal-based benchmark tasks for AI evaluation systems - Create technically deep debugging and investigation scenarios - Develop task specifications involving infrastructure, workflows, pipelines, or operational failures - Write clear solution approaches and deterministic evaluation criteria - Identify realistic edge cases, failure modes, and system constraints - Design multi-step reasoning challenges across complex technical environments - Contribute expertise across one or more engineering or operational domains - Review and refine benchmark quality, difficulty, and validation logic - Collaborate with reviewers and researchers on AI evaluation workflows Qualifications - 3–10 years of experience in software engineering or related technical domains - Strong debugging, analytical, and systems reasoning skills - Good understanding of system architecture, dependencies, and operational processes - Experience with terminal, CLI, automation, or developer tooling workflows - Exposure to AI systems, LLMs, benchmarking, or evaluation frameworks is preferred - Ability to design technically rigorous and realistic engineering scenarios

View details: AI Evaluation Engineer - Software Engineering Domain

India + 9 more

Apply

Job Closed

Software Developer

The Hello Team

Managed global staffing across 30 plus countries with enterprise recruiting, oversight, training, and performance manage

Full-stack Engineer59 days ago

Full Time RemoteTeam 1,001-5,000Since 2021H1B No Sponsor

Company Site LinkedIn

• Develop and maintain scalable software features and applications • Assist in building and improving management software systems • Test, debug, and troubleshoot applications • Monitor and maintain system stability and performance • Collaborate with the team on product enhancements • Participate in code reviews and contribute to development best practices • Integrate and maintain APIs and frontend/backend components • Document technical processes and workflows

React TypeScript

View details: Software Developer

New York

Apply