Job Closed

This listing is no longer active.

Gramian Consulting logo
Gramian Consulting

We get talents. You get results.

AI Evaluation Engineer – Mathematics & Algorithms

Artificial IntelligenceArtificial IntelligenceContractRemoteSeniorTeam 2-10Since 2025H1B No SponsorCompany SiteLinkedIn

Location

Pakistan

Posted

40 days ago

Salary

0

Seniority

Senior

Bachelor Degree5 yrs expEnglishDockerNumpyPython

Job Description

AI Evaluation Engineer – Mathematics & Algorithms

Gramian Consulting

• Design and build **multi-agent benchmark tasks** requiring multi-step mathematical reasoning and algorithmic problem-solving • Create **complex, decomposable problems** across domains such as: - Competition mathematics - Numerical analysis - Combinatorial optimization - Statistical inference • Develop **verification scripts** to validate: - Numerical outputs (with tolerance thresholds) - Proof correctness and logical steps - Algorithmic outputs and constraints • Write **clear, structured problem statements** with precise notation and defined outputs • Design **task decomposition strategies** for parallel or multi-agent execution • Implement computational solutions and validation pipelines using Python • Work with containerized environments (Docker) for reproducibility and evaluation

Job Requirements

  • 5+ years in mathematics, quantitative research, or computational science — competition math, university-level mathematics, or quantitative research background
  • Python programming — NumPy, SciPy, or symbolic computation (SymPy) Experience writing mathematical proofs or formal derivations.
  • Ability to create problems with precise, verifiable answers — not subjective or open-ended.
  • Experience with AI coding benchmarks (SWE-bench, Terminal-bench)
  • Comfortable with Docker — writing Dockerfiles, building images, and debugging container issues.
  • Understanding of numerical methods — floating point tolerance, convergence criteria, error bounds.
  • Nice to Have**
  • Experience creating competition math problems (AMC, AIME, Putnam, IMO)
  • Background in **theoretical computer science or advanced mathematics research**
  • Exposure to **automated theorem proving or formal verification**
  • Familiarity with AI reasoning benchmarks (GSM8K, MATH, AIME, GPQA, ARC-AGI)
  • Experience in **large-scale numerical or scientific computing**

Related Job Pages

More Artificial Intelligence Jobs

Hightouch logo

AI Creative Technologist

Hightouch

Sync customer data from your warehouse into the tools your business teams rely on.

Full TimeRemoteTeam 51-200Since 2020H1B No Sponsor

• Drive Product Adoption • Master AI Workflows • Consult & Strategize • Performance Optimization • Cross-Functional Collaboration

United States
iTalenters logo

AI Monitoring – Governance Engineer

iTalenters

Leading the art of connecting #tech talent with international IT projects

Full TimeRemoteTeam 11-50Since 2020H1B No Sponsor

• Monitorizar y gobernar los sistemas de IA en producción, especialmente aquellos que utilizan modelos y agentes externos. • Analizar y anticipar el impacto de los cambios de versión en los modelos, asegurando la calidad, continuidad y optimización de costes. • Ajustar y transformar prompts para evitar degradaciones, alucinaciones y problemas de rendimiento. • Colaborar estrechamente con data scientists, data engineers y perfiles de RAG, integrando soluciones robustas y eficientes. • Gestionar la documentación y liderar la integración de APIs y servicios externos. • Formalizar procesos técnicos e impulsar la transición de tareas informales a soluciones estructuradas. • Proponer e implementar mejoras de forma continua apostando siempre por la innovación.

Spain
ContractRemoteTeam 201-500Since 2015H1B No Sponsor

• Operate autonomously to audit, edit, and refine complex AI outputs. • Identifying and correcting malformed LaTeX expressions, unclosed environments, and inaccurate mathematical notations within text. • Rewriting AI-generated text to meet strict stylistic, structural, and creative requirements, ensuring engaging prose, varied sentence structure, and pristine grammar. • Fixing broken markdown elements, including unclosed code blocks, inaccurate language tags, headers, and list numbering to ensure flawless structural formatting. • Applying complex, multi-part evaluation rubrics consistently across high volumes of tasks to generate clean, reliable data for model training. • Ensuring all generated text adheres strictly to spelling, grammar, and formatting conventions.

United States
$8 - $30 / hour
Moore Solutions, Inc. logo

Generative AI Designer, Adobe Firefly Subject Matter Expert

Moore Solutions, Inc.

In-classroom and online learning solutions that work in-app and in the browser

Part TimeRemoteTeam 11-50Since 1996H1B No Sponsor

• Design clear, structured lessons that teach Adobe Firefly and AI assisted design workflows in a practical, easy to follow way • Create step by step guidance showing how to generate, refine, and apply AI generated content in real projects • Record short video walkthroughs demonstrating workflows such as: Text to image, generative fill, and text effects • Prompt development and iteration • Integrating Firefly outputs into Adobe Creative Cloud tools • Develop high quality visuals and screenshots that illustrate key concepts, features, and workflows • Translate design and AI concepts into clear instruction for students at different learning levels

United States
$1K / month