This opportunity is available through a leading AI-driven work platform.
Machine Learning Research Benchmark Consultant
Location
United States
Posted
1 day ago
Salary
$60 - $75 / hour
Seniority
Mid Level
No structured requirement data.
Job Description
Machine Learning Research Benchmark Consultant
24-MAG
Role Description We are sharing a specialised part-time consulting opportunity for experienced machine learning engineers and researchers with strong practical expertise in ML research workflows, model training, post-training, dataset curation, reinforcement learning, architecture design, evaluation tasks, and sandboxed technical environments. This role supports current and upcoming remote consulting opportunities focused on benchmarking AI agent performance on realistic machine learning research tasks. Selected professionals will complete self-contained ML research tasks under defined time and compute constraints, provide human reference performance for evaluation workflows, and submit structured work outputs that support technical assessment and research-quality analysis. Key Responsibilities - Machine Learning Research Task Execution - Attempt open-ended machine learning research tasks under fixed time and compute constraints. - Work independently in a sandboxed Linux environment using provided compute resources. - Apply practical ML engineering and research judgment to self-contained AI R&D tasks. - Use preferred development workflows and tools, including IDEs, coding assistants, notebooks, or command-line workflows where permitted. - Submit final work products that reflect clear reasoning, technical execution, and reproducible effort. - Benchmarking & Human Reference Evaluation - Serve as a skilled human reference point for evaluating AI agent performance on realistic ML research tasks. - Complete tasks using the same constraints and environment conditions defined for evaluation workflows. - Support benchmark quality by producing reliable, interpretable, and technically meaningful task attempts. - Document decisions, assumptions, implementation choices, and constraints where relevant. - Complete short pre-task and post-task questionnaires as part of the evaluation process. - Technical Workflow, Recording & Quality Control - Work in sandboxed technical environments with Linux, internet access, and provided compute resources. - Record full working sessions when required for evaluation and review purposes. - Follow task-specific confidentiality, NDA, environment, and submission requirements. - Debug issues involving code, packages, model training workflows, data processing, runtime behavior, or environment setup. - Submit required materials, including final outputs, recordings, questionnaires, and supporting notes where applicable. Qualifications - 3+ years of practical machine learning experience, with time spent in a PhD program counting toward this requirement where relevant. - Hands-on experience with at least one major ML framework such as PyTorch, JAX, or TensorFlow. - Strong practical ability to complete open-ended ML research or engineering tasks independently. - Experience working in Linux environments, debugging technical workflows, and managing research-oriented development tasks. - Ability to reason under time and compute constraints while producing clear, high-quality technical work. - Strong written communication skills for explaining methods, decisions, and results. - High attention to detail and comfort following structured evaluation, recording, and submission requirements. - Availability for at least 20 hours per week if selected, with additional availability considered helpful depending on project needs. Educational Background - Strong academic or professional background in machine learning, artificial intelligence, computer science, data science, statistics, engineering, or a related technical field is highly relevant. - PhD experience, advanced research experience, or comparable industry experience in machine learning may be especially valuable. - Candidates with experience from highly selective academic programs, major technology companies, AI research teams, or comparable technical environments may be a strong fit. - Practical research and engineering experience may be considered alongside formal education depending on project requirements. Nice to Have - Deep hands-on expertise in one or more of the following areas: - Pretraining transformer language models from scratch. - Reinforcement learning, PPO, reward shaping, custom gym or gymnasium environments, and throughput tuning. - Full fine-tuning, LoRA, QLoRA, DPO, RLHF, RLAIF, distillation, or post-training workflows. - Large-scale corpus filtering, deduplication, subsampling, and benchmark contamination avoidance. - Architecture design under strict parameter-count or size constraints. - Modifying pretrained architectures, including attention patterns, pooling heads, or training objectives. - Contrastive training for embedding or retrieval models. - Generative vision or video modeling. - Multilingual or low-resource language work. - Image or video data pipelines at scale. - Balancing competing model objectives such as safety and capability. - Prior experience as an ML evaluator, red-teamer, benchmark contributor, research engineer, or technical baseliner. - Experience using AI coding assistants in technical workflows while maintaining strong independent judgment. - Comfort working with confidential project materials and structured technical review processes. Why This Opportunity - Apply advanced ML research and engineering expertise to realistic AI R&D benchmarking tasks. - Serve as a human reference point for evaluating AI agent performance on open-ended technical challenges. - Work with provided compute and sandboxed environments without requiring personal GPU resources. - Use practical ML judgment across training, post-training, dataset curation, architecture, reinforcement learning, or evaluation workflows. - Remote structure with competitive hourly compensation. Contract Details - Independent contractor role. - Fully remote with project-based technical work. - Minimum expected availability of approximately 20 hours per week if selected, with greater availability preferred depending on project needs. - Competitive rates of $60–$75 per hour depending on expertise, ML research depth, task performance, and project scope. - A work-trial-style baseline task may be required before longer-term selection. - Each assigned baseline task may be attempted only once per contractor. - Project work may require screen recording, questionnaires, final work product submission, and adherence to confidentiality or NDA requirements. - Compute and sandboxed technical environments may be provided depending on task scope. - Weekly payments via Stripe or Wise. - Projects may be extended, shortened, or adjusted depending on scope and performance. About the Platform This opportunity is available through 24-MAG LLC. We connect experienced professionals with remote consulting opportunities across technical, evaluation, and project-based workstreams.
Related Guides
Related Job Pages
More Machine Learning Engineer Jobs
• Develop user-facing APIs, SDKs, and tools that streamline access to Mindbeam’s AI infrastructure. • Partner with research and product teams to translate complex ML workflows into clear, usable abstractions. • Optimize interfaces for scalability, security, and performance. • Advocate for the developer experience by gathering feedback and iterating rapidly. • Collaborate cross-functionally to ensure interfaces integrate seamlessly with enterprise environments.
Part-Time Online Facilitator
Eastern Kentucky UniversityEastern Kentucky University (EKU) is a four-year institution that provides its diverse student body with a wide range of educational opportunities on campus and
Role Description The School of Nursing at Eastern Kentucky University invites applications for a part-time online facilitator position. Applicants should hold a minimum of a master's degree in nursing or a related field from a regionally accredited institution and an unencumbered nursing license. The position may serve the online RN-BSN, MSN (Family Nurse Practitioner or Psychiatric Mental Health Nurse Practitioner) or Doctorate in Nursing programs and requires a minimum of two years nursing experience. If applying to facilitate in FNP or PMHNP specific courses, then the applicant must hold an unencumbered licensed as a FNP or PMHNP. Qualifications - Master's of Science in Nursing or a related field from a regionally accredited institution. - Unencumbered Registered Nurse license from the state of Kentucky or compact state. - Minimum 2 years experience as a licensed nurse. Requirements - Unencumbered license as a registered nurse in Kentucky or a compact state. Preferred Qualifications - Master's or doctoral degree in nursing or a related field from a regionally accredited institution. - Unencumbered nursing license. - Minimum of two years nursing experience. - If applying to facilitate in FNP or PMHNP specific courses, must hold an unencumbered license as a FNP or PMHNP. Job Duties - This position is not eligible for visa sponsorship. EEO Statement Eastern Kentucky University is an Equal Opportunity employer and educational institution and does not discriminate on the basis of age (40 and over), race, color, religion, sex, sexual orientation, gender identity, gender expression, pregnancy, ethnicity, disability, national origin, veteran status, or genetic information in the admission to, or participation in, any educational program or activity (e.g., athletics, academics and housing) which it conducts or any employment policy or practice. Background Check Statement Offers of employment are contingent upon a satisfactory background check.
• Work with partner ML and Annotation engineers and TPMs to spec out infrastructure and training requirements • Design and maintain robust CI/CD and CT (Continuous Training) pipelines for complex multimodal models • Implement versioning and storage strategies for massive 2D/3D datasets to ensure reproducibility and high-throughput access • Deploy and manage systems for monitoring model performance and data drift in production environments
Senior Machine Learning Engineer & Data Analyst
RangeAt Range Group, we believe in the power of the professional travel advisor. We represent over $4.5 billion in annual travel sales. All our investments stem from a common thesis, that travel advisors add real value to a trip. Travel advisors, when equipped with technology and 24/7 support, are irreplaceable. Equally important is our philanthropy. Range Foundation channels 10% of group profits to social impact, investing to build a better future. This is a fully remote opportunity based out of British Columbia to support our Pacific Standard Time operations. Candidates in Mountain Standard Time who are comfortable working in PST will also be considered. The company operates on Eastern Standard Time with core hours 9:00 AM to 6:00 PM EST (6:00 AM to 3:00 PM PST); we are open to start times between 6:00 AM and 9:00 AM PST. The candidate must be available to support production deployments Monday through Thursday, 6:30 PM to 7:30 PM PST. On weeks with evening deployments, hours may be reduced by the equivalent deployment time on Friday or another day of the week.
Role Description We are looking for a senior machine learning expert and data analyst to help us design, extend, and operate financial risk scoring systems at scale. You’ll work on models and pipelines that process hundreds of terabytes of data and power decisions where accuracy, explainability, and robustness matter. This role sits at the intersection of machine learning, fintech analytics, and big-data engineering. You’ll help evolve our scoring algorithms, improve signal quality, and ensure our models remain reliable and interpretable in production environments. We’re especially interested in someone who can combine strong ML theory, hands-on data engineering, and pragmatic fintech experience. What You’ll Do - Design and improve financial risk scoring algorithms and models. - Analyze large-scale datasets (hundreds of TBs in Elasticsearch and related systems). - Build and maintain data processing pipelines for feature generation, training, and evaluation. - Develop ML models for anomaly detection, fraud detection, credit/risk scoring, and behavioral analysis. - Validate models for accuracy, bias, stability, and drift over time. - Ensure models are explainable, auditable, and production-ready. - Work closely with engineering teams to deploy models into production systems. - Optimize performance and cost across large-scale data infrastructure. - Define metrics, dashboards, and monitoring for model performance. - Investigate edge cases and failure modes in scoring systems. Qualifications - Senior-level experience in machine learning and data analysis (5+ years). - Strong background in financial risk, fintech analytics, or fraud detection. - Experience building and deploying production ML models. - Strong Python ecosystem skills (NumPy, pandas, scikit-learn, PyTorch/TensorFlow, etc.). - Experience with large-scale data processing (100s of TBs). - Deep experience with Elasticsearch or similar distributed data stores. - Experience designing data pipelines (batch and/or streaming). - Strong statistical reasoning and experimentation skills. - Ability to translate business risk concepts into measurable model features. - Experience evaluating model drift, bias, and long-term stability. Requirements - Experience with real-time scoring systems. - Experience with distributed compute frameworks (Spark, Beam, Flink, etc.). - Familiarity with regulatory or compliance-sensitive environments. - Experience with graph-based risk models or transaction network analysis. - Experience building internal analytics tools or dashboards. - Knowledge of feature stores and model versioning systems. Benefits - Competitive salary + performance incentives. - Equity aligned with long-term growth. - High ownership and direct exposure to leadership. - Remote-first with global team. - Health and sports benefits. - Yearly international team off-sites. - Work on high-impact financial risk systems. - Tackle real-world ML challenges at large scale. - Influence the architecture of our scoring and analytics platform. - Collaborate with experienced engineers and data specialists. - Own meaningful parts of our data and ML strategy. How to Apply Send us: - A short introduction and relevant experience. - Examples of ML or risk models you’ve worked on. - Links to projects, papers, or code (if available). We’re particularly interested in candidates who can demonstrate experience building robust financial risk models on very large datasets and bringing them successfully into production.

