Reddit, Inc.

Dive into anything

Staff Research Engineer – Post-training & Evaluation

Research EngineerResearch EngineerFull Time Remote LeadTeam 501-1,000Since 2005H1B No SponsorCompany Site LinkedIn

Location

United States

Posted

122 days ago

Salary

$230K - $322K / year

Seniority

Lead

Postgraduate Degree6 yrs expExperience acceptedEnglishPython PyTorch

Job Description

• Define the 'Reddit Benchmark' evaluation standard: Own the methodology — not just the harness — for rigorously measuring model quality across Safety, Reasoning, representation/retrieval, and Reddit-specific knowledge. Decide what 'Reddit-native' means in measurable terms and set the bar the org trains against. • Own evaluation reliability and statistical rigor: Establish the science behind trustworthy evals — judge variance, multi-sample scoring, inter-rater/inter-sample agreement, sampling and temperature effects, and calibration of automated judges. You are accountable for whether a benchmark delta is real or noise. Drive the practice of evaluation as a release gate — offline against frozen datasets, and pre-merge in CI/CD — so regressions are caught before endpoints ship. • Design model-as-a-judge methodology: Own judge selection, prompt design, calibration, and reliability for automated evaluation using frontier external models, enabling rapid, trustworthy iteration cycles. • Set post-training recipes and strategy: Design SFT recipes (data mixtures, curriculum, ablation strategy) that convert base models into helpful, well-aligned endpoints; partner with engineering to scale them. • Evaluate base and CPT checkpoints, not just endpoints: Design checkpoint-selection methodology across CPT experiments and LR studies, so we pick the right base before committing post-training compute. • Drive synthetic data generation strategy: Define and curate high-quality instruction and evaluation sets to improve generalization where human data is scarce. • Partner with Safety Engineering: Translate high-level safety policy into concrete classification metrics, probe sets, and CI/CD unit tests — including precision/recall at threshold, label-noise handling, and false-positive taxonomy for abuse detection (HHV). • Diagnose post-training instability: Dive into loss curves and eval logs to identify alignment tax and capability degradation, and recommend the fix. • Lead research direction: Set technical direction for evaluation and post-training across the team, mentor engineers and scientists, and represent the work internally (and externally where appropriate).

Job Requirements

6+ years of professional ML experience (or PhD + 4+) with a direct focus on LLM post-training and evaluation.
PhD or MS in CS, ML, NLP, IR, or a related quantitative field — or equivalent industry research experience.
Deep expertise in evaluation reliability: judge/sample variance, multi-sample scoring, calibration, statistical significance, and the failure modes of automated evaluation.
Strong experience building custom, domain-specific evaluation harnesses (e.g., lm-eval-harness, Inspect AI, LightEval) — you know the strengths and limits of benchmarks like MMLU and GSM8K and when they don't apply, and you treat eval sets as versioned, frozen, regression-tracked code.
Experience evaluating both generation and representation/classification: model-as-a-judge for generative quality and precision/recall, PR-AUC, retrieval/MTEB-style metrics, gold-label denoising, and label-noise handling.
Deep understanding of Continuous Pre-training (CPT), Instruction Tuning (SFT), and how data quality shapes model behavior.
Fluency in Python; strong data-pipeline and eval-harness engineering (e.g., Hugging Face Transformers, vLLM, lm-eval-harness). Working knowledge of PyTorch and distributed training (FSDP2, DeepSpeed ZeRO-3) sufficient to direct and debug post-training runs.

Benefits

Comprehensive Healthcare Benefits and Income Replacement Programs
401k with Employer Match
Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
Family Planning Support
Gender-Affirming Care
Mental Health & Coaching Benefits
Flexible Vacation & Paid Volunteer Time Off
Generous Paid Parental Leave

Related Categories

Research Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More Research Engineer Jobs

Machine Learning Research Engineer

Miro

We’re a visual workspace for innovation, built for distributed teams of any size.

Research Engineer125 days ago

Full Time RemoteTeam 1,001-5,000H1B Sponsor

Company Site LinkedIn

• Design, train, and ship production-grade ML models, including deep learning, NLP, and computer vision systems—that solve complex business problems and power core product features. • Conduct deep exploratory research on massive datasets to uncover novel patterns in user behavior and content creation, translating raw data insights into new predictive modeling opportunities. • Apply advanced fine-tuning strategies (e.g., PEFT, LoRA) to adapt state-of-the-art foundation models to specific domain tasks, rigorously experimenting to maximize performance. • Architect scalable ML pipelines for data processing, feature engineering, training, and evaluation, ensuring high data quality and system reliability. • Optimize model performance for latency, throughput, and resource utilization, balancing model complexity with production constraints (e.g., overfitting vs. underfitting, compute efficiency). • Collaborate cross-functionally with data engineers, product managers, and software engineers to translate business requirements into technical ML specifications and integrate models into user-facing applications. • Champion MLOps excellence by automating deployment workflows, implementing CI/CD for ML, and establishing robust monitoring for model drift and health. • Stay at the forefront of ML research, evaluating novel algorithms and techniques (e.g., Transformer architectures, quantization) to drive innovation and technical strategy.

NumPy Pandas Python PyTorch scikit-learn TensorFlow

View details: Machine Learning Research Engineer

Denmark

Apply

Job Closed

Research Engineer Intern

GenBio AI

GenBio.AI, Inc. (GenBio AI) is an innovative global startup dedicated to developing the world's first AI-driven Digital Organism, an integrated system of multiscale foundation models for predicting, simulating, and programming biology at all levels. Our goal is to achieve comprehensive, actionable empirical understandings of the mechanisms underlying all organismal physiologies and diseases. This will pave the way for a new paradigm in drug design, bio-engineering, personalized medicine, and fundamental biomedical research, all powered by Generative Biology. Our founding team consists of world-renowned scientists and researchers in AI and Biology from prestigious institutions such as CMU, MBZUAI, WIS, alongside prominent financial investors. GenBio AI, a true global effort from day one, is establishing offices in Palo Alto, Paris, and Abu Dhabi.

Research Engineer125 days ago

Other RemoteTeam 29Since 2024

Headquartered in Silicon Valley, we are a newly established start-up where a collective of visionary scientists, engineers, and entrepreneurs are dedicated to transforming the landscape of biology and medicine through the power of generative AI. Our team comprises leading minds and innovators in AI and biological science, pushing the boundaries of what is possible. We are dreamers who reimagine a new paradigm for biology and medicine. We are committed to decoding biology holistically and enabling the next generation of life-transforming solutions. As the first mover in pan-modal Large Biological Models (LBM), we are pioneering a new era of biomedicine, with our LBM training leading to ground-breaking advancements and a transformative approach to healthcare. Our robust R&D team and leadership in LLMs and generative AI position us at the forefront of this revolutionary field. With headquarters in Silicon Valley, California, and a branch office in Paris and Abu Dhabi, we are poised to make a global impact. Join us as we embark on this journey to redefine the future of biology and medicine through the transformative power of Generative AI. Job Description: You will work with the team to conduct cutting-edge research in AI, foundation models, and computational biology. Your primary tasks will include improving existing models and exploring new methodologies to advance our AI capabilities in biology. You will collaborate with the team on designing and executing large-scale experiments, analyzing complex datasets, and applying statistical techniques to validate the performance and robustness of AI systems. Additionally, you will work closely with AI/machine learning researchers and computational biologists to develop Genbio AI’s state-of-the-art biology foundation models and drive the research agenda to generate impact. Qualification:

C Hugging Face JAX Python PyTorch

View details: Research Engineer Intern

California + 2 more

Apply

ML Research Engineer - GPUs go Brrr

Achira

Achira is building atomistic foundation simulation models to power the future of drug discovery.

Research Engineer125 days ago

Other RemoteTeam 12Since 2024

Why Achira Join a world-class team of scientists, ML researchers, and engineers working together to make the physical microcosm predictable and reshape the future of drug discovery. Move beyond the beaten path: we are actively exploring the next frontier of model architectures for AI x chemistry. Operate at frontier scale: massive compute, massive data, and massive ambition. Own impactful work end-to-end: from ideation to architecture to deployment on large-scale infrastructure. Work in an environment that rewards rigor, speed, execution, and an ownership mindset. About the Role Achira is building best-in-class foundation models to solve the most challenging problems in simulation for drug discovery and beyond. Atomistic Foundation simulation models (FSMs) as world models of the physical microcosm span machine learning interaction potentials (MLIPs), neural network potentials (NNPs), and diverse classes of generative models. We're looking for a rare individual who thrives at the intersection of cutting-edge deep learning architectures and high-performance computing. You will help shape the future of molecular machine learning by engineering high-efficiency implementations of advanced architectures for molecular densities, graph neural networks (GNNs) and beyond — pushing past the limits of what’s currently possible with today's hardware. Foundation simulation models hold immense promise in material sciences and drug discovery, but remain underutilized. At Achira, you’ll have the opportunity to change that — enabling models that understand and simulate the physical world at atomic resolution, with speed and fidelity never before seen. What You’ll Do Architect & Integrate : Implement state-of-the-art Graph Transformers, GNNs, and similar geometric deep learning architectures into production-ready pipelines. Optimize Deeply : Drive end-to-end performance — from high-level implementations in PyTorch / JAX down to hand-tuned CUDA kernels — to extract maximum throughput, minimize memory footprints, and optimize GPU compute bubbles. Scale Intelligently : Help scale training and inference workloads across thousands (and eventually tens of thousands) of GPUs, maximizing FLOPs, saturating caches, and pushing hardware to its limits. Collaborate Closely : Work alongside scientists and ML researchers to identify, evaluate, and develop novel architectures with superior inductive biases for molecular modeling. Simulate Precisely : Hone our models to simulate molecular systems with unprecedented speed and accuracy — enabling breakthroughs in drug design, protein modeling, and beyond. Automate Workflows: Utilize generative coding tools to accelerate your work and ultimately automate your optimization workflows About You You're equally excited writing PyTorch / JAX prototype code or tuning custom CUDA kernels for warp-level parallelism. You have strong opinions about cache hierarchies, tensor fusion, memory-bound vs compute-bound workloads — and know when to profile rather than guess. You’re energized by new architectures in the GNN and equivariance space, but never let the code get sloppy — you build for reusability, clarity, and scale. You’re curious about bleeding-edge as well as established technologies like Triton, TensorRT, TorchInductor, and NVIDIA Warp, and not afraid to dive in and try them out. You believe performance is a feature — and love seeing models train and infer faster. You have a sense of relentless urgency and are a natural collaborator who values team success. You want to work within a well-funded, bold, talent-dense organization to do your best work and focus on transformational impact against some of the world’s hardest technical problems. Eligibility In compliance with United States federal law, all persons hired will be required to verify identity and eligibility to work in the United States and to provide required employment eligibility verification documentation upon hire.

JAX PyTorch

View details: ML Research Engineer - GPUs go Brrr

California + 1 more

Apply

Job Closed

Senior Research Engineer

Mem0

The Memory layer for your AI apps and agents.

Research Engineer127 days ago

Other RemoteTeam 1-10Since 2023H1B No Sponsor

Company Site LinkedIn

Role Summary: Own the end-to-end lifecycle of memory features—from research to production. You’ll fine-tune models for extraction, updates, consolidation/forgetting, and conflict resolution; turn customer pain points into research hypotheses; implement and benchmark ideas from papers; and ship with Engineering to SOTA latency, reliability, and cost . You’ll also build evaluation at scale (offline metrics + online A/Bs) and close the loop with real-world feedback to continuously improve quality. What You'll Do: Fine-tune and train models for memory extraction, updates, consolidation/forgetting, and conflict resolution; iterate based on data and outcomes. Read, reproduce, and implement research : quickly prototype paper ideas, benchmark against baselines, and productionize what wins. Build evaluation at scale : automated relevance/accuracy/consistency metrics, gold sets, online A/B & interleaving, and clear dashboards. Work closely with customers to uncover pain points, turn them into research hypotheses, and validate solutions through field trials. Partner with Engineering to ship : design APIs and data contracts, plan safe rollouts, and maintain SOTA latency, reliability, and cost at scale. Minimum Qualifications

Python PyTorch

View details: Senior Research Engineer

California + 1 more

Apply

Job Closed

Staff Research Engineer – Post-training & Evaluation

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More Research Engineer Jobs

Machine Learning Research Engineer

Research Engineer Intern

ML Research Engineer - GPUs go Brrr

Senior Research Engineer