Firecrawl

Firecrawl is a San Francisco, California-based software company whose core mission is to make web data extraction fast, reliable, and accessible for AI and LLM

Research Engineer (Focused on RL)

Location

United States

Posted

79 days ago

Salary

$180K - $270K / year

Seniority

Mid Level

Job Description

Research Engineer (Focused on RL)

Firecrawl

Research Engineer (Focused on RL) You'll bring reinforcement learning to Firecrawl's core product — building the training infrastructure, reward pipelines, and fine-tuning systems that make our models meaningfully better at extracting, understanding, and structuring web data. This isn't theoretical RL research. You'll build your own training infra, run fast experiments, ship models to production, and bridge the gap between classical RL approaches and modern LLM agent systems. If you care as much about training throughput as you do about reward design, this is the role. Salary Range: $180,000–$270,000/year (Range shown is for U.S.-based employees. Compensation outside the U.S. is adjusted fairly based on your country's cost of living. You can explore how we calculate this here: https://www.firecrawl.dev/careers/compensation.) Equity Range: Up to 0.15% Location: San Francisco, CA (Preferred) OR Remote (Americas) Job Type: Full-Time Experience: 3+ years in applied RL, ML engineering, or model training — with production systems Visa: US Citizenship/Visa required for SF; N/A for Remote About Firecrawl Firecrawl is the easiest way to extract data from the web. Developers use us to reliably convert URLs into LLM-ready markdown or structured data with a single API call. In just a year, we've hit 8 figures in ARR and 90k+ GitHub stars by building the fastest way for developers to get LLM-ready data. We're a small, fast-moving, technical team building essential infrastructure super-intelligence will use to gather data on the web. We ship fast and deep. What You'll Do - Build training infrastructure and reward pipelines from scratch: Design and operate the systems that train and evaluate Firecrawl's models. You'll own the full loop — data collection, reward modeling, training runs, evaluation, and deployment. You build the infra yourself because you're the one who needs it to work. - Fine-tune models to achieve state-of-the-art results: Take foundation models and make them dramatically better at web data extraction, content understanding, and structured output generation. You know how to get from "decent fine-tune" to "best-in-class" and you have the patience and rigor to close that gap. - Bridge LLM agents and classical RL: The most interesting problems at Firecrawl sit at the intersection of modern LLM-based agents and classical RL techniques. You'll design reward signals for agent behaviors, apply RL methods to improve multi-step agent workflows, and figure out where traditional RL approaches outperform prompting — and vice versa. - Run fast experiments and iterate: You design experiments that test meaningful hypotheses, run them quickly, and make decisions based on results. You don't spend weeks on experiment infrastructure before getting a single result. Speed of iteration is a core part of how you work. - Communicate clearly to non-RL people: RL can be opaque. You translate your work into language that engineers, product people, and leadership can understand and act on. You know how to explain why a reward function matters without requiring everyone to read the paper. - Collaborate across the research team: Work closely with the Head of Research and the Search/IR-focused Research Engineer to connect RL improvements with search, ranking, and the broader product strategy. What We're Looking For Someone who builds their own training infra and reward pipelines. You don't wait for an ML platform team to set things up. You build the training loops, reward models, data pipelines, and evaluation frameworks yourself — because you understand that the infra choices directly affect the quality of the results. You've operated GPU clusters, managed training runs, and debugged convergence issues in production. Can fine-tune models to achieve SOTA results. You've taken models from baseline to best-in-class on tasks that matter. You understand the full fine-tuning lifecycle — data curation, training dynamics, hyperparameter sensitivity, evaluation methodology — and you have the taste to know when a model is actually good versus when the eval is flattering. Bridges LLM agents and classical RL approaches. You're fluent in both worlds. You understand PPO, RLHF, reward modeling, and policy optimization — and you also understand how modern LLM agents work, where they fail, and how RL techniques can make them better. You see connections between these domains that most people miss. Runs fast experiments and communicates clearly. You bias toward quick iterations over perfect setups. You'd rather run three rough experiments this week than one polished one next month. And when you have results, you communicate them clearly — to other researchers, to engineers, and to leadership. No one needs to decode your work to understand its impact. Production-minded. You care about whether your models actually work in production, not just on benchmarks. You've deployed models that serve real traffic and you've made hard tradeoffs between model quality, latency, and cost. Research that doesn't ship isn't research that matters here. Backgrounds that tend to do well: RL engineers at AI labs or applied ML teams who've shipped models to production. Researchers who've done RLHF or reward modeling for LLM systems. ML engineers who've built training infrastructure at startups and cared as much about the pipeline as the model. People who've worked at the intersection of RL and language models — whether in academic labs with a production bent or at companies building agent systems. What We're NOT Looking For Pure theorists. If your best RL work is a proof in a paper and you've never trained a model on real data at real scale, this isn't the role. We need someone who builds and ships, not someone who derives. Researchers who need a platform team. If you expect training infrastructure, data pipelines, and evaluation frameworks to be set up for you before you can be productive, you'll be frustrated here. You build the tools you need. People who only know one paradigm. If you're deep in classical RL but have never worked with LLMs, or if you're an LLM fine-tuner who's never touched RL — you'll be missing half the picture. This role requires fluency in both. Slow iterators. If your standard experiment cycle is measured in weeks, not days, you'll struggle with the pace here. We need someone who can run a meaningful experiment, interpret results, and decide next steps within a day or two — not someone who needs a month-long study to make a call. Black-box communicators. If your typical update is a wall of metrics that only another RL researcher can interpret, this isn't the right fit. We need someone who can explain what's working, what's not, and why it matters to people who don't have RL PhDs. A Note On Pace We operate at an absurd level of urgency because the window for what we're building won't stay open forever. If that excites you, keep reading. If it doesn't, no hard feelings — but this role probably isn't for you. Benefits & Perks Available to all employees - Salary that makes sense — $180,000–$270,000/year, based on impact, not tenure - Own a piece — Up to 0.15% equity in what you're helping build - Generous PTO — 15 days mandatory, anything after 24 days, just ask (holidays excluded); take the time you need to recharge - Parental leave — 12 weeks fully paid, for moms and dads - Wellness stipend — $100/month for the gym, therapy, massages, or whatever keeps you human - Learning & Development — Expense up to $1,000/year toward anything that helps you grow professionally - Team offsites — A change of scenery, minus the trust falls - Sabbatical — 3 paid months off after 4 years, do something fun and new Available to US-based full-time employees - Full coverage, no red tape — Medical, dental, and vision (100% for employees, 50% for spouse/kids) — no weird loopholes, just care that works - Life & Disability insurance — Employer-paid short-term disability, long-term disability, and life insurance — coverage for life's curveballs - Supplemental options — Optional accident, critical illness, hospital indemnity, and voluntary life insurance for extra peace of mind - Doctegrity telehealth — Talk to a doctor from your couch - 401(k) plan — Retirement might be a ways off, but future-you will thank you - Pre-tax benefits — Access to FSAs and commuter benefits (US-only) to help your wallet out a bit - Pet insurance — Because fur babies are family too Available to SF-based employees - SF HQ perks — Snacks, drinks, team lunches, intense ping pong, and peak startup energy - E-Bike transportation — A loaner electric bike to get you around the city, on us Interview Process - Application Review — Send us your stuff + a quick note on why this excites you. Show us what you've trained — models, reward systems, training pipelines. Published work is great; shipped production models are better. - Technical Deep Dive (~60 min) — Go deep on RL and model training work you've done: training infrastructure decisions, reward design, fine-tuning approaches, and production deployment. We'll explore a live problem — how you'd approach applying RL to improve an LLM agent workflow at Firecrawl. We're looking for depth across both classical RL and modern LLM techniques, production instincts, and fast reasoning. - Founder Chat (~30 min) — Culture, pace, ownership, and how you like to work. Time for your questions too. - Paid Work Trial (1–2 weeks) — Test drive the real thing: tackle a real RL/fine-tuning problem with production implications. We'll evaluate on technical depth, experiment velocity, and how clearly you communicate results. - Decision — We move fast after the trial. If you want to bring RL to one of the most interesting applied problems in AI — making agents better at understanding and extracting web data at scale — this is your shot. 👉 Apply now and let's train something great. 🧠

Related Categories

Related Job Pages

More Research Engineer Jobs

Role Description In this role, you will be responsible for actively participating in research, prototyping ideas, transforming research prototypes into production, and conducting code reviews. Your responsibilities will also include: - Planning, implementing, and shipping end-to-end functionality. - Profiling bottlenecks, hardening cryptographic components, and testing them. Qualifications - Experience with modern cryptography. - Familiarity with zero-knowledge virtual machines (zkVMs). - Experience in blockchain or other peer-to-peer systems. - Motivated by decentralisation and privacy. Requirements - Solid Rust knowledge. - Understanding of cryptography and zk-based blockchain systems. - Ability to propose improvements to the core protocol. - Experience working across different time zones. - Strong asynchronous teamwork. - Alignment to IFT / Logos values. Benefits - The expected compensation range for this role is negotiable, dependent on how we assess your skills and experience throughout our interview process. - We are happy to pay in any mix of fiat/crypto. Hiring Process - Interview with our POps team. - Interview with Zones Team Lead. - Pair programming task with the Zones team. - Interview with the Blockchain Team Lead. The steps may change along the way if we see it makes sense to adapt the interview stages, so please consider the above as a guideline.

United States + 29 moreAll locations: United States | Canada | Brazil | Colombia | Argentina | Chile | Venezuela | Bolivia | Ecuador | French Guiana | Guyana | Paraguay | Peru | Suriname | Uruguay | Mexico | Costa Rica | El Salvador | Guatemala | Honduras | Nicaragua | Panama | Dominican Republic | Puerto Rico | Bahamas | Guadeloupe | Haiti | Jamaica | Martinique | Montserrat
Job Closed
Hello Patient logo

AI Research Engineer – Prompting

Hello Patient

AI that handles patient communication perfectly, so healthcare teams can prioritize care.

Full TimeRemoteTeam 11-50Since 2024H1B No Sponsor

• Design structured experiments with real controls and statistical rigor • Build regression test coverage so we know when a global prompt change is breaking things across workflows we didn't expect • Create a pipeline of validated, high-confidence improvements engineering can pull from instead of building on faith • Work with the team to prioritize what actually gets shipped based on what the experiments say • Own prompt design across Mia's agent workflows - multi-turn, voice-first, healthcare context • Stay current on what's coming out in prompting research, and agent architecture, and bring back ideas worth testing • Prototype new approaches locally before anything touches production • Be the person the Product team comes to when they're stuck on a prompting problem • Work closely with our Product and Engineering teams - understand where Mia is struggling and help decide what's worth experimenting on • Translate experiment results into clear recommendations people can actually act on • Build out evaluation templates and frameworks so this doesn't live only in your head

United States
$180K - $230K / year
ServiceNow logo

Staff Research Engineer/Scientist

ServiceNow

As the AI platform for business transformation, we're putting AI to work across organizations — freeing people for work that matters. Making old tech work with new tech. Reaching across departments, from the front office to the back office and every office in between. Our ambition? To become the AI defining enterprise software company of the 21st century (or "AI DESCO21C," as we like to call it). With more than 8,400+ customers, we serve approximately 90% of the Fortune 500®, and we're proud to be a Fortune 100 Best Companies to Work For® and World's Most Admired Companies™. Explore your future career with us, visit www.careers.servicenow.com From Fortune. ©2026 Fortune Media IP Limited. All rights reserved. Used under license.

OtherRemoteTeam 10,001+Since 2004H1B Sponsor

Company Description It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today - ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500®. Our intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work. But this is just the beginning of our journey. Join us as we pursue our purpose to make the world work better for everyone. Job Description The CoreLLM team at ServiceNow leads groundbreaking research, engineering, building ServiceNow's signature models (NOWLLM). The effort involves all phases of Large Language Models development, including data curation, training, and evaluation. Our goal is to consistently enhance and improve NOWLLM models, by incorporating innovative features and optimizing efficiency, establishing a differentiated advantage for applications on the ServiceNow Platform. These models (NOWLLM) are consumed across various Business Units (BUs) within ServiceNow to power a wide range of BU-specific use cases, enabling teams to solve complex problems and deliver tailored solutions that accelerate innovation and efficiency across the platform. You will play a major part in building next generation Large Language Models(LLM's) for Enterprise Language Generation that will power NOW platform with AI experiences in day to day work of our customers. We are just getting started with our early-adopter customers and we need your help in building and making available an amazing range of solutions to our 9k+ enterprise customers around the world. What you get to do in this role: - Confronted with real-world challenges and datasets, you will need to use your AI/ML expertise and creativity to apply existing methods and develop new ones to solve these problems in a practical and scalable way. - Research and propose appropriate models/techniques, as well as implement, train and evaluate the models yourself. - Contribute to the design, implementation, and scaling of LLM's as a key AI-first platform offering in ServiceNow's portfolio. - Collaborate daily with a team of like-minded developers, applied research scientists, product managers and quality engineers to produce quality software. - Work with product owners to understand detailed requirements and own your code from design, implementation, testing and delivery of high-quality & high-impact solutions to our users. Qualifications To be successful in this role you have: - Expert in prompt engineering and developing LLM based features - Experience with methods of training and fine tuning large language models, such as distilation, supervised fine-tunning and policy optimization - Experience in using Al productivity tools such as Cursor, Windsurf, etc. is a plus or nice to have - Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry. - 6+ years of relevant experience with a Bachelor's degree; or 4 years with a Master's degree; or a PhD with no experience; or equivalent work experience. - Expertise in Python, OOP, Design Patterns. - Experience in Pretraining of an LLM is preferred but not mandatory - Experience with Instruction fine tuning and other fine-tuning techniques is a must. - Experience with Reinforcement learning is preferred but not mandatory. - Experience with various transformer architectures (auto-regressive ,sequence-to-sequence etc) - Ability to read latest papers and experiment with the ideas. - Good publication record in top tier conferences such as ICLR, NeurIPS, ICML, ACL, EMNLP, AAAI, etc. - Communicating your research findings to both technical and non-technical stakeholders, and working to ensure that the benefits and limitations of LLMs are clearly understood across the organization. For positions in this location, we offer a base pay of $199,100 - $348,500, plus equity (when applicable), variable/incentive compensation and benefits. Sales positions generally offer a competitive On Target Earnings (OTE) incentive compensation structure. Please note that the base pay shown is a guideline, and individual total compensation will vary based on factors such as qualifications, skill level, competencies, and work location. We also offer health plans, including flexible spending accounts, a 401(k) Plan with company match, ESPP, matching donations, a flexible time away plan and family leave programs. Compensation is based on the geographic location in which the role is located and is subject to change based on work location. Additional Information Work Personas We approach our distributed world of work with flexibility and trust. Work personas (flexible, remote, or required in office) are categories that are assigned to ServiceNow employees depending on the nature of their work and their assigned work location. Learn more here . To determine eligibility for a work persona, ServiceNow may confirm the distance between your primary residence and the closest ServiceNow office using a third-party service. Equal Opportunity Employer ServiceNow is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, creed, religion, sex, sexual orientation, national origin or nationality, ancestry, age, disability, gender identity or expression, marital status, veteran status, or any other category protected by law. In addition, all qualified applicants with arrest or conviction records will be considered for employment in accordance with legal requirements. Accommodations We strive to create an accessible and inclusive experience for all candidates. If you require a reasonable accommodation to complete any part of the application process, or are unable to use this online application and need an alternative method to apply, please contact globaltalentss@servicenow.com for assistance. Export Control Regulations For positions requiring access to controlled technology subject to export control regulations, including the U.S. Export Administration Regulations (EAR), ServiceNow may be required to obtain export control approval from government authorities for certain individuals. All employment is contingent upon ServiceNow obtaining any export license or other approval that may be required by relevant export control authorities. From Fortune. ©2025 Fortune Media IP Limited. All rights reserved. Used under license.

California
$199.1K - $348.5K / year
Job Closed
Mercor logo

Applied Math Specialist

Mercor

Cincinnatus is an enterprise staffing company that partners with leading technology companies to source and employ highly skilled professionals for full-time and long-term contingent roles. Cincinnatus serves as the employer of record for these engagements, providing W-2 employment, payroll, benefits, and compliance, while placing employees directly within client teams to work on high-impact initiatives. Roles hired through Cincinnatus are not project-based or freelance engagements. They are structured, role-based positions that typically involve full-time or fixed-term commitments, close collaboration with a client's internal teams, and integration into standard enterprise workflows. Cincinnatus is a legal entity separate from Mercor. While opportunities may be discovered through Mercor's platform, employment, onboarding, payroll, and benefits for these roles are administered by Cincinnatus. Equal Employment Opportunity Cincinnatus is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or any other legally protected characteristic. Cincinnatus is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans throughout the job application process.

OtherRemoteH1B No Sponsor

Role Description - Write and refine prompts to guide model behavior in mathematical contexts. - Evaluate LLM-generated responses to mathematics-related queries for correctness, rigor, and logical coherence. - Verify mathematical claims, derivations, and proofs using domain expertise. - Conduct fact-checking using authoritative public sources and domain knowledge. - Annotate model responses by identifying strengths, areas of improvement, and factual or conceptual inaccuracies. - Ensure model responses align with expected conversational behavior and system guidelines. Qualifications - PhD in Mathematics or a closely related field. - Demonstrated experience in Probability & Statistics. - Significant experience using large language models (LLMs). - Excellent writing skills for explaining complex mathematical concepts. - Strong attention to detail with the ability to notice subtle issues. - Experience reviewing or editing technical or academic writing. - Prior experience with RLHF, model evaluation, or data annotation work (preferred). - Experience teaching, mentoring, or explaining mathematical concepts to non-expert audiences (preferred). - Familiarity with evaluation rubrics, benchmarks, or structured review frameworks (preferred). Requirements - Full-time or Part-time Contract Work. - Compensation: $73/hour. - Location: USA, UK, Canada, EU. Company Description Mercor connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include Benchmark, General Catalyst, Peter Thiel, Adam D'Angelo, Larry Summers, and Jack Dorsey.

United States + 41 moreAll locations: United States | United Kingdom | Germany | France | Estonia | Portugal | Hungary | Poland | Ukraine | Romania | Bulgaria | Czechia | Slovakia | Belarus | Moldova | Sweden | Greece | Belgium | Italy | Ireland | Switzerland | Netherlands | Finland | Malta | Denmark | Lithuania | Croatia | Spain | Austria | Bosnia And Herzegovina | Iceland | Luxembourg | North Macedonia | Montenegro | Norway | Serbia | Slovenia | Albania | Cyprus | Latvia | Monaco | Canada
$73 / hour
Job Closed