NewtonX logo
NewtonX

Your end-to-end market research partner, built to answer your toughest B2B questions with confidence.

ML Lead, AI Data Labeling

Artificial IntelligenceArtificial IntelligenceFull TimeRemoteSeniorTeam 51-200Since 2017H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

12 days ago

Salary

$180K - $260K / year

Seniority

Senior

Postgraduate Degree5 yrs expEnglishPython

Job Description

ML Lead, AI Data Labeling

NewtonX

• Serve as the primary technical point of contact for ML, applied science, and product teams at AI-focused clients. • Hold your own in technical conversations. • Translate ambiguous technical requirements into concrete operational specs. • Design and build domain benchmarks for NewtonX-owned domains in high-value verticals. • Work directly with recruiting and operations lead to convert client and benchmark requirements into operational specs. • Sit in client calls alongside Commercial leads.

Job Requirements

  • 5 to 8 years of applied ML experience with substantive evaluation, benchmark, or human data work.
  • Working fluency with modern LLM evaluation: benchmark design, contamination handling, statistical significance, eval harness construction, agentic and tool-use evaluation, RLHF and preference data quality, red-team probe design.
  • Strong programming foundation. You can read and reason about an eval harness, write Python comfortably, work with model APIs, and prototype scoring pipelines.
  • Statistical fluency. You know when an effect is real and when it is noise.
  • Demonstrated client-facing presence.
  • Light commercial instinct.
  • Strong written communication.

Benefits

  • Excellent medical, dental, and vision insurance.
  • 401k match with immediate vesting.
  • Health savings/flexible savings account, and pre-tax commuter benefits.
  • Paid time off: vacation, holidays, sick, and parental leave.
  • A diverse, collaborative, and positive culture where we invest in and celebrate each other's success (happy hours, team projects, and retreats).

Related Job Pages

More Artificial Intelligence Jobs

Role Description نسعى لتعيين معلم  ينضم إلينا في رسالتنا الهادفة إلى تعليم العالم. يتولى المرشَّح الناجح مسئولية شرح الدروس على الإنترنت في مادة العلوم، وتشجيع الطلاب على المشاركة في تلك الدروس، ومساعدتهم على الاستعداد للامتحانات. - شرح الدروس على الإنترنت للطلاب عبر تطبيق Nagwa Classes طبقًا لمنهج وزارة التربية والتعليم لدعم عملية التعلم. - تنظيم أنشطة تشجع الطلاب على المشاركة من أجل ضمان تحقيق نتائج تعليمية فعالة، والتواصل مع الطلاب قبل الدروس وبعدها عبر تطبيق Nagwa Classes. - متابعة تقدم الطالب من خلال الواجبات التي تُرسل بعد كل درس من خلال تطبيق Nagwa Classes لضمان تحقيق نتائج التعلم المنشودة. - تقييم الطلاب لمساعدتهم في تطوير مهاراتهم وتعزيز معرفتهم بالمادة لتحسين نتائج التعلم. Qualifications - عدد سنوات الخبرة المناسبة في مجال التدريس: 2-4 سنوات. - درجة البكالوريوس في أي تخصص مرتبط بالمادة. - القدرة على جمع البيانات وتفسيرها وتلخيصها. - خبرة في التدريس بطرق متنوعة لتلبية احتياجات الطلاب المختلفة. - يُفضَّل وجود خبرة سابقة في التدريس الخاص على شبكة الإنترنت. - توافر جهاز كمبيوتر محمول أو كمبيوتر مكتبي. Requirements - اللغة العربية: ممتاز. - اللغة الإنجليزية: مقبول.

Worldwide
ContractRemoteTeam 11-50H1B No Sponsor

• Your job is to design the system and own the outcome, not to ship the code. • Main deliverable: v1 of the continuous performance management system, live across the full team of 38, with a monthly grade refresh and individual dashboards. • Finalize the three axes (results / skills and AI leverage / teamwork), their weights, scales, and the thresholds for moving between A, B, and C. Build on our current ABC framework, don't start from scratch. • Map out where the system pulls data for each axis. What gets collected automatically (Notion, Slack, GitHub, analytics, OKR tracking), what comes from managers, what comes from peer reviews. Keep manual input to a minimum. • Design the monthly loop: what runs automatically, what needs a manager's confirmation, and how each person sees their grade and track. • For the individual: dashboards that show my grade, my track, and what to work on. For the manager: dashboards for team growth and potential parting. • Templates for individual plans. B to A with measurable criteria and an ETA. C to A with performance based and time based boundaries. No fuzzy language. • Run a pilot on two competencies (for example, Engineering and Product), collect feedback, refine, and roll out to the full team.

Serbia
ContractRemoteTeam 11-50H1B No Sponsor

• Your job is to design the system and own the outcome, not to ship the code. • Main deliverable: v1 of the continuous performance management system, live across the full team of 38, with a monthly grade refresh and individual dashboards. • Finalize the three axes (results / skills and AI leverage / teamwork), their weights, scales, and the thresholds for moving between A, B, and C. Build on our current ABC framework, don't start from scratch. • Map out where the system pulls data for each axis. What gets collected automatically (Notion, Slack, GitHub, analytics, OKR tracking), what comes from managers, what comes from peer reviews. Keep manual input to a minimum. • Design the monthly loop: what runs automatically, what needs a manager's confirmation, and how each person sees their grade and track. • For the individual (my grade, my track, what to work on). For the manager (my team, who we're growing, who we're parting with). For C-level (TDI, A-player density by competency, risks). • Templates for individual plans. B to A with measurable criteria and an ETA. C to A with performance based and time based boundaries. No fuzzy language. • Run a pilot on two competencies (for example, Engineering and Product), collect feedback, refine, and roll out to the full team.

Cyprus
ContractRemoteTeam 11-50H1B No Sponsor

• Finalize the three axes (results / skills and AI leverage / teamwork), their weights, scales, and the thresholds for moving between A, B, and C. Build on our current ABC framework, don't start from scratch. • Map out where the system pulls data for each axis. What gets collected automatically (Notion, Slack, GitHub, analytics, OKR tracking), what comes from managers, what comes from peer reviews. Keep manual input to a minimum. • Design the monthly loop: what runs automatically, what needs a manager's confirmation, and how each person sees their grade and track. • For the individual (my grade, my track, what to work on). For the manager (my team, who we're growing, who we're parting with). For C-level (TDI, A-player density by competency, risks). • Templates for individual plans. B to A with measurable criteria and an ETA. C to A with performance based and time based boundaries. No fuzzy language. • Run a pilot on two competencies (for example, Engineering and Product), collect feedback, refine, and roll out to the full team.

Armenia