Applied Data Scientist, LLM Evaluation
Location
Texas
Posted
51 days ago
Salary
$175K - $275K / year
Seniority
Senior
Job Description
Applied Data Scientist, LLM Evaluation
Driver
• Own the LLM evaluation strategy at Driver — from first principles to production infrastructure. • Define quality metrics and build evaluation datasets. • Establish what 'good' looks like for each content type across the pipeline. • Build and curate gold-standard evaluation datasets across languages and repo archetypes (monorepos, microservices, libraries, applications). • Design rubrics that capture accuracy, completeness, usefulness, and readability. • Build benchmarking and experimentation infrastructure. • Create automated evaluation pipelines that score output against reference datasets. • Instrument the content generation pipeline to support A/B comparisons — run the same codebase through two strategies and compare results. • Build tooling for LLM-as-judge evaluation and regression detection. • Integrate evaluation into CI so pipeline changes come with quality evidence. • Develop automated quality signals at scale. • Build quality checks that flag degraded output without requiring human review of every document. • Monitor content quality trends over time. • Design sampling strategies for human review that maximize signal with minimal annotation effort. • Quantify tradeoffs and inform decisions. • Run experiments on model selection, context strategies, and pipeline architecture changes. • Quantify cost/quality/latency tradeoffs. • Partner with the engineering team to turn evaluation insights into shipped improvements.
Job Requirements
- Bachelor's, Master's, or PhD in Statistics, Machine Learning, Data Science, Computational Linguistics, or a related quantitative field.
- Minimum 3 — 5 years in applied science, ML engineering, or data science roles with a focus on evaluation, NLP, or generative AI. 7+ years experience preferred.
- Strong statistical foundations: experimental design, hypothesis testing, confidence intervals, effect sizes, power analysis.
- Experience designing and running evaluations for LLM or NLP systems — you've thought carefully about what 'better' means when outputs are open-ended text.
- Proficient in Python and the scientific/data stack (pandas, NumPy, scipy, sklearn).
- Comfortable working in Jupyter notebooks for exploration and prototyping, and turning that work into automated pipelines.
- Experience with LLM-as-judge approaches, inter-annotator agreement, and rubric design for subjective quality assessment.
- Familiarity with the practical challenges of non-deterministic systems: variance decomposition, multi-run methodology, distinguishing signal from noise at scale.
- Strong data storytelling — you can turn experiment results into clear recommendations that drive engineering and product decisions.
Benefits
- Competitive Compensation Packages - Cash & Equity
- Flexible Work Culture
- Unlimited Time Off + 12 Paid Company Holidays
- Insurance - Health, Dental, & Vision
- Life Insurance & FSA Accounts
- 401(k) Retirement Accounts - Traditional, Roth, or Both
- Quarterly Team Offsites
Related Guides
Related Categories
Related Job Pages
More Data Scientist Jobs
About Airwallex Airwallex is the only unified payments and financial platform for global businesses. Powered by our unique combination of proprietary infrastructure and software, we empower over 200,000 businesses worldwide - including Brex, Rippling, Navan, Qantas, SHEIN and many more - with fully integrated solutions to manage everything from business accounts, payments, spend management and treasury, to embedded finance at a global scale. Proudly founded in Melbourne, we have a team of over 2,000 of the brightest and most innovative people in tech across 26 offices around the globe. Valued at US$8 billion and backed by world-leading investors including T. Rowe Price, Visa, Mastercard, Robinhood Ventures, Sequoia, Salesforce Ventures, DST Global, and Lone Pine Capital, Airwallex is leading the charge in building the global payments and financial platform of the future. If you're ready to do the most ambitious work of your career, join us. Attributes We Value We hire successful builders with founder-like energy who want real impact, accelerated learning, and true ownership. You bring strong role-related expertise and sharp thinking, and you're motivated by our mission and operating principles. You move fast with good judgment, dig deep with curiosity, and make decisions from first principles, balancing speed and rigor. You're humble and collaborative; turn zero-to-one ideas into real products, and you "get stuff done" end-to-end. You use AI to work smarter and solve problems faster. Here, you'll tackle complex, high-visibility problems with exceptional teammates and grow your career as we build the future of global banking. If that sounds like you, let's build what's next. Growth Data Science Team The Growth Data Science team sits at the center of Airwallex's growth engine, turning data and AI into a durable competitive advantage. We build and scale advanced solutions-from predictive modeling and personalization to LLM-powered systems, incentive optimization, and causal inference-to unlock step-change growth across the customer lifecycle. Learn more about the data science team in this blog. The Role We're seeking a Senior Data Scientist to partner closely with the Growth and Marketing teams to design, deploy, and iterate on high-impact data science solutions that drive measurable business outcomes. In this role, you will apply machine learning models to influence how we acquire, engage, and retain customers globally. Your work will directly shape Airwallex's growth strategy, influencing how we acquire, engage, and retain customers. You are naturally curious, business-oriented, and have a proven track record of translating data insights into actionable recommendations that strengthen brand presence and accelerate growth. Why Join Now You'll join the Airwallex team during a critical period, where we are building a team that enables both global consistency in process, tooling and methodology and local flexibility and speed in execution. This is a great opportunity to shape the next phase of Airwallex's growth. Responsibilities - Develop predictive models to identify high-value users for Marketing and Commercial teams. - Apply machine learning and/or statistical models to measure causal impact of marketing initiatives. - Develop personalization solutions to drive the client onboarding efficiencies, optimize the referral incentive impacts and also the product cross-selling. - Develop the churn models to drive the growth retention strategy. - Partner with Marketing, Commercial, Engineering, and cross-functional teams to inform, influence, support, and execute GTM strategy and investment decisions. Qualifications - 5+ years industry experience and an advanced degree (PhD or MS) in a quantitative field (e.g. Statistics, Engineering, Sciences, Computer Science, Economics) - Experience with communicating the results of analyses to executives and cross-functional teams to influence the strategy - Expert in machine learning modeling to drive data driven growth strategy - Hands-on experience building, deploying, and maintaining machine learning models in production-grade systems (e.g., cloud-based services, real-time and batch pipelines). - Expert in data querying languages (e.g. SQL), scripting languages (e.g. Python), experience in schema design and dimensional data modeling a plus - Experience in technology, financial services and/or a high growth environment is advantageous Applicant Safety Policy: Fraud and Third-Party Recruiters To protect you from recruitment scams, please be aware that Airwallex will not ask for bank details, sensitive ID numbers (i.e. passport), or any form of payment during the application or interview process. All official communication will come from an @airwallex.com email address. Please apply only through careers.airwallex.com or our official LinkedIn page. Airwallex does not accept unsolicited resumes from search firms/recruiters. Airwallex will not pay any fees to search firms/recruiters if a candidate is submitted by a search firm/recruiter unless an agreement has been entered into with respect to specific open position(s). Search firms/recruiters submitting resumes to Airwallex on an unsolicited basis shall be deemed to accept this condition, regardless of any other provision to the contrary. Equal opportunity Airwallex is proud to be an equal opportunity employer. We value diversity and anyone seeking employment at Airwallex is considered based on merit, qualifications, competence and talent. We don't regard color, religion, race, national origin, sexual orientation, ancestry, citizenship, sex, marital or family status, disability, gender, or any other legally protected status when making our hiring decisions. If you have a disability or special need that requires accommodation, please let us know. #BI-Hybrid
Senior Data Scientist – International eKYC, Identity Graph
SocureThe leading provider of digital identity verification and fraud solutions. Salesinfo@socure.com
• Lead the design, development, and deployment of ML and graph-based algorithms for international entity resolution, identity trust scoring, and anomaly detection across heterogeneous, country‑specific datasets. • Architect reusable matching and linking frameworks that work across multiple ID schemes (e.g., national ID numbers, passports, voter IDs, mobile accounts, bank accounts) and local name/address conventions. • Develop probabilistic and rule‑augmented models that handle noisy, sparse, or partially labeled international data while maintaining explainability and regulatory defensibility. • Define and evolve the international extension of Socure’s identity graph: schema design, linkage strategies, quality tiers, and confidence scoring that can be leveraged by multiple products (Verify, KYC, watchlists, fraud). • Design and implement robust data quality and monitoring frameworks for international identity data (coverage, stability, drift, regional bias, label quality) and integrate them into modeling and production monitoring workflows. • Own experimentation strategy for major international eKYC initiatives: Design offline evaluations and online A/B tests that reflect local ground truth constraints and data sparsity. • Define success metrics that balance approval rates, fraud capture, and regulatory/operational constraints per market. • Analyze lift, stability, and fairness trade‑offs and drive go/no‑go decisions with Product and Engineering. • Contribute to model governance documentation and support responses to regulators and large enterprise customers regarding model logic, data provenance, fairness, and monitoring for international markets.
Senior Data Scientist – Big Data R&D, Identity Graph, KYC
SocureThe leading provider of digital identity verification and fraud solutions. Salesinfo@socure.com
• Own the design, development, and evaluation of machine learning, statistical, and graph-based algorithms for entity-resolution, identity trust scoring, and anomaly detection on massive datasets. • Architect and optimize graph-based identity representations (identity graph structure, linkage rules, clustering) to improve match rates, reduce false positives/negatives, and support downstream fraud and KYC models. • Build and maintain scalable data pipelines and feature stores in Spark/PySpark (or Scala), including data normalization, deduplication, and feature computation across large PII datasets in AWS/Databricks environments. • Lead A/B tests and offline/online experimentation for new models, features, and data sources; define success metrics, design experiments, and ensure rigorous validation before rollout. • Evaluate new internal and external data sources: explore signal quality, design backtests, quantify incremental value, and provide clear recommendations on vendor selection and integration. • Partner closely with product managers and engineers to translate ambiguous business and regulatory requirements (e.g., KYC coverage, watchlist matching) into concrete modeling and data roadmaps. • Provide deep analytical support to Socure’s compliance and regulatory product suite, including investigative analyses, root‑cause analysis for anomalies, and clear narratives for internal and external stakeholders. • Contribute to model governance and documentation: clearly explain model logic, data dependencies, limitations, and monitoring plans to internal risk/compliance stakeholders. • Mentor junior data scientists and engineers on best practices in data exploration, feature engineering, experimentation, and code quality. • Communicate complex technical concepts and trade‑offs in a concise, structured way to both technical and non‑technical audiences (e.g., product reviews, customer meetings, internal briefings).
• Lead cross-functional initiatives to define, implement, and iterate on measurement and analysis of our Virtual Power Plant (VPP). • Proactively analyze and interpret complex data to uncover critical business, product, and user insights. Propose strategies for improving both the systems powering our VPP and the tools used by our energy industry partners. • Synthesize large-scale, disparate datasets — including smart meter data, device telemetry, settlement data, and logs — to aid in increasing the impact of our VPP products and services. • Build source of truth data models in DBT that serve as the foundation for reporting and decision-making. • Define and monitor KPIs to evaluate the success of new product features and business initiatives. • Influence product and business strategy by effectively communicating analytical results, recommendations, and tradeoffs to stakeholders across all levels of the organization. • Partner with engineers to define appropriate data instrumentation for new product features. • Perform statistical analysis and build predictive models on user interactions and smart device data to drive business recommendations and product strategy. • Contribute to the culture and workflows of the Product Analytics team – advocate for analytic best practices, introduce new tools and processes, perform code reviews, support your peers, and coach more junior members.



