Ancestry logo
Ancestry

We connect everyone with their past so they can discover, preserve, and share their unique family stories.

Data Science – AI Document Understanding, Co-op

Data ScientistData ScientistPart TimeRemoteEntry LevelTeam 1,001-5,000Since 1983H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

8 days ago

Salary

0

Seniority

Entry Level

Postgraduate DegreeEnglishAWSCloudEC2Google Cloud PlatformPython

Job Description

Data Science – AI Document Understanding, Co-op

Ancestry

• Innovate with State-of-the-Art AI: Implement cutting-edge AI solutions for key Document Understanding tasks such as OCR/HTR, transcription, Named Entity Recognition (NER), Relation Extraction (RE), Coreference Resolution, Summarization, and Knowledge Graphs working with diverse genealogical and historical collections spanning newspapers, city directories, family history books, and vital records (i.e., birth, marriage, & death records). • Analyze and Optimize Multi-Modal Models: Evaluate the performance of multi-modal models in zero-shot and few-shot learning scenarios for comprehensive document understanding. • Architect Agentic Systems: Design and implement multi-agent workflows using frameworks like LangChain, LangGraph, CrewAI, or AutoGen to automate complex multi-step reasoning tasks in historical document analysis. • Evaluation & Observability: Establish "LLM-as-a-Judge" frameworks and use tools like Arize Phoenix, DeepEval, or RAGAS to monitor for hallucination, drift, and bias. • Collaborate on Cloud Deployment: Partner closely with ML Ops and Data Science Engineers to seamlessly deploy datasets, models, and pipelines in cloud environments. • Communicate Insights Effectively: Clearly and confidently present your findings, deliverables, and proposed solutions to technical and non-technical audiences, including teams, stakeholders, and executives.

Job Requirements

  • Currently pursuing an advanced degree (Master's or PhD preferred) in Computer Science, Data Science, Statistics, Mathematics, Linguistics, Engineering or related quantitative field with a strong data focus.
  • Specialization in AI & LLMs including familiarity with foundational models such as GPT, Gemini, Qwen, Llama, Claude, etc.
  • Experience with inference optimization, vLLM, LoRA, QLoRA, quantization, etc.
  • Familiar with embeddings, vector databases, transformer models, with software development experience.
  • Strong proficiency in Python and relevant tools and libraries, including transformer models, multi-modal models, and general NLP (e.g., Hugging Face Transformers, agentic frameworks and workflows, LangChain, LangGraph, CrewAI, AgentCore).
  • Familiarity with cloud platforms and related AI/ML services such as Google Cloud Platform, GCP, Gemini API, Vertex AI, AWS EC2, S3, SageMaker, Model Registry, and Bedrock is a plus.

Benefits

  • Flexible work arrangements
  • Professional development opportunities

Related Categories

Related Job Pages

More Data Scientist Jobs

Full TimeRemoteTeam 10,001+Since 1903H1B Sponsor

Role Description The Senior Data Scientist on the Credit AI team at Ford Credit will lead the development and deployment of advanced AI and machine learning solutions that improve customer experience, reduce risk, and drive operational efficiency. This role focuses on delivering scalable, production-ready solutions across conversational AI, fraud detection, forecasting, and intelligent automation initiatives while partnering closely with engineering, product, and business stakeholders. As a Senior Data Scientist within the Credit AI organization, you will play a critical role in shaping and delivering AI-driven solutions that support strategic business priorities across Ford Credit. You will work across a diverse portfolio of initiatives, including: - Conversational AI solutions for customer representatives - Fraud detection and risk analytics - Forecasting and predictive modeling - AI agents that automate business workflows and accelerate software development processes This role requires strong expertise in machine learning, statistical modeling, generative AI, and production AI systems. You will collaborate with cross-functional teams to: - Translate business challenges into scalable technical solutions - Develop and validate models - Ensure successful deployment into production environments - Establish best practices around model governance, monitoring, explainability, and responsible AI Success in this role will be measured through measurable business outcomes such as: - Reduced fraud losses - Improved forecast accuracy - Enhanced customer support efficiency - Increased automation effectiveness What you'll do: - Design, develop, validate, and deploy machine learning and AI solutions for business-critical applications - Build scalable predictive models, anomaly detection systems, forecasting solutions, recommendation systems, and generative AI applications - Develop conversational AI and agent-assist solutions leveraging LLMs, NLP, and retrieval-augmented generation (RAG) techniques - Create intelligent AI agents for business workflow automation and SDLC acceleration initiatives - Develop and optimize fraud detection models using supervised and unsupervised machine learning techniques - Analyze structured and unstructured datasets to identify trends, patterns, risks, and business opportunities - Partner with engineering teams to productionize AI/ML solutions and integrate them into enterprise applications and workflows - Develop reusable ML pipelines, feature engineering frameworks, and model monitoring capabilities - Monitor model performance, drift, reliability, and operational effectiveness in production environments - Collaborate with product managers, engineers, business stakeholders, and risk/compliance teams to define requirements, success metrics, and implementation strategies - Translate technical insights and analytical findings into clear business recommendations and executive-level communications - Ensure AI and machine learning solutions comply with data governance, privacy, security, and regulatory standards - Develop documentation supporting model explainability, validation, monitoring, and audit readiness - Promote responsible AI practices, including fairness, transparency, and risk mitigation - Mentor junior team members and contribute to technical standards, best practices, and continuous improvement initiatives Qualifications - Bachelor’s degree in Computer Science, Data Science, Statistics, Mathematics, Engineering, or a related quantitative field - 5+ years of experience developing and deploying machine learning or AI solutions in production environments - Strong programming experience in Python and experience with ML frameworks such as scikit-learn, PyTorch, TensorFlow, or similar - Experience building predictive models, forecasting solutions, anomaly detection systems, NLP applications, or generative AI solutions - Experience with large language models (LLMs), prompt engineering, retrieval-augmented generation (RAG), or conversational AI systems - Strong SQL and data manipulation skills with experience working on large-scale datasets - Experience with cloud platforms such as AWS, Azure, or GCP - Understanding of MLOps concepts including model deployment, monitoring, versioning, and CI/CD workflows - Strong analytical, problem-solving, communication, and stakeholder management skills Requirements - Master’s degree in Computer Science, Data Science, Statistics, Mathematics, Engineering, or a related quantitative field (preferred) - Experience in financial services, credit risk, fraud analytics, or regulated industries (preferred) - Experience with AI agents, orchestration frameworks, or automation platforms (preferred) - Experience with model explainability and governance tools such as SHAP or LIME (preferred) - Knowledge of software engineering workflows and developer productivity tooling (preferred) - Experience mentoring or leading technical teams (preferred) Benefits - Immediate medical, dental, vision and prescription drug coverage - Flexible family care days, paid parental leave, new parent ramp-up programs, subsidized back-up child care and more - Family building benefits including adoption and surrogacy expense reimbursement, fertility treatments, and more - Vehicle discount program for employees and family members and management leases - Tuition assistance - Established and active employee resource groups - Paid time off for individual and team community service - A generous schedule of paid holidays, including the week between Christmas and New Year’s Day - Paid time off and the option to purchase additional vacation time

United States
$99.6K - $192.9K / year
First Stop Health logo

Data Scientist

First Stop Health

We deliver care that people love. Members can talk with doctors or counselors 24/7 via app, website or phone.

Data Scientist8 days ago
ContractRemoteTeam 51-200Since 2011H1B No Sponsor

• Design and build metrics, experiments, projections, and predictive models • Write clean, efficient code in Python and SQL • Collaborate with business stakeholders to propose data-driven solutions • Test and refine data models and solutions

United States
Accenture Federal Services logo

SAP Data Scientist

Accenture Federal Services

We believe in the power of change, harnessed in ways that matter for our country and communities.

Data Scientist8 days ago
Full TimeRemoteTeam 10,001+Since 2017H1B No Sponsor

• Collect, organize, and wrangle data from SAP and non‑SAP sources; build reusable analytics and data pipelines • Clean, validate, and preprocess structured and unstructured datasets to ensure high‑quality inputs for analysis • Conduct statistical and exploratory analyses to identify trends, patterns, and business insights • Develop predictive and prescriptive models using SAP Analytics Cloud Predictive or external ML frameworks integrated with SAP data • Enhance data collection procedures to support analytic system development • Create dashboards, KPIs, and data visualizations to clearly communicate findings to business stakeholders • Support data reconciliation and KPI validation between source systems and analytics outputs • Collaborate with IT, business analysts, and leadership to implement data‑driven solutions • Design and deliver AI/ML‑based decision‑making frameworks; measure and justify model value.

District Of Columbia + 1 moreAll locations: District Of Columbia | Washington
$116.9K - $243.1K / year
Radian logo

Data Scientist II

Radian

Founded in 1977, Radian is a publicly traded company based in Philadelphia, Pennsylvania, that connects homebuyers, lenders, loan providers, and investors with

Data Scientist8 days ago

• Analyze data to support or disprove a thesis • Select and implement the right tools for the job • Build, train, test, and validate models • Engineer models into production • Document your work clearly • Monitor and improve models in production • Explore agentic and reasoning systems • Perform other duties as assigned

Maryland
$98K - $148K / year