Principal Data Engineer

Location

United States

Posted

2 days ago

Salary

0

Seniority

Lead

Job Description

Principal Data Engineer

DocuPhase LLC

Role Description We are looking for a Principal Data Engineer to lead our Document Intelligence initiatives as a core member of our research and engineering organization. This is a high-impact individual contributor and technical leadership role focused on advancing the state of machine learning, Data Science, NLP, and intelligent document processing within our platform. You will work closely with product, data, and engineering teams and the CTO to design systems that turn unstructured document data into actionable intelligence — at scale. Key Responsibilities - Lead research and engineering efforts in document intelligence, including OCR post-processing, document classification, information extraction, and layout understanding. - Design and implement scalable machine learning pipelines and data architectures that support document AI workloads in production environments. - Define the technical vision and roadmap for document intelligence capabilities across the organization. - Collaborate with cross-functional teams to translate business requirements into ML system designs, model architectures, and data platform decisions. - Evaluate, adapt, and extend state-of-the-art NLP and vision-language models for document understanding tasks. - Establish best practices for ML experimentation, model versioning, evaluation, and deployment (MLOps). - Mentor and provide technical guidance to engineers and researchers across the team. - Drive data architecture decisions that support both model training pipelines and downstream analytics and reporting needs. - Publish or present research findings internally and, where appropriate, externally. Qualifications - 10+ years of professional experience in R&D, machine learning, applied research, or data engineering. - Deep expertise in Document Intelligence — including OCR, document parsing, layout analysis, information extraction, and classification. - Strong data architecture background, including experience designing data lakes, feature stores, and ML data pipelines. - Proficiency in Python and relevant ML frameworks (PyTorch, TensorFlow, HuggingFace Transformers, etc.). - Experience taking ML models from research and prototyping through to production deployment at scale. - Solid understanding of NLP fundamentals and modern large language/vision-language model architectures. - Experience with cloud-based ML platforms and infrastructure (AWS, GCP, or Azure). - Strong written and verbal communication skills — ability to convey complex technical concepts to both technical and non-technical stakeholders. Preferred - PhD or Master's degree in Computer Science, Machine Learning, Computational Linguistics, or a closely related field. - Experience with document AI frameworks such as LayoutLM, Donut, PaddleOCR, Amazon Textract, or similar. - Publications or contributions to peer-reviewed research in NLP, computer vision, or document understanding. - Familiarity with enterprise document workflows — AP automation, contract processing, medical records, or similar domains. - Prior experience in a principal, staff, or lead engineer capacity with ownership over a technical domain.

Related Categories

Related Job Pages

More Data Engineer Jobs

NeueHealth logo

Senior Data Engineer

NeueHealth

Driving value in healthcare for all.

Data Engineer2 days ago
Full TimeRemoteTeam 1,001-5,000H1B No Sponsor

• Write traditional code and server-less functions using the language best suited for the task, which is primarily Scala. May include development with C# and SQL. • Build APIs, data microservices and ETL pipelines, to share data with internal and external partners and write interfaces to public data sets to enrich our analytics data stores. • Develop and optimize processes for fine-tuning large language models (LLMs) and implementing Retrieval-Augmented Generation (RAG) frameworks to enhance AI-driven solutions. • Participate in building and owning a culture of DevOps and Quality Assurance. • Continuously document your code, framework standards, and team processes. • Build and support Data Ingestion frameworks deployed in Azure. • Other duties and responsibilities as assigned.

Minnesota + 1 moreAll locations: Minnesota | Texas
Block logo

Senior Data Engineer, Risk

Block

Block builds simple, powerful tools that make progress towards an economy that’s truly open to all.

Data Engineer2 days ago
Full TimeRemoteTeam 10,001+Since 1990H1B Sponsor

Role Description As a Data Engineer you will handle everything from data architecture and modeling to data pipeline tooling and dashboarding. You will enable other compliance and risk teams to make impactful business decisions by laying the foundation of our large and unique datasets that span across multiple products. This role will support compliance or risk product data engineering teams. The Compliance Data Engineering team at Block supports the detection and reporting of suspicious financial crimes activity across Cash App, Square, and Afterpay. We work globally with partners in business, engineering, counsel, and product to provide a safe user experience for our customers while minimizing and potentially eliminating bad activity on our platform. The Risk Product Data Engineering team at Block builds reliable data foundations that help protect customers across Cash App, Square, and Afterpay. We partner with product, engineering, compliance, and legal teams to support risk detection, reporting, instrumentation, and ML-ready datasets across critical domains including identity, account access, eligibility, controls, and disputes. - Lead the creation and optimization of existing data models through AI code generation on eventing, customer level, and process level data. - Standardize business and product metric definitions in curated and optimized datasets, and develop data dictionaries and other related documentation. - Build monitoring to assess data quality and lineage and on top build AI agents for false positive reduction and automated resolution. - Participate in on-call rotation, monitor daily execution, diagnose and log issues, and fix business critical pipelines to ensure SLAs are met with internal stakeholders. Automate this process through AI agents to focus human effort on highest priority tasks. - Work with non-technical partners and product teams to understand their needs, translate business requirements into applicable data requirements, and come up with automated end-to-end solutions. Qualifications - A minimum of 10 years of related experience with a Bachelor’s degree; or 8 years and a Master’s degree; or equivalent experience. - High proficiency in SQL, Python, and Dbt. - Experience designing medium-to-large data engineering solutions and responsible for the entire lifecycle of projects including scoping, design, development, testing, deployment, and documentation. - Experience with ETL scheduling technologies with dependency checking, such as Airflow or Prefect. - Experience in schema design and dimensional data modeling. - Experience with setting up data quality and data lineage monitoring. - Experience with data driven decisions through AI and agentic automation. Technologies We Use and Teach - Snowflake - Databricks - Dbt - Github - Airflow - Prefect - Omni - Terraform Benefits - Remote work - Medical insurance - Flexible time off - Retirement savings plans - Modern family planning Company Description Block, Inc. (NYSE: XYZ) builds technology to increase access to the global economy. Each of our brands unlocks different aspects of the economy for more people. - Square makes commerce and financial services accessible to sellers. - Cash App is the easy way to spend, send, and store money. - Afterpay is transforming the way customers manage their spending over time. - TIDAL is a music platform that empowers artists to thrive as entrepreneurs. - Bitkey is a simple self-custody wallet built for bitcoin. - Proto is a suite of bitcoin mining products and services. Together, we’re helping build a financial system that is open to everyone.

California + 1 moreAll locations: California | Worldwide
$168.3K - $297K / year
AB InBev logo

Intermediate Data Engineer

AB InBev

To a Future With More Cheers

Data Engineer3 days ago
Full TimeRemoteTeam 10,001+H1B No Sponsor

• Own high-visibility FinOps and data initiatives tied to the company's most strategic KPIs. • Drive company-wide cost performance with an Excellence in the Means mindset, ensuring disciplined execution, sustainable efficiency, and measurable impact. • Lead optimization programs across Databricks workloads, pipelines, clusters, and storage to unlock scalable efficiency. • Build executive-ready dashboards and narratives that translate cost and performance into business impact. • Monitor spend in real time, detect anomalies early, and orchestrate swift cross-team action. • Define and elevate FinOps governance, standards, and accountability across domains. • Partner with engineering and platform teams to shape cost-efficient architecture decisions from day one. • Drive showback/chargeback transparency and strengthen unit-economics visibility at domain level. • Influence budget planning and forecasting with data-driven insights and ROI-based prioritization. • Convert technical findings into clear recommendations for product, finance, and leadership stakeholders. • Collaborate with Data, Finance, Product, Cloud, and Ops to accelerate measurable business outcomes.

Brazil
AspenView Technology Partners logo

AI Data Engineer

AspenView Technology Partners

AspenView Technology Partners empowers organizations to thrive with agile, expert-staffed, nearshore IT teams.

Data Engineer3 days ago
ContractRemoteTeam 11-50Since 2024H1B No Sponsor

Role Description AspenView Technology Partners is seeking an AI Data Engineer to work on a contract basis with one of our clients. You'll be reporting directly to the company executive leadership (SVP and COO), you will create visibility into operations by connecting and unifying data from a mature, federated systems landscape — with an AS/400 core — and turning it into reports, dashboards, and AI-ready data products. A central part of the role is building agentic connectors that let AI tools work safely and securely with company data, backed by disciplined management of syntax, schema, and the semantic layer. You will partner closely with the company IT team while delivering executive-level insight — helping the business move from ad hoc Excel and Claude analysis to dependable, enterprise-wide intelligence. What you will do: - Executive Reporting & Dashboards Security & Governance - Build reports and dashboards that give executive leadership clear, timely visibility into operations, working under their direction. - Translate leadership questions into well-structured, trustworthy data products. - Data Integration & Access - Connect and unify data from disparate sources across a federated landscape — including the AS/400 core — without a wholesale data-lake migration. - Reconcile mismatched data from bolt-on acquisitions (e.g., SKUs, warehousing, safety records) to maintain integrity and comparability. - Agentic Connectors & AI Enablement - Build agentic connectors that let AI tools — such as Claude — work with company data reliably. - Protect and secure those connectors: scoped access, authentication, and safe handling of sensitive data. - Schema, Syntax & Semantic Layer - Manage syntax, schema, and a semantic layer so data is consistent, well-defined, and ready for both analysts and AI. - Establish shared definitions and metrics that hold up across systems and acquisitions. - Collaboration & Delivery - Partner with AMCON’s IT team for access, cooperation, and knowledge of the existing environment. - Deliver iteratively — prioritizing the insights leadership needs most — and document what you build. Qualifications - Solid data engineering experience, with a track record of building reports and dashboards that stand up to executive scrutiny. - Practical AI fluency — comfort using tools like Claude and building and securing agentic connectors (e.g., MCP-style integrations) between AI and enterprise data. - Strong skills in syntax, schema, and semantic-layer management, and in reconciling disparate data sources. - Experience integrating data through a federated approach — querying and connecting sources — rather than relying solely on a central data lake. - Familiarity accessing data from legacy / enterprise systems, ideally AS/400 (IBM i) / DB2, via ODBC, APIs, or equivalent. - Strong SQL and a scripting language (e.g., Python), with sound data modeling fundamentals. - A security-first mindset for connectors and data access, with good judgment about sensitive information. - Excellent written and spoken English, the confidence to work directly with C-suite stakeholders, and self-directed delivery in a lean-IT environment. Nice if you have: - Hands-on AS/400 (IBM i) / DB2 data extraction and modernization experience. - BI platform experience (Power BI or similar) and semantic modeling. - MCP (Model Context Protocol) or other agentic connector / integration experience. - Background in wholesale distribution, supply chain, logistics, or retail data. - Experience unifying data after acquisitions (M&A data integration). - Data governance and security tooling experience. Equal Opportunity Employer AspenView is proud to be an equal opportunity employer. We believe in creating an environment where all employees feel welcome, valued, and empowered to succeed. We celebrate diversity and strive to build a culture of inclusion where all individuals, regardless of their race, color, gender, gender identity or expression, sexual orientation, disability, age, or any other characteristic, can thrive. We encourage applicants from all walks of life to join our team and make a lasting impact.

Argentina