Henry Schein started out as a Queens, New York-based pharmacy in 1932 and is now a Fortune 500 company specializing in healthcare products and solutions for hea
Lead Data Architect
Location
United States
Posted
2 days ago
Salary
$181.0K - $259.1K / year
Seniority
Senior
Job Description
Lead Data Architect
Henry Schein
• Define and implement a scalable, enterprise-wide data architecture aligned with business and technology goals • Develop a data strategy roadmap, ensuring long-term sustainability, scalability, and efficiency • Partner with executive leadership, product teams, and engineering to ensure data initiatives drive business value • Establish enterprise data governance, security, and compliance frameworks leveraging tools like Collibra or Alation • Oversee the design and evolution of data lakes, data warehouses, and cloud-based analytics platforms using Databricks, Snowflake, BigQuery, or Redshift • Lead the adoption of modern data architecture patterns, including event-driven architectures, real-time data streaming (Kafka, Pulsar), and AI-driven analytics • Provide guidance on database optimization, indexing, partitioning, and storage strategies for tools like PostgreSQL, MySQL, and NoSQL solutions like MongoDB or Cassandra • Evaluate emerging technologies, making recommendations for tools and platforms that enhance data capabilities • Direct ETL/ELT strategies, ensuring seamless data flow across systems with Python, Apache Airflow, dbt, or Informatica • Architect cloud-based solutions (AWS, Azure, or GCP) using services such as AWS Glue, Azure Synapse, and Google Cloud Dataflow to support analytics, AI, and operational use cases • Ensure API-first design for data integration using GraphQL, RESTful APIs, or event-driven architectures (Kafka, AWS Kinesis, Pub/Sub) • Define and oversee data quality, lineage, and cataloging efforts using Great Expectations, Monte Carlo, or DataHub • Develop policies for data privacy, access control, and encryption, ensuring compliance with GDPR, CCPA, HIPAA, or other relevant regulations • Implement enterprise-wide metadata management and data lineage tracking using Collibra, Alation, or Data Catalog solutions • Drive best practices for data security and compliance audits, leveraging IAM tools and cloud security solutions • Lead a team of data architects, engineers, and analysts, mentoring them on best practices • Act as a liaison between business and technical teams, translating business needs into scalable data solutions • Champion a culture of innovation, ensuring the data team is adopting cutting-edge methodologies • Conduct data architecture reviews, ensuring alignment with organizational standards
Job Requirements
- 10+ years of experience in data architecture, data engineering, or related fields
- Bachelor’s degree (Master’s preferred) in Computer Science, Applied Mathematics, Statistics, Machine Learning, or a closely related field (or foreign equivalent)
- Proven track record in designing large-scale, enterprise data architectures
- Expertise in SQL, NoSQL, and distributed database technologies such as Snowflake, Databricks, BigQuery, Redshift, PostgreSQL, MongoDB, and Cassandra
- Strong experience with cloud-based data platforms (AWS, Azure, GCP) and services like AWS Glue, Azure Data Factory, and Google Dataflow
- Deep understanding of data modeling, ETL/ELT processes, and data pipeline optimization using dbt, Apache Airflow, Informatica, or Talend
- Experience with real-time streaming technologies (Kafka, Spark Streaming, Apache Flink, AWS Kinesis)
- Strong knowledge of data security, governance, and compliance frameworks
- Excellent verbal and written communication skills and ability to resolve disputes effectively and efficiently
- Outstanding presentation and public speaking skills
- Mastery independent decision making, analysis and problem-solving skills
- Ability to quickly understand and assess complex projects, systems and ecosystems and identify relevant relationships and connections between them
- Mastery planning and organizational skills and techniques
- Communicate effectively with senior management and key stakeholders
- Ability to influence, build relationships, understand organizational complexities, manage conflict and navigate politics
- Familiarity with the healthcare data domain with previous experience working with healthcare datasets is a plus
- Strong Python programming skills, with expertise in data manipulation and pipeline development using Pandas, PySpark, NumPy, and SQLAlchemy
- Experience with AI/ML-driven analytics architectures and MLOps frameworks like MLflow or SageMaker
- Hands-on experience with Infrastructure as Code (Terraform, CloudFormation)
- Familiarity with Graph databases and knowledge graphs (Neo4j, Amazon Neptune)
- Certifications in cloud data services (AWS Certified Data Analytics, Google Professional Data Engineer, Databricks Certified Data Engineer)
Benefits
- Medical, Dental and Vision Coverage
- 401K Plan with Company Match
- PTO
- Paid Parental Leave
- Income Protection
- Work Life Assistance Program
- Flexible Spending Accounts
- Educational Benefits
- Worldwide Scholarship Program
- Volunteer Opportunities
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Staff Data Architect
Jellyfish - Orthogonal Networks, Inc.Jellyfish, also known as Orthogonal Networks, Inc., is self-described as a pioneer in engineering management platforms (EMPs). Founded with the mission to help
Staff Data Architect Location: Remote - US Full time Department: Engineering Compensation - $200K – $260K • Offers Equity The posted range represents the possible base pay for this role. Actual compensation will depend on your experience, skills, role scope, and alignment with the position. Some postings may include more than one salary band to reflect different levels. Jellyfish is the backbone for elite engineering organizations, and our data infrastructure needs to be as high-performing and insightful as the teams we serve. We are looking for a Staff/Lead Data Architect to help us design, automate, and scale the next generation of our Jellyfish data platform. You’ll be responsible for maturing our core data models, automating environment boundaries, and driving advanced observability and cost-attribution deeper into our data pipeline architecture. If you view manual data intervention as a technical debt to be solved and want to work in an environment where your architectural decisions directly impact how the world’s best engineering leaders measure their productivity, you’re the perfect fit. What you’ll actually be doing: - Architectural Evolution & Blueprinting – You’ll own the blueprint for the next-generation Jellyfish data platform. You'll tackle our existing data footprint, refactoring pipelines and structures into highly efficient, scalable patterns (like Medallion-style schemas or unified semantic layers). - Automated Data Governance – You’ll design and automate strict, code-driven environment isolation boundaries. You'll ensure dev, staging, and production data catalogs (and their underlying cloud storage) never dangerously cohabitate, eliminating the risk of "fat-finger" data drops or PII leakage. - Orchestration & Compute Scaling – You’ll lead the modernization of our workflow orchestration and distributed compute engines. You’ll focus on slashing engine runtime overhead, eliminating API bottlenecks, and streamlining heavy parallelized or mapped data tasks. - Modern Integration Middleware – You'll partner with application teams to ensure our React frontends and backend services hit highly secure, cached API and Backend-for-Frontend (BFF) layers rather than querying raw data services directly, protecting our warehouses from concurrency spikes. - Proactive Data Observability & FinOps – You’ll build and maintain granular data-quality monitors and cost-allocation frameworks. You won't just track overall warehouse spend; you’ll implement systems to map execution cost and token usage directly down to the tenant, team, or user level. You’re a great fit if: - Data Tooling Fluency – You have deep, production-level experience with Python, advanced SQL, and modern data stack essentials. You are deeply familiar with programmatic orchestrators (like Prefect, Dagster, or Airflow) and modern data validation engines (like Pydantic v2). - Catalog & Warehouse Practitioner – You have hands-on mastery of enterprise-scale data platforms and governance layers (e.g., Snowflake, Databricks Unity Catalog, BigQuery) and know exactly how to map environments to catalogs and data quality to schemas. - Automation Mindset – You look at a manual data backfill or a clicked-together database permission and immediately think about how to automate it via Infrastructure-as-Code (Terraform) or programmatic workflows. - Collaborative Systems Thinker – You don’t design in a vacuum. You are excellent at documenting data lineage, mentoring data engineers, and collaborating across DevOps and Product teams to align infrastructure with business goals. - Pragmatic Problem Solver – You know the difference between data quality stages and software development lifecycles. You know when a "perfect" distributed cluster is required and when a "good enough" cached view keeps the business moving. Bonus Points: - You’ve survived (and thrived in) a rapidly scaling B2B SaaS startup handling massive multi-tenant data sets. - You have strong opinions on the future of Git-like data versioning and zero-copy cloning (e.g., Iceberg, Nessie). - You’ve managed complex cloud-billing attributions or scaled heavy LLM/vector-embedding data workloads and lived to tell the tale. A list of job experiences and qualification requirements is great, but humility, a performance-driven attitude, and a team-player approach are most important to us. We love to have fun and win in the process. We only hire people who have a passion for building great companies in an environment where a sense of humor is a must. Occasional travel may be required. Applicants must be authorized to work for any employer in the US. We are unable to sponsor or take over sponsorship of an employment visa at this time. Let’s talk about us! This is all about you, but you want to know a little about us. Jellyfish enables leaders to effectively build AI-integrated engineering teams, align engineering decisions with business initiatives and deliver the right software efficiently and on time. AI tools alone won’t transform your org—Jellyfish shows you what’s working, what’s not, and how to build high-performing teams that know how to use AI the right way.
Senior Data Engineer
Jellyfish - Orthogonal Networks, Inc.Jellyfish, also known as Orthogonal Networks, Inc., is self-described as a pioneer in engineering management platforms (EMPs). Founded with the mission to help
Senior Data Engineer Location: Remote - US Full time Department: Engineering Compensation - $190K – $240K • Offers Equity The posted range represents the possible base pay for this role. Actual compensation will depend on your experience, skills, role scope, and alignment with the position. Some postings may include more than one salary band to reflect different levels. Jellyfish is the backbone for elite engineering organizations, and our data pipelines need to be as high-performing and reliable as the teams we serve. We are looking for a Senior Data Engineer to help us build, automate, and execute the next generation of our Jellyfish data platform. Working closely with our Lead Data Architect, you’ll be responsible for implementing core data models, building production-grade CI/CD for data pipelines, and transforming raw engineering signals into highly optimized analytical layers. If you view broken pipelines and manual data patches as a technical debt to be solved and want to write code that directly impacts how the world’s best engineering leaders measure their output, you’re the perfect fit. What you’ll actually be doing: - Pipeline Execution & Modeling – You’ll maintain our end-to-end data pipelines, writing clean, modular Python and SQL. You will help translate the architectural blueprint into reality, structuring data across our Medallion layers (Bronze > Silver > Gold) for maximum performance and reliability. - Orchestration Modernization – You’ll take the lead on migrating, optimizing, and maintaining our workflow orchestration engines. You’ll eliminate pipeline bottlenecks, leverage modern fast-paths (like Pydantic v2 and async database clients), and ensure distributed tasks scale seamlessly without hitting API limits. - Data CI/CD & Infrastructure Automation – You’ll build the "paved road" for data deployments. You’ll use Terraform to provision data resources and write automated tests to validate schemas and data quality before code ever hits our isolated staging or production catalogs. - API & Caching Integration – You’ll collaborate with product developers to expose data safely. You’ll help design and optimize the application backend tiers, backend-for-frontend (BFF) layers, and Redis caching structures that protect our core data warehouse from frontend concurrency spikes. - On-Call & Observability Triage – You’ll participate in the data platform's incident response rotation. You won't just patch a failing pipeline; you’ll build deep observability, refine alerts to reduce noise, and write programmatic fixes to ensure the issue never happens again. You’re a great fit if: - Data Engineering Fluency – You have solid, production-level experience with Python, advanced SQL, and data transformation frameworks (like dbt or PySpark). You are highly comfortable working with programmatic orchestrators (such as Prefect, Dagster, or Airflow). - Warehouse & Catalog Practitioner – You know your way around enterprise data platforms (e.g., Snowflake, Databricks, BigQuery). You understand how to safely navigate environment boundaries, manage access keys securely, and write performant queries that don't balloon the cloud bill. - Automation Mindset – You look at a repeated data backfill, a manual schema fix, or an untracked data quality bug and immediately think about how to script a permanent, automated solution. - Collaborative Builder – You love working in a team. You write readable code, value thorough documentation and clear data lineage, and enjoy collaborating with application engineers to solve complex data delivery problems. - Pragmatic Problem Solver – You know when to write a perfectly optimized distributed processing job and when a simple, well-indexed database table or cached view is the smartest move to keep the business moving. Bonus Points: - You’ve survived (and thrived in) a rapidly scaling startup handling complex, multi-tenant B2B SaaS data. - You have strong opinions on data quality testing frameworks (like Great Expectations or Soda) and data-observability patterns. - You’ve worked extensively with cloud cost allocation or tracked token-level spend for LLM/AI model integrations. A list of job experiences and qualification requirements is great, but humility, a performance-driven attitude, and a team-player approach are most important to us. We love to have fun and win in the process. We only hire people who have a passion for building great companies in an environment where a sense of humor is a must. Occasional travel may be required. Applicants must be authorized to work for any employer in the US. We are unable to sponsor or take over sponsorship of an employment visa at this time. Let’s talk about us! This is all about you, but you want to know a little about us. Jellyfish enables leaders to effectively build AI-integrated engineering teams, align engineering decisions with business initiatives and deliver the right software efficiently and on time. AI tools alone won’t transform your org—Jellyfish shows you what’s working, what’s not, and how to build high-performing teams that know how to use AI the right way.
Data Engineer
Jellyfish - Orthogonal Networks, Inc.Jellyfish, also known as Orthogonal Networks, Inc., is self-described as a pioneer in engineering management platforms (EMPs). Founded with the mission to help
Data Engineer Location: Remote - US Full time Department: Engineering Pay: $165K – $205K + Equity The posted range represents the possible base pay for this role. Actual compensation will depend on your experience, skills, role scope, and alignment with the position. Some postings may include more than one salary band to reflect different levels. Jellyfish is the backbone for elite engineering organizations, and our data pipelines need to be as high-performing and reliable as the teams we serve. We are looking for a Data Engineer to join our data platform team and help us execute, automate, and maintain the next generation of our Jellyfish data platform. In this role, you’ll be a core builder—fully autonomous, highly proficient, and responsible for translating architectural blueprints into clean, production-grade pipelines. If you view manual data patches and unmonitored workflows as bugs to be squashed and want to write code that directly impacts how the world’s best engineering leaders measure their output, you’re the perfect fit. What you’ll actually be doing: - Core Pipeline Engineering – You’ll write the clean, modular Python and optimized SQL that drives our daily data transformations. You will be responsible for implementing our Medallion-layer data models (Bronze → Silver → Gold), ensuring high performance and data integrity. - Modern Orchestration & Tuning – You’ll manage and tune our workflow orchestration engines (like Prefect or Dagster). You’ll hunt down slow execution paths, optimize parameter serialization (e.g., leveraging Pydantic v2), and ensure our distributed processing jobs run efficiently. - Infrastructure as Code (IaC) – You won't just write data scripts; you'll own your infrastructure deployment. You will use Terraform to manage and provision data warehouse schemas, permissions, and tables across securely isolated staging and production catalogs. - API & Caching Integration – You’ll collaborate with product developers to expose data safely. You’ll help implement and maintain the application backend tiers, backend-for-frontend (BFF) layers, and Redis caching structures that protect our core data warehouse from frontend concurrency spikes. - On-Call & Pipeline Observability – You’ll participate in our data platform's incident response rotation. When a pipeline breaks, you won't just fix the data; you’ll refine the Datadog dashboards and alerts to ensure we catch the issue earlier next time. You’re a great fit if: - Data Engineering Fluency – You have solid, hands-on production experience with Python, advanced SQL, and data transformation concepts. You are comfortable building and scheduling workflows using programmatic orchestrators (such as Prefect, Dagster, or Airflow). - Warehouse & Catalog Practitioner – You know your way around enterprise data platforms (e.g., Snowflake, Databricks, BigQuery). You understand how to navigate environment boundaries, manage access keys securely, and write performant queries. - Automation Mindset – You look at a repeated data backfill, a manual schema fix, or an untracked data quality bug and immediately think about how to script a permanent, automated solution. - Collaborative Builder – You love working in a team. You write readable code, value thorough documentation and clear data lineage, and enjoy collaborating with application engineers to solve complex data delivery problems. - Pragmatic Problem Solver – You know when to write a perfectly optimized distributed processing job and when a simple, well-indexed database table or cached view is the smartest move to keep the business moving. Bonus Points: - You’ve worked in a rapidly scaling startup handling complex, multi-tenant B2B SaaS data. - You have experience with data quality testing frameworks (like Great Expectations or Soda). - You’ve interacted with cloud cost allocation tracking or token-level spend for LLM/AI model integrations. A list of job experiences and qualification requirements is great, but humility, a performance-driven attitude, and a team-player approach are most important to us. We love to have fun and win in the process. We only hire people who have a passion for building great companies in an environment where a sense of humor is a must. Occasional travel may be required. Applicants must be authorized to work for any employer in the US. We are unable to sponsor or take over sponsorship of an employment visa at this time. Let’s talk about us! This is all about you, but you want to know a little about us. Jellyfish enables leaders to effectively build AI-integrated engineering teams, align engineering decisions with business initiatives and deliver the right software efficiently and on time. AI tools alone won’t transform your org—Jellyfish shows you what’s working, what’s not, and how to build high-performing teams that know how to use AI the right way.
• Work on SAP rollout and data migration projects. • Plan, execute, and validate master and transactional data migration activities. • Perform data mapping, transformation, loading, and reconciliation. • Ensure data quality and integrity throughout the migration process. • Propose improvements and optimizations to migration processes. • Work with external systems responsible for master data. • Produce functional documentation and training materials.

