Transforming behavioral health through technology with a human touch
Senior Data Engineer
Location
United States
Posted
2 days ago
Salary
$128K - $176K / year
Seniority
Senior
Job Description
Senior Data Engineer
Lyra Health
• Join a team of innovative engineers building and scaling the core data infrastructure, pipelines, and services that power our products • Design and implement a robust data warehouse to support a wide range of analytics and operational use cases • Develop and maintain efficient data pipelines and curated data sets by working closely with stakeholders to gather requirements and translate them into technical solutions • And of course—write code every day!
Job Requirements
- 4+ years of experience as a Data Engineer.
- Proven track record of writing high-quality, production-ready Python and delivering impactful, scalable data projects.
- Expertise in SQL-based data modeling and transformation frameworks (e.g. dbt), with a focus on schema management, performance, and data governance.
- Strong experience with modern ingestion tools: configuring, deploying on data integration platforms (e.g. Airbyte), commercial ELT pipelines (e.g. Fivetran), and database replication methods like Change Data Capture (CDC).
- Strong orchestration background: hands-on experience building robust, fault-tolerant pipelines using Apache Airflow (including custom alerting, advanced logging, and automated retry mechanisms).
- Experience with modern data visualization and analytics tools such as Sigma or Tableau.
- Strong knowledge of Snowflake Cloud Data Warehouse Architecture, including native ingestion patterns (e.g. Snowpipe for continuous streaming) and open table formats (e.g. Apache Iceberg).
- Strong knowledge of Snowflake administration including familiarity with maintaining security compliance, role based access control (RBAC), external authentication methods, and managing downstream data consumption in external platforms.
- A 'QA-First' engineering mindset: Strong experience in end-to-end data quality assurance, regression testing, and data validation. Capable of troubleshooting data discrepancies, and performance-tuning complex pipelines.
Benefits
- Comprehensive healthcare coverage (including medical, dental, vision, FSA/HSA, life and disability insurances)
- Lyra for Lyrians; coaching and therapy services
- Equity in the company through discretionary restricted stock units
- Competitive time off with pay policies including vacation, sick days, and company holidays
- Paid parental leave
- 401K with up to 3% matching
- Monthly tech allowance
- We like to spread joy throughout the year with well-being perks and activities, surprise swag, regular community celebration…and more!
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Role Description We are looking for a skilled GCP Data Engineer to join our EPEO - Data and AI Ops team. In this role, you will play a critical part in designing, developing, and maintaining our Security Data Lake and associated data products. The core requirement for this role is deep technical expertise in Google Cloud Platform (GCP) and hands-on experience building scalable cloud data pipelines (ETL/ELT). Additionally, experience with Cribl (for log stream routing, shaping, and reduction) is considered a strong asset and a major plus. - Design, develop, and maintain robust data pipelines using GCP native services. - Build and manage data quality frameworks to ensure the integrity and reliability of security data assets. - Integrate diverse data sources and security tools via APIs to centralize security oversight. - Optimize database performance, query efficiency, and storage costs within Google BigQuery. - (Preferred) Utilize Cribl to route, shape, and enrich incoming security telemetry and log data. - Develop, deploy, and monitor automated data pipelines (ETL/ELT) using Python, SQL, and GCP services (such as Cloud Functions, Cloud Scheduler, and Dataflow). - Manage and optimize schema designs, partitioning, and clustering in Google BigQuery to ensure cost-effective and high-performance querying. - Implement and scale data quality and auditing frameworks using GCP Dataplex with centralized rules metadata configuration. - Design and maintain robust API integrations (e.g., EAMS, TrendMicro, and other threat detection platforms) to ingest critical security logs. - [Preferred/Good to Have] Configure and manage Cribl Stream pipelines (sources, destinations, routes, and functions) to parse, mask, enrich, and route security logs. - [Preferred/Good to Have] Implement log reduction strategies in Cribl to optimize data ingestion and lower downstream storage costs. - Partner with security teams to deliver actionable data products, reporting views, and tactical dashboards to prevent service outages. Qualifications - Bachelor’s degree in Computer Science, Computer Engineering, Data Science, Information Technology, or a related technical field (or equivalent combination of education and experience). - 8+ years of professional experience in Data Engineering, Cloud Data Warehousing, or software development. - 5+ years of hands-on experience designing and implementing production-grade solutions on Google Cloud Platform (GCP), specifically utilizing native services such as Google BigQuery, Cloud Run/Functions, and Cloud Storage, Dataflow, PubSub. - High proficiency in Python and advanced SQL for building, optimizing, and troubleshooting complex ETL/ELT pipelines. - Excellent written and verbal communication skills, with a proven ability to collaborate effectively with cross-functional teams in an agile environment. Benefits - Immediate medical, dental, vision and prescription drug coverage. - Flexible family care days, paid parental leave, new parent ramp-up programs, subsidized back-up child care and more. - Family building benefits including adoption and surrogacy expense reimbursement, fertility treatments, and more. - Vehicle discount program for employees and family members and management leases. - Tuition assistance. - Established and active employee resource groups. - Paid time off for individual and team community service. - A generous schedule of paid holidays, including the week between Christmas and New Year’s Day. - Paid time off and the option to purchase additional vacation time.
• Define and drive the product vision, strategy, and roadmap for the Data Curation Pipeline team • Lead the modernization of healthcare data pipelines that ingest and transform raw claims, prescription, and EHR data into standardized enterprise datasets • Identify opportunities to improve data quality, scalability, usability, speed, and downstream product readiness • Partner with internal stakeholders to understand business needs and translate them into product priorities and roadmap decisions • Own and manage the team backlog, ensuring clear prioritization of high-impact initiatives, technical enablers, defects, and enhancements • Serve as the primary product point of contact for engineering, architecture, and cross-functional stakeholders • Define acceptance criteria and participate in user acceptance testing and release validation • Work closely with software developers, data engineers, architects, QA, analytics teams, and business stakeholders throughout the product lifecycle • Lead Agile product processes including backlog refinement, sprint planning input, prioritization, and stakeholder communication • Build strong relationships with internal business stakeholders, product teams, and data consumers across the organization
• Design, build, and manage end-to-end data pipelines across the medallion architecture—specifically the bronze, silver (base vault with DBT and orchestration tools, business vault), and gold layers. • Ingest and process raw data using Spark and Amazon EMR for scalable, distributed computation. • Develop and automate data transformations for the base vault using DBT (Data Build Tool) to standardize and model data efficiently.
• Build and maintain ingestion pipelines (REST APIs, SFTP, database replication) into Snowflake • Write dbt models across bronze/silver/gold layers following established conventions • Integrate with third-party ESPs (DotDigital, Adestra, FastTrack): push segments, pull back campaign events (opens, clicks, bounces, conversions) • Develop and maintain AWS Lambda functions for data extraction and reverse ETL • Build data models for identity resolution, segmentation, and attribution • Configure and manage Airflow DAGs for orchestration • Implement data quality checks, monitoring, and alerting • Manage Snowflake and AWS infrastructure via Terraform • Investigate and resolve data incidents (pipeline failures, data quality issues, PII exposure)




