Engineering new possibilities with platforms, data, and generative AI
Lead Data Engineer
Location
United States
Posted
17 hours ago
Salary
$143.4K - $168.7K / year
Seniority
Senior
Job Description
Lead Data Engineer
Egen
• Architect and optimize large-scale data platforms on Google Cloud, with BigQuery as the analytical backbone • Design and build unified batch and streaming pipelines that handle high-volume, mission-critical workloads • Lead infrastructure-as-code practices, ensuring environments are repeatable, secure, and version-controlled • Implement open table formats to enable cross-cloud and cross-engine data interoperability • Establish automated data quality, metadata, and lineage practices across the data estate • Partner with data scientists, analysts, and product teams to translate business needs into reliable data products • Mentor engineers, review designs, and raise the bar on engineering standards
Job Requirements
- 7+ years in data engineering, with at least 2 years in a lead or senior individual contributor capacity on Google Cloud-based platforms
- BigQuery (Advanced):** Deep knowledge of BigQuery architecture, including partitioning, clustering, slot management, storage optimization, and query execution tuning
- Streaming & Batch Pipelines:** Strong hands-on experience building unified pipelines using **Dataflow** (Apache Beam), **Dataproc**, and **Pub/Sub**
- Infrastructure as Code:** Production experience developing and managing cloud infrastructure with **Terraform**
- Open Table Formats:** Working knowledge of **Apache Iceberg**, including its role in enabling cloud and engine interoperability (e.g., across BigQuery, Spark, Snowflake)
- Data Governance:** Experience with **Dataplex** and **Data Catalog** for automated data quality checks, metadata tagging, and column-level lineage from source to destination
Benefits
- Comprehensive Health Insurance
- Paid Leave (Vacation/PTO)
- Paid Holidays
- Sick Leave
- Parental Leave
- Bereavement Leave
- 401 (k) Employer Match
- Employee Referral Bonuses
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
• This role supports the large-scale modernization of legacy systems by leading the migration of over 10TB of documentation and records into a cloud-based environment. • The position focuses on transforming document-heavy workflows into structured, digital processes, while enabling seamless data integration across systems using MuleSoft APIs. • The work directly supports enhanced analytics, reporting, and operational efficiency for a mission-critical enterprise platform. • Lead end-to-end data migration from legacy systems to cloud platforms, including data mapping, transformation, and validation (Salesforce-centric). • Analyze and convert unstructured, document-heavy data into structured, analytics-ready formats using Document AI. • Implement data quality processes (cleansing, deduplication, reconciliation) to ensure accuracy and completeness post-migration. • Support phased migration and system cutover with minimal operational disruption. • Design, build, and maintain MuleSoft integrations, including API development for real-time and batch data exchange. • Apply data transformation logic (e.g., DataWeave) and troubleshoot integration/data flow issues across systems. • Contribute to Salesforce Data 360 implementation, including data harmonization, identity resolution, and unified profiles. • Ensure proper data ingestion, lineage, metadata management, and traceability across the ecosystem. • Ensure compliance with data governance, privacy, and security requirements (e.g., PII). • Collaborate in Agile teams; support testing, documentation, and audit readiness across migration and integration efforts.
Senior Curriculum Developer – GenAI Applications and Data Engineering
ClouderaAt Cloudera, we believe that data can make what is impossible today, possible tomorrow.
Role Description At Cloudera, you will bridge the gap between complex data technologies and actionable learning. You will design and develop hands-on technical training, workshops, and certification content focused on Generative AI Applications, Agentic AI, MLOps, and modern Data Engineering for customers, partners, and internal teams. As a Senior Technical Curriculum Developer You will: - Design and develop technical training focused on Generative AI, Agentic AI workflows, and AI application development including the NVIDIA AI Agent stack. - Create instructor-led guides, lab manuals, Jupyter notebook-based exercises, and on-demand video content. - Develop technical assessments, certification questions, and demo environments. - Develop real-world AI agents and data scenarios across Banking, Telecom, Retail, Healthcare, and other enterprise domains. Workshop Delivery & Technical Integration: - Lead and conduct technical ML/AI and Agent development workshops directly for customers and internal teams. - Build and maintain labs using Python, AI/ML frameworks, Vector databases, and NVIDIA-accelerated data pipelines (e.g., cuDF, Apache Spark on GPUs). - Partner with Product, Engineering, and Solution Architecture teams to ensure training reflects the latest platform capabilities. Continuous Improvement: - Iterate on course content based on learner feedback, instructor input, and emerging industry trends like LLMOps. - Support internal enablement initiatives and maintain technical walkthroughs. Qualifications - Python, React, Node.js, HTML, and Markdown. - Data & Infrastructure: SQL, Linux, Bash scripting, Kubernetes, and Docker. - Traditional ML (PyTorch or similar), Generative AI, and building enterprise Agentic AI workflows. Familiar with OpenAI v1 API and how to build with it. - Experience with the NVIDIA AI Agent & Data Engineering stack, including NVIDIA NIM, NeMo Retriever, NeMo Guardrails, and cuDF for accelerated data processing. - AI Coding agents (Cursor, Claude, Codex, Gemini) and modern IDEs. Requirements - Total 6+ years of data and application development experience with 4+ years in curriculum development, technical training, AI/ML/Data Engineering, or MLOps. - Proven ability to conduct hands-on workshops and simplify complex AI concepts for technical and non-technical audiences. - Expertise in Agent frameworks (CrewAI, LangChain, NVIDIA AI-Q), Vector databases, RAG pipelines, and Git/version control. You may also have: - Advanced Architecture Knowledge of Data Lakehouse architectures, Apache Spark, Kafka, and Airflow. - Governance Familiarity with AI governance, Responsible AI, and NVIDIA OpenShell for secure agent runtimes. - Platform & Operations Exposure to Cloudera environments, LMS platforms, and DevOps/CI/CD practices. - Multimedia Experience with certification development and video recording/editing tools. Benefits - Generous PTO Policy - Support work life balance with Unplugged Days - Flexible WFH Policy - Mental & Physical Wellness programs - Phone and Internet Reimbursement program - Access to Continued Career Development - Comprehensive Benefits and Competitive Packages - Paid Volunteer Time - Employee Resource Groups
Role Description If you have an interest in being part of one of the fastest growing industries in the nation, you may consider wanting to work for Trulieve! If you have a desire to help others in need through your efforts, this may be the role for you! At Trulieve, we strive to bring our patients the relief they need in a product they can trust. Our plants are hand-grown in an environment specially designed to reduce unwanted chemicals and pests, keeping the process as natural as possible at every turn. Our products are designed to alleviate seizures, severe and persistent muscle spasms, pain, nausea, loss of appetite, and other symptoms associated with serious medical conditions such as cancer. Our specially trained staff works hand-in-hand with physicians to provide the right products and the correct dosage to ensure patients get the compassionate care they need. Requisition ID: 19863 Remote Work Available: Yes Job Title: Senior Data Engineer Department: Data Engineering Reports to: Senior Director of Data Platform Engineer Location: Remote Responsibilities - Design and Implement Snowflake-Native Data Architectures - Lead the creation and optimization of modern data architectures on Snowflake, including multi-layer pipelines (bronze/silver/gold), real-time streaming ingestion, and governed analytics layers. - Ensure that the architecture supports high-volume data workloads, near-real-time freshness requirements, and integrates seamlessly with upstream operational systems and downstream analytics consumers. - Lead the Development of Complex, End-to-End Data Pipelines Using Native Snowflake Services - Architect and build data pipelines using Dynamic Tables for declarative SQL-based transformations with automated dependency management and incremental refresh. - Implement real-time and near-real-time ingestion using Snowpipe and Snowpipe Streaming. - Design event-driven and procedural pipeline logic using Streams and Tasks for complex orchestration scenarios (MERGE, SCD patterns, external function calls). - Leverage Snowpark (Python) for advanced transformations that require procedural logic beyond SQL. - Collaborate with Data Scientists, Analysts, and Other Stakeholders - Work closely with analytics and data science teams to understand data requirements and translate them into scalable, performant Snowflake solutions. - Build and maintain Semantic Views and governed consumption layers that provide self-service access to curated data. - Provide technical guidance on Snowflake best practices for data usage, cost optimization, and warehouse sizing. - Ensure Data Security, Governance, and Compliance Standards Are Met - Implement and manage Snowflake data governance features including Dynamic Data Masking, Row Access Policies, Object Tagging, and Data Classification. - Establish and maintain data governance frameworks ensuring data quality and compliance with relevant regulations (SOX, GDPR, HIPAA). - Manage role-based access control (RBAC), data sharing, and cross-account governance using Snowflake's native security model. Qualifications - Deep Understanding of Data Engineering Concepts - Extensive knowledge of data modeling, including designing and maintaining relational, dimensional, and semi-structured data models within Snowflake. - Proficiency in data warehousing concepts with hands-on experience designing multi-layer transformation pipelines (staging, intermediate, marts). - Strong understanding of incremental processing patterns, change data capture (CDC), and slowly changing dimension (SCD) strategies. - Expertise in Snowflake Platform and Native Data Pipeline Services - Deep proficiency with Snowflake Dynamic Tables (TARGET_LAG, REFRESH_MODE, pipeline dependency graphs, incremental vs. full refresh). - Hands-on experience with Snowpipe and Snowpipe Streaming for continuous and real-time data ingestion. - Strong knowledge of Snowflake Streams and Tasks for event-driven and procedural pipeline orchestration. - Experience with Snowpark (Python/Scala) for complex data transformations and UDFs/UDTFs. - Familiarity with Snowflake Cortex AI functions for embedding AI/ML capabilities into data pipelines. - Proficiency in SQL, Python, and Data Engineering Frameworks - Advanced SQL skills including window functions, CTEs, recursive queries, semi-structured data handling (VARIANT, OBJECT, ARRAY), and performance optimization. - Python proficiency for Snowpark development, automation scripting, and integration work. - Experience with modular SQL transformation development, testing, and documentation using native Snowflake patterns and shared engineering standards. - Familiarity with orchestration platforms such as Apache Airflow / Astronomer for pipeline scheduling and monitoring. - Experience with Real-Time Data Processing and Streaming Architectures - Hands-on experience designing solutions for real-time and near-real-time data pipelines using Snowpipe Streaming and Kafka connectors. - Understanding of event-driven architectures and their integration with Snowflake's continuous data pipeline features. - Experience with change data capture (CDC) patterns and tools (Debezium, Fivetran, custom CDC). - Strong Knowledge of Cloud Infrastructure and Cost Optimization - Expertise in cloud platforms (AWS, Azure, or GCP) with a focus on integration with Snowflake (external stages, storage integrations, PrivateLink). - Experience with Snowflake cost management including warehouse sizing strategies, auto-suspend/resume, resource monitors, and query optimization. - Familiarity with infrastructure-as-code tools (Terraform, Pulumi) for managing Snowflake resources declaratively. Contributions - Lead the Technical Design of New Projects - Responsible for making critical decisions regarding Snowflake architecture patterns, product direction, and delivery tradeoffs, including when to use Dynamic Tables, Streams/Tasks, Materialized Views, and external orchestration based on business requirements and user needs. - Develop and enforce best practices for pipeline development, testing, deployment, and monitoring within the team. - Design and document data contracts and SLAs for pipeline freshness (TARGET_LAG), data quality, and availability. - Mentor and Develop Junior Engineers - Provide technical leadership and mentorship to junior and mid-level engineers, fostering a culture of continuous learning and improvement. - Lead code reviews, pair programming sessions, and technical workshops focused on Snowflake-native development patterns and engineering best practices. - Create reusable patterns, templates, and documentation for common pipeline scenarios (CDC ingestion, SCD management, real-time aggregations). - Operate as a Product Minded Data Engineering Leader - Partner with business stakeholders, analytics teams, and engineering leaders to shape the roadmap for data products and platform capabilities based on business value, user needs, and operational priorities. - Translate ambiguous business problems into clearly defined product requirements, success metrics, delivery plans, and prioritized engineering work. - Own the lifecycle of key data products and shared platform services, including intake, prioritization, stakeholder alignment, adoption, and continuous improvement. - Continuously Evaluate and Improve Existing Systems - Regularly review current data systems to identify opportunities for migration to modern patterns (for example, migrating legacy Streams/Tasks to Dynamic Tables where appropriate). - Monitor pipeline health, refresh performance, and cost efficiency using Snowflake's INFORMATION_SCHEMA, ACCOUNT_USAGE, and alerting capabilities. - Implement optimizations including incremental refresh tuning, warehouse right-sizing, and query performance improvements. Experience - Substantial Professional Experience in Data Engineering - Typically requires 7+ years of experience in data engineering or related fields. - Proven track record of designing and implementing large-scale data solutions on Snowflake or similar cloud data platforms. - 2+ years of hands-on experience with Snowflake-native pipeline services (Dynamic Tables, Streams/Tasks, Snowpipe). Training & Certifications - Snowflake and Cloud Data Engineering Certifications - Preferred certifications include SnowPro Advanced: Data Engineer, SnowPro Core, or AWS/Azure/GCP data engineering certifications. - Continuous learning through relevant certifications and training to stay current with Snowflake platform releases and modern data engineering practices. - Familiarity with Snowflake's release cycle and ability to evaluate new features (for example, Cortex AI, Document AI, Iceberg Tables) for team adoption. Benefits - Salary will be commensurate with experience. - A comprehensive benefits package including paid time off is offered with this position. Company Description Trulieve provides equal employment opportunities to all employees and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, pregnancy or any other characteristic protected by federal, state or local laws.
• Lead the migration to Microsoft Fabric. • Rebuild existing pipelines, workflows, and workspaces from our Azure environment. • Expand and scale the platform to create value for users. • Design and build ingestion for new sources, create and validate transformation code, and surface data in intuitive semantic layers. • Maintain and improve data trust. • Work in a medallion architecture. • Put engineering quality first. • Connect data to consumption. • Consider storage and lifecycle.




