Ceresti Health logo
Ceresti Health

Everyone else treats the patient. We activate the caregiver—because that’s where dementia care really begins.

Senior Data Engineer

Data EngineerData EngineerFull TimeRemoteSeniorTeam 11-50Since 2013H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

9 days ago

Salary

0

Seniority

Senior

Bachelor Degree8 yrs expEnglishAirflowAWSCloudPostgreSQLPythonSQLVault

Job Description

Senior Data Engineer

Ceresti Health

• Design and own Ceresti’s end-to-end data architecture: a landing zone with secure cloud object storage for raw partner files and API payloads, validated ingestion pipelines into our transactional Postgres, and a curated analytics layer that decouples reporting and AI workloads from production • Build ingestion pipelines for the data we receive today, including partner data files (CSV/JSON/XML/HL7/X12 as applicable) and REST/SFTP API integrations with schema validation, quarantine of bad records, and full lineage from raw bytes to curated row • Stand up and operate the curated layer (data warehouse / lakehouse-lite) so analytics and ML models can consume data without slowing down the transactional system • Choose, integrate, and operate the smallest set of tools needed, including object storage, an orchestrator (Dagster, Prefect, Airflow, etc.), dbt or similar for transformations, a single validation library (Great Expectations / Pandera / Soda) • Design and enforce data governance for a HIPAA-regulated environment: PHI/PII classification, encryption in transit and at rest, role-based access, audit logging, retention and minimum-necessary policies, and de-identification where appropriate • Partner with backend, ML, product, and clinical stakeholders to define data contracts with our health plan and ACO partners and hold the line on data quality • Build and maintain reliable feature data for ML models, including embeddings (e.g., pgvector) and curated feature tables for risk stratification, engagement, and outcomes work • Instrument the data platform for observability including pipeline SLAs, data freshness, schema drift, quality metrics, and act on what the data tells you • Participate fully in our Agile process: backlog grooming, sprint planning, demos, and retrospectives • Mentor engineers across the team on SQL, schema design, and the craft of building data systems that are boring in the best possible way

Job Requirements

  • BS/BA degree or higher in Computer Science, Engineering, or a related technical field
  • 8+ years of professional data engineering experience, with a track record of shipping production data systems end-to-end
  • Mastery of PostgreSQL: schema design, indexing, query tuning, partitioning, logical replication, JSONB, extensions (pg_partman, pg_cron, pgvector, etc.), and operating Postgres at scale
  • Strong experience designing and operating data pipelines, including file-based ingestion (SFTP / object storage drops) and API-based ingestion (REST, webhooks)
  • Hands-on experience with one or more cloud platforms (AWS preferred) and their data primitives: object storage (S3), managed Postgres
  • Experience designing data warehouses and/or data lakes and the judgment to know which one a given problem actually needs
  • Strong experience with dbt (or equivalent SQL-based transformation framework) and modern data modeling patterns (Kimball dimensional, Data Vault, One Big Table — and an opinion about when each is right)
  • Experience with at least one orchestration framework (Dagster, Prefect, or Airflow) and a clear point of view on which to use when
  • Strong Python skills for ingestion, validation, and tooling
  • Experience with data validation and data-quality frameworks (Great Expectations, Pandera, Soda, or equivalent)
  • Experience with change-data-capture from Postgres (logical replication, or equivalent)
  • Data governance experience in a HIPAA-regulated environment or, at minimum, demonstrated instincts for protecting PHI and PII (encryption, least privilege, audit, de-identification, BAA-aware vendor selection); HITRUST or SOC 2 experience is a strong plus
  • Comfortable with infrastructure-as-code and CI/CD for data systems
  • Experience supporting ML workloads: building feature tables, managing training data, serving features at inference time; familiarity with embeddings, vector search (pgvector or equivalent), and LLM integration patterns (RAG, prompt-grounded analytics) is a plus
  • Excellent written and verbal communication skills: you can explain a tricky schema decision to a business stakeholder and a data contract to a partner with equal clarity
  • Demonstrated experience working in Agile/Scrum teams

Benefits

  • Competitive salary and benefits package
  • Opportunities for professional growth and development
  • Collaborative and dynamic work environment
  • Flexible work arrangements and remote work options
  • Access to cutting-edge technologies and tools

Related Categories

Related Job Pages

More Data Engineer Jobs

Ocean Technologies Group logo

Data Engineering Team Lead

Ocean Technologies Group

Powering teams that deliver for people & planet, with maritime learning, crew and fleet management and GRC solutions

Data Engineer9 days ago
Full TimeRemoteTeam 201-500Since 2020H1B No Sponsor

• Lead a team of data engineers, ensuring alignment on goals, quality and delivery timelines. • Mentor and coach team members to support their technical and professional growth. • Drive engineering excellence by promoting best practices in coding, architecture, testing and observability. • Plan and manage team capacity, sprints and milestones to ensure predictable delivery. • Own the design, evolution and operation of ingestion and transformation pipelines on Apache Airflow and the analytical serving layer on Apache Druid. • Make architectural calls on concurrency, partitioning, memory sizing and cost — including JVM heap and direct-memory tuning on the Druid cluster. • Collaborate closely with DevOps on the Kubernetes / EKS platform that hosts our Druid and Airflow workloads. • Ensure robust data validation, reconciliation and verification so that reporting is trustworthy. • Collaborate with other Team Leaders, Development Managers, Architects and Product Owners to align engineering execution with business objectives. • Contribute to the evolution of development processes, CI/CD pipelines and DevOps practices. • Foster a culture of continuous improvement, innovation and knowledge sharing.

Philippines
Nuvitek logo

Data Engineer

Nuvitek

Speed Up True Modernization

Data Engineer9 days ago
Full TimeRemoteTeam 51-200Since 2012H1B No Sponsor

• Design, develop, and maintain scalable RAG/CAG pipelines for AI-powered applications • Build and optimize document ingestion workflows for structured and unstructured data sources • Manage and maintain vector stores to support semantic search and retrieval capabilities • Develop OCR processing pipelines for historical and modern document collections spanning 1781–2025 • Optimize retrieval performance, relevance tuning, and ranking strategies for LLM-based systems • Build reliable data pipelines that support integrations with large language models and AI services • Collaborate with engineers, UX teams, product owners, and stakeholders to deliver scalable AI solutions • Ensure data quality, integrity, security, and performance across ingestion and retrieval systems • Implement monitoring, logging, and troubleshooting for AI and data processing workflows • Contribute to architecture decisions, technical documentation, and engineering best practices • Participate in agile pod-based development teams and continuous improvement initiatives

United States
$115K - $125K / year
Job Closed
CSpring logo

Enterprise Data Warehouse ETL/Data Engineer

CSpring

Unlocking the power and potential of data.

Data Engineer9 days ago
Full TimeRemoteTeam 51-200H1B Sponsor

• Design and develop reusable, parameter-driven ingestion and transformation pipelines • Build and maintain medallion architecture solutions • Develop performant ELT workflows • Create and optimize PySpark notebooks and distributed processing jobs • Design dimensional data models • Implement data vault patterns • Optimize distributed SQL workloads • Implement CI/CD processes • Build monitoring, logging, and auditing solutions • Lead or contribute to cloud modernization initiatives

Illinois

Data Engineer

UnitedHealth Group

UnitedHealth Group is a healthcare and well-being company that’s dedicated to improving the health outcomes of millions around the world. We are comprised of

Data Engineer9 days ago

Role Description We are seeking a highly skilled Senior Data Engineer to design, build, and optimize scalable data and AI platforms on Azure. This role will focus on enabling enterprise data pipelines, real-time processing, and AI/ML model integration using Databricks and modern cloud technologies. You will enjoy the flexibility to telecommute* from anywhere within the U.S. as you take on some tough challenges. - Design and develop scalable data pipelines using Databricks, Apache Spark, and Python on Azure - Build cloud-native solutions leveraging Azure Data Lake, Azure Data Factory, and Delta Lake - Collaborate with Data Science and AI teams to operationalize ML models and embed them into production workflows - Develop and maintain feature stores, model input pipelines, and real-time/streaming frameworks - Ensure data quality, governance, and security across the full data lifecycle - Build reusable frameworks, accelerators, and automation scripts to improve engineering efficiency - Optimize performance, scalability, and reliability of data workflows and batch/streaming pipelines - Participate in Agile development processes, including sprint planning, code reviews, and CI/CD pipelines - Provide production support and on-call coverage, ensuring system stability and rapid issue resolution - Design, develop, and deploy AI-powered solutions to address complex business challenges with emphasis on responsible use of AI Qualifications - Bachelor’s degree in Computer Science, Engineering, or IT related field - 6+ years of experience in Data Engineering with Python/PySpark - 6+ years of experience in building ETL/ELT pipelines using Databricks - 6+ years of experience working in Agile environments - 5+ years of strong experience in SQL / PL-SQL - 4+ years of experience with Azure Databricks and Delta Lake architecture - 4+ years of hands-on experience with CI/CD (GitHub Actions, Azure DevOps) - 3+ years of hands-on experience with Azure cloud services (ADF, ADLS, Databricks) - 2+ years of experience with Databricks Delta Live Tables (DLT) - 2+ years of experience with unit testing, validation, and pipeline testing frameworks Requirements - Familiarity with medallion architecture and SCD2 implementations - AI builder: Design, develop, and deploy AI-powered solutions to address complex business challenges with emphasis on responsible use of AI - Experience building enterprise-scale data platforms - Strong skills in performance tuning and debugging large-scale pipelines - Experience with real-time/streaming frameworks (Structured Streaming) - Ability to work in distributed, cross-functional global teams - Exposure to GenAI tools (e.g., GitHub Copilot) for engineering productivity - Strong understanding of secure coding practices and vulnerability remediation - Proven ability to analyze logs, troubleshoot production issues, and optimize performance - Demonstrated capability to design and deploy AI-powered solutions responsibly Benefits - Comprehensive benefits package - Incentive and recognition programs - Equity stock purchase - 401k contribution (all benefits are subject to eligibility requirements)

United States
$72.8K - $130K / year
Job Closed