Senior Big Data Engineer

Data EngineerData EngineerFull TimeRemoteSeniorTeam 5,001-10,000H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

5 days ago

Salary

$160.5K - $200K / year

Seniority

Senior

Job Description

Senior Big Data Engineer

Circana

Role Description The Senior Big Data Engineer is responsible for designing, building, and delivering highly scalable big data and ETL solutions across distributed environments. This role supports projects at various and unanticipated client worksites throughout the United States, with headquarters based in Chicago, IL. The engineer works closely with cross-functional teams to develop, test, and deploy data solutions that meet business and operational needs, leveraging modern big data frameworks, cloud platforms, and agile development practices. Telecommuting is permitted. Job Responsibilities - Design and implement highly scalable ETL applications on Hadoop and Big Data ecosystems. - Develop new scripts, tools, and methodologies for streamlining and automating ETL workflows. - Deliver big data projects using Spark, Python, Scala, SQL, and Hive. - Design and create use-cases and scenarios for functional testing, integration testing, and system testing. - Work closely with Data Science, QA, Operations, and other teams to deliver on tight deadlines. - Participate in daily agile and scrum meetings and code reviews. - Coordinate with cross-functional operational teams for managing data delivery. - Write efficient, reusable, and well-documented code. - Prepare technical design documents for solutions. - Identify and address issues encountered from the data factory and provide timely solutions to incorrect or undesired results or behavior with ILD solutions. Technical Environment - Writing ETL Spark applications in PySpark and Scala, Flume. - Spark architecture, data frames, tuning Spark. - Relational databases (Oracle, PostgreSQL). - Python, SQL HQL, Hive. - Data Bricks. - Managing software systems using Hadoop, MapReduce, HDFS and all included services. - Distributed computing principles. - Big Data querying tools (Pig, Hive, Impala). - Data-warehousing and data-modeling techniques. - Core Java, Linux, SQL, scripting languages. - Cloud platforms (Azure). - Integration of data from multiple data sources. - Lambda Architecture. Qualifications - Bachelor’s degree in Computer Science, Information Systems, Computer Engineering, or related field plus 5 years of progressive experience as a Software Engineer/Developer or in Software Development required. - Experience writing ETL Spark applications in PySpark or Scala. - Experience with spark architecture, data frames, tuning spark. - Experience with relational databases (Oracle, PostgreSQL). - Experience with Python, SQL HQL, Hive. - Experience with Data Bricks. - Experience managing software systems using Hadoop, MapReduce, HDFS and all included services. - Experience with distributed computing principles. - Experience with Big Data querying tools (Hive). - Experience with data-warehousing & data-modeling techniques. - Experience with Core Java, Linux, SQL, scripting languages. - Experience with Cloud platforms (Azure). - Experience with integration of data from multiple data sources. - Experience with Lambda Architecture. - Telecommuting permitted. Circana Behaviors - Stay Curious: Being hungry to learn and grow, always asking the big questions. - Seek Clarity: Embracing complexity to create clarity and inspire action. - Own the Outcome: Being accountable for decisions and taking ownership of our choices. - Center on the Client: Relentlessly adding value for our customers. - Be a Challenger: Never complacent, always striving for continuous improvement. - Champion Inclusivity: Fostering trust in relationships engaging with empathy, respect, and integrity. - Commit to each other: Contributing to making Circana a great place to work for everyone. Location This position can be located in the following area(s): US Remote. Compensation The salary range for this role is 160,500 USD to 200,000 USD. This job is also eligible for bonus pay. Benefits - Comprehensive package of benefits including paid time off. - Medical/dental/vision insurance. - 401(k) to eligible employees. Application Process You can apply for this role through the Circana careers website or Intranet site for internal candidates.

Related Categories

Related Job Pages

More Data Engineer Jobs

Fusion Risk Management logo

Senior Data Engineer

Fusion Risk Management

Fusion Risk Management is recognized as the most innovative and fastest growing provider of cloud-based enterprise software for business continuity risk management, IT disaster recovery and crisis management. Fusion is transforming the industry and has been named a leader in Gartner's Magic Quadrant for Business Continuity Management software.

Data Engineer5 days ago
Full TimeRemoteTeam 258Since 2006

The Role We’re looking for a product-minded Senior Data Engineer to lead the buildout of a new, graph-backed enterprise data platform at Fusion. This is not a maintenance role. You will architect and own a new data platform from the ground up—designing the ingestion layer, graph and relational storage, entity resolution pipelines, and APIs that unify resilience data across customers, systems, and cloud environments. You will define how data is ingested, resolved, modeled as a graph, governed, and exposed across Fusion’s ecosystem. This platform will power dependency analysis, recovery modeling, predictive intelligence, and a new generation of resilience products. This is a high-ownership opportunity for someone who wants to build something foundational, work with graph and network data structures at scale, and create a platform that becomes core to Fusion’s long-term strategy. Key Responsibilities • Architect and build Fusion’s next-generation data platform from the ground up, including a graph database layer, relational storage, and data lake components. • Design and implement scalable ETL/ELT pipelines to ingest and transform data from customer environments, internal systems, and third-party platforms using managed connector frameworks. • Build and maintain entity resolution pipelines that match, merge, and link records across disparate sources into a unified graph model. • Design and implement graph data models that represent operational dependencies, recovery sequences, and organizational relationships—supporting traversal queries across complex, multi-hop networks. • Develop temporal and bitemporal data models that capture how entities and relationships change over time, enabling historical replay and audit-grade versioning. • Establish best practices for data governance, quality, observability, lineage, and security across the platform. • Build backend services and APIs that expose graph queries, entity lookups, and data capabilities to downstream applications and ML systems. • Support containerized deployment across both managed cloud and customer-hosted (reverse SaaS) environments. • Partner with product and engineering leadership to shape the long-term data platform roadmap. Knowledge, Skills, and Abilities • Strong SQL expertise with experience designing performant data models and production-grade transformations. • Experience with graph databases or network-oriented data problems—e.g., dependency mapping, supply chain graphs, knowledge graphs, social network analysis, or similar domains where relationships between entities are central to the data model. • Familiarity with graph query languages or traversal patterns (e.g., Gremlin, Cypher, SPARQL, or recursive SQL) and an understanding of when graph representations outperform relational models. • Experience with entity resolution, record linkage, or deduplication at scale—whether using probabilistic matching frameworks, deterministic rules, or ML-assisted approaches. • Experience building data lakes, warehouses, and distributed data systems from the ground up. • Strong understanding of ETL/ELT patterns, orchestration (e.g., Airflow, Dagster, dbt, or similar), and pipeline reliability. • Experience with open-source or self-hosted data infrastructure components and a pragmatic sense for build-vs-buy trade-offs. • Experience designing and implementing enterprise system integrations, connectors, and APIs. • Strong engineering fundamentals with focus on scalability, performance, monitoring, and security. • Familiarity with containerized deployments and orchestration (Docker, Kubernetes, Helm, or similar) (bonus). • Experience with temporal or bitemporal data modeling patterns (bonus). • Experience with Salesforce or ServiceNow data models and integrations (bonus). • Strong Python or Java skills for building backend services (bonus). • Familiarity with AI-assisted development tools (e.g., Copilot, Cursor, Claude Code, or similar) and comfort using them to accelerate engineering workflows. • Product-oriented mindset with the ability to make pragmatic architectural decisions in ambiguous, early-stage environments Qualifications (Education and Experience) • Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related field. • 5+ years of experience in data engineering, backend data systems, or platform engineering roles. • Experience building or significantly expanding a data platform or data infrastructure in a production environment. • Experience working with graph, network, or highly relational data structures in a professional or academic setting. • Experience working in cloud-native environments (Azure preferred). • Experience designing enterprise-grade integrations and connectors. • Experience with entity resolution or record-matching techniques (nice to have). • Experience with containerized deployments (Docker, Kubernetes) (nice to have). Milestones for the First Six Months In one month, you will: – Complete onboarding and gain deep familiarity with Fusion’s products, data strategy, and long-term platform vision – Assess the current state of data infrastructure and evaluate graph database and entity resolution options against platform requirements – Align with product and engineering leadership on platform scope and priorities In three months, you will: – Deliver the first foundational components of the new data platform—core graph storage layer, initial ingestion pipelines, and entity resolution workflow – Implement initial ETL/ELT workflows and at least one production-grade system connector – Establish standards for graph data modeling, governance, and observability In six months, you will: – Own and deliver the first production-ready version of Fusion’s new data platform, including graph traversal APIs and entity resolution – Have multiple ingestion pipelines and connectors operating reliably in production – Serve as the architectural owner of the platform, driving roadmap and technical direction – Propose and lead the next phase of platform expansion—temporal modeling, advanced graph analytics, and ML feature pipelines Compensation & Benefits The annual base salary range for this position is $135,000-$155,000, depending on the candidate’s experience, qualifications, and relevant skill set. The position is also eligible for an annual bonus. Fusion offers a comprehensive benefits package including medical, dental, vision, and a 401(k) plan. Disclaimers Fusion is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, disability, age, pregnancy, military service or discharge status, genetic information, sex, sexual orientation, gender identity, or national origin. Nothing in this job posting should be construed as an offer or guarantee of employment.

United States
$135K - $155K / year

Role Description Estamos buscando un/a Ingeniero/a de Datos especializado/a en Celonis para participar en proyectos de mejora de procesos y analítica avanzada, trabajando junto a equipos de negocio y tecnología. - Levantamiento y documentación de procesos de negocio. - Modelado de procesos en la plataforma Celonis. - Definición y seguimiento de KPIs. - Realización de análisis funcionales. - Identificación de oportunidades de automatización y optimización. - Colaboración con áreas de negocio para comprender sus necesidades. - Participación en proyectos de mejora continua y transformación digital. Qualifications - Experiencia trabajando con Celonis. - Conocimiento de procesos empresariales. - Experiencia en modelado de datos y análisis funcional. - Dominio de SQL. - Capacidad para interactuar con usuarios de negocio. Requirements - Conocimientos de Process Mining. - Experiencia en herramientas ETL. - Conocimientos de Python. - Experiencia en metodologías Agile. Benefits - Incorporación a proyectos estratégicos de transformación. - Formación y certificaciones. - Excelente ambiente de trabajo. - Flexibilidad laboral. - Desarrollo profesional y plan de crecimiento. - Retribución competitiva.

Spain
Correlation One logo

Senior Data Engineer

Correlation One

Correlation One is a technology company that is on a mission “to create equal access to data-driven jobs of tomorrow.” As an employer, the company is known

Data Engineer5 days ago

• Develop long-term technical vision for the team • Design technical strategy to develop components, frameworks and libraries to create seamless applications for our customers • Hands-on development of scalable components and applications • Mentor engineers including senior and junior engineers on their team • Improve and uphold standards for engineering and operational excellence • Create and encourage comprehensive documentation for code, libraries, and components to facilitate collaboration and knowledge sharing within the team

Latin America

Role Description We are looking for a Data Engineer with exceptional SQL skills to join our growing team at uMotif. This role is primarily focused on writing, optimizing, and maintaining complex SQL across our clinical data infrastructure on AWS. You will be the go-to person for query performance, data extraction, and SQL-driven pipeline development — ensuring our clinical and product teams always have fast, reliable access to the data they need. You will work closely with TechOps, DevOps, Engineering, and Clinical Operations to build well-crafted SQL solutions that underpin uMotif’s patient engagement and clinical trial platforms. Please note, this is a remote-working role; however you will need to align with east-coast (EST) working hours to be able to liaise with the team in the UK time-zone (BST). What will you do? - SQL Development & Optimization - Write and maintain complex SQL queries across large-scale clinical datasets, including multi-table joins, window functions, CTEs, and subqueries. - Diagnose and tune slow-running queries using execution plans, index analysis, and query profiling tools — delivering measurable performance improvements. - Establish and enforce SQL best practices, coding standards, and review processes across the data team. - Optimize SQL for cost and performance — with a deep understanding of how the complete system handles query execution. - Build and manage indexes, partitioning strategies, and materialized views to support performant analytical and operational queries. - Data Pipeline Development - Design and build ELT/ETL pipelines with SQL at their core, leveraging AWS services such as: - AWS Aurora for structured data processing - AWS Lambda and Step Functions for orchestration and transformation triggers - Write transformation logic using dbt, including tests, documentation, and lineage tracking. - Ensure pipelines are performant, reliable, and well-monitored — with clear alerting when things go wrong. - Analytics & Reporting Enablement - Build clean, well-documented SQL datasets and semantic layers that empower self-serve analytics across clinical and product teams. - Partner with TechOps and clinical stakeholders to translate reporting requirements into robust, reusable SQL data products. - Support dashboard and reporting tools including Grafana and Amazon QuickSight with optimized underlying queries. - Data Quality & Governance - Implement SQL-based data quality checks and validation frameworks across critical pipelines. - Support data cataloging, lineage tracking, and access control in line with healthcare data standards. - Assist with compliance requirements for clinical trial data, including audit trails and row-level security where needed. - Collaboration & Continuous Improvement - Participate actively in code reviews, with a particular focus on SQL quality, readability, and performance. - Mentor junior engineers and analysts on SQL patterns, optimisation techniques, and data engineering fundamentals. - Contribute to technical documentation, runbooks, and data engineering best practices. - Drive root cause analysis for data incidents and improve pipeline reliability over time. Qualifications - 4+ years of experience in data engineering or a closely related role, with SQL as a core daily skill. - Demonstrable expertise in writing complex, production-grade SQL — including window functions, recursive CTEs, lateral joins, and advanced aggregations. - Proven track record of query optimization: reading execution plans, diagnosing bottlenecks, and delivering significant performance improvements. - Strong hands-on experience with AWS data services, particularly Aurora, Redshift, Athena, and S3. - Experience building ELT/ETL pipelines at scale, with SQL transformation at their core. - Proficiency in dbt for data transformation, testing, and documentation. - Experience with Python for pipeline orchestration and data processing tasks. - Familiarity with workflow orchestration tools such as Apache Airflow or AWS MWAA. - Understanding of data quality principles, access control, and governance (e.g. AWS Lake Formation). - Experience working in a GitLab or similar CI/CD environment. - Strong analytical mindset, attention to detail, and excellent communication skills. Technical Skills - Core SQL & Data Tools - SQL (expert level) — Aurora/PostgreSQL, Athena/Presto, Redshift SQL - dbt (data build tool) - Python - Apache Airflow / AWS MWAA - AWS Data Services - AWS Aurora (PostgreSQL-compatible) - Amazon CloudWatch (Data Insights, Performance Insights) - AWS Lambda & Step Functions - Amazon S3 - Amazon Redshift — including query tuning, WLM configuration, and distribution strategies - Amazon Athena — federated queries, partitioning, columnar formats (Parquet, ORC) - AWS Lake Formation - AWS Glue (supporting role) - Other Tools - GitLab CI/CD - Amazon QuickSight / Grafana - Terraform (nice to have) Other Important Skills - Strong analytical and troubleshooting capabilities with a systematic approach to query debugging. - Ability to work independently and collaboratively across cross-functional teams. - Strong documentation and communication skills — able to explain SQL logic and data decisions to non-technical stakeholders. - Continuous improvement mindset with a focus on data reliability, performance, and quality. - Ability to manage multiple priorities in a fast-paced, mission-driven environment. Nice to have - Experience in healthcare, life sciences, or clinical trials data environments. - Familiarity with healthcare data standards such as HL7 or FHIR. - AWS certifications such as AWS Certified Data Engineer – Associate or AWS Certified Solutions Architect. - Knowledge of Infrastructure as Code using Terraform. - Exposure to streaming data pipelines using AWS Kinesis or Apache Kafka.

EST (UTC-5)
$110K - $130K / year