Cummins is an equal opportunity employer. Our policy is to provide equal employment opportunities to all qualified persons without regard to race, sex, color, disability, national origin, age, religion, union affiliation, sexual orientation, veteran status, citizenship, gender identity, or other status protected by law.
Data Architect
Location
United States
Posted
19 hours ago
Salary
$123.0K - $150.4K / year
Seniority
Senior
Job Description
Data Architect
Cummins Inc.
• Design and automate scalable data ingestion and transformation pipelines across relational, event-based, and unstructured sources. • Build and maintain frameworks to monitor, detect, and resolve data quality and integrity issues. Implement data governance practices, including metadata management, data access, and retention policies. • Architect and guide development of reliable, efficient, and scalable ETL/ELT data pipelines with monitoring and alerting. • Design physical data models and optimize database structures, indexing, and relationships for performance. • Test, optimize, and troubleshoot data pipelines to ensure stability and performance. • Develop and manage large-scale data storage solutions using distributed and cloud platforms (e.g., data lakes, Hadoop, NoSQL databases). • Drive automation and modernization of data infrastructure and integration processes to support agile analytics initiatives.
Job Requirements
- College, university, or equivalent degree in relevant technical discipline, or relevant equivalent experience required.
- This position may require licensing for compliance with export controls or sanctions regulations.
- Intermediate experience in a relevant discipline area is required. Knowledge of the latest technologies and trends in data engineering are highly preferred and includes:
- Familiarity analyzing complex business systems, industry requirements, and/or data regulations
- Background in processing and managing large data sets
- Design and development for a Big Data platform using open source and third-party tools
- SPARK, Scala/Java, Map-Reduce, Hive, Hbase, and Kafka or equivalent college coursework
- SQL query language
- Clustered compute cloud-based implementation experience
- Experience developing applications requiring large file movement for a Cloud-based environment and other data extraction tools and methods from a variety of sources
- Experience in building analytical solutions
- Intermediate experiences in the following are preferred:
- Experience with IoT technology
- Experience in Agile software development
- Dimensional Modeling Mastery — Deep expertise in designing enterprise‑scale dimensional models (star, snowflake, constellation) with strong command of fact table grain definition, surrogate key strategies, slowly changing dimensions (Types 1–6), bridge tables, and late‑arriving data handling.
- Advanced SQL Engineering — Highly proficient in writing complex, high‑performance SQL, including window functions, CTE‑driven transformations, query plan analysis, cost‑based optimization, partitioning strategies, and performance tuning across large, distributed datasets.
- Snowflake Architecture & Engineering — Hands‑on experience with Snowflake internals including micro‑partitioning, clustering keys, result‑set caching layers, warehouse sizing/auto‑suspend tuning, Snowpipe/Streams/Tasks orchestration, Time Travel, Zero‑Copy Cloning, and secure data sharing patterns.
- Graph Database & Cypher Proficiency — Strong experience with Neo4j or equivalent graph platforms, including graph schema design, Cypher query optimization, graph algorithms (PageRank, community detection, pathfinding), and integration of graph workloads with analytical and relational systems.
- Microsoft Fabric Ecosystem — Practical experience with Fabric Lakehouse architecture, Delta Lake optimization, Data Engineering pipelines, Data Factory orchestration, KQL‑based Real‑Time Analytics, semantic model creation, and integration with Power BI and OneLake governance.
- SAP S/4HANA Data Structures — Familiarity of SAP S/4HANA data models (FI/CO, MM, SD, PP), CDS views, OData services, SLT/SDI/ODP‑based extraction patterns, and harmonization of SAP transactional data into cloud‑based analytical platforms.
- Cloud Data Architecture — Strong understanding of distributed data processing, ELT/ETL orchestration, event‑driven ingestion (Kafka/Event Hub), metadata‑driven frameworks, schema evolution, and data lifecycle management across cloud environments (Azure preferred).
- Data Governance & Metadata Management — Experience implementing enterprise data catalogs, lineage tracking, data quality rules, master data integration, and security models (RBAC/ABAC, row‑level and column‑level security).
- Performance Engineering & Optimization — Ability to diagnose bottlenecks across compute, storage, and network layers; optimize workloads for cost and performance; and design scalable, fault‑tolerant data architectures.
- Cross‑Platform Integration — Experience integrating heterogeneous systems (SAP, Snowflake, Fabric, graph DBs, APIs, streaming platforms) into unified analytical ecosystems with strong focus on interoperability and data consistency.
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Senior Data Engineer
Oddin.ggMarket-leading esports betting ecosystem: real-time odds, iFrame, risk management, official esports data, and more.
• Design and maintain ingestion & transformation flows (dbt, SQL, Python). • Build reliable pipelines from ETLs, CDC, APIs, CSVs, and event streams into analytical models. • Ensure deduplication and data integrity for BI reporting. • Develop semantic and analytical models (fact/dimension tables, star/snowflake schemas). • Optimize query performance in BI warehouse (currently Athena with Iceberg but in the future maybe ClickHouse, Redshift, BigQuery or Starrocks). • Implement versioned entities and audit trails for business data. • Build pre-aggregations and semantic layers (Cube, dbt metrics and models). • Support dashboards and client-facing reports in Grafana. • Design KPIs and metrics with product, risk, and operations teams. • Implement validation checks (dbt tests, Pydantic). • Monitor data freshness, pipeline SLAs, and dashboard accuracy. • Set up alerts and documentation for business users. • Work closely with analysts, product managers, and operations to understand BI needs. • Expose data via APIs/semantic layers for self-service exploration. • Contribute to the design of rules for reporting policies and metric definitions.
• Manage and optimise Snowflake data warehouses. • Overhaul Snowflake warehouse performance through materialisation strategies, dynamic tables, and clustering optimisation. • Build scalable ETL/ELT pipelines using dbt, Airflow, Fivetran, and AWS DMS. • Build ingestion pipelines using DMS and Fivetran. • Develop modelling layers in dbt using medallion architecture principles. • Transform raw data into clean, reliable, and business-ready models using dbt and AI-assisted tooling for documentation and testing. • Integrate data from multiple sources, including CRM systems, payment platforms, gaming platforms, and other operational systems. • Own initiatives focused on data quality improvements and monitoring, including anomaly detection and automated alerting. • Monitor and optimise platform performance, cost efficiency, and security. • Work closely with cross-functional teams across product development, data, operations, and analytics functions. • Collaborate with the data science team and support colleagues with reporting and analytics activities when required. • Support the organisation’s move towards real-time data ingestion and ETL using technologies such as DMS, Kafka, and Kinesis. • Help mentor other engineers within the team. • Contribute to the team's AI strategy and promote effective use of AI tools across engineering workflows. • Produce and maintain clear, comprehensive documentation to support scalability, transparency, and long-term platform sustainability. • Communicate effectively with both technical and non-technical stakeholders and proactively raise blockers when encountered.
• Design, develop, and maintain scalable data pipelines and integration solutions within the Azure ecosystem • Build and support enterprise data platforms that consolidate information from multiple source systems • Develop ETL/ELT processes to ingest, transform, validate, and distribute business-critical data • Contribute to data modelling, data quality, and governance initiatives • Support Master Data Management (MDM) processes and ensure consistency across systems • Work with structured and semi-structured data from ERP, business applications, APIs, databases, and external sources • Collaborate with business and technical stakeholders to translate requirements into scalable data solutions • Participate in deployment, testing, monitoring, and continuous improvement activities • Contribute to CI/CD pipelines, automation, and DevOps best practices • Actively participate in Agile ceremonies and team collaboration
As a Data Engineer, you will be responsible for building and maintaining scalable, reliable data infrastructure that powers the organization's data platform. You will develop ETL/ELT pipelines for both batch and real-time data processing, integrate data from multiple sources, optimize data warehouse models, and ensure data quality, performance, and scalability. Working closely with Data Scientists, Analytics Engineers, BI specialists, and IT/DevOps teams, you will enable data-driven decision-making and support key business functions in a fast-growing international e-commerce environment. •Build and maintain ETL/ELT pipelines (batch and streaming) •Work with event-driven architectures and real-time data processing •Integrate multiple data sources via APIs and replication •Design and optimize data warehouse models •Set up monitoring, logging, and alerting •Continuously improve performance, scalability, and reliability




