WHEN YOU NEED TO MEET A HIGHER STANDARD® in US | ASIA | EUROPE | AUSTRALIA
Senior Cloud Data Engineer
Location
India
Posted
30 days ago
Salary
0
Seniority
Senior
Job Description
Senior Cloud Data Engineer
CAI
Role Description We are seeking a Senior Cloud Data Engineer who will manage the existing cloud data platform to make it more scalable, reliable, and cost efficient. This position will be full-time and remote. What You’ll Do - Analyze and understand existing data warehouse implementations to support migration and consolidation efforts. - Reverse-engineer legacy stored procedures (PL/SQL, SQL) and translate business logic into scalable Spark SQL code within Databricks notebooks. - Design and develop data lake solutions on AWS using S3 and Delta Lake architecture, leveraging Databricks for processing and transformation. - Build and maintain robust data pipelines using ETL tools with ingestion into S3 and processing in Databricks. - Collaborate with data architects to implement ingestion and transformation frameworks aligned with enterprise standards. - Evaluate and optimize data models (Star, Snowflake, Flattened) for performance and scalability in the new platform. - Document ETL processes, data flows, and transformation logic to ensure transparency and maintainability. - Perform foundational data administration tasks including job scheduling, error troubleshooting, performance tuning, and backup coordination. - Work closely with cross-functional teams to ensure smooth transition and integration of data sources into the unified platform. - Participate in Agile ceremonies and contribute to sprint planning, retrospectives, and backlog grooming. - Triage, debug and fix technical issues related to Data Lakes. - Maintain and manage code repositories like Git. Qualifications - 8–12 years of experience in data engineering, data warehousing, or big data platform development. - 5+ years of hands-on experience with Databricks including Spark, Delta Lake, and performance optimization. - 4+ years of experience designing and implementing cloud-based data lake or lakehouse architectures on Databricks. - Strong expertise in Spark technologies including Spark SQL, PySpark, or Scala. - Advanced SQL and PL/SQL skills with the ability to interpret and refactor legacy stored procedures. - Strong understanding of data modeling techniques including Star Schema, Snowflake, and modern Lakehouse modeling approaches. - Proficiency in at least one programming language (Python, Scala, Java). - Hands-on experience with data modeling and warehouse design principles. - Experience designing enterprise-scale ETL/ELT pipelines and ingestion frameworks. - Strong understanding of performance tuning, partitioning strategies, and cost optimization for large-scale data platforms. - Experience implementing CI/CD pipelines and DevOps practices for data engineering workflows. - Bachelor’s degree in computer science, Information Technology, Data Engineering, or related field. - Experience working in Agile environments and contributing to iterative development cycles. Requirements - Databricks Professional or Associate Cloud Certification. - Experience with enterprise data governance, metadata management, and data catalog platforms. Company Description CAI is a global services firm with over 9,000 associates worldwide and a yearly revenue of $1.3 billion+. We have over 40 years of excellence in uniting talent and technology to power the possible for our clients, colleagues, and communities. As a privately held company, we have the freedom and focus to do what is right—whatever it takes. Our tailor-made solutions create lasting results across the public and commercial sectors, and we are trailblazers in bringing neurodiversity to the enterprise.
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Role Description Siamo Corposostenibile, il principale centro online di Nutrizione Integrativa in Italia, in forte crescita nel settore health & wellness, con un team di oltre 150 professionisti. Stiamo costruendo una piattaforma tecnologica proprietaria con forte componente data-driven: raccolta, normalizzazione e analisi dei dati provenienti da molteplici sistemi (CRM, marketing, prodotto, AI). Cerchiamo un Senior Data Engineer responsabile della progettazione e realizzazione della nostra data platform su GCP. Il focus non è solo analisi: il ruolo prevede la costruzione end-to-end del sistema, dall’acquisizione dei dati alla generazione di dashboard utilizzate in azienda. Avrai ownership diretta su come i dati vengono raccolti, strutturati e resi disponibili al business. Modalità di lavoro: - Full remote (Europa) - Collaborazione asincrona + allineamenti periodici - Richiesta sovrapposizione operativa: GMT-1 / GMT+3 Responsibilities - Progettare e sviluppare un data lake su GCP (Cloud Storage + BigQuery) - Costruire pipeline di ingestione dati da fonti eterogenee: - API (CRM, marketing tools, SaaS esterni) - database interni - eventi applicativi - Sviluppare pipeline ETL/ELT scalabili e monitorabili - Modellare i dati per analytics (star schema, layer semantico) - Garantire qualità, consistenza e affidabilità dei dati - Esporre dati tramite query layer, API o strumenti BI - Sviluppare dashboard per business e operations - Ottimizzare costi e performance su BigQuery - Definire standard e best practice per la gestione dei dati Stack Tecnologico (GCP) - Storage: Google Cloud Storage (data lake) - Data Warehouse: BigQuery - Processing: Python (batch jobs), SQL - Orchestrazione: Cloud Composer (Airflow) o equivalenti - Ingestion/Eventi: Pub/Sub (nice to have) - BI: Looker / Metabase / Superset - Infra: Docker, CI/CD Qualifications - 5+ anni di esperienza in data engineering o backend engineering con forte componente dati - Esperienza nella progettazione di data lake o data warehouse - Ottima conoscenza di Python e SQL - Esperienza con pipeline ETL/ELT e integrazione dati da API esterne - Data modeling per analytics (fact tables, dimensioni, ecc.) - Esperienza con database relazionali (PostgreSQL o simili) Requirements - Esperienza con sistemi distribuiti o scalabili - Versionamento codice (Git) - Esperienza con Docker e pipeline CI/CD - Familiarità con orchestrazione workflow (Airflow o equivalenti) Nice to have - Esperienza con streaming/event-based systems - Esperienza con strumenti BI (Metabase, Superset, Looker, etc.) - Esperienza con sistemi di tracking (event analytics) - Esperienza in ambienti startup / high-growth Soft Skills - Forte autonomia e senso di ownership - Capacità di lavorare su problemi poco definiti - Approccio pragmatico e orientato al risultato - Comunicazione chiara con team tecnici e non tecnici Benefits - Ownership completa della data platform - Impatto diretto su decisioni di business - Ambiente veloce e senza burocrazia - Collaborazione diretta con leadership tecnica - Full remote in Europa - Retribuzione commisurata all’esperienza e alle competenze del candidato. Disponibilità a discutere nel corso del colloquio.
Role Description Clear Fracture is building AI-driven data integration systems that enable organizations to connect, transform, and reason over complex data using agentic workflows. Our platform operates across cloud and on-prem environments and is designed to support multi-tenant, production-scale use cases. We are looking for a Data Engineer who operates as a software engineer first, with strong experience in data modeling and data systems. You will play a key role in building the core data layer that powers our agentic platform—designing schemas, implementing data services, and enabling reliable, scalable data flows. In addition to building core data infrastructure, you will also develop real use cases on the platform itself, helping shape how users interact with data. This includes designing data interfaces, abstractions, and tooling that make it easier to understand, model, and work with data across the system. This is not a traditional ETL-only role. You will write production code, design systems, and help define how data is represented, accessed, and understood across the platform. Qualifications - Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent practical experience. - 6+ years of professional experience in software engineering and/or data engineering roles. - Due to the nature of the work, U.S. Citizenship and the ability to obtain a Secret Clearance are required. - Strong programming skills in Python (or similar backend language). - Experience designing and implementing data models for production systems, with advanced knowledge of dimensional modeling topics like slowly changing dimensions and entity relationship diagrams. - Proficiency in SQL and experience with relational databases (e.g., PostgreSQL). - Experience building backend services or APIs that interact with data systems. - Experience designing and operating data pipelines (ETL/ELT). - Familiarity with NoSQL databases and different data storage paradigms. - Experience working with large datasets and performance optimization. - Experience with Docker and containerized development workflows. - Familiarity with Kubernetes-based environments. - Strong understanding of software engineering fundamentals (testing, version control, system design). Requirements - Design and implement logical and physical data models for complex, evolving datasets. - Define schemas and access patterns that support multi-tenant usage and application-level workflows. - Balance normalization, performance, and flexibility across different storage systems. - Partner with product and engineering teams to translate requirements into scalable data designs. - Develop real-world data use cases on top of the platform to validate and extend its capabilities. - Design and build data interfaces and abstractions that help users understand and work with data. - Contribute to systems such as data glossaries, semantic layers, and metadata and schema discovery tools. - Help define how users explore, model, and interact with data within the platform. - Translate complex data structures into intuitive, usable representations. - Build backend services and APIs that expose and operate on data models. - Implement data access layers that are reliable, maintainable, and performant. - Contribute to core application architecture where data and services intersect. - Write clean, testable, production-grade code. - Design and implement pipelines for ingesting, transforming, and validating data. - Support both batch and near-real-time processing workflows. - Build systems that handle structured, semi-structured, and unstructured data. - Enable data flows that support AI-driven and agent-based workflows. - Work with embeddings, context retrieval, and data representations used in modern AI systems. - Help design systems that make data accessible and useful for autonomous agents. - Implement validation, monitoring, and testing for data systems. - Ensure correctness, consistency, and observability of data pipelines and services. - Diagnose and resolve data-related issues in production environments. Benefits - Engineering mindset: You approach data systems as software systems, not just pipelines. - Data intuition: You understand how to model real-world complexity into clear, usable structures. - Product thinking: You care about how users interact with and understand data, not just how it is stored. - Systems thinking: You see how data flows through services, APIs, and AI systems. - Ownership: You take responsibility for the reliability and usability of what you build. - Pragmatism: You balance ideal design with real-world constraints. - Collaboration: You work effectively across engineering disciplines.
Director, Data Engineering – Automation
EmpowerWe are an equal opportunity employer with a commitment to diversity. All individuals, regardless of personal characteristics, are encouraged to apply. All qualified applicants will receive consideration for employment without regard to age, race, color, national origin, ancestry, sex, sexual orientation, gender, gender identity, gender expression, marital status, pregnancy, religion, physical or mental disability, military or veteran status, genetic information, or any other status protected by applicable state or local law.
• Lead a team of data engineers transforming data from disparate systems to enable insights and analytics for business stakeholders. • Create technical roadmaps and recommend strategies for data pipelines and integration. • Leverage cloud-based infrastructure to implement scalable, resilient, and efficient data engineering solutions. • Collaborate with data analysts, data scientists, database administrators, cross-functional teams, and business stakeholders to solve problems. • Influence architectural decisions and design patterns across the data platform. • Provide technical leadership across the software development lifecycle, from design to deployment, including hands-on contribution. • Develop project plans, facilitate prioritization timelines, allocate resources, and take ownership of assigned technical projects in a fast-paced environment. • Perform code reviews and ensure data engineers follow best-practice coding standards. • Define and validate test cases to ensure data quality, reliability, and a high level of confidence. • Continuously improve quality, efficiency, and scalability of data pipelines, reducing gaps and inconsistencies.
• Own and deliver impactful data products within WSI’s medallion architecture. • Transform raw and conformed data into governed, high-quality datasets for analytics, AI, and operational use. • Design, build, and optimize data solutions on Microsoft Fabric, including pipelines, Lakehouse/Warehouse structures, PySpark notebooks, and semantic models. • Evolve and implement data architecture patterns (medallion, SCD, CDC, orchestration, CI/CD), adapting them to real-world scale, performance, and business needs. • Ensure data quality, observability, and performance at scale. • Implement validation frameworks, monitoring, SLAs, and cost-optimized storage and compute strategies. • Partner with stakeholders to translate requirements into reusable, scalable data models and curated data products. • Drive consolidation of legacy reporting and BI tools into a unified, governed analytics platform. • Embed security, governance, and best practices, including role-based access, cataloging, and release management. • Act as a technical leader, contributing to standards, mentoring peers, and elevating overall engineering quality. • Leverage modern dev tools (e.g., GitHub Copilot or similar) to accelerate delivery and engineering efficiency.


