Capco, a Wipro company, is a management & technology consultancy dedicated to the financial services & energy industries
Data Engineer (Databrick + Pyspark)
Location
India
Posted
5 days ago
Salary
0
Seniority
Senior
Job Description
Data Engineer (Databrick + Pyspark)
Capco
Job Title: Data Engineer (PySpark / Databricks) Experience: 5–9 Years Location: Pune (Hybrid – Capco Office) Job Summary We are looking for a skilled Data Engineer with strong expertise in PySpark, Databricks, and modern data engineering practices. The ideal candidate will have hands-on experience in building scalable data pipelines, working with large datasets, and leveraging cloud-based data platforms. Key Responsibilities Design, develop, and maintain scalable ETL/ELT data pipelines Work extensively with PySpark and Apache Spark for large-scale data processing Build and manage workflows using Apache Airflow Develop and optimize data solutions on Databricks (Jobs, Delta Lake) Work with cloud-based data lakes (S3 or equivalent) Write efficient and complex SQL queries for data transformation and analysis Run and manage Spark workloads on EMR Serverless or other managed Spark platforms Ensure data quality, reliability, and performance optimization of pipelines Must Have Skills Strong hands-on experience with PySpark and Apache Spark internals Experience with Databricks (Jobs, Delta Lake) Proficiency in Apache Airflow for workflow orchestration Solid experience building ETL/ELT pipelines at scale Strong SQL skills and experience with Data Warehouse (DWH) systems Experience running Spark workloads on EMR Serverless or managed Spark platforms Hands-on experience with cloud data lakes (S3 or equivalent) Good to Have Skills Experience with Delta Lake / Apache Iceberg Exposure to streaming frameworks (Spark Structured Streaming, Kafka) Familiarity with CI/CD pipelines for data engineering workflows Knowledge of data governance, cataloging, and lineage tools
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Data Engineer
ReplyReply designs and implements innovative solutions in the areas: Digital Services, Technology and Consulting.
• Projetar, desenvolver e manter pipelines de dados e transformações no Palantir Foundry (PySpark, SQL, Code Workbook). • Construir e gerenciar ontologias, tipos de objetos e relacionamentos dentro da camada de Ontologia do Foundry. • Implementar e suportar fluxos de trabalho e agentes de IA/ML usando Palantir AIP. • Colaborar com partes interessadas do negócio para traduzir requisitos em soluções de dados escaláveis. • Garantir a qualidade dos dados, governança e melhores práticas de segurança em todos os pipelines. • Criar e manter dashboards e aplicações de dados dentro do Foundry Workshop. • Fornecer orientações técnicas e documentação para decisões de arquitetura de dados.
• Design, develop, and maintain scalable data ingestion, transformation, and publishing pipelines utilizing Databricks and AWS services. • Implement and optimize Databricks Lakehouse capabilities including Unity Catalog, Delta Live Tables, Auto Loader, Databricks SQL, and Delta Sharing. • Build and maintain governed data products supporting operational, analytical, reporting, and machine learning workloads. • Develop and support medallion architecture data pipelines and enterprise data quality frameworks. • Implement data governance controls, metadata management, lineage tracking, and data retention policies. • Collaborate with cloud engineers, architects, cybersecurity specialists, and business stakeholders to deliver secure, production-ready solutions. • Optimize platform performance through partitioning, clustering, caching, workload tuning, and query optimization techniques. • Support analytics enablement through semantic layers, dashboards, reporting solutions, and self-service data access capabilities. • Participate in architecture reviews, operational readiness activities, platform modernization initiatives, and continuous improvement efforts. • Create and maintain technical documentation, design artifacts, operational procedures, and engineering standards.
Data Engineer
SuperStaffComprehensive BPO, RPO, and Call Center Outsourcing Solutions for Growing Businesses
• Data Ingestion & ETL: Build Python/SQL pipelines to ingest invoices, orders, and catalogs. • Perform historical backfills via SFTP/API and manage Airflow DAGs. • Instance Configuration: Set up custom fields, product filtering logic, and sales workflows. • Manage SSO and user provisioning for new rollouts. • AI-Augmented Engineering: Leverage AI coding assistants (Copilot, Cursor) and LLMs to accelerate Python/SQL script generation, data mapping, and debugging. • Customer Communication & Projects: Act as a technical point of contact. Translate complex data issues into clear updates for customers. Own project milestones from kickoff to "Go Live." • Integration & Automation: Build Workato recipes and connect customer ERPs via APIs/webhooks to ensure real-time data flow. • QA & Troubleshooting: Triage HubSpot support tickets, debug data discrepancies in large data sets, and deploy production fixes. • Documentation: Maintain customer data mappings and internal technical runbooks.
• Architect and optimize large-scale data platforms on Google Cloud, with BigQuery as the analytical backbone • Design and build unified batch and streaming pipelines that handle high-volume, mission-critical workloads • Lead infrastructure-as-code practices, ensuring environments are repeatable, secure, and version-controlled • Implement open table formats to enable cross-cloud and cross-engine data interoperability • Establish automated data quality, metadata, and lineage practices across the data estate • Partner with data scientists, analysts, and product teams to translate business needs into reliable data products • Mentor engineers, review designs, and raise the bar on engineering standards




