Capco, a Wipro company, is a management & technology consultancy dedicated to the financial services & energy industries

Data Engineer (Databrick + Pyspark)

Data EngineerData EngineerFull Time Remote SeniorTeam 1,001-5,000Since 1998H1B SponsorCompany Site LinkedIn

Location

India

Posted

5 days ago

Salary

Seniority

Senior

English

Job Description

Job Title: Data Engineer (PySpark / Databricks) Experience: 5–9 Years Location: Pune (Hybrid – Capco Office) Job Summary We are looking for a skilled Data Engineer with strong expertise in PySpark, Databricks, and modern data engineering practices. The ideal candidate will have hands-on experience in building scalable data pipelines, working with large datasets, and leveraging cloud-based data platforms. Key Responsibilities Design, develop, and maintain scalable ETL/ELT data pipelines Work extensively with PySpark and Apache Spark for large-scale data processing Build and manage workflows using Apache Airflow Develop and optimize data solutions on Databricks (Jobs, Delta Lake) Work with cloud-based data lakes (S3 or equivalent) Write efficient and complex SQL queries for data transformation and analysis Run and manage Spark workloads on EMR Serverless or other managed Spark platforms Ensure data quality, reliability, and performance optimization of pipelines Must Have Skills Strong hands-on experience with PySpark and Apache Spark internals Experience with Databricks (Jobs, Delta Lake) Proficiency in Apache Airflow for workflow orchestration Solid experience building ETL/ELT pipelines at scale Strong SQL skills and experience with Data Warehouse (DWH) systems Experience running Spark workloads on EMR Serverless or managed Spark platforms Hands-on experience with cloud data lakes (S3 or equivalent) Good to Have Skills Experience with Delta Lake / Apache Iceberg Exposure to streaming frameworks (Spark Structured Streaming, Kafka) Familiarity with CI/CD pipelines for data engineering workflows Knowledge of data governance, cataloging, and lineage tools

Related Categories

Data Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More Data Engineer Jobs

Data Engineer

Reply designs and implements innovative solutions in the areas: Digital Services, Technology and Consulting.

Data Engineer5 days ago

Full Time RemoteTeam 10,001+Since 1996H1B Sponsor

Company Site LinkedIn

• Projetar, desenvolver e manter pipelines de dados e transformações no Palantir Foundry (PySpark, SQL, Code Workbook). • Construir e gerenciar ontologias, tipos de objetos e relacionamentos dentro da camada de Ontologia do Foundry. • Implementar e suportar fluxos de trabalho e agentes de IA/ML usando Palantir AIP. • Colaborar com partes interessadas do negócio para traduzir requisitos em soluções de dados escaláveis. • Garantir a qualidade dos dados, governança e melhores práticas de segurança em todos os pipelines. • Criar e manter dashboards e aplicações de dados dentro do Foundry Workshop. • Fornecer orientações técnicas e documentação para decisões de arquitetura de dados.

PySpark Python SQL

View details: Data Engineer

Brazil

Apply

Senior Data Engineer – Databricks, AWS

Booker DiMaio

Engineering Innovation and Transformation

Data Engineer5 days ago

Full Time RemoteTeam 11-50H1B No Sponsor

Company Site LinkedIn

• Design, develop, and maintain scalable data ingestion, transformation, and publishing pipelines utilizing Databricks and AWS services. • Implement and optimize Databricks Lakehouse capabilities including Unity Catalog, Delta Live Tables, Auto Loader, Databricks SQL, and Delta Sharing. • Build and maintain governed data products supporting operational, analytical, reporting, and machine learning workloads. • Develop and support medallion architecture data pipelines and enterprise data quality frameworks. • Implement data governance controls, metadata management, lineage tracking, and data retention policies. • Collaborate with cloud engineers, architects, cybersecurity specialists, and business stakeholders to deliver secure, production-ready solutions. • Optimize platform performance through partitioning, clustering, caching, workload tuning, and query optimization techniques. • Support analytics enablement through semantic layers, dashboards, reporting solutions, and self-service data access capabilities. • Participate in architecture reviews, operational readiness activities, platform modernization initiatives, and continuous improvement efforts. • Create and maintain technical documentation, design artifacts, operational procedures, and engineering standards.

Amazon Redshift AWS Cloud Cyber Security ETL Python SQL Unity

View details: Senior Data Engineer – Databricks, AWS

Maryland

Apply

Data Engineer

SuperStaff

Comprehensive BPO, RPO, and Call Center Outsourcing Solutions for Growing Businesses

Data Engineer6 days ago

Full Time RemoteTeam 201-500Since 2009H1B No Sponsor

Company Site LinkedIn

• Data Ingestion & ETL: Build Python/SQL pipelines to ingest invoices, orders, and catalogs. • Perform historical backfills via SFTP/API and manage Airflow DAGs. • Instance Configuration: Set up custom fields, product filtering logic, and sales workflows. • Manage SSO and user provisioning for new rollouts. • AI-Augmented Engineering: Leverage AI coding assistants (Copilot, Cursor) and LLMs to accelerate Python/SQL script generation, data mapping, and debugging. • Customer Communication & Projects: Act as a technical point of contact. Translate complex data issues into clear updates for customers. Own project milestones from kickoff to "Go Live." • Integration & Automation: Build Workato recipes and connect customer ERPs via APIs/webhooks to ensure real-time data flow. • QA & Troubleshooting: Triage HubSpot support tickets, debug data discrepancies in large data sets, and deploy production fixes. • Documentation: Maintain customer data mappings and internal technical runbooks.

Airflow Apache Cloud ETL Google Cloud Platform Python SQL Go

View details: Data Engineer

Colombia

$6,000K / month

Apply

Lead Data Engineer

Egen

Engineering new possibilities with platforms, data, and generative AI

Data Engineer6 days ago

Full Time RemoteTeam 501-1,000Since 2000H1B Sponsor

Company Site LinkedIn

• Architect and optimize large-scale data platforms on Google Cloud, with BigQuery as the analytical backbone • Design and build unified batch and streaming pipelines that handle high-volume, mission-critical workloads • Lead infrastructure-as-code practices, ensuring environments are repeatable, secure, and version-controlled • Implement open table formats to enable cross-cloud and cross-engine data interoperability • Establish automated data quality, metadata, and lineage practices across the data estate • Partner with data scientists, analysts, and product teams to translate business needs into reliable data products • Mentor engineers, review designs, and raise the bar on engineering standards

Apache BigQuery Cloud Spark Terraform

View details: Lead Data Engineer

United States

$143.4K - $168.7K / year

Apply