Databricks Data Engineer

Data EngineerData EngineerFull TimeRemoteSeniorTeam 10,001+H1B SponsorCompany SiteLinkedIn

Location

Brazil

Posted

14 days ago

Salary

0

Seniority

Senior

Job Description

Databricks Data Engineer

Compass

• Develop and maintain batch and streaming data pipelines in Databricks using PySpark and Spark SQL; • Implement pipelines following the established Medallion architecture (Bronze, Silver, and Gold); • Assist in migrating legacy pipelines and jobs from tools such as IBM DataStage, Azure Data Factory, and Azure Synapse Analytics to Databricks Workflows; • Help migrate routines and notebooks from Databricks on Azure to AWS; • Develop and version notebooks and code using Databricks Repos and Git; • Implement basic tests and data quality routines in pipelines; • Support dimensional modeling of the Data Warehouse (Star Schema) with the architecture team; • Collaborate with the DevOps team to automate deployments and CI/CD; • Participate in implementing data governance with Unity Catalog; • Monitor and provide support for production data pipelines.

Job Requirements

  • Experience as a Data Engineer or Mid-Level Data Engineer;
  • Knowledge of Databricks and Apache Spark (PySpark and Spark SQL);
  • Experience with Python and advanced SQL;
  • Experience with cloud environments, preferably AWS (S3, Glue Catalog);
  • Knowledge of Lakehouse architecture and Delta Lake;
  • Experience with batch and/or streaming data pipelines;
  • Familiarity with Git and code versioning;
  • Basic knowledge of dimensional modeling (Star Schema / Data Warehouse);
  • Basic knowledge of CI/CD applied to data;
  • Differentials:**
  • Knowledge of IBM DataStage (migration or legacy support);
  • Knowledge of Azure Data Factory and Azure Synapse Analytics;
  • Experience with Databricks on Azure;
  • Knowledge of data governance and catalog solutions (Unity Catalog or similar).

Related Categories

Related Job Pages

More Data Engineer Jobs

Bluefish AI logo

Senior Data Engineer

Bluefish AI

AI Marketing Suite for Brands

Data Engineer14 days ago
Full TimeRemoteTeam 11-50Since 2024H1B No Sponsor

• Design, build, and maintain scalable data pipelines that ingest, transform, and validate large volumes of data across multiple sources and channels. • Improve the scalability, reliability, and performance of our data pipelines to support rapidly growing workloads and new data streams. • Contribute to the design and implementation of our Data Lake architecture, enabling reliable data storage and reuse across teams. • Manage and optimize data ingestion workflows, including data collected from web scrapers, third-party vendors, and internal systems. • Monitor pipeline health, investigate incidents, and implement improvements to increase system reliability and observability. • Support the onboarding and integration of new AI channels and data sources into the platform. • Collaborate with teams across the organization to ensure data generated by different systems can be reused effectively for analytics and business intelligence. • Identify and resolve performance bottlenecks in distributed systems, including rate limiting, concurrency, and throughput constraints. • Advise engineering and product teams on data architecture, data quality, and best practices for managing scalable data workflows. • Continuously evaluate and improve our data platform to support the company’s rapid growth and evolving product needs.

Germany
Capital Rx logo

Underwriting Data Engineer

Capital Rx

Affordable Pharmacy Benefits, Powered by Modern Infrastructure.

Data Engineer14 days ago
Full TimeRemoteTeam 501-1,000Since 2017H1B No Sponsor

• Thoroughly understand all Underwriting and Rebate Administration data sources, fields, and relationships, including historical claims, Medi-Span tables, NCPDP tables, and Capital Rx’s Book of Business data to interpret trends, identify patterns, and conduct complex data analyses in support of Underwriting, Rebate Administration, and Sales business goals • Investigate new data sources for the purpose of building & maintaining data pipelines to clean, transform, and aggregate disparate data • Use agile software development to create, maintain, and improve back-end systems for data extracts, pricing processes, and analytic tools to advance the department’s technical capabilities • Model front-end views and back-end data sources to draw a comprehensive picture of the user experience and analytic pipelines throughout the system and to enable powerful data analysis • Troubleshoot and customize infrastructure code in SQL, R, and Python to diagnose and solve data or process issues • Work closely with our Underwriting and Clinical teams to help build complex algorithms that provide useful insights into our data • Assist in the analysis and development of new models and front-end tools that can be used to make predictions and answer questions for financial modeling & reporting • Collaborate with various teams across the company on data sources and analytic methodology in support of Underwriting, Rebate Administration, and company objectives • Provide standard and ad-hoc analytics to support the Underwriting and Rebate Administration teams for the successful completion of bid opportunities and rebate payment management

United States
$76.8K - $121K / year
INflow Federal logo

Data Engineer

INflow Federal

Be the Difference. #BeINflow

Data Engineer14 days ago
Full TimeRemoteTeam 51-200Since 2013H1B No Sponsor

• Design, implement, and maintain data pipelines and ETL processes supporting ingestion, transformation, and validation of mission data • Develop and optimize data models and schemas across relational and non-relational databases to support system integrations and analytics • Collaborate with system architects, integration developers, and data analysts to ensure data consistency, security, and integrity across cloud environments • Implement data migration and synchronization between legacy systems, applications, and modern cloud platforms • Utilize AWS services (Glue, Lambda, S3, RDS, Redshift, Kinesis) to build and sustain scalable and fault-tolerant data infrastructure • Support data validation and reconciliation, performing quality checks and developing reports to ensure accuracy • Integrate data from APIs, streaming sources, and file-based systems into centralized repositories or data lakes • Automate data workflows using infrastructure-as-code and CI/CD principles to ensure repeatability and efficiency • Monitor and troubleshoot data pipeline performance, ensuring adherence to SLAs and operational reliability • Implement data encryption, masking, and access controls in compliance with DoD cybersecurity policies and RMF requirements • Support development of dashboards and analytics products, enabling data-driven insights for mission stakeholders • Maintain documentation and metadata repositories, including data dictionaries, lineage, and technical specifications • Participate in Agile sprints, contributing to backlog refinement, testing, and cross-functional collaboration

Virginia
Runware logo

Platform Data Engineer

Runware

Generative media in the blink of an API.

Data Engineer14 days ago
Full TimeRemoteTeam 11-50Since 2023H1B No Sponsor

• Design, build, and maintain **schemas and data models** • Optimize table layout, partitioning, indexing, and compression for high-volume data • Ensure fast, efficient querying for logs, requests, metrics, and performance traces • Maintain ingestion pipelines for billions of records • Build robust pipelines for: - API logs - Model inference logs - Error events - Usage & integration events - GPU & system metrics • Implement ETL/ELT workflows to transform raw data into analytics-ready structures • Ensure quality, reliability, and real-time availability of data sources • Build tooling to support large-scale **log analysis** • Enable deep investigation into latency, throughput, errors, and bottlenecks • Provide the raw data foundation for E2E inference-time monitoring • Help debug production issues using logs and traces • Work closely with DevOps, ML, and backend engineering • Integrate pipelines with monitoring tools (Prometheus, Grafana, Datadog, OpenTelemetry) • Automate ingestion and cleanup tasks • Build internal libraries or utilities to support monitoring and debugging workflows • Provide clean data interfaces for the Data Expert (dashboards, monitoring, analytics) • Support engineering teams by exposing the right logs and metrics • Contribute to debugging, RCA (root cause analysis), and performance optimization initiatives

United Kingdom