LABEL MAKERS & GLOBAL SERVICES
Data Engineer
Location
Brazil
Posted
139 days ago
Salary
0
Seniority
Senior
Job Description
Data Engineer
IDT BY INDET GROUP
• Design, implement, and validate ETL/ELT data pipelines–for batch processing, streaming integrations, and data warehousing, while maintaining comprehensive documentation and testing to ensure reliability and accuracy. • Maintain end-to-end Snowflake data warehouse deployments and develop Denodo data virtualization solutions. • Recommend process improvements to increase efficiency and reliability in ELT/ETL development. • Stay current on emerging data technologies and support pilot projects, ensuring the platform scales seamlessly with growing data volumes. • Architect, implement and maintain scalable data pipelines that ingest, transform, and deliver data into real-time data warehouse platforms, ensuring data integrity and pipeline reliability. • Partner with data stakeholders to gather requirements for language-model initiatives and translate into scalable solutions. • Create and maintain comprehensive documentation for all data processes, workflows and model deployment routines. • Should be willing to stay informed and learn emerging methodologies in data engineering, and open source technologies.
Job Requirements
- 5+ years of experience in ETL/ELT design and development, integrating data from heterogeneous OLTP systems and API solutions, and building scalable data warehouse solutions to support business intelligence and analytics.
- Excellent English communication skills.
- Effective oral and written communication skills with BI team and user community.
- Demonstrated experience in utilizing python for data engineering tasks, including transformation, advanced data manipulation, and large-scale data processing.
- Design and implement event-driven pipelines that leverage messaging and streaming events to trigger ETL workflows and enable scalable, decoupled data architectures.
- Experience in data analysis, root cause analysis and proven problem solving and analytical thinking capabilities.
- Experience designing complex data pipelines extracting data from RDBMS, JSON, API and Flat file sources.
- Demonstrated expertise in SQL and PLSQL programming, with advanced mastery in Business Intelligence and data warehouse methodologies, along with hands-on experience in one or more relational database systems and cloud-based database services such as Oracle, MySQL, Amazon RDS, Snowflake, Amazon Redshift, etc.
- Proven ability to analyze and optimize poorly performing queries and ETL/ELT mappings, providing actionable recommendations for performance tuning.
- Understanding of software engineering principles and skills working on Unix/Linux/Windows Operating systems, and experience with Agile methodologies.
- Proficiency in version control systems, with experience in managing code repositories, branching, merging, and collaborating within a distributed development environment.
- Interest in business operations and comprehensive understanding of how robust BI systems drive corporate profitability by enabling data-driven decision-making and strategic insights.
Benefits
- Please attach CV in English.
- The interview process will be conducted in English.
- Only accepting applicants from LATAM.
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
• Perform data modeling by analyzing and structuring datasets to ensure proper relationships and adherence to industry best practices. • Handle special data by creating and managing equivalence tables, including specific cases such as “verticals”, aiming for greater standardization and efficiency. • Build pipelines that ensure full traceability from client file delivery through to loading into the Gold layer in Azure Databricks. • Identify opportunities for source-side automation to reduce reliance on manual files and increase efficiency. • Propose and implement efficient migration strategies, such as incremental loads, to avoid high costs from daily full loads. • Familiarity with code documentation. • Experience with agile methodologies. • Familiarity with developing process monitoring and alerts. • Ability to communicate technical solutions in a simple, pragmatic manner. • Proactive in supporting the team.
• Organization and Structuring of Data Repositories • Perform data modeling, analyzing and structuring datasets to ensure proper relationships and adherence to market best practices • Handle special data cases by creating and managing equivalence/mapping tables, including specific cases such as “verticals,” aiming for greater standardization and efficiency • Development of Data Pipelines and Flows • Build pipelines that ensure full traceability, from client file delivery through to loading into the Gold layer on Google Cloud Platform (GCP) • Identify opportunities for source-side automation to reduce dependence on manual files and increase efficiency • Migration and Optimization of Databases • Propose and implement efficient migration strategies, such as incremental loads, avoiding high costs associated with daily full loads • Familiarity with code documentation (desirable) • Experience with agile methodologies (desirable) • Skills in developing and optimizing APIs for data consumption (desirable) • Familiarity with developing monitoring and alerts for processes implemented on GCP (desirable) • Ability to communicate technical solutions in a simple and pragmatic way (soft skill) • Proactive in supporting the team (soft skill)
• Design, implement, and maintain robust, scalable data pipelines using Google Cloud Platform (GCP) and Amazon Web Services (AWS). • Develop and implement effective data models to support complex analytics and automated reporting. • Work closely with product teams and squads to understand business requirements and translate them into viable technical solutions. • Ensure data governance, quality, and compliance with established standards by implementing data engineering best practices. • Implement efficient ETL/ELT processes and optimize the performance of data pipelines to ensure data accuracy and integrity. • Provide technical support to resolve data integration issues and ensure continuous availability of data systems. • Actively participate in agile initiatives, collaborating in sprints and ceremonies to ensure high-quality, on-time deliveries.
• Design and implement robust, production-grade pipelines using Python, Spark SQL, and Airflow to process high-volume file-based datasets (CSV, Parquet, JSON). • Lead efforts to canonicalize raw healthcare data (837 claims, EHR, partner data, flat files) into internal models. • Own the full lifecycle of core pipelines — from file ingestion to validated, queryable datasets — ensuring high reliability and performance. • Onboard new customers by integrating their raw data into internal pipelines and canonical models; collaborate with SMEs, Account Managers, and Product to ensure successful implementation and troubleshooting. • Build resilient, idempotent transformation logic with data quality checks, validation layers, and observability. • Refactor and scale existing pipelines to meet growing data and business needs. • Tune Spark jobs and optimize distributed processing performance. • Implement schema enforcement and versioning aligned with internal data standards. • Collaborate deeply with Data Analysts, Data Scientists, Product Managers, Engineering, Platform, SMEs, and AMs to ensure pipelines meet evolving business needs. • Monitor pipeline health, participate in on-call rotations, and proactively debug and resolve production data flow issues. • Contribute to the evolution of our data platform — driving toward mature patterns in observability, testing, and automation. • Build and enhance streaming pipelines (Kafka, SQS, or similar) where needed to support near-real-time data needs. • Help develop and champion internal best practices around pipeline development and data modeling.


