AI Transformation, today.
Senior Data Engineer
Location
Mexico
Posted
168 days ago
Salary
0
Seniority
Senior
Job Description
Senior Data Engineer
Arkham Technologies
• Own the high-performance Data Platform based on Lakehouse architecture. • Work with Apache Spark, Trino, and Delta Lake. • Ensure data governance and interoperability across platforms. • Shape data infrastructure across the entire data lifecycle—from ingestion to transformation and activation.
Job Requirements
- 5+ years in data engineering, data architecture, or a related field.
- Proficiency in Apache Spark, Delta Lake, and Trino.
- Strong experience with Python for scripting and automation.
- Hands-on experience with AWS services, including Glue, S3, and EMR.
- Understanding of distributed data systems and query engines.
- Excellent analytical and debugging skills.
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
• Design, develop, and maintain scalable batch and streaming data pipelines using Apache Spark and cloud-native services (for example AWS Glue, EMR, Kinesis, and Lambda). • Utilize and optimize Apache Spark (RDDs, DataFrames, Spark SQL) for distributed processing of large datasets, including both batch and near real‑time use cases. • Implement robust ETL/ELT processes to ingest and transform data from databases, APIs, files, and event streams into curated datasets stored in S3 data lakes, data warehouses (such as Amazon Redshift), and data marts. • Implement data quality checks, validation rules, and governance controls (including schema enforcement, profiling, and reconciliation) to ensure accuracy, completeness, and consistency. • Develop and maintain logical and physical data models, schemas, and metadata in catalogs to support analytics, BI, and ML consumption. • Create and manage data warehouses, data lakes, and data marts on AWS and other cloud platforms (such as Azure or GCP) following modern architectural patterns. • Collaborate with data analysts, data scientists, and business stakeholders to understand data requirements and translate them into scalable pipeline and modeling solutions. • Collaborate with DevOps, platform, security, and compliance teams to ensure secure, reliable cloud implementations and adherence to organizational standards. • Develop cloud and data architecture documentation, including diagrams, guidelines, and best practices, to enable knowledge sharing and reuse. • Troubleshoot and resolve data pipeline and job issues across development and production environments, ensuring minimal downtime and preserving data integrity. • Continuously optimize data pipelines for performance, cost, reliability, and data quality using best practices in distributed data engineering and cloud resource tuning. • Build algorithms and prototypes that combine and reconcile raw information from multiple sources, including resolving data conflicts and inconsistencies. • Provide technical leadership for the analytics data stack, including reviewing designs, establishing standards for observability and reliability, and guiding junior engineers in delivering high-quality solutions. • Define and manage data and cloud infrastructure using infrastructure‑as‑code tools such as Terraform (and/or AWS CDK/CloudFormation) to ensure consistent, repeatable environments across development, test, and production. • Participate actively in agile ceremonies (backlog refinement, sprint planning, daily stand‑ups, reviews), including estimating and updating user stories, tracking progress, and collaborating closely with data product and analytics stakeholders.
• Design, build, and optimize data pipelines and workflows in Azure and Databricks, including Data Lake and SQL Database integrations. • Implement scalable ETL/ELT frameworks using Azure Data Factory, Databricks, and Spark. • Optimize data structures and queries for performance, reliability, and cost efficiency. • Drive data quality and governance initiatives, including metadata management and validation frameworks. • Collaborate with cross-functional teams to define and implement data models aligned with business and analytical requirements. • Maintain clear documentation and enforce engineering best practices for reproducibility and maintainability. • Ensure adherence to security, compliance, and data privacy standards. • Mentor junior engineers and contribute to establishing engineering best practices. • Support CI/CD pipeline development for data workflows using GitLab or Azure DevOps. • Partner with data consumers to publish curated datasets into reporting tools such as Power BI.
Bolsista Doutor – Engenharia de Dados, IA Generativa, Multiagent, ETL
Sistema FibraPelo Futuro da Indústria | Pelo Futuro do Trabalho
• Arquitetura multiagentes para formação de Squad de Dados & Analytics; • Metodologia de validação padronizada para LLMs com tarefas complexas; • Metodologia de desenvolvimento de software com IAs e LLMs; • Impacto do uso de multiagentes de Dados & Analytics na produtividade, estudo de caso na área de investimentos.
Bolsista Doutor – Engenharia de Dados, IA Generativa, Multiagent, ETL
Sistema FibraPelo Futuro da Indústria | Pelo Futuro do Trabalho
• Arquitetura multiagentes para formação de Squad de Dados & Analytics • Metodologia de validação padronizada para LLMs com tarefas complexas • Metodologia de desenvolvimento de software com IAs e LLMs • Impacto do uso de multiagentes de Dados & Analytics na produtividade, estudo de caso na área de investimentos.



