Junior Data Engineer – Mobile Apps
Location
Spain
Posted
46 days ago
Salary
0
Seniority
Junior
Job Description
Junior Data Engineer – Mobile Apps
Leadtech Group
• Design, develop, and optimize data infrastructure on Databricks. • Architect pipelines using BigQuery, Google Cloud Storage, Apache Airflow, dbt, Dataflow, and Pub/Sub. • Support the development and maintenance of data platform on GCP including data warehousing in BigQuery/Databricks. • Organize data into clear layers and domain-focused Data Marts for analytics and reporting. • Assist with Terraform-based Infrastructure as Code to provision and manage cloud resources. • Build, maintain, and improve ETL/ELT pipelines using Apache Airflow. • Develop and maintain dbt transformations in BigQuery. • Support data ingestion and processing using Google Dataflow, Apache Beam, or Pub/Sub. • Monitor scheduled jobs and troubleshoot failures. • Implement and maintain data quality checks using Great Expectations or dbt tests. • Support documentation of datasets, metadata, lineage, and audit processes. • Follow security best practices including IAM, encryption, and data handling. • Partner with Analytics, Product, and Data Science teams for data support.
Job Requirements
- 1+ year of experience in data engineering or a related data role.
- Exposure to mobile, product, or marketing data is a plus.
- Basic hands-on experience with GCP services such as BigQuery and Google Cloud Storage.
- Familiarity with Apache Airflow for scheduling and orchestrating data workflows.
- Some experience with dbt or similar transformation tools.
- Exposure to Pub/Sub, Dataflow, or other batch/streaming tools is a plus.
- Understanding of Data Mart concepts and interest in Infrastructure as Code tools such as Terraform.
- Good coding skills in Python; Java or Scala is a plus.
- Ability to write scripts for automation and data processing tasks.
- Familiarity with Docker and basic container concepts.
- Exposure to CI/CD and version control workflows such as GitHub Actions, GitLab CI, Jenkins, or similar.
- Understanding of data quality principles and experience with dbt tests, Great Expectations, or similar tools is a plus.
- Basic knowledge of data governance concepts such as lineage, metadata, and access control.
- Awareness of privacy and compliance principles such as GDPR is a plus.
- General understanding of OLTP and OLAP systems.
- Clear communication skills and willingness to work closely with technical and non-technical stakeholders.
- Organized, proactive, and eager to learn.
- Strong problem-solving mindset and attention to detail.
- Interest in machine learning workflows and exposure to tools such as Vertex AI or similar ML platforms.
Benefits
- Growth and career development
- Work-Life balance
- Comprehensive benefits
- Unique Perks
- Equal Employment Opportunity Employer
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
• At Valtech, you’ll find an environment designed for continuous learning, meaningful impact, and professional growth. • Your work will help transform industries.
• Design, build, and operate the data infrastructure that powers AI and analytics initiatives. • Build the foundational data layer for LLM applications, RAG systems, and AI-powered products alongside classic data pipelines and analytics infrastructure. • Own the full data lifecycle: from ingestion and transformation to quality, governance, and serving, with a particular focus on the emerging data patterns required by modern AI systems. • Build and maintain vector databases and RAG infrastructure, designing high-performance ETL/ELT pipelines, and ensuring data quality at every stage. • Enable AI engineers, data scientists, and business analysts to build and deploy AI-powered solutions with confidence in the underlying data. • Design and build scalable, fault-tolerant data pipelines for batch and real-time/streaming workloads; • Implement modern ELT patterns using dbt, Spark, or Dataflow for transformation within cloud data warehouses; • Build data ingestion pipelines from diverse sources: APIs, databases, SaaS platforms, file systems, event streams, and document repositories; • Implement incremental processing, CDC (Change Data Capture), and event-driven pipeline architectures for near-real-time data availability; • Design pipeline orchestration using Apache Airflow, Prefect, Dagster, or cloud-native workflow services; • Build and maintain data contracts between producers and consumers to ensure schema stability and backward compatibility. • Design, deploy, and optimize vector database infrastructure for AI applications: Pinecone, Weaviate, ChromaDB, pgvector, Qdrant, or Milvus; • Build document ingestion and processing pipelines for RAG: document parsing (PDF, DOCX, HTML, images), chunking strategies (semantic, recursive, sentence-window), and metadata enrichment; • Implement and optimize embedding generation pipelines using models from OpenAI, Cohere, Voyage AI, or open-source alternatives (BAAI/bge, Nomic); • Design hybrid search architectures combining dense vector search with sparse retrieval (BM25) and metadata filtering for optimal RAG performance; • Build and maintain knowledge base management systems: versioned document corpora, incremental indexing, and stale content detection; • Implement RAG evaluation infrastructure: retrieval accuracy metrics (MRR, NDCG, Hit Rate), context relevance scoring, and end-to-end RAG benchmarks. • Design and implement comprehensive data quality frameworks: validation rules, anomaly detection, freshness monitoring, and schema enforcement; • Build data quality pipelines using Great Expectations, Soda, dbt tests, or Monte Carlo for automated data validation at every pipeline stage; • Implement data lineage tracking and impact analysis across the data platform; • Design and enforce data governance policies: access control, data classification, PII detection and masking, and retention policies; • Build data catalogs and discovery tools that enable self-service data access for AI engineers and analysts; • Monitor and alert on data quality SLAs: completeness, accuracy, timeliness, and consistency. • Design and maintain the core data platform architecture on cloud-native services (AWS, Azure, GCP) — optimizing for cost, performance, and reliability; • Build and operate data lake/data lakehouse architectures using Delta Lake, Apache Iceberg, or Apache Hudi on cloud object storage; • Implement data warehouse solutions using Snowflake, Databricks, BigQuery, or Redshift — with proper partitioning, clustering, and materialization strategies; • Design data serving layers for diverse consumers: low-latency APIs (feature stores), analytical dashboards, AI model training, and RAG retrieval; • Implement data platform observability: pipeline monitoring, cost tracking, performance dashboards, and capacity planning; • Build self-service data infrastructure patterns that enable other teams to create and manage their own data pipelines with guardrails. • Build and maintain feature stores for ML model training and serving: offline (batch) and online (real-time) feature computation and storage; • Design data pipelines for ML workflows: training data preparation, validation sets, evaluation datasets, and model monitoring data; • Implement data versioning and reproducibility for ML experiments using DVC, LakeFS, or Delta Lake time travel; • Build feedback loop infrastructure: capturing AI model predictions, user interactions, and ground truth labels for continuous model improvement; • Design and implement data infrastructure for AI model monitoring: input drift detection, output quality monitoring, and population stability metrics.
Data Architect
InfosysTransforming Enterprises To Become A Thriving Live Enterprise. AI-Powered. Digital Agility At Scale. Always-On Learning.
**About your role** The ideal candidate will have extensive experience in designing and implementing data architectures, with a strong understanding of database management, data modelling, and data governance. This role requires a strategic thinker with strong analytical and problem-solving skills and the ability to work collaboratively with clients and cross-functional teams.
• As a Senior Data Engineer, you will be a key member of our data team responsible for designing, building, and maintaining the data infrastructure and pipelines that drive our data-driven decision-making processes. • You will collaborate with cross-functional teams to ensure the availability, reliability, and accessibility of our data assets, enabling our organization to extract actionable insights and deliver high-impact solutions. • National and international expected traveling time varies according to project/client and organizational needs: 0%-15% estimated.




