Junior Data Engineer – Mobile Apps

Data EngineerData EngineerFull TimeRemoteJuniorTeam 201-500Since 2009H1B No SponsorCompany SiteLinkedIn

Location

Spain

Posted

46 days ago

Salary

0

Seniority

Junior

Job Description

Junior Data Engineer – Mobile Apps

Leadtech Group

• Design, develop, and optimize data infrastructure on Databricks. • Architect pipelines using BigQuery, Google Cloud Storage, Apache Airflow, dbt, Dataflow, and Pub/Sub. • Support the development and maintenance of data platform on GCP including data warehousing in BigQuery/Databricks. • Organize data into clear layers and domain-focused Data Marts for analytics and reporting. • Assist with Terraform-based Infrastructure as Code to provision and manage cloud resources. • Build, maintain, and improve ETL/ELT pipelines using Apache Airflow. • Develop and maintain dbt transformations in BigQuery. • Support data ingestion and processing using Google Dataflow, Apache Beam, or Pub/Sub. • Monitor scheduled jobs and troubleshoot failures. • Implement and maintain data quality checks using Great Expectations or dbt tests. • Support documentation of datasets, metadata, lineage, and audit processes. • Follow security best practices including IAM, encryption, and data handling. • Partner with Analytics, Product, and Data Science teams for data support.

Job Requirements

  • 1+ year of experience in data engineering or a related data role.
  • Exposure to mobile, product, or marketing data is a plus.
  • Basic hands-on experience with GCP services such as BigQuery and Google Cloud Storage.
  • Familiarity with Apache Airflow for scheduling and orchestrating data workflows.
  • Some experience with dbt or similar transformation tools.
  • Exposure to Pub/Sub, Dataflow, or other batch/streaming tools is a plus.
  • Understanding of Data Mart concepts and interest in Infrastructure as Code tools such as Terraform.
  • Good coding skills in Python; Java or Scala is a plus.
  • Ability to write scripts for automation and data processing tasks.
  • Familiarity with Docker and basic container concepts.
  • Exposure to CI/CD and version control workflows such as GitHub Actions, GitLab CI, Jenkins, or similar.
  • Understanding of data quality principles and experience with dbt tests, Great Expectations, or similar tools is a plus.
  • Basic knowledge of data governance concepts such as lineage, metadata, and access control.
  • Awareness of privacy and compliance principles such as GDPR is a plus.
  • General understanding of OLTP and OLAP systems.
  • Clear communication skills and willingness to work closely with technical and non-technical stakeholders.
  • Organized, proactive, and eager to learn.
  • Strong problem-solving mindset and attention to detail.
  • Interest in machine learning workflows and exposure to tools such as Vertex AI or similar ML platforms.

Benefits

  • Growth and career development
  • Work-Life balance
  • Comprehensive benefits
  • Unique Perks
  • Equal Employment Opportunity Employer

Related Categories

Related Job Pages

More Data Engineer Jobs

Valtech logo

Data Engineer

Valtech

The experience innovation company.

Data Engineer47 days ago
Full TimeRemoteTeam 5,001-10,000Since 1997H1B Sponsor

• At Valtech, you’ll find an environment designed for continuous learning, meaningful impact, and professional growth. • Your work will help transform industries.

Argentina
Job Closed
Keep IT Simple logo

Senior Data Engineer

Keep IT Simple

Keeping IT Simple Since 1988.

Data Engineer47 days ago
Full TimeRemoteTeam 11-50Since 1988H1B No Sponsor

• Design, build, and operate the data infrastructure that powers AI and analytics initiatives. • Build the foundational data layer for LLM applications, RAG systems, and AI-powered products alongside classic data pipelines and analytics infrastructure. • Own the full data lifecycle: from ingestion and transformation to quality, governance, and serving, with a particular focus on the emerging data patterns required by modern AI systems. • Build and maintain vector databases and RAG infrastructure, designing high-performance ETL/ELT pipelines, and ensuring data quality at every stage. • Enable AI engineers, data scientists, and business analysts to build and deploy AI-powered solutions with confidence in the underlying data. • Design and build scalable, fault-tolerant data pipelines for batch and real-time/streaming workloads; • Implement modern ELT patterns using dbt, Spark, or Dataflow for transformation within cloud data warehouses; • Build data ingestion pipelines from diverse sources: APIs, databases, SaaS platforms, file systems, event streams, and document repositories; • Implement incremental processing, CDC (Change Data Capture), and event-driven pipeline architectures for near-real-time data availability; • Design pipeline orchestration using Apache Airflow, Prefect, Dagster, or cloud-native workflow services; • Build and maintain data contracts between producers and consumers to ensure schema stability and backward compatibility. • Design, deploy, and optimize vector database infrastructure for AI applications: Pinecone, Weaviate, ChromaDB, pgvector, Qdrant, or Milvus; • Build document ingestion and processing pipelines for RAG: document parsing (PDF, DOCX, HTML, images), chunking strategies (semantic, recursive, sentence-window), and metadata enrichment; • Implement and optimize embedding generation pipelines using models from OpenAI, Cohere, Voyage AI, or open-source alternatives (BAAI/bge, Nomic); • Design hybrid search architectures combining dense vector search with sparse retrieval (BM25) and metadata filtering for optimal RAG performance; • Build and maintain knowledge base management systems: versioned document corpora, incremental indexing, and stale content detection; • Implement RAG evaluation infrastructure: retrieval accuracy metrics (MRR, NDCG, Hit Rate), context relevance scoring, and end-to-end RAG benchmarks. • Design and implement comprehensive data quality frameworks: validation rules, anomaly detection, freshness monitoring, and schema enforcement; • Build data quality pipelines using Great Expectations, Soda, dbt tests, or Monte Carlo for automated data validation at every pipeline stage; • Implement data lineage tracking and impact analysis across the data platform; • Design and enforce data governance policies: access control, data classification, PII detection and masking, and retention policies; • Build data catalogs and discovery tools that enable self-service data access for AI engineers and analysts; • Monitor and alert on data quality SLAs: completeness, accuracy, timeliness, and consistency. • Design and maintain the core data platform architecture on cloud-native services (AWS, Azure, GCP) — optimizing for cost, performance, and reliability; • Build and operate data lake/data lakehouse architectures using Delta Lake, Apache Iceberg, or Apache Hudi on cloud object storage; • Implement data warehouse solutions using Snowflake, Databricks, BigQuery, or Redshift — with proper partitioning, clustering, and materialization strategies; • Design data serving layers for diverse consumers: low-latency APIs (feature stores), analytical dashboards, AI model training, and RAG retrieval; • Implement data platform observability: pipeline monitoring, cost tracking, performance dashboards, and capacity planning; • Build self-service data infrastructure patterns that enable other teams to create and manage their own data pipelines with guardrails. • Build and maintain feature stores for ML model training and serving: offline (batch) and online (real-time) feature computation and storage; • Design data pipelines for ML workflows: training data preparation, validation sets, evaluation datasets, and model monitoring data; • Implement data versioning and reproducibility for ML experiments using DVC, LakeFS, or Delta Lake time travel; • Build feedback loop infrastructure: capturing AI model predictions, user interactions, and ground truth labels for continuous model improvement; • Design and implement data infrastructure for AI model monitoring: input drift detection, output quality monitoring, and population stability metrics.

Brazil
Infosys logo

Data Architect

Infosys

Transforming Enterprises To Become A Thriving Live Enterprise. AI-Powered. Digital Agility At Scale. Always-On Learning.

Data Engineer47 days ago
Full TimeRemoteTeam 10,001+Since 1981H1B Sponsor

**About your role** The ideal candidate will have extensive experience in designing and implementing data architectures, with a strong understanding of database management, data modelling, and data governance. This role requires a strategic thinker with strong analytical and problem-solving skills and the ability to work collaboratively with clients and cross-functional teams.

Poland
Mindera logo

Senior Data Engineer

Mindera

We craft software with people we love.

Data Engineer47 days ago
Full TimeRemoteTeam 1,001-5,000Since 2014H1B Sponsor

• As a Senior Data Engineer, you will be a key member of our data team responsible for designing, building, and maintaining the data infrastructure and pipelines that drive our data-driven decision-making processes. • You will collaborate with cross-functional teams to ensure the availability, reliability, and accessibility of our data assets, enabling our organization to extract actionable insights and deliver high-impact solutions. • National and international expected traveling time varies according to project/client and organizational needs: 0%-15% estimated.

Morocco