Job Closed
This listing is no longer active.
We power professionals.
Data Engineer, Level II
Location
Colombia
Posted
43 days ago
Salary
0
Seniority
Mid Level
Job Description
Data Engineer, Level II
Propelus
• Develop and deploy efficient ETL/ELT processes to extract, transform, and load data from various sources into our cloud data warehouse. • Own the implementation of data quality checks. You will identify and resolve data discrepancies (duplicates, missing values, formatting errors) before they reach downstream stakeholders. • Monitor and optimize existing data components to improve system performance, reduce latency, and ensure high system uptime. • Partner with Data Analysts and Software Engineers to understand their data requirements and build the structural solutions they need for deep-dive reporting. • Maintain clear, concise documentation for all pipelines and processes to ensure long-term transparency and maintainability across the team. • Identify and execute opportunities to automate manual data processes, increasing the speed and reliability of our data delivery.
Job Requirements
- 2–3 years of professional experience in data engineering or a related backend data role.
- High proficiency in SQL for querying, manipulating, and transforming complex datasets.
- Proficiency in Python (preferred) or Java for building and automating data workflows.
- Solid understanding of relational databases and hands-on experience with cloud-based platforms like Snowflake, BigQuery, or AWS Redshift.
- Proficiency using Git for collaboration and maintaining code integrity within a team environment.
- A proven ability to work with Data Science, Business Intelligence, and Software Engineering teams to deliver high-quality data solutions.
- A proven ability to troubleshoot complex data failures and identify root causes efficiently.
- A deep understanding of data validation basics and a commitment to ensuring data quality (checking for duplicates, null values, or formatting errors).
- A self-motivated problem-solver who enjoys troubleshooting complex data-related issues and finding timely resolutions.
Benefits
- Awarded one of BuiltIn's 2025 Best Places to Work and honored as a Silver Stevie® Award Winner in the 2025 Stevie Awards For Great Employers.
- Professional development allowance to help you grow in the ways that mean the most to you.
- Flexibility for balancing work with the rest of life and ample PTO, including paid time off for volunteering, your birthday, and becoming a new parent.
- For US Employees: 401K with company matching, as well as financial planning education and resources.
- Employees can choose from HSA, FSA, and traditional insurance options for medical, dental, and vision coverage for themselves and dependents.
- Lifestyle Spending Account (LSA): We support personal well-being by offering an annual lifestyle spending account that you can use for what matters most to you—whether it’s a gym membership, a meditation app, WFH equipment, or fresh produce delivered to your door.
- For LATAM Employees: Your health is our top priority! We cover 100% of your health insurance premiums. Our plans include national and international coverage, so you're protected no matter where you are.
- Propelus Flex Club: Our flexible benefits platform gives you monthly points to redeem on what you need most. Plus, you'll get access to exclusive discounts just for being part of our team.
- We've got you covered with a life insurance policy, paid 100% by the company. You can also add your beneficiaries at an exclusive, discounted rate.
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
• Design, develop, and optimize data infrastructure on Databricks. • Architect pipelines using BigQuery, Google Cloud Storage, Apache Airflow, dbt, Dataflow, and Pub/Sub. • Support the development and maintenance of data platform on GCP including data warehousing in BigQuery/Databricks. • Organize data into clear layers and domain-focused Data Marts for analytics and reporting. • Assist with Terraform-based Infrastructure as Code to provision and manage cloud resources. • Build, maintain, and improve ETL/ELT pipelines using Apache Airflow. • Develop and maintain dbt transformations in BigQuery. • Support data ingestion and processing using Google Dataflow, Apache Beam, or Pub/Sub. • Monitor scheduled jobs and troubleshoot failures. • Implement and maintain data quality checks using Great Expectations or dbt tests. • Support documentation of datasets, metadata, lineage, and audit processes. • Follow security best practices including IAM, encryption, and data handling. • Partner with Analytics, Product, and Data Science teams for data support.
• At Valtech, you’ll find an environment designed for continuous learning, meaningful impact, and professional growth. • Your work will help transform industries.
• Design, build, and operate the data infrastructure that powers AI and analytics initiatives. • Build the foundational data layer for LLM applications, RAG systems, and AI-powered products alongside classic data pipelines and analytics infrastructure. • Own the full data lifecycle: from ingestion and transformation to quality, governance, and serving, with a particular focus on the emerging data patterns required by modern AI systems. • Build and maintain vector databases and RAG infrastructure, designing high-performance ETL/ELT pipelines, and ensuring data quality at every stage. • Enable AI engineers, data scientists, and business analysts to build and deploy AI-powered solutions with confidence in the underlying data. • Design and build scalable, fault-tolerant data pipelines for batch and real-time/streaming workloads; • Implement modern ELT patterns using dbt, Spark, or Dataflow for transformation within cloud data warehouses; • Build data ingestion pipelines from diverse sources: APIs, databases, SaaS platforms, file systems, event streams, and document repositories; • Implement incremental processing, CDC (Change Data Capture), and event-driven pipeline architectures for near-real-time data availability; • Design pipeline orchestration using Apache Airflow, Prefect, Dagster, or cloud-native workflow services; • Build and maintain data contracts between producers and consumers to ensure schema stability and backward compatibility. • Design, deploy, and optimize vector database infrastructure for AI applications: Pinecone, Weaviate, ChromaDB, pgvector, Qdrant, or Milvus; • Build document ingestion and processing pipelines for RAG: document parsing (PDF, DOCX, HTML, images), chunking strategies (semantic, recursive, sentence-window), and metadata enrichment; • Implement and optimize embedding generation pipelines using models from OpenAI, Cohere, Voyage AI, or open-source alternatives (BAAI/bge, Nomic); • Design hybrid search architectures combining dense vector search with sparse retrieval (BM25) and metadata filtering for optimal RAG performance; • Build and maintain knowledge base management systems: versioned document corpora, incremental indexing, and stale content detection; • Implement RAG evaluation infrastructure: retrieval accuracy metrics (MRR, NDCG, Hit Rate), context relevance scoring, and end-to-end RAG benchmarks. • Design and implement comprehensive data quality frameworks: validation rules, anomaly detection, freshness monitoring, and schema enforcement; • Build data quality pipelines using Great Expectations, Soda, dbt tests, or Monte Carlo for automated data validation at every pipeline stage; • Implement data lineage tracking and impact analysis across the data platform; • Design and enforce data governance policies: access control, data classification, PII detection and masking, and retention policies; • Build data catalogs and discovery tools that enable self-service data access for AI engineers and analysts; • Monitor and alert on data quality SLAs: completeness, accuracy, timeliness, and consistency. • Design and maintain the core data platform architecture on cloud-native services (AWS, Azure, GCP) — optimizing for cost, performance, and reliability; • Build and operate data lake/data lakehouse architectures using Delta Lake, Apache Iceberg, or Apache Hudi on cloud object storage; • Implement data warehouse solutions using Snowflake, Databricks, BigQuery, or Redshift — with proper partitioning, clustering, and materialization strategies; • Design data serving layers for diverse consumers: low-latency APIs (feature stores), analytical dashboards, AI model training, and RAG retrieval; • Implement data platform observability: pipeline monitoring, cost tracking, performance dashboards, and capacity planning; • Build self-service data infrastructure patterns that enable other teams to create and manage their own data pipelines with guardrails. • Build and maintain feature stores for ML model training and serving: offline (batch) and online (real-time) feature computation and storage; • Design data pipelines for ML workflows: training data preparation, validation sets, evaluation datasets, and model monitoring data; • Implement data versioning and reproducibility for ML experiments using DVC, LakeFS, or Delta Lake time travel; • Build feedback loop infrastructure: capturing AI model predictions, user interactions, and ground truth labels for continuous model improvement; • Design and implement data infrastructure for AI model monitoring: input drift detection, output quality monitoring, and population stability metrics.
Data Architect
InfosysFounded in 1981, Infosys is an information technology and services company providing consulting, outsourcing, technology, and next-generation services to clients in over 50 countri
**About your role** The ideal candidate will have extensive experience in designing and implementing data architectures, with a strong understanding of database management, data modelling, and data governance. This role requires a strategic thinker with strong analytical and problem-solving skills and the ability to work collaboratively with clients and cross-functional teams.




