Job Closed

This listing is no longer active.

Propelus

We power professionals.

Data Engineer, Level II

Data EngineerData EngineerFull Time Remote Mid LevelTeam 201-500Since 2001H1B No SponsorCompany Site LinkedIn

Location

Colombia

Posted

43 days ago

Salary

Seniority

Mid Level

Bachelor Degree2 yrs expEnglishAmazon Redshift AWS BigQuery Cloud ETL Java Python SQL

Job Description

• Develop and deploy efficient ETL/ELT processes to extract, transform, and load data from various sources into our cloud data warehouse. • Own the implementation of data quality checks. You will identify and resolve data discrepancies (duplicates, missing values, formatting errors) before they reach downstream stakeholders. • Monitor and optimize existing data components to improve system performance, reduce latency, and ensure high system uptime. • Partner with Data Analysts and Software Engineers to understand their data requirements and build the structural solutions they need for deep-dive reporting. • Maintain clear, concise documentation for all pipelines and processes to ensure long-term transparency and maintainability across the team. • Identify and execute opportunities to automate manual data processes, increasing the speed and reliability of our data delivery.

Job Requirements

2–3 years of professional experience in data engineering or a related backend data role.
High proficiency in SQL for querying, manipulating, and transforming complex datasets.
Proficiency in Python (preferred) or Java for building and automating data workflows.
Solid understanding of relational databases and hands-on experience with cloud-based platforms like Snowflake, BigQuery, or AWS Redshift.
Proficiency using Git for collaboration and maintaining code integrity within a team environment.
A proven ability to work with Data Science, Business Intelligence, and Software Engineering teams to deliver high-quality data solutions.
A proven ability to troubleshoot complex data failures and identify root causes efficiently.
A deep understanding of data validation basics and a commitment to ensuring data quality (checking for duplicates, null values, or formatting errors).
A self-motivated problem-solver who enjoys troubleshooting complex data-related issues and finding timely resolutions.

Benefits

Awarded one of BuiltIn's 2025 Best Places to Work and honored as a Silver Stevie® Award Winner in the 2025 Stevie Awards For Great Employers.
Professional development allowance to help you grow in the ways that mean the most to you.
Flexibility for balancing work with the rest of life and ample PTO, including paid time off for volunteering, your birthday, and becoming a new parent.
For US Employees: 401K with company matching, as well as financial planning education and resources.
Employees can choose from HSA, FSA, and traditional insurance options for medical, dental, and vision coverage for themselves and dependents.
Lifestyle Spending Account (LSA): We support personal well-being by offering an annual lifestyle spending account that you can use for what matters most to you—whether it’s a gym membership, a meditation app, WFH equipment, or fresh produce delivered to your door.
For LATAM Employees: Your health is our top priority! We cover 100% of your health insurance premiums. Our plans include national and international coverage, so you're protected no matter where you are.
Propelus Flex Club: Our flexible benefits platform gives you monthly points to redeem on what you need most. Plus, you'll get access to exclusive discounts just for being part of our team.
We've got you covered with a life insurance policy, paid 100% by the company. You can also add your beneficiaries at an exclusive, discounted rate.

Related Categories

Data Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More Data Engineer Jobs

Junior Data Engineer – Mobile Apps

Leadtech Group

Data Engineer43 days ago

Full Time RemoteTeam 201-500Since 2009H1B No Sponsor

Company Site LinkedIn

• Design, develop, and optimize data infrastructure on Databricks. • Architect pipelines using BigQuery, Google Cloud Storage, Apache Airflow, dbt, Dataflow, and Pub/Sub. • Support the development and maintenance of data platform on GCP including data warehousing in BigQuery/Databricks. • Organize data into clear layers and domain-focused Data Marts for analytics and reporting. • Assist with Terraform-based Infrastructure as Code to provision and manage cloud resources. • Build, maintain, and improve ETL/ELT pipelines using Apache Airflow. • Develop and maintain dbt transformations in BigQuery. • Support data ingestion and processing using Google Dataflow, Apache Beam, or Pub/Sub. • Monitor scheduled jobs and troubleshoot failures. • Implement and maintain data quality checks using Great Expectations or dbt tests. • Support documentation of datasets, metadata, lineage, and audit processes. • Follow security best practices including IAM, encryption, and data handling. • Partner with Analytics, Product, and Data Science teams for data support.

Airflow Apache BigQuery Cloud Docker ETL Google Cloud Platform Java Jenkins Python Scala Terraform

View details: Junior Data Engineer – Mobile Apps

Spain

Apply

Data Engineer

Valtech

The experience innovation company.

Data Engineer43 days ago

Full Time RemoteTeam 5,001-10,000Since 1997H1B Sponsor

Company Site LinkedIn

• At Valtech, you’ll find an environment designed for continuous learning, meaningful impact, and professional growth. • Your work will help transform industries.

Azure Cloud NoSQL SQL

View details: Data Engineer

Argentina

Apply

Job Closed

Senior Data Engineer

Keep IT Simple

Keeping IT Simple Since 1988.

Data Engineer43 days ago

Full Time RemoteTeam 11-50Since 1988H1B No Sponsor

Company Site LinkedIn

• Design, build, and operate the data infrastructure that powers AI and analytics initiatives. • Build the foundational data layer for LLM applications, RAG systems, and AI-powered products alongside classic data pipelines and analytics infrastructure. • Own the full data lifecycle: from ingestion and transformation to quality, governance, and serving, with a particular focus on the emerging data patterns required by modern AI systems. • Build and maintain vector databases and RAG infrastructure, designing high-performance ETL/ELT pipelines, and ensuring data quality at every stage. • Enable AI engineers, data scientists, and business analysts to build and deploy AI-powered solutions with confidence in the underlying data. • Design and build scalable, fault-tolerant data pipelines for batch and real-time/streaming workloads; • Implement modern ELT patterns using dbt, Spark, or Dataflow for transformation within cloud data warehouses; • Build data ingestion pipelines from diverse sources: APIs, databases, SaaS platforms, file systems, event streams, and document repositories; • Implement incremental processing, CDC (Change Data Capture), and event-driven pipeline architectures for near-real-time data availability; • Design pipeline orchestration using Apache Airflow, Prefect, Dagster, or cloud-native workflow services; • Build and maintain data contracts between producers and consumers to ensure schema stability and backward compatibility. • Design, deploy, and optimize vector database infrastructure for AI applications: Pinecone, Weaviate, ChromaDB, pgvector, Qdrant, or Milvus; • Build document ingestion and processing pipelines for RAG: document parsing (PDF, DOCX, HTML, images), chunking strategies (semantic, recursive, sentence-window), and metadata enrichment; • Implement and optimize embedding generation pipelines using models from OpenAI, Cohere, Voyage AI, or open-source alternatives (BAAI/bge, Nomic); • Design hybrid search architectures combining dense vector search with sparse retrieval (BM25) and metadata filtering for optimal RAG performance; • Build and maintain knowledge base management systems: versioned document corpora, incremental indexing, and stale content detection; • Implement RAG evaluation infrastructure: retrieval accuracy metrics (MRR, NDCG, Hit Rate), context relevance scoring, and end-to-end RAG benchmarks. • Design and implement comprehensive data quality frameworks: validation rules, anomaly detection, freshness monitoring, and schema enforcement; • Build data quality pipelines using Great Expectations, Soda, dbt tests, or Monte Carlo for automated data validation at every pipeline stage; • Implement data lineage tracking and impact analysis across the data platform; • Design and enforce data governance policies: access control, data classification, PII detection and masking, and retention policies; • Build data catalogs and discovery tools that enable self-service data access for AI engineers and analysts; • Monitor and alert on data quality SLAs: completeness, accuracy, timeliness, and consistency. • Design and maintain the core data platform architecture on cloud-native services (AWS, Azure, GCP) — optimizing for cost, performance, and reliability; • Build and operate data lake/data lakehouse architectures using Delta Lake, Apache Iceberg, or Apache Hudi on cloud object storage; • Implement data warehouse solutions using Snowflake, Databricks, BigQuery, or Redshift — with proper partitioning, clustering, and materialization strategies; • Design data serving layers for diverse consumers: low-latency APIs (feature stores), analytical dashboards, AI model training, and RAG retrieval; • Implement data platform observability: pipeline monitoring, cost tracking, performance dashboards, and capacity planning; • Build self-service data infrastructure patterns that enable other teams to create and manage their own data pipelines with guardrails. • Build and maintain feature stores for ML model training and serving: offline (batch) and online (real-time) feature computation and storage; • Design data pipelines for ML workflows: training data preparation, validation sets, evaluation datasets, and model monitoring data; • Implement data versioning and reproducibility for ML experiments using DVC, LakeFS, or Delta Lake time travel; • Build feedback loop infrastructure: capturing AI model predictions, user interactions, and ground truth labels for continuous model improvement; • Design and implement data infrastructure for AI model monitoring: input drift detection, output quality monitoring, and population stability metrics.

Airflow Amazon Redshift Apache AWS Azure BigQuery Cloud Docker ETL Google Cloud Platform Kafka Kubernetes Neo4j PySpark Python Spark SQL Terraform Vault

View details: Senior Data Engineer

Brazil

Apply

Data Architect

Infosys

Founded in 1981, Infosys is an information technology and services company providing consulting, outsourcing, technology, and next-generation services to clients in over 50 countri

Data Engineer43 days ago

Full Time Remote

**About your role** The ideal candidate will have extensive experience in designing and implementing data architectures, with a strong understanding of database management, data modelling, and data governance. This role requires a strategic thinker with strong analytical and problem-solving skills and the ability to work collaboratively with clients and cross-functional teams.

Airflow Apache AWS Azure Cloud ETL Hadoop Kafka MongoDB NoSQL Oracle Spark Splunk SQL

View details: Data Architect

Poland

Apply

Data Engineer, Level II

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More Data Engineer Jobs

Junior Data Engineer – Mobile Apps

Data Engineer

Senior Data Engineer

Data Architect