We provide the best software engineering solutions by investing in our people first.
Senior Data Engineer – AWS, RAG Pipelines
Location
Colombia
Posted
2 days ago
Salary
0
Seniority
Senior
Job Description
Senior Data Engineer – AWS, RAG Pipelines
Jalasoft
• Design and operate the cloud data infrastructure powering AI initiatives. • Architect production-scale data lakes on AWS. • Build real-time ingestion and observability pipelines. • Own the vector search and embedding layers that feed RAG systems and autonomous agents.
Job Requirements
- Overall Experience: 7+ years in Data Engineering, Distributed Systems, or Data Architecture
- AWS & Infrastructure: 4+ years architecting production-scale data lakes, storage tiers, and event streaming
- AI/LLM Pipelines: 2+ years building RAG systems, managing embeddings, and orchestrating foundational models
- Proficiency in AWS Data Lake Architecture & Storage
- Proficiency in Real-Time Observability & Log Analytics
- Proficiency in Elasticsearch & OpenSearch Optimization, Vectorization, Embeddings
- Proficiency in Amazon Bedrock & Generative AI Pipelines
- Proficiency in Software Engineering & API Ingestion
- Production-level proficiency in one or more of: C# (.NET Core), Java, Python, or Node.js
- AWS S3 partitioning strategies, lifecycle policies, and columnar formats (Parquet, Iceberg)
- AWS Glue Data Catalog and Lake Formation for multi-tenant, fine-grained access control
- Query optimization over petabyte-scale datasets using Amazon Athena and Redshift Spectrum
- Distributed oTel collector configuration for log, trace, and metrics capture and routing into S3
- High-volume streaming of system logs, Datadog captures, and raw server events into S3
- Real-time CDC from PostgreSQL using Debezium or AWS DMS
- Amazon OpenSearch clusters with simultaneous lexical and high-dimensional vector search
- OpenSearch index lifecycle management, sharding strategies, and dynamic mappings at scale
- Amazon Bedrock foundational model APIs (Claude, Titan) for data enrichment, classification, and semantic parsing
- Knowledge Bases for Amazon Bedrock for automatic chunking, metadata extraction, and vector index syncs from S3
- ETL/ELT pipelines ingesting unstructured event data from SaaS APIs (e.g., Pendo, Hotjar, Google Analytics)
- MCP server development to expose data lake context and utilities to AI agents
Benefits
- Remote work.
- 13 floating holiday.
- 15 vacation days per year completed.
- Good working environment.
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Architect, Data Engineer
QuantiphiPioneering AI-first solutions, solving complex business challenges through expertise, cloud, data engineering, and AI.
• Lead the architectural vision for a next-generation data layer designed specifically for Agentic AI. • Design the end-to-end blueprint for a modern data layer that seamlessly integrates structured, unstructured, and relational (Graph) data for AI agents. • Oversee the health, security, and performance optimization of our data clusters (Snowflake/Kinetica), ensuring 99.9% availability for mission-critical AI workflows. • Act as the 'Face of Engineering' for the customer. Lead discovery workshops, manage technical expectations, and align the architectural roadmap with their business objectives. • Establish benchmarks for data latency and retrieval accuracy, ensuring the data layer can keep pace with the real-time demands of agentic execution.
Role Description - Data Analysis & Insight Generation: - Analyze large and complex datasets to extract meaningful insights that drive business outcomes. - Communicate findings and recommendations through reports, dashboards, and presentations. - Data Engineering & Preparation: - Clean, preprocess, and transform raw data for analysis and modeling. - Collaborate with data engineering teams to ensure data availability and quality. - Collaboration with Stakeholders: - Work closely with product managers, engineers, and business leaders to understand requirements and deliver data-driven solutions. - Translate business problems into analytical frameworks. - A/B Testing & Experimentation: - Design and analyze A/B tests to measure the impact of product changes and marketing campaigns. - Provide statistical rigor in experimentation and decision-making. - Research & Innovation: - Stay up-to-date with the latest developments in data science, machine learning, and AI. - Propose innovative approaches and solutions for complex problems. - Other duties as assigned Qualifications - Bachelor’s or Master’s degree in Computer Science, Statistics, Mathematics, Data Science, or a related field. - 5+ years of experience in data science or a related field. - Hands-on experience with data analysis, machine learning, and statistical modeling. - Proficiency in Python, R or similar technologies for data analysis and modeling. - Strong experience with data manipulation libraries (e.g., Pandas, NumPy) and machine learning libraries (e.g., Scikit-Learn, TensorFlow, PyTorch). - SQL proficiency for data extraction and transformation. - Knowledge of cloud platforms (e.g., AWS, Azure, Google Cloud) and big data technologies (e.g., Spark, Hadoop) is a plus. Benefits - Medical, Dental & Vision Benefits - 401(k) Savings Plan with Company Match - Flexible Planned Paid Time Off - Generous Sick Leave - Inclusive & Welcoming Environment - Purpose-Driven Culture - Work-Life Balance - Commitment to Community Involvement - Employer-Paid Parental Leave - Employer-Paid Short-Term Disability - Remote Work Flexibility
• Senior Data Engineers for various and unanticipated worksites throughout the U.S. (HQ: Chicago, IL). • Develop large scale end to end data pipeline applications, covering multiple data sources spread across data center and AWS cloud. • Use developed software applications to locate and analyze source data; create data flows to extract, profile, and store ingested data; define and build data cleansing and imputation; map to a common data model; transform to satisfy business rules and statistical computations; and validate data content. • Produce software data building blocks, data models, and data flows, such as dimensional data, data feeds, dashboard reporting, and data science research and exploration. • Produce automated software tests of data flow components and for data content quality. • Automate orchestration and error handling for use by production operation teams. • Provide technical expertise to diagnose errors from production support teams. • Guide junior team members in performance tuning applications in distributed computing environments. • Perform root cause analysis on all data and processes and identify opportunities for improvement. • Develop metadata-driven and fully parameterized data processing tools. • Mentor junior engineers.
• Design and develop complex Power BI semantic models and scalable reporting solutions. • Write advanced SQL (including Databricks SQL) and DAX to implement complex business logic. • Architect and maintain shared semantic models and datasets for analytics across reporting solutions. • Diagnose and resolve performance issues across Databricks and Power BI. • Collaborate with data engineering teams to define and consume curated Gold-layer datasets. • Refactor existing reports to transition to governed semantic models built on Databricks-backed data products. • Implement dataset governance practices including certification, documentation, and metric standardization. • Develop and validate data quality checks across Silver and Gold layers. • Design and implement automated analytical workflows integrating Power BI, Python, and the Power Platform. • Build forecasting, trend analysis, and statistical models supporting advanced analytics use cases. • Perform code reviews and provide technical guidance to Associate developers.




