Job Closed
This listing is no longer active.
At Cloudera, we believe that data can make what is impossible today, possible tomorrow.
Senior Data Architect
Location
Costa Rica
Posted
8 days ago
Salary
0
Seniority
Senior
Job Description
Senior Data Architect
Cloudera
• Design and implement scalable data warehouse and lakehouse architectures on the Cloudera platform. • Define enterprise data models, governance frameworks, security standards, and data quality practices. • Architect and optimize analytics solutions across SQL engines including Impala, Hive, and Iceberg. • Design AI-powered analytics solutions leveraging LLMs, Retrieval-Augmented Generation (RAG), vector databases (such as PostgreSQL, Qdrant, Milvus), and NLQ capabilities. • Lead the integration of AI/ML capabilities into enterprise data platforms and data pipelines. • Leverage vibe coding / AI-assisted development tools to accelerate development and improve productivity. • Build and optimize batch and near real-time data pipelines. • Collaborate with business stakeholders to translate business requirements into scalable data products and analytics solutions. • Establish best practices for performance optimization, data architecture, and AI-assisted development. • Mentor teams on modern data architecture and AI-enabled development methodologies. • Ensure data security, governance, and compliance within enterprise data platforms.
Job Requirements
- Bachelor’s degree in Computer Science or equivalent and 5-6 years of related experience; OR Master’s degree and 3-5 years of related experience; OR PhD and 0-3 years of related experience
- Deep expertise in enterprise data warehousing, lakehouse architectures, and Cloudera-based data platforms.
- Strong experience with CDP, including HDFS, Hive, Impala, Kudu, and Cloudera data ingestion and processing frameworks.
- Strong understanding of distributed data systems and Hadoop-based architectures.
- Advanced SQL skills, including performance tuning and query optimization.
- Proficiency in Python and data engineering frameworks.
- Experience with dimensional and normalized data modeling.
- Strong understanding of data governance, lineage, metadata management, and enterprise security.
- Experience implementing AI/ML, LLM, vector database, and RAG-based solutions in production environments.
- Familiarity with AI-assisted development tools (e.g., GitHub Copilot and LLM-powered workflows).
- Strong communication, stakeholder management, and problem-solving skills.
- Ability to align enterprise data architecture with business objectives in Finance, Sales, and Revenue Operations.
- Ability to bridge traditional data platforms with modern AI capabilities.
Benefits
- Generous PTO Policy
- Support work life balance with Unplugged Days
- Flexible WFH Policy
- Mental & Physical Wellness programs
- Phone and Internet Reimbursement program
- Access to Continued Career Development
- Comprehensive Benefits and Competitive Packages
- Paid Volunteer Time
- Employee Resource Groups
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Role Description The operations team is a highly strategic and analytical team that helps guide and implement strategic initiatives across the company. We are looking for a Data & Analytics specialist to connect and organize our data across the company to drive visibility into performance and strategy across our sales, marketing, product, operations, finance, and engineering efforts. The ideal candidate will have some background in data engineering. - Design, build and maintain business-critical data and distributed systems that will provide real time and reliable data to all of our go to market tools and internal users. - Connect our production backend/data to business systems including Salesforce, Marketo, Intercom, Google Analytics, Metabase, etc. This can include working with a data warehouse/data lake, organizing large scale data (we send 10 billion notifications a day), and building ETLs to business systems. - Evaluate ways to increase the efficiency of internal data flows and centralize sources of truth. - Innovate, design, and build data systems, services, and tools using GCP (Google Cloud Platform) that scale with OneSignal’s products and business requirements. - Work cross functionally including with the backend engineering team as well as business teams including operations, product, marketing, sales, customer success, support, finance, etc. - Analyze the data and create data insights and business insights to help move the business as well as assist and empower teams across the company in making data related decisions. - Build data science/machine learning models using internal and external data sources to identify potential new customers, those who are at risk of churn or those with potential upsell opportunities. - Work with Airflow, DBT, Presto, Hightouch and introduce the latest tools into our technology stack. Potentially, figure out how to incorporate artificial intelligence into our technology stack. Qualifications - 6+ years of professional experience in a technical area at a high growth startup is preferred. - Proficiency with Python and experience with DBT and Airflow is a plus. - Self driven and ability to identify problems and implement and identify solutions. - A combination of technical and business acumen. The ideal candidate would have an understanding of SaaS metrics and growth company infrastructure scaling challenges. - Strong interpersonal and communication skills and experience working cross functionally. - The ideal candidate has had experience growing and managing a smaller but high functioning team. Requirements - The New York and California base salary for this full time position is between $170,000 to $190,000. Your exact starting salary is determined by a number of factors such as your experience, skills, and qualifications. - In addition to base salary, we also offer a competitive equity program and comprehensive and inclusive benefits.
• Business Context Apoyar al equipo de Sostenibilidad de LATAM Airlines en la evolución de su plataforma de datos. El proyecto consiste en integrar y modelar distintas fuentes corporativas junto con la Base de Datos de Sostenibilidad, habilitando información confiable, trazable y escalable para análisis, reportabilidad y toma de decisiones.
About Air Space Intelligence ASI's mission-critical technology powers decision-making across aviation, defense, energy, and other critical infrastructure domains. Backed by top-tier investors including Andreessen Horowitz, Spark Capital, and Renegade Partners, ASI delivers operational decision superiority—compressing days of analysis into seconds of action. ASI is leading the way and pushing the boundaries of what’s possible. What You Will Do: You will own the reality layer that powers ASI's defense products. You will operate, improve, and expand the data flows that move operational data from mission systems into usable, trusted datasets for decision-making. This role is equal parts data engineering, data forensics, and technical communication. You will dig into real-world data, determine what it actually means, and translate findings into clear, actionable pipelines ready to drive mission-critical decisions for end users. What We Value: - Strong fluency with modern data engineering tooling and patterns: streaming and batch pipelines, schema evolution, data contracts, and lineage. - Demonstrated ability to debug data: profiling, anomaly detection, reconciling sources, and separating signal from noise while cross-validating. - Strong technical communication skills: you can explain what the data is doing, what it means, what is broken, why it matters, and what engineering should change. - Comfort with distributed messaging and processing (Kafka, Flink, Spark, or equivalents) and modern orchestration (Airflow, Dagster, Temporal, or similar). - Strong grasp of API design and integration patterns (REST, gRPC, GraphQL) and experience working across a range of data formats and wire-level protocols (JSON, XML, Protobuf, and binary protocols like JREAP-C or CMF-B). - Working knowledge of modern network protocols, firewalls, system level connections, cross-system authentication, and an enthusiasm to get data flowing. - Familiarity with defense operational data and mission systems, with an appreciation for delayed reporting, inconsistent identifiers, changing semantics, and obscure edge cases. - Comfort operating in classified network environments (e.g., SIPR, JWICS) and working within accreditation boundaries (IL5/IL6, ATO processes). - Experience deploying data infrastructure on Kubernetes and across hybrid cloud and on-prem environments. - A bias for action and distinct aptitude for problem solving in ambiguous environments. - Active SECRET or TOP SECRET U.S. Security Clearance. How We Hire: We look at the interview process not as a screening or test, but rather as an opportunity to simulate what it would look like working together. We build the interview process around you.
SAP Data Engineer – Freelancer
MactoresMactores is a trusted leader among businesses in providing modern data platform solutions.
• Build extraction pipelines from SAP HANA to AWS S3 using SLT, ODP, CDS views, SDI, and native HANA SQLScript, picking the right tool per source and per latency requirement. • Model raw SAP tables across FI/CO, MM, SD, and adjacent modules into clean, semantically meaningful datasets that the downstream Spark layer and business users can actually use. • Design and operate delta and CDC patterns so incremental loads stay correct, idempotent, and replayable. • Write ABAP extractors where standard SAP tooling falls short, and document them so future engineers can change them safely. • Own the write-back path: load curated data from S3 into SAP BW / BW4HANA and model it for end-user reporting and analytical querying. • Land data in S3 as Parquet with sane partitioning, schemas, and IAM scoping, and define the contract with the PySpark engineer at the ingestion-to-transformation boundary. • Embed with a customer team, ship the pipeline to production, and stay close enough through cutover to know it actually runs.



