Apify Technologies s.r.o.

Data Engineer

Data EngineerData EngineerFull Time Remote Mid LevelTeam 51-200

Location

Worldwide

Posted

18 days ago

Salary

Seniority

Mid Level

Snowflake HubSpot Mixpanel Segment CRM dbt Tableau Data Engineering Salesforce SQL Python BigQuery Databricks Amazon Redshift Airflow Matillion ETL LLM Observability/Monitoring

Job Description

Role Description We're looking for a Data Engineer to own the integration layer between Snowflake and the operational tools that run Apify's go-to-market and product motion: HubSpot, Intercom, Mixpanel, and Segment. You'll make sure the right data lands in the right system at the right time, with the right shape, so Sales, Marketing, Customer Success, and Product teams can act on it. You'll be the 9th member of the data team - joining a mix of analytical engineers, analysts, and data scientists - at the moment Segment is being rolled out as Apify's CDP. That's yours to land end-to-end. What you'll be working on: - Own the integration domain end to end - all pipelines, transformations, and Snowflake models that connect HubSpot, Intercom, Mixpanel, and Segment to the rest of the platform, in both directions. - Design event tracking and the CDP layer with the RevOps team as Segment becomes the source of truth for behavioral data flowing into product, marketing, and CRM systems. - Build reliable, observable pipelines in Keboola and dbt - with clear data contracts, schema tests, freshness guarantees, and alerting. - Model integration data in Snowflake so HubSpot, Intercom, Mixpanel, and Segment data lands in well-defined tables that downstream consumers can trust, with documentation that analysts and scientists can actually use. - Power lifecycle automations - PQA scores back into HubSpot, behavioral campaigns in Intercom and customer.io, product usage signals - by shipping the data they depend on. - Diagnose and resolve pipeline incidents independently - trace lineage across multiple components, find root causes, fix, and write the runbook so it doesn't bite the next person. Tech stack - Snowflake - data warehouse - Keboola - extractors, writers, and orchestration - dbt - transformations on Snowflake (orchestrated by Keboola; this is where we're actively migrating existing transformation logic) - Tableau and Redash - BI - n8n - workflow automation - Segment - CDP, currently being rolled out end-to-end Qualifications - 3+ years of data engineering experience, with meaningful time spent on integrations between a cloud warehouse and operational SaaS tools (HubSpot, Salesforce, Intercom, Zendesk, Mixpanel, Amplitude, Segment, RudderStack, or similar). - Fluent in SQL (window functions, CTEs, complex multi-source joins, query optimization) and comfortable in Python for the parts a no-code tool can't handle. - Production experience with Snowflake (or BigQuery, Databricks, Redshift), and an understanding of the cost, performance, and access-control tradeoffs of a usage-based warehouse. - Experience building end-to-end pipelines combining an orchestration or ELT platform (Keboola, Fivetran, Airflow, Dagster, Prefect, Matillion) with a transformation framework like dbt. - Hands-on experience with a CDP (Segment, RudderStack, mParticle) - tracking plans, schemas, identity resolution, downstream consumers - not just installing the snippet. - You think in data contracts - schema stability, freshness SLAs, documented field definitions - and treat the boundary between your domain and downstream consumers as a first-class interface. - Comfortable with reverse ETL (Census, Keboola, or hand-rolled), and you understand what it means to write back to a CRM that humans are also editing. - Pragmatic about tooling - happy to use n8n for the right job, and equally happy to write proper code when that's the right call. - Able to explain why a dashboard moved and what it means to non-technical stakeholders in Sales, Marketing, and Customer Success, in English, both in writing and in person. Nice to have: - Experience with usage-based billing or product-led growth data models. - Exposure to LLM-assisted workflows in the data stack. - Prior experience at a SaaS company between 50 and 500 people. Expectations By the end of the first month, we expect you to: - Know the data team, the RevOps and Growth stakeholders who depend on the integration layer, and the workflows that flow through HubSpot, Intercom, Mixpanel, and Segment. - Work through the existing Keboola components and dbt models to understand what's in place, what's fragile, and where the silent failures live. - Trace a typical record from each source system through to the Snowflake tables analysts use. By the end of the first 3 months, we expect you to: - Have a complete map of the integration domain - what flows where, what's owned by whom, where the silent failures are - and a documented six-month plan for the work ahead. - Have at least one end-to-end improvement shipped with monitoring in place. - Be the go-to person on the data team for HubSpot, Intercom, Mixpanel, and Segment data questions. By the end of the first 6 months, we expect you to: - Have Segment operating as the durable CDP for Apify, with a published tracking plan and reliable event flows into Snowflake and downstream tools. - Have core tables from HubSpot, Intercom, Mixpanel, and Segment with documented data contracts - schema, freshness SLA, ownership - and tests and alerting in place. - Have driven measurable improvements in data freshness, pipeline reliability, and incident response time, tracked publicly, and shipped at least one cross-team initiative where the data integration unlocked a business outcome (conversion lift, churn reduction, ops automation). Benefits - Space, support, and autonomy for personal growth, with a direct impact on our success. - Full-time position in Prague (Lucerna Palace), Brno (Titanium), or fully remote. - Flexible working hours (perfect for both night owls 🦉 and early birds 🐥). - Nobody counts holidays as long as the work gets done 💪. - Unlimited Claude for every Apifier. We don't count tokens. Just use them well 🤖. - Stock options and profit sharing 💰. - Free Multisport card. - We welcome pets, kids, and bikes in the office. - Epic team buildings and offsites 🚢 with biking, canoeing, and other adventures 🪂. - Solid education and training budget, conference tickets, internal “Eat & Learn” sessions, and the possibility to work across teams. - Generous hardware budget 💻. - Free lunches every day when working from the office 🌮🥡. - Unlimited supply of ☕ & 🍺 and snacks. - Free entry to the wonderful Prague and Brno Zoo 🐘. - Ping-pong, chess, PS5, lightsabers, foosball league after lunch.

Related Categories

Data Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More Data Engineer Jobs

Technical Lead - Data Migration & Azure Integration

Enroute

We deliver IT services and solutions provided by a team of passionate problem solving individuals highly skilled.

Data Engineer18 days ago

Full Time RemoteTeam 51-200H1B Sponsor

Company Site LinkedIn

Role Description We are looking for a Technical Lead to guide a migration from an on-premises Oracle and Boomi environment to Azure SQL and Azure Data Factory. This role will provide technical leadership across both the database and ETL workstreams, support architecture and design decisions, provide hands-on technical guidance when needed, and ensure the team remains aligned with the project timeline, scope, and deliverables. The Technical Lead will also serve as the primary point of contact with the client for day-to-day technical coordination, helping translate client requirements into actionable work for the development team while keeping stakeholders informed of progress, risks, dependencies, and decisions. Qualifications - Strong experience leading data migration, ETL, or data platform modernization projects. - Solid understanding of relational databases, data modeling, SQL development, and data validation. - Experience with Oracle, SQL Server, and/or Azure SQL. - Experience with ETL/integration platforms such as Boomi, Azure Data Factory, SSIS, Informatica, Talend, MuleSoft, or similar tools. - Ability to guide technical architecture and design decisions across database and integration workstreams. - Strong understanding of source-to-target mapping, data profiling, migration planning, reconciliation, and cutover activities. - Ability to provide technical support to engineers and help resolve design, development, performance, and data quality issues. - Experience managing project timelines, deliverables, dependencies, risks, and technical decisions. - Strong client-facing communication skills. - Ability to coordinate with business stakeholders, architects, developers, QA resources, and project leadership. - Strong documentation skills, including technical decisions, risks, action items, and status updates. Requirements - Lead the technical execution of the migration from Oracle and Boomi to Azure SQL and Azure Data Factory. - Guide architecture and design decisions across database migration, ETL migration, integration, validation, and deployment activities. - Serve as the main technical point of contact with the client. - Translate client requirements, priorities, and constraints into actionable technical direction for the team. - Coordinate the work of the database migration and ETL/Azure Data Factory engineers. - Review and validate technical designs, source-to-target mappings, migration approaches, and implementation plans. - Provide hands-on technical guidance and troubleshooting support when needed. - Help resolve complex issues related to data conversion, pipeline design, performance, compatibility, validation, and reconciliation. - Track project timeline, deliverables, dependencies, risks, and open decisions. - Ensure the team is aligned on priorities, scope, quality expectations, and delivery commitments. - Communicate project status, risks, blockers, and key decisions to client stakeholders and internal leadership. - Support planning for testing, production cutover, rollback considerations, and post-migration stabilization. - Ensure technical documentation is created and maintained throughout the engagement. - Promote best practices for data migration, ETL design, security, performance, monitoring, and maintainability. Benefits - Monetary compensation - Year-end Bonus - IMSS, AFORE, INFONAVIT - Major Medical Expenses Insurance - Life Insurance - Funeral Expenses Coverage - TDU Membership - MediAccess - Health Check-Up Subsidy - Preferential rates for car insurance - Vacations - Official Mexican Holidays - Life Happens Days - Bereavement Leave - Civil Marriage Leave - English Classes - Certifications - Educational Agreements (Talisis, U-ERRE, UNID, TecMilenio, Tec de Monterrey, UDEM, SPIS) - Corporate Agreements & Discounts (Sorteos Tec, Envia Flores, TopGolf) - Taquitos Rewards - Birthday Bonus - Work-from-home Bonus - Laptop Policy Company Description Enroute is committed to providing equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training.

Oracle Database Azure SQL ETL Microsoft SQL Server SSIS Informatica Performance Optimization Observability/Monitoring

View details: Technical Lead - Data Migration & Azure Integration

Mexico

Apply

AI Engineer & Data Engineer

Enroute

We deliver IT services and solutions provided by a team of passionate problem solving individuals highly skilled.

Data Engineer18 days ago

Full Time RemoteTeam 51-200H1B Sponsor

Company Site LinkedIn

Role Description We are seeking a data-driven Ai Engineer to join our team at a high-growth advertising technology company. This role focuses on scaling our reporting infrastructure for advertising performance and billing reconciliation, ensuring that financial and operational data is accurate, automated, and actionable. - Develop robust data pipelines, ensuring data quality and reliability. - Enable efficient data consumption across the organization. - Collaborate closely with cross-functional teams including Product, Engineering, Analytics, and Business stakeholders to deliver high-impact data platforms. The ideal candidate is a proactive problem-solver with strong technical expertise, capable of working with large datasets, modern data architectures, and cloud-based environments. You thrive in fast-paced settings, navigate ambiguity with confidence, and are passionate about turning data into actionable value. Qualifications - Databricks & AI Architecture (Must-Have) - Strong experience working with Databricks Lakehouse architecture - Nice to have expertise in Databricks Mosaic AI and Unity Catalog for governing AI assets - Hands-on experience Building RAG (Retrieval-Augmented Generation) pipelines using Vector Search - SQL & Data Modeling (Must-Have) - Advanced SQL development - AI Engineering & Data Workflows (Must-Have) - Experience integrating LLM APIs (OpenAI, Anthropic, etc.) into data workflows - Hands-on experience using AI for: - Data enrichment - Anomaly detection - Automated classification - Experience with LangChain, LlamaIndex, or similar frameworks - Exposure to Model Context Protocol (MCP) or similar approaches to connect AI models with external tools and data sources - Strong understanding of Tool Calling / Function Calling: enabling LLMs to interact with SQL databases and external APIs securely. - Experience in Prompt Engineering and Guardrailing: designing system prompts that maintain context and hierarchy (e.g., understanding team associations). - Platform & Engineering Practices (Nice-to-Have / Medium) - Experience with GitHub workflows - Familiarity with CI/CD pipelines (Jenkins or similar) - Experience working with YAML/YML configuration files Requirements - Architect AI Agents: Build and deploy agents that can perform NLP-based data generation, automated data enrichment, and complex data reasoning within Databricks. - Natural Language Interfaces: Develop "Chat with your Data" features, allowing stakeholders to query the data warehouse using natural language. - Integrate LLMs into data workflows for automation and intelligence. - Develop scalable data models to support analytics and AI use cases. - Implement AI-driven enhancements such as anomaly detection and data enrichment. - Collaborate with data, analytics, and engineering teams to improve data reliability. - Optimize performance and scalability of data and AI workflows. - Support automation through CI/CD practices. - Ensure data quality, traceability, and maintainability across pipelines. Benefits - Monetary compensation - Year-end Bonus - IMSS, AFORE, INFONAVIT - Major Medical Expenses Insurance - Life Insurance - Funeral Expenses Coverage - TDU Membership - MediAccess - Health Check-Up Subsidy - Preferential rates for car insurance - Vacations - Official Mexican Holidays - Life Happens Days - Bereavement Leave - Civil Marriage Leave - English Classes - Certifications - Educational Agreements (Talisis, U-ERRE, UNID, TecMilenio, Tec de Monterrey, UDEM, SPIS) - Corporate Agreements & Discounts (Sorteos Tec, Envia Flores, TopGolf) - Taquitos Rewards - Birthday Bonus - Work-from-home Bonus - Laptop Policy

AI Databricks Unity SQL LLM OpenAI API LangChain LlamaIndex GitHub CI/CD Jenkins AI Agents

View details: AI Engineer & Data Engineer

Mexico

Apply

Job Closed

Senior Data Architect – Enterprise Data Strategy

Autodesk

How the world gets designed and made. #MakeAnything

Data Engineer18 days ago

Full Time RemoteTeam 10,001+Since 1982H1B No Sponsor

Company Site LinkedIn

• Establish and evolve the enterprise data architecture vision aligned to Autodesk’s business strategy • Define enterprise data domains and ownership models across operational, customer, and analytical ecosystems • Develop guiding principles and guardrails for how data is created, shared, mastered, governed, and activated • Shape architectural patterns that support data products, domain-oriented ownership, and modern data paradigms • Identify enterprise data assets that can be activated to power AI/ML, personalization, automation, and intelligent decisioning • Design architectural patterns that make high-quality, governed data accessible for AI workflows across the organization • Ensure machine learning pipelines and AI systems are grounded in authoritative enterprise data domains • Partner with AI/ML and product teams to define scalable, reusable data foundations for experimentation and production AI systems • Remove systemic barriers that limit data discoverability, accessibility, and reuse • Create and maintain high-level conceptual and logical data models spanning CRM, ERP, Finance, subscription systems, marketing platforms, support systems, and analytics environments

Cloud ERP

View details: Senior Data Architect – Enterprise Data Strategy

Washington

$143K - $256.5K / year

Apply

Job Closed

Senior Software Engineer, Data Platform

Function Health

At Function, we celebrate diversity and are committed to building a diverse and inclusive workforce. As an equal opportunity employer, we do not discriminate on the basis of race, color, gender identity, ancestry, religion, age, sexual orientation, national origin, disability, marital status, Veteran status, or any other occupationally irrelevant criteria. Join the Function Health team and become a part of our mission to build a healthier future for all. Discover more about us and how we're changing the face of healthcare at Function Health. Important Notice: Legitimate communication from the Function Health team will always come from an email address ending in @functionhealth.com. Function Health will never request personal information such as banking details or payment during the hiring process. Please be cautious of communications or job offers that come from other email domains, instant messaging platforms, or unsolicited calls. If you ever have doubts about the legitimacy of a communication, please reach out to us directly at talent@functionhealth.com.

Data Engineer18 days ago

Full Time RemoteTeam 11-50

Role Description We are seeking an experienced Senior Software Engineer, Data Platform, to contribute to the design, development, and optimization of our data infrastructure. This role requires deep expertise in GCP, data engineering, Change Data Capture (CDC), ETL, governance, and streaming technologies. You will work closely with data scientists, analysts, and software engineers to ensure seamless data ingestion, processing, and access across the organization. This is a foundational role within a growing team that requires a hands-on architect who can balance speed, excellence, and pragmatism. Key Responsibilities - Contribute to the design, development, and scaling of core data infrastructure using GCP, Spark, Databricks, and Fivetran. - Develop robust and maintainable ETL/ELT workflows that support diverse structured and unstructured data needs across the organization. - Implement and manage Change Data Capture (CDC) pipelines to enable near real-time data replication and synchronization. - Define and enforce data governance and compliance standards, including access control, auditability, lineage, and metadata management. - Build and manage streaming and batch data pipelines to serve high-impact use cases across analytics, product, compliance, and experimentation. - Act as a strategic partner to cross-functional teams (product, analytics, engineering, clinical) to ensure data is accessible, trustworthy, and impactful. - Drive the long-term architectural vision of our data platform to support current and future business and product needs. Qualifications - 5+ years of experience in software engineering, with a focus on scalable data architectures. - Strong expertise in GCP (IAM, GCS, Pub/Sub, etc.) and hands-on experience with Spark and Databricks. - Hands-on experience with CDC technologies like Fivetran, or equivalent. - Proficiency in ETL/ELT tools and frameworks (dbt, Apache Airflow, Dataform, etc.). - Deep understanding of data governance principles, including compliance and security best practices. - Demonstrated success in collaborating across functions to deliver data solutions for analytics, experimentation, or compliance. - A balance of IC execution and leadership skills; you’re equally comfortable rolling up your sleeves or mentoring others. - Familiarity with streaming data architecture, real-time ingestion, and delivery frameworks. - Proficient in SQL and Python for data processing and automation. - Strong problem-solving skills with the ability to work in a fast-paced environment. - Excellent communication and technical storytelling skills — you can align technical work with business value. Nice-to-Have Skills and Experiences - Experience with Terraform or Infrastructure-as-Code (IaC) for data infrastructure automation. - Background in HIPAA or other regulated environments with sensitivity to data privacy and compliance. - Familiarity with the dbt Semantic Layer and modern data modeling best practices. - Exposure to data observability platforms and practices. - Familiarity with machine learning data pipelines. - Exposure to multi-cloud or hybrid-cloud environments. - Experience building scalable solutions in a 0-1 environment. To be a strong fit, you also need - Bias Toward Action: Demonstrated ability to take initiative, make decisions under uncertainty, and move projects forward even in the face of ambiguity. - Entrepreneurial Spirit: Strong adaptability to changing business needs with a knack for building and optimizing processes. - Communication: Excellent communication skills, capable of explaining complex technical concepts to non-technical stakeholders. - Remote Work Adaptability: Comfort with remote work environments, demonstrating the ability to stay productive and connected with the team irrespective of physical location. - Continuous Improvement: A willingness to question assumptions and a commitment to continuous improvement. Benefits - Competitive salary. - Equity and benefits package. - Flexible, fully-remote working hours. - A dynamic work environment where creativity and innovation are encouraged.

GCP Data Engineering ETL Apache Spark Databricks Amazon IAM dbt Airflow SQL Python Terraform Infrastructure as Code Observability/Monitoring AI/ML

View details: Senior Software Engineer, Data Platform

United States

Apply

Data Engineer

Job Description

Related Guides

Related Categories

Related Job Pages

More Data Engineer Jobs

Technical Lead - Data Migration & Azure Integration

AI Engineer & Data Engineer

Senior Data Architect – Enterprise Data Strategy

Senior Software Engineer, Data Platform