Data Engineer (Databricks), Czech based
Location
Czechia
Posted
67 days ago
Salary
0
Seniority
Mid Level
Job Description
Data Engineer (Databricks), Czech based
Whirr Crew
We are looking for a Data Engineer (Databricks) to support the preparation phase for AI implementation by ensuring data is well-structured, reliable, and ready for advanced use cases. The ideal candidate acts as a bridge between business and technology, combining strong data understanding with hands-on technical expertise. Start Date: ASAP Location: Remote in Czechia Language: English, Czech Contract Type: B2B Responsibilities: - Collaborate with business and technical stakeholders to understand data needs and requirements - Utilize Databricks for data processing and engineering tasks - Prepare, structure, and ensure high quality of data for AI implementation - Work hands-on with data, transforming and organizing it for further usage - Act as a bridge between business and technology, translating requirements into data solutions Requirements: - Proven experience as a Data Engineer or in a similar role - Hands-on experience with Databricks - Strong understanding of data and ability to work with it hands-on - Experience working closely with both business and technical stakeholders - Ability to bridge business needs with technical implementation
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Description About the Role At NinjaOne, we're looking for a skilled Data Engineer to join our team and help drive the future of our data infrastructure. You'll play a critical role in building, maintaining, and scaling our systems to ensure smooth data flow, accuracy, and security across the organization. This is an exciting opportunity to work on innovative projects, collaborate with cross-functional teams, and help shape how we leverage data to fuel growth, optimize products, and drive business decisions. Location - We are flexible on remote working from home, if you are located in the USA and reside in one of the following states - CA , CO , CT , FL , GA , *IL , KS , MA , MD, ME , NJ , NC , NY , OR , TN , TX , VA , and WA . We have physical offices in Austin, TX and Tampa, FL, if you prefer a hybrid option. We hire the best software engineers, but experience in our stack can't hurt: NinjaOne is built on Java , Kotlin , C++ , Golang and Postgres ; supporting millions of user endpoints and running as a scalable cloud service in AWS . Knowing large-scale datastore bottlenecks, asynchronous application design and client-server architecture will help you. What You'll be Doing - Data Pipeline Development: Design and implement scalable data pipelines that move and transform large volumes of data from multiple sources to central data warehouses , transforming data to enable business reporting and advanced analytics . - Database Management: Manage and optimize the performance of relational databases, ensuring data availability, reliability, and consistency. - Automation & Optimization: Automate and optimize data workflows to reduce manual processes and improve efficiency in data collection, storage, and processing. - Monitoring & Maintenance: Ensure the integrity and security of data across systems, monitor performance, and troubleshoot any issues that arise within the data pipeline. - Data Visualization: Build d ashboards and reports in Tableau and Databricks to expose key data points and trends to business stakeholders . - Collaboration: Work closely with data scientists, analysts, and other teams to gather requirements, understand data needs, and provide solutions that support data-driven decision-making. - Other duties as needed . About You - Bachelor's degree in Computer Science , Computer Engineering, Information Technology or equivalent work experience preferred. - 3 + years of experience in software development, with a strong focus on data engineering and data science. - E xperience in building data pipelines and managing large-scale data systems using technologies like SQL and Python - P roficiency in cloud platforms like AWS, GCP, or Azure, and experience with tools like Airflow , Kafka or dbt for orchestrating data workflows. - Experience with both relational databases inc luding MySQL, PostgreSQL and NoSQL databases like MongoDB, Cassandra. - E xperience with data warehousing concepts and tools suc h as Redshift, BigQuery , Snowflake. About Us NinjaOne automates the hardest parts of IT to deliver visibility, security, and control over all endpoints for more than 20,000 customers. The NinjaOne automated endpoint management platform is proven to increase productivity, reduce security risk, and lower costs for IT teams and managed service providers. NinjaOne is obsessed with customer success and provides free and unlimited onboarding, training, and support. NinjaOne is #1 on G2 in endpoint management, patch management, remote monitoring and management, and mobile device management. What You'll Love We are a collaborative, kind, and curious community. We honor your flexibility needs with full-time work that is hybrid remote. We have you covered with our comprehensive benefits package, which includes medical, dental, and vision insurance. We help you prepare for your financial future with our 401(k) plan. We prioritize your work-life balance with our unlimited PTO. We reward your work with opportunity for growth and advancement. Additional Information This position is NOT eligible for Visa sponsorship. Due to federal government security requirements associated with our FedRAMP-authorized environment, candidates must be U.S. citizens or lawful permanent residents. *Due to operational policies, NinjaOne is unable to hire for this role within the city limits of Chicago. We will consider all qualified candidates who reside outside of the city proper or are willing to self- relocate. Starting pay for the successful applicant depends on a variety of job-related factors, including but not limited to location, market demands, experience, job-related knowledge, and skills. The benefits available for this position include medical, dental, vision, 401(k) plan, life insurance coverage and PTO. For roles based in California, Colorado, Maryland, New Jersey, or Washington the base salary hiring range for this position is $ 90 ,000 to $ 1 7 0,000 per year. For roles based in New York, the base salary hiring range for this position is $ 9 0,000 to $ 1 7 0,000 per year. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, genetic information, marital status, veteran status, or any other status protected by applicable law. We are committed to providing an inclusive and diverse work environment. #LI- KS 2 #LI-Remote #BI-Remote #BI-Hybrid
Role Description We are looking for a Backend AI & Data Pipeline Engineer to own the end-to-end data processing infrastructure that powers Yuzee's intelligent course and job matching platform. You will design and maintain scalable, event-driven pipelines that process tens of thousands of daily records, generate semantic embeddings, and feed a growing knowledge graph used for personalised career pathway recommendations. What you'll do - Design and maintain three distinct processing pipelines — scheduled job ingestion, event-driven course processing, and a periodic knowledge graph builder — each with independent trigger logic and cost controls. - Generate and manage semantic embeddings via Amazon Bedrock (Titan v2), index them in MongoDB Atlas Vector Search, and calibrate similarity thresholds to ensure match accuracy. - Build and maintain a knowledge graph linking jobs, courses, skills, and industries using FP-Growth association rules and archetype-to-SOC code mapping. - Build and improve a two-stage discovery and matching API on AWS Lambda — vector retrieval first, then deep eligibility scoring with LLM re-ranking. - Right-size Fargate Spot instances and design resumable processing loops that tolerate interruption, keeping infrastructure costs under control as data volume scales. - Maintain and improve daily job scrapers across multiple sources and build institution data scrapers with robust HTML cleaning pipelines. Qualifications - 1+ years of backend engineering experience focused on data pipelines, ML infrastructure, or search systems. - Hands-on experience with AWS serverless and container services — Lambda, ECS Fargate, EventBridge, and Step Functions. - Strong Python skills — Pandas, async processing, bulk database operations, and text cleaning. - Familiarity with vector databases and semantic similarity search; MongoDB Atlas Vector Search experience is a strong plus. - Cost-conscious infrastructure mindset — you think in per-record compute costs, free tiers, Spot resilience, and right-sizing. - Ability to document and communicate complex architecture clearly to both technical and non-technical stakeholders. Requirements - Degree or existing proven experience. Benefits - You can work from home for the whole internship period. - A reference letter can be requested upon completion of internship. - A bit of flexibility with working time aside from the usual 9am to 6pm (Ex. 8am to 5pm / 7:30am to 4:30pm). - The possibility of retainment for part-time or full-time work post-internship based on your performance, even if you are not based in Malaysia.
Role Description At Ilant Health, data is the cornerstone of our mission. It drives our clinical precision, shapes our business strategy, and provides the measurable ROI necessary to expand access for employers and health plans. We are looking for a Lead Data Engineer to architect the "source of truth" that powers our value-based care models for obesity and cardiometabolic health. In this role, you will not just build pipelines; you will be the architect of our data platform. You will own the ingestion of complex healthcare datasets (claims, eligibility, clinical labs), the design of our "Single Patient View," and the creation of next-generation internal tools that allow non-technical stakeholders to query our data using natural language. Key Responsibilities - Data Architecture and Strategy (The “Blueprint”) - Design the "Single Patient View": Architect a unified data model that stitches together fragmented data sources (e.g., linking a pharmacy claim for Wegovy, a clinical lab result for HbA1c, and user engagement metrics from the Ilant app into a cohesive longitudinal record). - Scalability Planning: Design a cloud-native infrastructure (likely Snowflake/AWS) capable of handling 100x Member growth without requiring a total refactor. - Buy vs. Build Decisions: Evaluate and select the right tooling for ingestion (e.g., Fivetran vs. custom Python) and orchestration (e.g., Airflow vs. Dagster) to maintain low engineering overhead while maximizing output. - Conversational Intelligence Layer (GenAI/LLM): Architect and implement a "Text-to-Data" interface (leveraging LLMs/RAG) that allows business decision-makers to interact with our data via prompts (e.g., similar to Gemini/ChatGPT). - Pipeline Engineering (The “Plumbing”) - Data Consumption Layer: Ensure the reliability and low-latency availability of the data assets (dbt models, feature stores) consumed by the Data Science and Analytics teams, guaranteeing they always have fresh, trustworthy data for modeling and reporting. - External Data Integration (Primary Mandate): Own the end-to-end reliability of mission-critical external files. You are responsible for the system that ingests, validates, and standardizes these files from payers/employers. - Claims Ingestion Engine: Build robust, fault-tolerant pipelines to handle the notoriously messy formats of payer data (EDI 837/835, raw CSVs, JSON) and standardize them into a clean, queryable schema. - dbt Model Ownership: Oversee the transformation layer (using dbt), creating a "Gold" layer of data that is business-ready for analysts, product features, and the conversational AI layer. - Data Quality and Trust (The “Guardrails”) - Pipeline Reliability & Operational Uptime: You own the "uptime" of our data platform. Ensure all scheduled ingestion and transformation jobs run successfully and on time. You are the first line of defense when a pipeline fails, leading the root cause analysis (RCA) and resolution to minimize downtime. - Automated Testing & Observability: Implement "Data Observability" tools (e.g., Great Expectations, Monte Carlo, or custom equivalents) to catch issues before they hit the dashboard (e.g., Configure alerts to trigger if an eligibility file arrives with 50% fewer records than the previous month). - Governance & Compliance: Act as the technical custodian of HIPAA compliance. Ensure all PII/PHI is encrypted, masked, and accessed only via strict Role-Based Access Controls (RBAC). - Master Data Management (MDM): Implement identity resolution logic to handle conflicts across sources (e.g., ensuring "Jane Doe" in a Cigna claims file is correctly matched to "Jane Doe" in the Ilant app database). - Leadership and Collaboration - Partner with Product: Work directly with the CPO and Product Managers to assess the technical feasibility of new features (e.g., "Can we accurately calculate 'time to goal weight' given the current data latency?"). - Partner with Data Science: Collaborate to productionize predictive models (e.g., patient risk stratification, weight loss trajectory). You will build the MLOps infrastructure that takes a model from a Jupyter notebook to a scalable, real-time inference API within our product. Qualifications - Experience: 7+ years in Data Engineering, with at least 3+ years in a Lead or Architectural role. - Strategic Maturity: Demonstrated ability to make high-stakes "Buy vs. Build" decisions and architect systems for 10x scale, prioritizing long-term stability and maintainability over short-term patches. - Healthcare Native: Deep familiarity with healthcare data standards (HL7, FHIR, ICD-10, CPT, NDC) and the specific challenges of claims/eligibility ingestion. - GenAI/LLM Interest: Practical experience or strong interest in building semantic layers for LLM applications (RAG, Vector DBs, or prompt engineering for analytics). Requirements - Languages: Python (Advanced), SQL (Expert). - Cloud: AWS. - Warehousing: Snowflake, BigQuery, or Databricks. - Transformation: dbt (Data Build Tool). - Orchestration: Airflow, Dagster, or Prefect. Benefits - Fully remote environment – work from anywhere while maintaining meaningful collaboration with a distributed team. - Comprehensive health benefits – medical, dental, and vision coverage to support you and your family. - Paid time off – 2 weeks of PTO to rest, recharge, and take the time you need. - Flexible floating holiday – one additional day each year to celebrate what matters most to you. - Paid sick leave – 5 sick days so you can prioritize your health when needed. - 11 paid company holidays throughout the year. - 401(k) retirement plan to help you invest in your future. - Healthcare and Dependent Care FSA options for additional tax-advantaged savings.
Description About the Role At NinjaOne, we're looking for a skilled Senior Data Engineer to join our team and help drive the future of our data infrastructure. You'll play a critical role in building, maintaining, and scaling our systems to ensure smooth data flow, accuracy, and security across the organization. This is an exciting opportunity to work on innovative projects, collaborate with cross-functional teams, and help shape how we leverage data to fuel growth, optimize products, and drive business decisions. Location - We are flexible on remote working from home, if you are located in the USA and reside in one of the following states - CA , CO , CT , FL , GA , *IL , KS , MA , MD, ME , NJ , NC , NY , OR , TN , TX , VA , and WA . We have physical offices in Austin, TX and Tampa, FL, if you prefer a hybrid option. We hire the best software engineers, but experience in our stack can't hurt: NinjaOne is built on Java , Kotlin , C++ , Golang and Postgres ; supporting millions of user endpoints and running as a scalable cloud service in AWS . Knowing large-scale datastore bottlenecks, asynchronous application design and client-server architecture will help you. What You'll be Doing - Data Pipeline Development: Design and implement scalable data pipelines that move and transform large volumes of data from multiple sources to central data warehouses , transforming data to enable business reporting and advanced analytics . - Database Management: Manage and optimize the performance of relational databases, ensuring data availability, reliability, and consistency. - Automation & Optimization: Automate and optimize data workflows to reduce manual processes and improve efficiency in data collection, storage, and processing. - Monitoring & Maintenance: Ensure the integrity and security of data across systems, monitor performance, and troubleshoot any issues that arise within the data pipeline. - Data Visualization: Build d ashboards and reports in Tableau and Databricks to expose key data points and trends to business stakeholders . - Collaboration: Work closely with data scientists, analysts, and other teams to gather requirements, understand data needs, and provide solutions that support data-driven decision-making. - Other duties as needed . About You - Bachelor's degree in Computer Science , Computer Engineering, Information Technology or equivalent work experience preferred. - 10 + years of experience in software development, with a strong focus on data engineering and data science. - E xperience in building data pipelines and managing large-scale data systems using technologies like SQL and Python. - Expertise in Python. - Experience in cloud platforms like AWS, GCP, or Azure, and experience with tools like Airflow , Kafka or dbt for orchestrating data workflows. - Mastery with both relational databases inc luding MySQL, PostgreSQL and NoSQL databases like MongoDB, Cassandra. - E xperience with data warehousing concepts and tools suc h as Redshift, BigQuery , Snowflake. - Solid understanding of Microservices Architecture and DevOps principles. - Experience that will make you a standout candidate: o Previous experience working with large-scale data pipelines and machine learning models. o Understanding of Generative AI and Deep Learning frameworks. About Us NinjaOne automates the hardest parts of IT to deliver visibility, security, and control over all endpoints for more than 30,000 customers. The NinjaOne automated endpoint management platform is proven to increase productivity, reduce security risk, and lower costs for IT teams and managed service providers. NinjaOne is obsessed with customer success and provides free and unlimited onboarding, training, and support. NinjaOne is #1 on G2 in endpoint management, patch management, remote monitoring and management, and mobile device management. What You'll Love We are a collaborative, kind, and curious community. We honor your flexibility needs with full-time work that is hybrid remote. We have you covered with our comprehensive benefits package, which includes medical, dental, and vision insurance. We help you prepare for your financial future with our 401(k) plan. We prioritize your work-life balance with our unlimited PTO. We reward your work with opportunity for growth and advancement. Additional Information This position is NOT eligible for Visa sponsorship. Due to federal government security requirements associated with our FedRAMP-authorized environment, candidates must be U.S. citizens or lawful permanent residents. *Due to operational policies, NinjaOne is unable to hire for this role within the city limits of Chicago. We will consider all qualified candidates who reside outside of the city proper or are willing to self- relocate. Starting pay for the successful applicant depends on a variety of job-related factors, including but not limited to location, market demands, experience, job-related knowledge, and skills. The benefits available for this position include medical, dental, vision, 401(k) plan, life insurance coverage and PTO. For roles based in California, Colorado, Maryland, New Jersey, or Washington the base salary hiring range for this position is $11 0 ,000 to $200 ,000 per year. For roles based in New York, the base salary hiring range for this position is $11 0,000 to $20 0,000 per year. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, genetic information, marital status, veteran status, or any other status protected by applicable law. We are committed to providing an inclusive and diverse work environment. #LI- KS 2 #LI-Remote #BI-Remote #BI-Hybrid
