Transforming Care Through Technology
Senior Research Data Engineer
Location
Canada
Posted
2 days ago
Salary
C$159.1K - C$176.7K / year
Seniority
Senior
Job Description
Senior Research Data Engineer
PointClickCare
• Own the gold data layer. Transform messy, silver tables into curated, semantically rich, clean and documented gold datasets suitable for AI model development, including datasets and features reusable for AI development across projects. • Maintain the data as products and needs evolve. To do this you will • Reverse-engineer data semantics. Talk with product engineers, clinical and workflow experts to learn how the products are used and how data are created in the field. • Understand SQL queries, stored procedures, technical data definitions, and other code to know how products represent and transform data. • Learn how data are ingested into the data lake, what silver tables and columns actually represent and how they behave. • Capture provenance, semantics, clinical event sequencing, cross module record linkage and known quirks. • Bridge semantics with AI needs. Understand researcher data needs to design and build the gold data product, with documentation that evolves, to meet AI applied research needs for a highly efficient AI-first foundation for model R&D. • Curate datasets across modalities. For various AI uses such as generative AI, RAG, predictive and other techniques, support researcher needs for chunked and tagged unstructured content with rich metadata, point-in-time-correct features and clean labels. For classical ML and statistical work, deliver model-ready tables. • Build pipelines for reuse. Develop transformations from silver into gold inside Databricks/Spark as scheduled, observable workloads. Design them so researchers can iterate on new features and data mixes without rebuilding from scratch. • Automate quality, filtering, and synthesis. Support research needs for programmatic labeling, weak supervision, near-duplicate detection, boilerplate and noise removal, and LLM-API-driven synthetic data generation where ground truth is scarce. • Version and hand off. Maintain reproducible dataset snapshots. Define clean lineage and semantic definitions so the downstream team can use and re-use gold datasets in AI R&D.
Job Requirements
- 5+ years building production data systems, with at least 2 supporting ML or AI workloads.
- Track record of learning complex new data domains quickly, through reading source code, interviewing experts, and building durable artifacts others rely on.
- Advanced Python, SQL, and PySpark /Databricks for working with large, messy data.
- Expert SQL specifically: comfortable reading complex stored procedures and reverse-engineering business logic from queries.
- Databricks ecosystem depth: Delta Lake, Unity Catalog , Spark/ PySpark tuning, MLflow.
- AI domain literacy: working understanding of embeddings, tokenization, feature engineering, point-in-time correctness, train/validation/test splits, data drift, and the differences between what classical ML and generative models need from data.
- Data wrangling across modalities: transforming unstructured content (text, PDFs, transcripts, logs) and structured tabular data into clean, model-ready forms.
- AI-friendly data formats (Parquet, Hugging Face datasets) and storage layout decisions — partitioning, sharding, caching, that keep researcher workflows responsive in Azure, AWS or other working environments.
- Data quality, filtering, and synthesis pipelines: support for programmatic labeling and weak supervision (e.g. Snorkel or equivalent), near-duplicate detection (MinHash /LSH), content and quality filters, LLM-API-driven synthetic data generation.
- Pipeline orchestration (e.g. a la Airflow, Databricks Workflows, Dagster , or Prefect) and dataset versioning including Unity Catalog and feature-store support.
- Experience handling regulated or sensitive data under controlled access (HIPAA or equivalent). Familiarity with general de-identification concepts.
- Git-based version control and CI/CD for data and code.
- Strong written documentation. Skill in eliciting requirements and tacit knowledge from technical and non-technical experts.
- Bachelor’s degree in computer science, data science, engineering, statistics, or related field. Equivalent practical experience considered.
Benefits
- Benefits starting from Day 1!
- Retirement Plan Matching
- Flexible Paid Time Off
- Wellness Support Programs and Resources
- Parental & Caregiver Leaves
- Fertility & Adoption Support
- Continuous Development Support Program
- Employee Assistance Program
- Allyship and Inclusion Communities
- Employee Recognition … and more!
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
• Toma de requisitos, desarrollo, pruebas funcionales y validación de soluciones de datos. • Diseño e implementación de procesos de carga e integración de datos con herramientas ETL, especialmente PowerCenter, IDQ o DEI. • Promoción a producción y mantenimiento de procesos, asegurando su correcta operativa. • Trabajo con bases de datos y plataformas como BigQuery, Cloudera/Hive y SQL Server, incluyendo documentación técnica.
• Design, build, and optimize robust ETLs using PySpark and DBT to process large-scale customer datasets. • Develop tools and frameworks to streamline data integrations and improve scalability. • Define the technical vision for DC data architecture and mentor engineers. • Manage external contractors to ensure the team delivers high-quality, practical solutions. • Partner with product, engineering, and applied science teams to scope work and deliver data solutions that address real-world challenges in customer data quality and product feature requirements.
• Define and drive the data engineering technical strategy, architecture decisions, and platform roadmap aligned to company objectives • Lead and deliver large-scale, complex data initiatives—spanning multiple teams and iterations—from ambiguous problem definition through production deployment • Design robust, scalable data architectures (batch and streaming) that support Kueski's long-term business needs at scale • Demonstrated success shaping and executing an AI-centric data strategy that leverages the latest AI technologies to accelerate value delivery, enable trusted self-service data consumption, and strengthen data quality, governance, and organizational decision-making. • Identify the limits of existing tools or processes; lead the design and build of new capabilities when current solutions fall short • Shape, standardize, and champion data engineering methodologies, best practices, and technical standards for the team and department • Develop and own CI/CD pipelines and infrastructure-as-code for reliable, automated data platform operations • Drive data quality, observability, and governance programs across the data platform • Apply data cleansing techniques to facilitate data consumption and quality across the platform • Partner cross-functionally with Data Science, ML, Analytics, Platform, and Product teams to deliver data-driven solutions end-to-end • Represent data engineering in cross-organizational initiatives; support and lead efforts outside the core area of responsibility • Mentor and guide Data Engineers at all levels; constructively challenge assumptions and elevate team quality through code review, pairing, and coaching
Senior Snowflake Data Engineer
Encora DigitalEncora, a leader in digital engineering, drives innovation by crafting cutting-edge, cloud-first, data-first, and AI-first solutions that redefine industries. S
Role Description We at Coforge are hiring Senior Snowflake Data Engineers with the following skill set. We are looking for two Senior Data Engineers (21999) to support our client in scaling their Data Engineering capabilities. The team is currently managing a backlog of ~170 projects, ranging from data infrastructure and database development to integrations and reporting initiatives. This is an excellent opportunity to work on high-impact, data-driven projects in a fast-paced, collaborative environment. - Design scalable and efficient data models and analytical cubes in collaboration with stakeholders. - Develop and maintain robust data pipelines using SQL and Python. - Integrate data models and data services through APIs. - Leverage Snowflake to build optimized, high-performance data solutions. - Collaborate with cross-functional teams to support data infrastructure, reporting, and analytics initiatives. - Ensure best practices in data architecture, modeling, and pipeline performance. - Contribute to continuous improvement of data engineering standards and practices. Qualifications - 5+ years of experience in Data Engineering or related roles. - Strong proficiency in SQL and Python. - Hands-on experience with Snowflake (minimum 1 year). - Solid experience in data modeling for analytical and reporting use cases. - Proven experience in API integrations. - Strong problem-solving skills and attention to detail. - Ability to work collaboratively in cross-functional teams. - Good communication skills and a proactive mindset. Requirements - Familiarity with AWS and cloud-based data solutions. - Experience working in environments with large backlogs or multiple concurrent data projects. - Exposure to modern data architectures and scalable pipeline design. Company Description At Coforge, we hire professionals based solely on their skills and qualifications. We are committed to building an inclusive workplace and do not discriminate based on age, disability, religion, gender, sexual orientation, socioeconomic status, or nationality.



