Vulcury invests in early stage startups and advises companies of all sizes on strategy, growth, and efficiency
Data Engineer – Pipelines, Structured Markup
Location
India
Posted
123 days ago
Salary
0
Seniority
Senior
Job Description
Data Engineer – Pipelines, Structured Markup
Vulcury
• Design and maintain ingestion pipelines (Python-based ETL/ELT) • Design structured transformation workflows using dbt, SQLMesh, or equivalent • Convert unstructured transcripts and documents into normalized database records • Maintain PostgreSQL architecture (structured tables, JSONB, indexing strategy) • Develop attribute extraction frameworks for technical, commercial, and risk signals • Ensure data quality, consistency, and lineage from raw interaction to structured output • Collaborate with AI/ML engineers to ensure clean model inputs
Job Requirements
- 4-5 years of experience
- Strong Python (data pipelines, orchestration)
- Advanced SQL (PostgreSQL preferred)
- Experience with ETL/ELT frameworks (dbt, Airflow, SQLMesh, etc.)
- Experience handling semi-structured data (JSON, transcripts, document parsing)
- Strong schema design and normalization skills
- Familiarity with cloud storage systems (S3 or equivalent)
Benefits
- Competitive salary
- Health insurance
- Paid time off
- Flexible work arrangements
- Professional development opportunities
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Data Engineer
Koala HealthKoala Health works to provide people with all of the medications and health products their pets need. The company packages medication and health products by date and time to help m
• Own and evolve Koala Health’s end-to-end data infrastructure, including ingestion, transformation, modeling, and delivery. • Design and maintain reliable data pipelines from production systems (e.g., application databases, third-party tools, vendors). • Build and manage data models that support analytics, reporting, and operational use cases. • Establish and enforce best practices for data quality, testing, monitoring, and documentation. • Partner with stakeholders across product, operations, finance, and marketing to understand data needs and translate them into scalable solutions. • Improve the reliability, performance, and cost-efficiency of the data stack as the business grows. • Own incident response and debugging for data issues, proactively identifying and resolving root causes. • Create and maintain clear documentation so data assets are understandable and usable across the company. • Evaluate and implement tooling improvements where it meaningfully improves developer velocity or data quality. • Act as a thought partner to leadership on how data can better support decision-making and operational efficiency.
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description A ROIT é uma empresa disruptiva e inovadora. Desenvolvemos e disponibilizamos ao mercado soluções de tecnologia disruptivas, baseadas no uso humano como exceção. Oferecemos uma transformação inteligente para empresas que desejam evoluir e alcançar resultados surpreendentes com Robotização, Inteligência Artificial e Analytics. Venha fazer parte dessa transformação! Responsibilities - Implementar sistemas e rotinas de monitoramento (dados, aplicações, queries etc); - Evoluir modelos de dados, arquitetura e construção de pipeline de dados para atender novos requisitos de engenharia e negócios; - Implementar rotinas de migração, tratamento e armazenamento de dados (ETL); - Desenvolver integrações entre diferentes fontes de dados (RDS, APIs externas e etc) a fim de centralizar em um datalake; - Implementar e conduzir testes de carga; - Monitorar pipeline de dados em execução. Qualifications - Experiência com Python; - Experiência com GCP; - Experiência com integração continua e deploy em cloud; - Experiência com Datalake, Data Warehouse e Data Marts; - Conhecimento em banco de dados SQL e NoSQL; - Conhecimento em recurso das plataformas cloud; - Conhecimento em arquitetura orientada a eventos. Requirements - Será um diferencial ter experiência em: - Vivência em projetos de streaming; - Experiência com Airflow; - Processamento distribuído de dados (Spark ou similares).
Data Engineer
EnodeAPIs for connecting to EVs, thermostats and other energy hardware. Building the technology behind a green energy system
• Deliver Flex-critical data needs: build and maintain reliable pipelines and datasets that enable Flex models (e.g., demand/availability signals, aggregations, monitoring). • Evolve the data platform: assess what we have today and drive pragmatic improvements in architecture, tooling, and operating practices. • Own data quality and trust: implement testing, lineage/definitions, and guardrails (e.g., dbt tests, anomaly detection, freshness checks) so stakeholders can trust outputs. • Enable self-serve analytics: produce well modeled datasets and documentation that make it easy for others to answer questions without bespoke work. • Partner on data science work: collaborate on data readiness for modelling, feature pipelines, evaluation workflows, and productionization concerns (even if you’re not the primary model builder). • Make high-leverage tech choices: propose and justify changes (or non-changes) to tools and processes, prioritizing impact and delivery over long platform rewrites.
• Design, build, and maintain scalable data pipelines using Microsoft Fabric and Apache Airflow • Ingest, transform, and integrate data from a variety of sources, including relational systems, APIs, and MongoDB • Implement and manage data solutions aligned to Medallion architecture principles (Bronze, Silver, Gold) • Design and maintain analytical data models, including fact and dimension tables, to support reporting and analytics • Optimize data storage, performance, and reliability across lakehouse and warehouse environments • Ensure data quality, observability, and lineage through validation, monitoring, and documentation • Collaborate with data analysts and BI developers to enable performant, well-modeled datasets for Power BI • Partner with clinical, operational, and technical stakeholders to understand data requirements and constraints • Support data governance, security, and compliance efforts, including HIPAA-related controls • Mentor junior data engineers and contribute to engineering standards and best practices




