Job Closed
This listing is no longer active.
We are a Y-Combinator-backed startup building your AI-powered Recruiter Agent
Senior Data Engineer, Microsoft Fabric
Location
India
Posted
89 days ago
Salary
0
Seniority
Senior
Job Description
Senior Data Engineer, Microsoft Fabric
Weekday (YC W21)
• Design and implement scalable cloud-native data architectures using Microsoft Fabric and Azure data services. • Define best practices for data governance, architecture standards, and platform scalability. • Build robust data models and data warehouse architectures to support analytics and AI workloads. • Design and develop high-performance ETL and ELT pipelines for large-scale data processing. • Build and maintain data pipelines using Python and SQL to process and transform complex datasets. • Ensure reliability, scalability, and performance optimization across data workflows. • Develop and manage data engineering workflows using Databricks, Spark, and Delta Lake. • Implement data ingestion frameworks and support large-scale data processing environments. • Optimize data pipelines for performance, reliability, and cost efficiency. • Design workflow orchestration using tools such as Airflow or Azure-native orchestration services. • Automate data processing pipelines and maintain operational reliability across systems. • Collaborate with machine learning teams to support LLM, NLP, and AI-driven data workflows. • Enable feature engineering and data pipelines that support advanced analytics and AI models. • Establish best practices for data architecture, pipeline management, documentation, and security. • Ensure compliance with enterprise data governance and quality standards.
Job Requirements
- 4+ years of experience in data engineering, data architecture, or ETL development.
- Hands-on experience with Microsoft Fabric data engineering capabilities.
- Strong expertise in ETL/ELT development and data pipeline design.
- Experience working with Databricks, Apache Spark, and Delta Lake.
- Strong programming skills in Python and SQL.
- Experience building scalable data platforms on Azure cloud environments.
- Knowledge of data warehousing, data modeling, and large-scale data processing.
- Familiarity with LLM/NLP workflows or AI-driven data pipelines is an advantage.
- Bachelor’s degree in Computer Science, Information Technology, or related field preferred.
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
ABOUT KLED Kled is building the largest opt-in human data network in the world. We are not a labeling firm. We are not a task marketplace. We are a consumer application where people upload their real photos, videos, and documents and get paid continuously. We then filter, standardize, and license that data to frontier AI labs and enterprises that need fresh, rights-aware training data. Since launching our mobile app in 2026, we have: • Reached #1 on the App Store (Finance) with 0 paid marketing • Scaled to 200,000+ active data contributors • Processed 1.5–3M uploads per day • Raised $5M+ from investors behind SpaceX, Airbnb, Coinbase, xAI, OpenAI, Anthropic, Spotify, Lyft, Uber, and more Our mission is to let anyone download the app and earn a real living wage from uploading their data. ABOUT THE ROLE Database & Infrastructure Engineer (Full-Stack Systems) We process millions of files per day and store hundreds of millions of media records. Your job is to make our data layer world-class. You will: • Optimize and scale our PostgreSQL (Supabase) infrastructure • Design indexing, partitioning, and query strategies for large-scale media datasets • Improve performance across ingestion, enrichment, and retrieval pipelines • Build internal tools for querying and auditing large datasets • Create customer-ready dataset sample packs • Design and automate dataset exports and delivery pipelines (S3, secure transfers, custom formats) • Work across backend, ML, and product teams to support new features This is not just DBA work. You’ll help design the systems that move and package the data powering frontier AI labs. WE’RE LOOKING FOR • Strong PostgreSQL expertise (indexing, partitioning, performance tuning) • Experience working with large datasets (100M+ records preferred) • Deep understanding of storage systems (S3 or similar object storage) • Strong backend experience (TypeScript, Python, or similar) • Comfort building internal tooling and automation scripts • Ability to move between database, backend, and infrastructure work Bonus: • Experience with data pipelines (ETL, transformation layers) • Experience with vector databases (pgvector, FAISS, Pinecone) • Experience delivering structured datasets to enterprise customers • DevOps experience (CI/CD, infra automation) • Experience working with media-heavy systems CURRENT STACK Backend • PostgreSQL (Supabase) — 188M+ media files • S3 storage • Deno / TypeScript edge functions • Python ML pipelines Frontend • SwiftUI (migrating to Flutter) COMPENSATION • Base salary: $150,000 - $250,000 • $150,000 – $350,000 equity • Benefits • Relocation support • SF HQ (SOMA) or remote We move fast and work hard (9–9 culture). If you're excited to build the world’s largest consumer app, let’s talk! GROWTH OPPORTUNITY You’ll join a team operating at the frontier of applied AI data infrastructure. We move fast and work 7 days a week. In this role, you’ll have the opportunity to: • Own core systems that power one of the largest human data networks in the world • Design infrastructure that directly influences what data trains next-generation AI models • Build at real scale - millions of uploads per day, adversarial environments, global contributors • Ship alongside a team that has built marketplaces, AI systems, and products used by millions If you’re excited to move fast, build systems that matter, and help define how human data powers frontier AI, let’s talk.
About tvScientific tvScientific is the first and only CTV advertising platform purpose-built for performance marketers. We leverage massive data and cutting-edge science to automate and optimize TV advertising to drive business outcomes. Our solution combines media buying, optimization, measurement, and attribution in one, efficient platform. Our platform is built by industry leaders with a long history in programmatic advertising, digital media, and ad verification who have now purpose-built a CTV performance platform advertisers can trust to grow their business. We are seeking a Staff Data Engineer to lead the design, implementation, and evolution of our identity services and data governance platform. This role is critical to ensuring trusted, privacy-safe, and well-governed data across the organization. You will work at the intersection of data engineering, identity resolution, privacy, and platform reliability. This is an individual contributor role, where you will work to define and implement a strategic vision for data engineering within the organization. What you'll do: - Identity Services: - Design and maintain a scalable identity resolution platform - Build pipelines and services to ingest, normalize, link, and version identity data across multiple sources - Ensure deterministic and probabilistic matching logic that is transparent, auditable, and measurable - Partner with product and analytics teams to expose identity data through reliable, well-documented APIs and datasets - Build and operate batch and streaming pipelines using modern data stack tools - Create clear documentation, standards, and runbooks for identity and governance systems - Data Governance & Trust - Own data governance foundations including data lineage, quality checks, schema enforcement, and access controls - Implement privacy-by-design principles (PII handling, consent enforcement, retention policies) - Collaborate with legal, privacy, and security teams to operationalize regulatory requirements (e.g., GDPR, CCPA) - Establish monitoring and alerting for data quality, freshness, and integrity What we're looking for: - Data engineering experience with proven track record building data infrastructure using Spark with Scala - Proven experience building data infrastructure using Spark with Scala for at least 5 years - Experience in delivering significant technical initiatives and building reliable, large scale services - Experience in delivering APIs backed by relationship-heavy datasets - Experience implementing data governance practices, including data quality, metadata management, and access controls - Strong understanding of privacy-by-design principles and handling of sensitive or regulated data - Familiarity with data lakes, cloud warehouses, and storage formats - Strong proficiency in AWS services - Successful design and implementation of scalable and efficient data infrastructure - High attention to detail in implementation of automated data quality checks - Effective collaboration with cross-functional teams - Excellent written and verbal communication skills - Bachelor's degree in Computer Science or a related field In-Office Requirement Statement: - We recognize that the ideal environment for work is situational and may differ across departments. What this looks like day-to-day can vary based on the needs of each organization or role. Relocation Statement: - This position is not eligible for relocation assistance. Visit our PinFlex page to learn more about our working model. #LI-SM4 #LI-REMOTE At Pinterest we believe the workplace should be equitable, inclusive, and inspiring for every employee. In an effort to provide greater transparency, we are sharing the base salary range for this position. The position is also eligible for equity. Final salary is based on a number of factors including location, travel, relevant prior experience, or particular skills and expertise. Information regarding the culture at Pinterest and benefits available for this position can be found here. US based applicants only $155,584—$320,320 USD
About tvScientific tvScientific is the first and only CTV advertising platform purpose-built for performance marketers. We leverage massive data and cutting-edge science to automate and optimize TV advertising to drive business outcomes. Our solution combines media buying, optimization, measurement, and attribution in one, efficient platform. Our platform is built by industry leaders with a long history in programmatic advertising, digital media, and ad verification who have now purpose-built a CTV performance platform advertisers can trust to grow their business. As a Staff Data Engineer at tvScientific, you will be a key player in implementing the robust data infrastructure to power our data-heavy company. You will collaborate with our cross-functional teams to evolve our core data pipelines, design for efficiency as we scale, and store data in optimal engines and formats. This is an individual contributor role, where you will work to define and implement a strategic vision for data engineering within the organization. What you'll do: - Design and implement robust data infrastructure in AWS, using Spark with Scala - Evolve our core data pipelines to efficiently scale for our massive growth - Store data in optimal engines and formats, matching your designs to our performance needs and cost factors - Collaborate with our cross-functional teams to design data solutions that meet business needs - Design and implement knowledge graphs, exposing their functionality both via Batch Processing and APIs - Leverage and optimize AWS resources while designing for scale - Collaborate closely with our Data Science and Product teams - How we'll define success: - Successful design and implementation of scalable and efficient data infrastructure - Timely delivery and optimization of data assets and APIs - High attention to detail in implementation of automated data quality checks - Effective collaboration with cross-functional teams What we're looking for: - Production data engineering experience - Proficiency in Spark and Scala, with proven experience building data infrastructure in Spark using Scala - Experience in delivering significant technical initiatives and building reliable, large scale services - Experience in delivering APIs backed by relationship-heavy datasets - Familiarity with data lakes, cloud warehouses, and storage formats - Strong proficiency in AWS services - Expertise in SQL for data manipulation and extraction - Excellent written and verbal communication skills - Bachelor's degree in Computer Science or a related field - Nice-to-haves: - Experience in adtech - Experience implementing data governance practices, including data quality, metadata management, and access controls - Strong understanding of privacy-by-design principles and handling of sensitive or regulated data - Familiarity with data table formats like Apache Iceberg, Delta - Previous experience building out a Data Engineering function - Proven experience working closely with Data Science teams on machine learning pipelines In-Office Requirement Statement: - We recognize that the ideal environment for work is situational and may differ across departments. What this looks like day-to-day can vary based on the needs of each organization or role. Relocation Statement: - This position is not eligible for relocation assistance. Visit our PinFlex page to learn more about our working model. #LI-SM4 #LI-REMOTE At Pinterest we believe the workplace should be equitable, inclusive, and inspiring for every employee. In an effort to provide greater transparency, we are sharing the base salary range for this position. The position is also eligible for equity. Final salary is based on a number of factors including location, travel, relevant prior experience, or particular skills and expertise. Information regarding the culture at Pinterest and benefits available for this position can be found here. US based applicants only $155,584—$320,320 USD
About tvScientific tvScientific is the first and only CTV advertising platform purpose-built for performance marketers. We leverage massive data and cutting-edge science to automate and optimize TV advertising to drive business outcomes. Our solution combines media buying, optimization, measurement, and attribution in one, efficient platform. Our platform is built by industry leaders with a long history in programmatic advertising, digital media, and ad verification who have now purpose-built a CTV performance platform advertisers can trust to grow their business. As a Senior Data Engineer at tvScientific, you will be a key player in implementing the robust data infrastructure to power our data-heavy company. You will collaborate with our cross-functional teams to evolve our core data pipelines, design for efficiency as we scale, and store data in optimal engines and formats. This is an individual contributor role, where you will work to define and implement a strategic vision for data engineering within the organization. What you'll do: - Implement robust data infrastructure in AWS, using Spark with Scala - Evolve our core data pipelines to efficiently scale for our massive growth - Store data in optimal engines and formats - Collaborate with our cross-functional teams to design data solutions that meet business needs - Built out fault-tolerant batch and streaming pipelines - Leverage and optimize AWS resources while designing for scale - Collaborate closely with our Data Science and Product teams - How we'll define success: - Successful implementation of scalable and efficient data infrastructure - Timely delivery and optimization of data assets and APIs - High attention to detail in implementation of automated data quality checks - Effective collaboration with cross-functional teams What we're looking for: - Production data engineering experience - Proficiency in Spark and Scala, with proven experience building data infrastructure in Spark using Scala - Familiarity with data lakes, cloud warehouses, and storage formats - Strong proficiency in AWS services - Expertise in SQL for data manipulation and extraction - Excellent written and verbal communication skills - Bachelor's degree in Computer Science or a related field - Nice-to-Haves - Experience in adtech - Experience implementing data governance practices, including data quality, metadata management, and access controls - Strong understanding of privacy-by-design principles and handling of sensitive or regulated data - Familiarity with data table formats like Apache Iceberg, Delta In-Office Requirement Statement: - We recognize that the ideal environment for work is situational and may differ across departments. What this looks like day-to-day can vary based on the needs of each organization or role. Relocation Statement: - This position is not eligible for relocation assistance. Visit our PinFlex page to learn more about our working model. #LI-SM4 #LI-REMOTE At Pinterest we believe the workplace should be equitable, inclusive, and inspiring for every employee. In an effort to provide greater transparency, we are sharing the base salary range for this position. The position is also eligible for equity. Final salary is based on a number of factors including location, travel, relevant prior experience, or particular skills and expertise. Information regarding the culture at Pinterest and benefits available for this position can be found here. US based applicants only $123,696—$254,667 USD

