Spend is the fuel to help your company deliver performance, profitability, and purpose!
AI Engineer, Data Pipeline
Location
India
Posted
22 days ago
Salary
0
Seniority
Senior
Job Description
AI Engineer, Data Pipeline
Coupa Software
• Build data ingestion pipelines to extract and transform enterprise data. • Implement data cleansing and normalization routines. • Write and maintain ETL jobs using Spark/PySpark on cloud infrastructure. • Implement data validation and quality checks at each pipeline stage. • Build automated data export jobs for model training datasets. • Support feature extraction from enterprise schemas. • Monitor pipeline health, troubleshoot failures, and optimize performance. • Document data lineage, schemas, and transformation logic.
Job Requirements
- 3+ years of software engineering experience.
- Experience with Python and data processing (pandas, PySpark, or equivalent).
- Familiarity with SQL and relational databases (MySQL, PostgreSQL).
- Experience with cloud data services (object storage, managed Spark, managed ETL, or equivalent).
- Understanding of ETL/ELT patterns and data pipeline design.
- Experience with data formats (Parquet, JSON, Avro).
- Strong attention to data quality and testing.
- BS in Computer Science or equivalent experience.
Benefits
- Pioneering Technology: At Coupa, we're at the forefront of innovation, leveraging the latest technology to empower our customers with greater efficiency and visibility in their spend.
- Collaborative Culture: We value collaboration and teamwork, and our culture is driven by transparency, openness, and a shared commitment to excellence.
- Global Impact: Join a company where your work has a global, measurable impact on our clients, the business, and each other.
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
• Diseñar, desarrollar, implementar y ajustar sistemas distribuidos a gran escala y canalizaciones que procesan grandes volúmenes de datos • Centrándose en la escalabilidad, la baja latencia y la tolerancia a fallos en cada sistema construido
• Define and maintain the end-to-end application and data architecture • Establish standards for system design, integration patterns (APIs, middleware, eventing), and data models • Ensure scalability, performance, and long-term maintainability • Act as the technical lead across all external development partners • Review and approve solution designs, code architecture, and technical approaches • Challenge vendors where needed — no rubber stamping • Ensure delivery aligns with architectural standards and business outcomes • Define and oversee data architecture, governance, and quality standards • Manage integration across systems (ERP, CRM, eCommerce, etc.) • Ensure data is usable, reliable, and decision-ready • Partner with leadership to translate business goals into technical roadmaps and system requirements • Simplify complex technical concepts for non-technical stakeholders • Evaluate and guide decisions on SaaS vs. custom development, build vs. buy vs. integrate • Design integration architecture across ERP, marketing systems, and data platforms • Establish architecture governance processes • Participate in sprint reviews, backlog prioritization, and delivery checkpoints • Ensure proper documentation, testing, and deployment standards
Lead Consultant, Data Engineer
LovelyticsLovelytics is a data, AI, and analytics consultancy. Your Data, Our Expertise. Crafting Data Innovation into Reality.
• Utilize consulting and technical skills to be able to work in a client-facing project environment independently • Be responsible for your own execution and sometimes lead individual work streams on client engagements as assigned and under supervision of engagement lead • Collaborate with other team members to successfully deliver on projects • Work effectively and directly communicate with both internal and client and/or partner teams • Develop full ownership of your execution on client engagements • Design and implement complex ETL/ELT pipelines with evidence of improved data processing times • Successfully lead small data warehousing projects with measurable performance enhancements under management of an engagement lead • Contribute to real-time data processing solutions and manage streaming data • Implement security and compliance measures for data pipelines • Design and implement version control and branching strategies and integrate them into CI/CD for promoting and testing in higher environments • Hands-on experience working with SAP data at the table level • Strong understanding of SAP data structures and relationships, beyond ETL tooling • Ability to interpret SAP data in the context of underlying business processes
• Build & Operate Large-Scale Feature Pipelines: Design and maintain batch/streaming pipelines (Spark, Flink, Databricks, Airflow) producing ML features for ranking models. • Ensure Point-in-Time Correctness: Develop feature sets that enable unbiased offline training and credible online inference. • Develop Embedding & Content Pipelines: Build scalable workflows for metadata, imagery, and multimodal representations; partner with Science teams to operationalize new models. • Architect Data Foundations: Design Delta/Parquet data models and medallion layers, optimizing storage layout and partitioning for latency and cost. • Real-Time Engineering: Build Kafka-based systems for real-time features and user-activity aggregations, ensuring robust handling of out-of-order events and exactly-once semantics. • Governance & Leadership: Define data quality rules and schema evolution processes while collaborating across ML pods to translate model needs into infrastructure.




