Job Closed

This listing is no longer active.

Machinify

Machinify focuses on providing machine learning solutions to businesses and was created to help companies integrate artificial intelligence into everyday practi

Senior Data Engineer – Analytics

Location

California

Posted

140 days ago

Salary

0

Seniority

Senior

Bachelor Degree4 yrs expEnglishAirflowAWSApache KafkaPythonApache SparkSQL

Job Description

Senior Data Engineer – Analytics

Machinify

• Design and implement robust, production-grade pipelines using Python, Spark SQL, and Airflow to process high-volume file-based datasets (CSV, Parquet, JSON). • Lead efforts to canonicalize raw healthcare data (837 claims, EHR, partner data, flat files) into internal models. • Own the full lifecycle of core pipelines — from file ingestion to validated, queryable datasets — ensuring high reliability and performance. • Onboard new customers by integrating their raw data into internal pipelines and canonical models; collaborate with SMEs, Account Managers, and Product to ensure successful implementation and troubleshooting. • Build resilient, idempotent transformation logic with data quality checks, validation layers, and observability. • Refactor and scale existing pipelines to meet growing data and business needs. • Tune Spark jobs and optimize distributed processing performance. • Implement schema enforcement and versioning aligned with internal data standards. • Collaborate deeply with Data Analysts, Data Scientists, Product Managers, Engineering, Platform, SMEs, and AMs to ensure pipelines meet evolving business needs. • Monitor pipeline health, participate in on-call rotations, and proactively debug and resolve production data flow issues. • Contribute to the evolution of our data platform — driving toward mature patterns in observability, testing, and automation. • Build and enhance streaming pipelines (Kafka, SQS, or similar) where needed to support near-real-time data needs. • Help develop and champion internal best practices around pipeline development and data modeling.

Job Requirements

  • 4+ years of experience as a Data Engineer (or equivalent), building production-grade pipelines.
  • Strong expertise in Python, Spark SQL, and Airflow.
  • Experience processing large-scale file-based datasets (CSV, Parquet, JSON, etc) in production environments.
  • Experience mapping and standardizing raw external data into canonical models.
  • Familiarity with AWS (or any cloud), including file storage and distributed compute concepts.
  • Experience onboarding new customers and integrating external customer data with non-standard formats.
  • Ability to work across teams, manage priorities, and own complex data workflows with minimal supervision.
  • Strong written and verbal communication skills — able to explain technical concepts to non-engineering partners.
  • Comfortable designing pipelines from scratch and improving existing pipelines.
  • Experience working with large-scale or messy datasets (healthcare, financial, logs, etc).
  • Experience building or willingness to learn streaming pipelines using tools such as Kafka or SQS.
  • Bonus: Familiarity with healthcare data (837, 835, EHR, UB04, claims normalization).

Benefits

  • Real impact — your pipelines will directly support decision-making and claims payment outcomes from day one.
  • High visibility — partner with ML, Product, Analytics, Platform, Operations, and Customer teams on critical data initiatives.
  • Total ownership — you’ll drive the lifecycle of core datasets powering our platform.
  • Customer-facing impact — you will directly contribute to successful customer onboarding and data integration.

Related Categories

Related Job Pages

More Data Engineer Jobs

Netflix logo

Manager, Ads Data Engineering

Netflix

Described as the world's top internet television network, Netflix is a publicly-traded entertainment company offering video-on-demand and streaming media. As an

Data Engineer140 days ago

• Lead a team of strong engineers building high-scale, highly reliable data processing systems serving the Ads domain • Build a strong team vision and roadmap for the team • Partner with stakeholders to enable collaboration among teams • Provide direct, constructive feedback grounded in empathy • Monitor and proactively address productivity and efficiency • Hire and grow a diverse, high-performing team

United States
$360K - $920K / year
Job Closed
Netflix logo

Full Stack Software Engineer 5 – Data Architecture, Integrations

Netflix

Described as the world's top internet television network, Netflix is a publicly-traded entertainment company offering video-on-demand and streaming media. As an

Data Engineer140 days ago

• Drive the design, development, and governance of data integrations across HR systems • Set integration standards and collaborate with engineering partners • Build robust, reusable, well-governed data pipelines and APIs • Develop full-scale applications when needed • Serve as thought leader and trusted advisor for data integration and architecture

United States
$100K - $600K / year
Job Closed
Eton Technologies logo

Data Engineering Consultant

Eton Technologies

ERP | Cloud | Analytics | Integrations | IT Support

Data Engineer140 days ago
OtherRemoteTeam 51-200Since 2016H1B No Sponsor

• Lead and deliver modern data platforms and analytics initiatives for global clients. • Blend hands-on data engineering, solution architecture, and client advisory responsibilities. • Work closely with business and technical stakeholders to design scalable, secure, and high-performance data solutions.

United States
Smartsheet logo

Principal SE - Big Data Platform

Smartsheet

Founded in 2005, Smartsheet offers collaborative work management and process automation to empower greater enterprise productivity. A leading cloud-based platfo

Data Engineer140 days ago

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description As a Principal Data & AI/ML Ops Engineer at Smartsheet, you will have the opportunity to work across multiple teams and disciplines, building a versatile skillset while solving the complex challenges of a global platform. - Data Architecture and Design: Designing and overseeing the architecture of scalable and reliable data platforms, including data pipelines, storage solutions, and processing systems. - Data Modelling and Management: Developing and implementing data models, ensuring data quality, and establishing data governance policies. - Data Pipeline Development: Building and optimising data pipelines for ingesting, processing, and transforming large datasets from various sources. - Performance Optimisation: Identifying and resolving performance bottlenecks in data pipelines and systems, ensuring efficient data retrieval and processing. - Technology Evaluation and Innovation: Staying abreast of emerging data technologies and exploring opportunities for innovation to improve the organisation’s data infrastructure. - Troubleshooting and Problem Solving: Diagnosing and resolving complex data-related issues, ensuring the stability and reliability of the data platform. - Data Security and Compliance: Implementing data security measures, ensuring compliance with data governance policies, and protecting sensitive data. - Perform other duties as assigned. Qualifications - Enterprise SaaS software solutions with high availability and scalability. - Solution handling large scale structured and unstructured data from varied data sources. - Experience in building and maintaining data platform systems such as distributed compute, data orchestration, distributed storage, streaming infrastructure ensuring scalability, reliability, efficiency and security. - Working with Product engineering team to influence designs with data, AI and analytics use cases in mind. - In depth experience in System design involving large Petabytes of data with Databricks Lakehouse. - Experience in modern AI/Data infrastructure patterns, Semantics layer Organizing data for AI agents (metadata, context). - AI/MLOps workflows on Databricks, MLFlow, Mosaic AI Agent Framework, Unity Catalog, Vector Search, Knowledge Graph. - Knowledge of AI/ML frameworks like LangChain, LangGraph for AI/ML Ops pipeline integration. - Cloud Platforms: Hands-on experience with at least one major cloud provider (AWS, Azure, or GCP). Experience in AWS hosted data platform is preferable. - Programming languages like Python, SQL, and potentially Java or Scala. - Exposure to Snowflake and Data pipeline frameworks like Airbyte/Airflow is preferable. - Modern software engineering practices like Kubernetes, CI/CD, IAC tools (Preferably Terraform), Observability, monitoring and alerting. - Solution Cost Optimisations and design to cost. - Driving engineering excellence initiatives. - Legally eligible to work in India on an ongoing basis. Benefits - Your ideas are heard, your potential is supported, and your contributions have real impact. - You’ll have the freedom to explore, push boundaries, and grow beyond your role. - We welcome diverse perspectives and nontraditional paths. Equal Opportunity Employer Smartsheet is an Equal Opportunity (EEO) employer committed to fostering an inclusive environment with the best employees. It is our policy to provide equal employment opportunities to all qualified applicants in accordance with applicable laws in the US, UK, Australia, Germany, Costa Rica, Japan, Bulgaria, and India. All qualified applicants will receive consideration without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, protected veteran or disabled status, or genetic information. If there are preparations we can make to help ensure you have a comfortable and positive interview experience, please let us know.

United States + 7 moreAll locations: United States | United Kingdom | Germany | India | Australia | Japan | Bulgaria | Costa Rica
Job Closed