Job Closed

This listing is no longer active.

Kalibri Labs logo
Kalibri Labs

Discovery | Analytics | Insights

Machine Learning Data Engineer

Data EngineerData EngineerFull TimeRemoteSeniorTeam 51-200Since 2013H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

57 days ago

Salary

$120K - $160K / year

Seniority

Senior

Postgraduate Degree3 yrs expEnglishAirflowCloudDockerETLJenkinsPySparkPythonSQL

Job Description

Machine Learning Data Engineer

Kalibri Labs

• Design, build, and maintain production data pipelines using Python, Prefect, Airflow, Jenkins or any other orchestration framework multi-phase algorithmic workflows. • Build and optimize advanced SQL transformations in Snowflake, including window functions, CTEs, stored procedures, UDFs, and semi-structured data processing. • Build and maintain dbt models for data transformation, identity resolution, and slowly changing dimension (SCD Type 2) tracking across 80+ models and multiple pipeline stages. • Build and maintain feature engineering pipelines that feed ML models including CatBoost gradient boosting, Prophet time-series decomposition, LightGBM regression, and PuLP linear programming solvers. • Operationalize ML model outputs by integrating predicted ADRs, occupancy forecasts, and optimization results into downstream production tables and Parquet file outputs. • Integrate and reconcile data from multiple heterogeneous sources including hotel property management systems, rate shop providers, mapping APIs, and market forecast data. • Work with PySpark for large-scale daily distribution processing, managing partitioning strategies, memory tuning, and efficient Parquet I/O across millions of records. • Implement and monitor data quality frameworks such as DBT and Monte Carlo. • Manage CI/CD pipelines using Bitbucket Pipelines for automated testing, linting (SQLFluff), and deployment of dbt projects and Python applications. • Containerize pipeline components with Docker for consistent execution across development and production environments. • Implement robust retry logic, error handling, and fallback strategies across pipeline phases to ensure reliable daily and monthly production runs.

Job Requirements

  • Master's degree or PhD in Computer Science, Data Science, Statistics, Mathematics, or a related quantitative field (or Bachelor's degree with equivalent experience).
  • 3–5 years of professional experience as an ML Engineer, Quantitative Engineer, or Research Scientist.
  • Strong proficiency in Python for data pipeline development, scripting, and automation.
  • Deep experience with SQL and cloud data warehouses, particularly Snowflake (stored procedures, UDFs, semi-structured data, performance tuning).
  • Hands-on experience with workflow orchestration tools such as Prefect, Airflow, or similar (e.g., Dagster, Luigi).
  • Proficiency with dbt (dbt Core or dbt Cloud) for SQL-based data transformation and testing.
  • Experience working with PySpark or similar distributed computing frameworks for large-scale data processing.
  • Strong understanding of data modeling, ETL/ELT patterns, and data warehouse design principles.
  • Proficiency with Git version control and collaborative development workflows (Bitbucket preferred).
  • Demonstrated ability to operationalize ML models — not just train them — including feature pipelines, model serving, and output validation.
  • Excellent cross-functional collaboration skills with proven ability to work alongside data scientists, analysts, and product managers.

Benefits

  • Fully remote work, with a thriving company culture
  • Robust medical, dental, and vision plans through Blue Cross Blue Shield, including a $0 cost plan for employees and subsidized coverage for dependents
  • 401k plan with employer match
  • Flexible Paid Time Off
  • $250 new hire allowance for home office setup

Related Categories

Related Job Pages

More Data Engineer Jobs

Worldwide Clinical Trials logo

Senior Associate, Pharmacovigilance - Mexico/Brazil - Remote

Worldwide Clinical Trials

Established in 1986, Worldwide Clinical Trials is a privately held company and leading provider of clinical-trial research studies. As an employer, the company

Data Engineer57 days ago

Who we are We’re a global, midsize CRO that pushes boundaries, innovates and invents because the path to a cure for the world’s most persistent diseases is not paved by those who play it safe. It is built by those who take pioneering, creative approaches and implement them with quality and excellence. We are Worldwide Clinical Trials, and we are a global team of over 3,500+ experts, bright thinkers, dreamers and doers and, together, we are changing the way the world experiences CROs – in the best possible way. Our mission is to work with passion and purpose every day to improve lives and we are looking for others who value this same pursuit. Why Worldwide We believe everyone plays an important role in making a world of difference for patients and their caregivers. From our hands-on, accessible leaders, to our cohesive and supportive teams, we are committed to enabling professionals from all backgrounds and experiences to succeed. We prioritize cultivating a diverse and inclusive environment that continues to promote collaboration and creativity. We are proud to be a workplace where people thrive by being themselves and are inspired to do their best work every day. Join us! What the Senior Associate, Pharmacovigilance does Worldwide Responsible for the collection, processing, evaluation and reporting of incoming Serious Adverse Event (SAE) data according to applicable regulatory guidelines/requirements, Worldwide Standard Operating Procedures (SOPs) and project specific instructions. Independently serves as Lead PV Associate on large sized studies/programs that are moderate to complex in scope of work. What you will do - Author Safety Management Plan for assigned studies - Review incoming SAE data for completeness and accuracy - Perform data entry in the Safety Database and/or complete applicable tracking of incoming safety information - Generate queries for missing or unclear information and follow-up with sites for resolution - Perform QC of SAEs processed by other PV Associates - Generate regulatory reports and perform safety submissions as needed What you will bring to the role - Excellent understanding of medical and scientific terminology, of the principles of clinical assessment of adverse drug events, of international regulations and of reporting requirements - Excellent understanding of computer technology, and management of relational database systems, including extraction of data - Positive attitude and ability to interact diplomatically and professionally with internal and external customers in a global environment - Excellent organizational skills and ability to handle multiple competing priorities within tight timelines - Consistently demonstrates commitment, dependability, cooperation, adaptability and flexibility in executing daily tasks and responsibilities Your experience - Bachelor’s degree in a science-related field, or nursing, or equivalent - Minimum of 5 years of pharmacovigilance experience (pre-approval clinical trials) - Equivalent combination of relevant education and experience - Computer literacy and strong working knowledge of MS Office applications (Excel, PowerPoint, Word) - Excellent written and verbal communication skills - Excellent organizational skills and attention to detail - Demonstrated ability to handle multiple competing priorities while adhering to applicable timelines - Ability to work independently, prioritize work effectively and work successfully in matrix team environment - Ability and willingness for potential limited travel (domestic and international) as needed (attend Investigator Meeting, project kick-off meeting and/or bid defense meeting) - Fluent in written and verbal English We love knowing that someone is going to have a better life because of the work we do. To view our other roles, check out our careers page at Discover a world of difference at Worldwide! For more information on Worldwide, visit www.Worldwide.com or connect with us on LinkedIn. Worldwide is an equal opportunity employer that is committed to enabling professionals from all backgrounds and experiences to succeed and, to that end, we prioritize attracting diverse talent and cultivating an inclusive environment that encourages collaboration and creativity. We know that when our employees feel appreciated and included, they can be more creative, innovative, and successful. We’re on a mission to hire the very best and are committed to creating exceptional employee experiences where everyone is respected and has access to equal opportunity. We provide equal employment opportunities to all employees and applicants regardless of race, color, ethnicity, ancestry, religion, national origin, gender, sex, gender identity or expression, sexual orientation, age, citizenship, marital or parental status, disability, military status, or other class protected by applicable law.

Brazil + 1 moreAll locations: Brazil | Mexico
Zayo Group logo

Senior Data Analytics Engineer-Network Data Solutions

Zayo Group

Zayo provides mission-critical bandwidth to the world’s most impactful companies, fueling the innovations that are transforming our society. Zayo’s 141,000-mile network in North America and Europe includes extensive metro connectivity to thousands of buildings and data centers. Zayo’s communications infrastructure solutions include dark fiber, private data networks, wavelengths, Ethernet, and dedicated Internet access. Zayo serves wireless and wireline carriers, media, tech, content, finance, healthcare and other large enterprises.

Data Engineer57 days ago
Full TimeRemoteTeam 1,001-5,000

Company Description Zayo provides mission-critical bandwidth to the world’s most impactful companies, fueling the innovations that are transforming our society. Zayo’s 141,000-mile network in North America and Europe includes extensive metro connectivity to thousands of buildings and data centers. Zayo’s communications infrastructure solutions include dark fiber, private data networks, wavelengths, Ethernet, and dedicated Internet access. Zayo serves wireless and wireline carriers, media, tech, content, finance, healthcare and other large enterprises. About the Role We’re looking for a Senior Analytics Engineer to join our Network Data Solutions team. In this role, you will combine telecom data expertise with technical acumen and problem-solving skills to design and deliver high-quality data products that enable our network and engineering teams to make informed, data-driven decisions. You’ll work closely with network operations, engineering, and technology partners to understand their needs, document requirements, and translate them into scalable data solutions. You’ll also analyze existing processes and data pipelines to identify improvement opportunities — proposing smarter, faster, and more reliable ways to deliver insights that drive network performance, reliability, and operational efficiency. You will leverage your expertise in data design, data engineering (ETL/ELT), and business intelligence to develop solutions from beginning to end. Key Responsibilities - Develop standards and ways of working for the Analytics Engineering team for Network Data Solutions, with speed to delivery, quality products, scalability, reliability, and observability in mind. - Partner with network and technology stakeholders to understand business objectives, challenges, and data requirements across areas such as network performance, reliability, capacity, and customer experience. - Develop and maintain deep knowledge of telecom data domains and source systems, including network inventory, provisioning, assurance, fault, and performance data. - Lead the creation of BTS (BRD Technical Specification) documentation, translating business logic into clear technical solutions. - Design, build, and maintain robust data products that support operational and strategic reporting — from ingestion through transformation, modeling, and visualization. - Evaluate existing data processes and models to identify opportunities for standardization, optimization, and automation. - Propose and implement innovative solutions that improve data accessibility, quality, and usability for network teams. - Develop scalable data models using dbt and Snowflake aligned to modern data architecture principles. - Build and orchestrate ETL/ELT pipelines in Azure Data Factory and Microsoft Fabric, ensuring efficiency and reliability. - Write and optimize SQL and Python code for advanced data manipulation, automation, and validation. - Facilitate UAT (User Acceptance Testing) with business partners to ensure data solutions align with real-world operational needs. - Design and deliver dashboards and reports using Power BI, Sigma, and Tableau that translate complex network data into actionable insights. - Collaborate with data engineers, analysts, and architects to maintain alignment across the broader data ecosystem. - Serve as a subject matter expert (SME) for network-related data and analytics, mentoring others on telecom-specific concepts, sources, and logic. Qualifications Required Skills & Experience: - Bachelor’s or Master’s degree in Computer Science, Data Science, Information Systems, Telecommunications, or a related field (or equivalent experience). - Minimum of seven (7) years of experience in analytics engineering, data engineering, or BI roles — preferably supporting telecom or network operations. - Strong proficiency in SQL, Snowflake, dbt, and Python for data transformation and automation. - Hands-on experience with Azure Data Factory and Microsoft Fabric for pipeline orchestration and data lakehouse management. - Advanced experience building and optimizing Power BI, Sigma and/or Tableau dashboards. - Deep understanding of telecom data domains, including (examples you can customize): - Network inventory and topology systems (e.g., NMS, OSS, GIS) - Fault, performance, and capacity management data - Provisioning, activation, and service assurance systems - Proven ability to analyze complex business problems, identify gaps in existing processes, and propose scalable data solutions. - Strong communication and documentation skills, with the ability to bridge business and technical perspectives. Preferred Skills: - Familiarity with CI/CD tools (e.g., GitHub, Azure DevOps). - Experience implementing data quality frameworks and metadata management practices. - Background in network engineering analytics, service performance monitoring, or customer experience analytics. - Knowledge of telecom industry KPIs and metrics (e.g., uptime, MTTR, capacity utilization, throughput, etc.). Estimated base salary range: $95,100 - $146,300 USD/annually. The base pay range shown is a guideline and reasonable estimate for this role. It takes into account the wide variety of factors that are considered in making compensation decisions. Actual compensation offered may vary from the posted range based upon geographic location, work experience, skill level, certifications, and other business and organizational needs. Non- sales roles may be eligible to participate in a discretionary annual incentive plan. Sales roles may be eligible to participate in a sales incentive plan. Additionally, this position may be eligible for certain benefits, such as health insurance, life insurance, disability retirement plans, paid time off. The posting will be active for a minimum of 3 days. The active posting will continue to extend by 3 days until the position is filled. Benefits, Rewards & Wellness - Excellent Health, Dental & Vision Insurance - Retirement 401(k) Savings Plan - Generous paid time off policy including paid parental leave Zayo provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, provincial or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training.

United States
$95.1K - $146K / year
NTT Group logo

Data Engineer - AWS

NTT Group

A global IT innovator founded in 1965, NTT DATA specializes in system integration and networking system services for more than a dozen industries. As an employe

Data Engineer57 days ago
Full TimeRemoteTeam 55,092Since 1988

Req ID: 366941 NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a Data Engineer - AWS to join our team in Remote, Karnātaka (IN-KA), India (IN). "Job Duties: Role Overview We are looking for a skilled Data Engineer to design, build, and maintain scalable, reliable data pipelines and platforms that support analytics, reporting, and operational decision-making. The role’s primary focus is enabling an end-to-end data ingestion and processing pipeline—extracting data preferably from Salesforce, landing it in Amazon S3, and transforming/loading it into Amazon Redshift for analytics-ready consumption. The engineer will also work on SQL modernization (including Oracle SQL development and conversion/optimization for Redshift), data quality, governance, monitoring, and performance tuning. ________________________________________ Primary Focus (Must-Have Outcome) Build and operate robust ETL/ELT pipelines for: • Salesforce → Amazon S3 → Amazon Redshift • Automated extraction, secure landing, transformation, load, and publishing for reporting/analytics • Strong data quality, reconciliation, monitoring, and scheduling built into the pipeline ________________________________________ Key Responsibilities Salesforce to AWS Data Pipelines (Core) • Build and maintain pipelines that extract data from Salesforce (API-based or connector-based), land data in Amazon S3, and load into Amazon Redshift • Implement incremental loads / CDC patterns where applicable; manage full loads and historical backfills as needed • Establish scheduling and orchestration for daily/near-real-time jobs with reliability and retry mechanisms SQL Engineering (Oracle + Redshift) • Design, develop, and optimize complex SQL in Oracle • Analyze and convert Oracle SQL to Redshift-compatible SQL, optimizing for Redshift performance and cost • Tune Redshift queries using best practices such as sort keys, distribution styles, and query patterns ETL/ELT, Data Modeling, and Warehousing • Design and maintain ETL/ELT jobs, transformations, and reusable frameworks • Build and optimize data models for warehousing/lakehouse patterns (facts/dimensions, curated layers) • Support both batch and (where applicable) near-real-time processing patterns Data Quality, Governance, and Compliance • Implement data quality checks (completeness, accuracy, consistency), reconciliation, and validation rules • Ensure data integrity, metadata documentation, lineage, and governance practices • Apply security and compliance standards (GDPR/regulatory needs where applicable) Operations, Monitoring, and Reliability • Monitor pipelines and infrastructure using AWS monitoring tools; troubleshoot performance and reliability issues • Improve pipeline resilience through alerting, logging, retries, and error handling • Contribute to modernization and cloud migration initiatives and automation (DataOps/CI-CD where relevant) Cross-Functional Collaboration • Partner with analytics/reporting and business stakeholders to gather requirements and deliver reliable datasets • Work effectively with cross-functional teams and provide clear documentation of pipelines and datasets ________________________________________ Technology Stack (Expected Exposure) Primary (Must-Have) • AWS: Amazon S3, Redshift, IAM, CloudWatch • Salesforce Integration: Salesforce APIs / connectors (extraction & ingestion patterns) • Programming & Querying: Python, SQL • Oracle: Complex SQL, stored procedures (as needed), performance tuning • Orchestration/Scheduling: AWS Glue, Lambda, Step Functions, cron-based scheduling (or equivalent) Data Engineering & Platform (Good-to-Have / Nice-to-Have) • ETL tools: Informatica, Talend, Azure Data Factory • Warehousing: Snowflake, Azure Synapse (plus Redshift as primary) • Big data: Spark, Hadoop • Streaming & APIs: Kafka, Event Hub, REST APIs • DevOps/DataOps: CI/CD for data pipelines, infrastructure-as-code exposure Minimum Skills Required: Required Skills & Experience • Strong hands-on experience building ETL/ELT pipelines in cloud environments • Proven experience integrating Salesforce data into a data platform (extraction, S3 landing, transformat" About NTT DATA NTT DATA is a $30 billion business and technology services leader, serving 75% of the Fortune Global 100. We are committed to accelerating client success and positively impacting society through responsible innovation. We are one of the world's leading AI and digital infrastructure providers, with unmatched capabilities in enterprise-scale AI, cloud, security, connectivity, data centers and application services. our consulting and Industry solutions help organizations and society move confidently and sustainably into the digital future. As a Global Top Employer, we have experts in more than 50 countries. We also offer clients access to a robust ecosystem of innovation centers as well as established and start-up partners. NTT DATA is a part of NTT Group, which invests over $3 billion each year in R&D. Whenever possible, we hire locally to NTT DATA offices or client sites. This ensures we can provide timely and effective support tailored to each client’s needs. While many positions offer remote or hybrid work options, these arrangements are subject to change based on client requirements. For employees near an NTT DATA office or client site, in-office attendance may be required for meetings or events, depending on business needs. At NTT DATA, we are committed to staying flexible and meeting the evolving needs of both our clients and employees. NTT DATA recruiters will never ask for payment or banking information and will only use @nttdata.com and @talent.nttdataservices.com email addresses. If you are requested to provide payment or disclose banking information, please submit a contact us form, https://us.nttdata.com/en/contact-us. NTT DATA endeavors to make https://us.nttdata.com accessible to any and all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact us at https://us.nttdata.com/en/contact-us. This contact information is for accommodation requests only and cannot be used to inquire about the status of applications. NTT DATA is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. For our EEO Policy Statement, please click here. If you'd like more information on your EEO rights under the law, please click here. For Pay Transparency information, please click here.

India
Job Closed
Khan Academy logo

Data Engineering & Analytics Lead

Khan Academy

Khan Academy delivers an online learning platform with a mission to provide free, world-class educational tools for people everywhere. Salman Khan founded the platform in 2005 as a

Data Engineer57 days ago

ABOUT KHAN ACADEMY Khan Academy is a fast-paced, nonprofit startup on a mission to provide a free, world-class education for anyone, anywhere. We already reach millions of students every month and are growing rapidly. We’re building a library of world-class instructional and practice resources that empowers learners. Whether they’re studying matrices, mitosis, or multivariable calculus, we want to offer students the resources to realize that they can learn anything. ABOUT KHAN ACADEMY INDIA Khan Academy India aims to deliver a world class user experience that is locally relevant to learners in India and is enabled by a strong on-the-ground team and operations. Our learning system is mastery based, which allows students to master key concepts at a pace that is right for them before moving on to more challenging content. From serving under 500,000 learners in 2016, we are now serving almost 4 million learners a month across our websites, apps and youtube channels. These learners include both independent learners accessing us at home and teacher directed learners in schools. Our focus is to reach the underserved by making our content accessible in local languages and by working with large public school systems. Khan Academy is available in Hinglish, Hindi, Gujarati, Assamese, Marathi, Punjabi and Kannada. ABOUT THE ROLE We are looking for a person who will help us with data management, transformation, analysis and visualization. The primary focus will be on choosing optimal tools to use for these purposes and then maintaining, implementing, and monitoring them. Optimally you have knowledge of at least one dynamic language and SQL, a good business mindset, and data analysis competencies In this role you will: - Data Integration: Automate ingestion of data from multiple sources (e.g., Google Sheets, APIs, internal systems) into BigQuery - Database Management: Design, build, and maintain scalable data pipelines and workflows in BigQuery; Ensure data accuracy, consistency, and reliability - Exploratory Analysis: Perform descriptive and diagnostic analyses to uncover trends, patterns, and insights. - Business Reporting: Build dashboards, reports, and visualizations (e.g., in Looker Studio, Tableau, or similar tools) to support decision-making. - KPI Definition & Tracking: Partner with stakeholders to define key business metrics and create systems for daily, weekly and monthly tracking as needed - Data Storytelling: Translate complex datasets into clear, actionable insights for both technical and non-technical audiences ABOUT YOU You are someone with: - A passion for education and a desire to change the world - A willingness to roll up your sleeves and help the team get work done as we are growing - 4+ years of hands on experience in data engineering and analytics field, ideally in an education setting - Knowledge of advanced statistical (i.e. multiple regression, hypothesis testing) and machine learning techniques (i.e. clustering, decision tree learning, etc.) for real-world applications - Strong SQL foundations & ability to manipulate data using R or Python - Prior experience with the end-to-end analytics chain is a nice to have (e.g. data modeling, BI tools, Bigquery) - Strong verbal/written communication & data presentation skills, including an ability to effectively communicate with both business and technical teams, experience with BI tools is a plus - Ability to work collaboratively with cross-functional teams (with the product, content, marketing, philanthropy, and analytics teams) of staff that span wide time zones (Delhi, India to California, USA) to research and improve our content and products - Being aware of good practices when collaborating in version control (Git) PERKS AND BENEFITS We may be a non-profit, but we reward our talented team like a for-profit. - Competitive salaries and Meritocracy-driven, candid culture - A fun, high-caliber team that trusts you and gives you the freedom to be brilliant - The ability to put your talents towards a deeply meaningful mission and the opportunity to work on high-impact products that are already defining the future of education - Remote work friendly, i.e. option to work from home; flexible schedules LEARN MORE - Sal’s TED talk from 2011 - Sal’s TED talk from 2015 - Sal’s TED talk from 2023 - A glimpse of our team: http://www.khanacademy.org/about/the-team - A glimpse of our content created: https://www.youtube.com/watch?v=ED8P8vchQJM - Our Hinglish content in action: http://bit.ly/khanacademyyoutube HOW TO APPLY - Attach your resume or LinkedIn URL in the space provided below. - Complete the pre-work assignment here and submit your assignments below. - Please submit a google drive link of your assignment - Make sure you have enabled view access for anyone with the link. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, or veteran status.

India