Job Closed
This listing is no longer active.
A New Approach to Mission Critical
Senior Data Pipeline Engineer
Location
United States
Posted
179 days ago
Salary
$45K - $90K / year
Seniority
Senior
Job Description
Senior Data Pipeline Engineer
apiphani
• Design, develop, and maintain scalable batch and streaming data pipelines using Apache Spark and cloud-native services (for example AWS Glue, EMR, Kinesis, and Lambda). • Utilize and optimize Apache Spark (RDDs, DataFrames, Spark SQL) for distributed processing of large datasets, including both batch and near real‑time use cases. • Implement robust ETL/ELT processes to ingest and transform data from databases, APIs, files, and event streams into curated datasets stored in S3 data lakes, data warehouses (such as Amazon Redshift), and data marts. • Implement data quality checks, validation rules, and governance controls (including schema enforcement, profiling, and reconciliation) to ensure accuracy, completeness, and consistency. • Develop and maintain logical and physical data models, schemas, and metadata in catalogs to support analytics, BI, and ML consumption. • Create and manage data warehouses, data lakes, and data marts on AWS and other cloud platforms (such as Azure or GCP) following modern architectural patterns. • Collaborate with data analysts, data scientists, and business stakeholders to understand data requirements and translate them into scalable pipeline and modeling solutions. • Collaborate with DevOps, platform, security, and compliance teams to ensure secure, reliable cloud implementations and adherence to organizational standards. • Develop cloud and data architecture documentation, including diagrams, guidelines, and best practices, to enable knowledge sharing and reuse. • Troubleshoot and resolve data pipeline and job issues across development and production environments, ensuring minimal downtime and preserving data integrity. • Continuously optimize data pipelines for performance, cost, reliability, and data quality using best practices in distributed data engineering and cloud resource tuning. • Build algorithms and prototypes that combine and reconcile raw information from multiple sources, including resolving data conflicts and inconsistencies. • Provide technical leadership for the analytics data stack, including reviewing designs, establishing standards for observability and reliability, and guiding junior engineers in delivering high-quality solutions. • Define and manage data and cloud infrastructure using infrastructure‑as‑code tools such as Terraform (and/or AWS CDK/CloudFormation) to ensure consistent, repeatable environments across development, test, and production. • Participate actively in agile ceremonies (backlog refinement, sprint planning, daily stand‑ups, reviews), including estimating and updating user stories, tracking progress, and collaborating closely with data product and analytics stakeholders.
Job Requirements
- Bachelor’s degree in Computer Science, Engineering, Mathematics, or related field, or equivalent work experience.
- 6+ years of experience in data engineering or closely related roles, working with large, complex datasets.
- Demonstrated experience owning production-grade data pipelines end to end, from design and implementation through monitoring, incident response, and continuous improvement.
- Extensive hands-on experience with Apache Spark for large-scale data processing, including RDDs, DataFrames, and Spark SQL.
- Familiarity with big data ecosystem components such as HDFS, Hive, and HBase, and their cloud-native equivalents on AWS and other clouds.
- Experience with SQL and NoSQL databases such as MySQL, PostgreSQL, DynamoDB, or similar technologies.
- Strong proficiency in SQL and at least one programming language such as Python (preferred) for data processing, automation, and orchestration glue code.
- Experience with data pipeline orchestration and scheduling tools such as AWS Step Functions, Amazon Managed Workflows for Apache Airflow (MWAA), or Apache Airflow.
- Experience with cloud-based data platforms and services, ideally AWS (S3, Glue, EMR, Redshift, Kinesis, Lambda), with exposure to Azure or GCP as a plus.
- Experience designing and implementing data warehouses and data lakes, including partitioning, file formats, and performance optimization.
- Experience with data quality, automated data testing, and data governance methodologies and tools; familiarity with lineage, cataloging, and access controls.
- Strong analytical and problem-solving skills, high attention to detail, and clear written and verbal communication.
- Ability to work independently and collaboratively in a fast-paced, agile, and cross-functional environment.
- Experience working with a modern data catalog such as Alation, Collibra, or similar tools is a plus.
- Ability to prepare and curate data for prescriptive and predictive modeling (for example, features for ML models) is a plus.
- Hands‑on experience with infrastructure as code, preferably Terraform (and/or AWS CDK/CloudFormation), to provision and manage data and cloud resources.
- Practical experience working in an agile delivery model, including breaking down work into user stories, sizing and updating them during the sprint, and delivering incrementally.
Benefits
- Medical/dental/vision - 100% paid for employees, 50% paid for dependents
- Life and disability - 100% paid for employees
- 401K - 3% contribution, no employee contribution necessary
- Education and tuition reimbursement - up to $50K annually
- Employee Stock Options Plan
- Accident, critical illness, hospital indemnity benefits offered through our providers
- Employee Assistance Program
- Legal assistance
- Paid Time Off - up to 6 weeks per year
- Sick Leave - up to 2 weeks per year
- Parental Leave - up to 12 weeks
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
• Design, build, and optimize data pipelines and workflows in Azure and Databricks, including Data Lake and SQL Database integrations. • Implement scalable ETL/ELT frameworks using Azure Data Factory, Databricks, and Spark. • Optimize data structures and queries for performance, reliability, and cost efficiency. • Drive data quality and governance initiatives, including metadata management and validation frameworks. • Collaborate with cross-functional teams to define and implement data models aligned with business and analytical requirements. • Maintain clear documentation and enforce engineering best practices for reproducibility and maintainability. • Ensure adherence to security, compliance, and data privacy standards. • Mentor junior engineers and contribute to establishing engineering best practices. • Support CI/CD pipeline development for data workflows using GitLab or Azure DevOps. • Partner with data consumers to publish curated datasets into reporting tools such as Power BI.
Data Center Operations Technician
RYZ LabsRYZ Labs is a startup studio built in 2021 by three lifelong entrepreneurs. The founders of RYZ have worked at some of the world's largest tech companies and some of the most iconic consumer brands. They have lived and worked in Argentina for many years and have decades of experience in Latam. Passion for the early phases of company creation Attracting the brightest talents to build industry-defining companies in a post-pandemic world Remote and distributed teams throughout the US and Latam Use of cutting-edge technologies in cloud computing Aim to provide diverse product solutions for different industries Plans to build a large number of startups in the upcoming years Our Values and What to Expect Customer First Mentality - every decision we make should be made through the lens of the customer. Bias for Action - urgency is critical, expect that the timeline to get something done is accelerated. Ownership - step up if you see an opportunity to help, even if not your core responsibility. Humility and Respect - be willing to learn, be vulnerable, and treat everyone who interacts with RYZ with respect. Frugality - being frugal and cost-conscious helps us do more with less. Deliver Impact - get things done most efficiently. Raise our Standards - always be looking to improve our processes, our team, and our expectations. The status quo is not good enough and never should be.
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description RYZ Labs is hiring for a Data Center Operations Technician to provide daily technical support, ensure system health, perform routine maintenance, and respond to incidents/outages. Day/night shifts and on-call required. - Provide Tier 1/2 support for data center systems and services. - Monitor performance/alerts; triage and remediate issues. - Perform routine maintenance, patching, firmware updates, backups, and health checks. - Handle racking/stacking, cabling, and hardware replacements. - Respond to incidents/outages; escalate and document RCAs. - Maintain runbooks, SOPs, and accurate tickets. Qualifications - 2+ years in data center/IT operations or NOC. - Familiar with Linux/Windows, virtualization (VMware/KVM), and basic networking. - Experience with monitoring and ticketing tools; ITIL a plus. - Strong troubleshooting and communication skills; able to work shifts/on-call and handle equipment. Company Description RYZ Labs is a startup studio built in 2021 by two lifelong entrepreneurs. The founders of RYZ have worked at some of the world's largest tech companies and some of the most iconic consumer brands. They have lived and worked in Argentina for many years and have decades of experience in Latam. What brought them together is the passion for the early phases of company creation and the idea of attracting the brightest talents in order to build industry-defining companies in a post-pandemic world. - Our teams are remote and distributed throughout the US and Latam. - They use the latest cutting-edge technologies in cloud computing to create applications that are scalable and resilient. - We aim to provide diverse product solutions for different industries, planning to build a large number of startups in the upcoming years. - At RYZ, you will find yourself working with autonomy and efficiency, owning every step of your development. - We provide an environment of opportunities, learning, growth, expansion, and challenging projects. - You will deepen your experience while sharing and learning from a team of great professionals and specialists. - Customer First Mentality - every decision we make should be made through the lens of the customer. - Bias for Action - urgency is critical, expect that the timeline to get something done is accelerated. - Ownership - step up if you see an opportunity to help, even if not your core responsibility. - Humility and Respect - be willing to learn, be vulnerable, and treat everyone who interacts with RYZ with respect. - Frugality - being frugal and cost-conscious helps us do more with less. - Deliver Impact - get things done in the most efficient way. - Raise our Standards - always be looking to improve our processes, our team, and our expectations. The status quo is not good enough and never should be.
Senior Principal Data Architect
Lamb WestonSeeing possibilities in potatoes and making great fries loved the world over. Join our team of potato experts!
• Design and maintain enterprise data architecture standards leveraging Snowflake. • Collaborate with SAP data teams to ensure seamless data ingestion from SAP and non-SAP sources. • Develop scalable data modeling frameworks, including medallion architectures for enterprise data products. • Establish data performance, cost optimization, and security standards across the Snowflake environment. • Guide integration with Power BI and other analytics platforms for efficient data delivery. • Mentor engineering teams on Snowflake data patterns, performance tuning, and governance best practices.
Senior Staff Data Engineer
Scratch FinancialScratch Financial is the world's simplest patient financing solution.
• Designs, builds, and oversees the deployment and operation of technology architecture, solutions and software to capture, manage, store, and utilize structured and unstructured data • Contributor to the overall Data Product roadmap by working closely with our business partners to understand their challenges and develop analytical tools to help drive business decisions • Develops technical tools and programming that leverage artificial intelligence, machine learning, and big-data techniques to cleanse, organize and transform data • Leverage prototyping methodologies to propose and design creative business solutions that exploit our broad toolset of technologies • Creates and establishes design standards and assurance processes for software, systems, and applications development • Reviews internal and external business and product requirements for data operations and activity and suggests changes and upgrades to systems and storage • Design, develop, and maintain CI/CD pipelines using GitHub Actions to automate deployment, testing, and monitoring of applications. • Implement and manage serverless solutions • Implement infrastructure as code (IaC) practices • Work with development teams to set up automated testing frameworks • Understands the basics of relational data modeling



