Job Closed
This listing is no longer active.
A Partner That Brings Enterprise Cloud Transformation Full Circle
Data Engineer
Location
Virginia
Posted
6 days ago
Salary
$111K - $204K / year
Seniority
Senior
Job Description
Data Engineer
AIS (Applied Information Sciences)
• Design, build, and maintain scalable batch and near-real-time data pipelines using cloud-native services • Develop and optimize data ingestion, transformation, and orchestration workflows across diverse data sources • Build and maintain ELT/ETL frameworks to support analytics, reporting, and data science use cases • Prepare, transform, and curate raw data into analytics-ready datasets for both technical and non-technical stakeholders • Develop, deploy, and operate data products within Azure-based analytics platforms (e.g., Databricks, Synapse, Fabric) • Implement data quality checks, monitoring, and observability to ensure data accuracy, reliability, and integrity • Apply data governance, security, and privacy controls aligned with enterprise and regulatory standards • Monitor data platform performance and proactively implement cost and performance optimizations • Partner with data scientists, analysts, and analytics engineers to ensure trusted and timely access to data • Design data solutions that are scalable, reusable, automated, and well-governed by default
Job Requirements
- Bachelor’s degree (or equivalent experience) in Computer Science, Information Systems, Engineering, Mathematics, Statistics, or a related field
- Experience working within a modern cloud data platform, with Microsoft Azure strongly preferred
- Hands-on experience with Apache Spark or other distributed data processing frameworks
- Strong SQL skills and experience with relational data modeling and query optimization
- Proficiency in Python, with experience building data pipelines or transformations (PySpark experience a plus)
- Experience with data orchestration and workflow tools (e.g., Azure Data Factory, Airflow, or similar)
- Solid understanding of data modeling, schema design, and analytical data structures
- Familiarity with data governance, security, and quality concepts in enterprise environments
- Strong problem-solving, communication, and collaboration skills
- Ability to work independently while contributing effectively within cross-functional teams
Benefits
- Competitive Salaries
- Qualified Overtime
- Paid Time Off (PTO)
- Flexible Holiday Leave (88 hours per year)
- Parental Leave
- Immediate Healthcare: Medical, Dental, Vision, and Life Insurance
- Employee Stock Ownership Plan (ESOP)
- 401(k) Retirement Plan (5% match on base compensation, immediate 100% vesting)
- Tuition Reimbursement & Learning Allowance
- Referral Bonus Program (up to $5k)
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Software Data Engineer, Data Platform
AuguryFounded in 2012, Augury is a computer software and technology company that connects smartphones with ultrasonic sensors and vibrations to detect machine malfunctions before they oc
Our mission is to transform how people and machines work together to push the boundaries of human productivity. A leader in Industrial AI, Augury helps the world’s manufacturers leverage real-time production insights to drive new levels of efficiency. Combining predictive and prescriptive AI technology with industry expertise, production teams can proactively address alerts, minimize downtime, reduce asset costs, and maximize yield and capacity. Our customers achieve payback in six months or less, enabling global scale. We're looking for team members excited to partner with the world's manufacturers and build the future of production together. You are a Software Data Engineer with deep experience building data-intensive systems, not a traditional ETL or BI-focused Data Engineer. In this role, you will design and build production-grade data services, platforms, and pipelines that power DIH and our AI-driven products. You will combine strong software engineering fundamentals with modern data engineering practices, with a focus on clean architecture, reliability, scalability, observability, and testing. As a Software Data Engineer, Data Platform, you will: - Build and evolve Python-based services and pipelines that ingest raw industrial events, store them reliably, and expose clean, well-modeled tables and APIs for downstream consumers, including Digital Twin, Smart Canvas, AI agents, and analytics. - Design systems that handle duplicates, invalid data, late-arriving events, and reprocessing in a principled, incremental, and reproducible manner. - Collaborate with platform, machine learning, and product teams across Israel and globally to transform complex data challenges into robust, observable, and scalable software solutions. - A Day in Your Life Production Data Systems & Pipelines - Design and implement end-to-end data flows, from raw event ingestion into durable storage to modeled datasets and aggregates that power products, Digital Twin capabilities, analytics, and AI agents. - Build idempotent pipelines that can safely re-run without corrupting data, using deterministic keys and clearly defined contracts between raw, curated, and modeled datasets. - Implement incremental aggregations (e.g., machine signal summaries, production metrics, and operational KPIs) that correctly account for late-arriving data, watermarking strategies, and reproducibility requirements. - Model relationships and context across machines, lines, factories, sensors, work orders, and operational events to support context-aware applications, knowledge graphs, and AI agents. - Partner with platform teams to define how datasets are stored within our lakehouse, Digital Twin, and context graph architectures and exposed through well-defined APIs and tools. Software Engineering & Data Quality - Write clean, maintainable Python services with clear separation of concerns across ingestion, validation, transformation, persistence, aggregation, and orchestration layers. - Apply strong data modeling and SQL fundamentals, including schema design, indexing strategies, event-time semantics, and scalable aggregation patterns. - Drive testing discipline across the data platform, including unit tests, data-quality tests, integration tests, and validation frameworks. - Design for observability through metrics, logging, tracing, and monitoring that simplify debugging, improve data quality visibility, and support production operations. - Troubleshoot and resolve production data issues, including incorrect aggregations, missing data, duplicate records, schema evolution challenges, and backfill operations. Streaming, Lakehouse & Scalability - Build and evolve systems that scale from local development environments to cloud-scale lakehouse architectures using technologies such as Databricks, Delta Lake, and Spark. - Design and implement data pipelines following modern lakehouse patterns, including Bronze, Silver, and Gold layers, partitioning strategies, and cost-efficient compute utilization. - Work with streaming and messaging platforms (Kafka, Pub/Sub, or similar) to build reliable, idempotent consumers, replay capabilities, and reprocessing workflows. - Contribute to multi-tenant data architectures, data contracts, and governance practices that enable secure and efficient access to customer data at scale. Collaboration & AI-Native Experiences - Work closely with DIH, Smart Canvas, and AI teams to define how agents interact with structured data, context graphs, APIs, and tools in deterministic and reliable ways. - Translate product requirements and user needs into technical designs that balance correctness, performance, latency, cost, and long-term maintainability. - Participate in architecture reviews, design discussions, code reviews, and collaborative development practices that raise the overall engineering bar across the organization. - Help shape the future of AI-native experiences by building the data foundations that power intelligent applications and agentic workflows. What You Bring - Bachelor's degree in Computer Science, Software Engineering, Data Engineering, Information Systems, or a related engineering discipline, or equivalent practical experience. - 5+ years of professional software engineering experience, including substantial experience building backend systems, distributed systems, or data-intensive applications in production environments. - Strong Python engineering skills, including modular architecture, dependency management, testing practices, observability, and production-grade code quality. - Strong SQL and data modeling expertise, including schema design, indexing strategies, event-driven data models, and scalable analytical aggregations. - Hands-on experience building incremental and idempotent data pipelines that handle duplicate, invalid, and late-arriving events without impacting downstream consumers. - Experience with at least one major cloud platform (Azure, GCP, or AWS) and modern lakehouse technologies such as Databricks, Delta Lake, Spark, or equivalent architectures. - Experience with streaming or messaging technologies such as Kafka, Pub/Sub, Event Hubs, or similar event-driven systems. - Proven ability to diagnose and resolve production data issues, including data quality problems, schema evolution, backfills, replay scenarios, and performance bottlenecks. - Strong written and verbal communication skills in English and experience collaborating effectively with globally distributed teams. Nice to Have - Experience building industrial, IoT, manufacturing, or operational data platforms. - Familiarity with Digital Twin architectures and industrial data models. - Experience with graph databases, context graphs, knowledge graphs, or relationship-centric data modeling. - Exposure to AI/LLM-powered applications, including retrieval-augmented generation (RAG), agents, tool calling, or evaluation frameworks. - Experience working with Databricks or similar lakehouse platforms from both application and platform perspectives. - Experience building data products that directly support AI agents, intelligent applications, or machine learning workflows. Perks - Stock options - Paid parental leave - Flex PTO Augury is a people-first organization. We believe in fostering an inclusive environment in which employees feel encouraged to share their unique perspectives, leverage their strengths, and act authentically. We know that diverse teams are strong teams, and we welcome those from all backgrounds and varying experiences. We are committed to providing employees with a work environment free of discrimination and harassment. We believe that diversity is more than just good intentions, and we are committed to creating an inclusive environment for all employees. Augury is a proud equal opportunity employer, we strive to create a work environment in which everyone, all applicants, employees, customers, guests, and vendors feel safe and comfortable. We commit to maintain a workplace that is free of any type of harassment and does not tolerate anyone intimidating, humiliating, or hurting others. We prohibit willful discrimination based on age, gender, ethnicity, race, color, religion, political opinions, sexual orientation, sexual identity or expression, military or veteran status, disability or any other characteristic protected by law.
AI & Data Engineer
EnrouteWe deliver IT services and solutions provided by a team of passionate problem solving individuals highly skilled.
Role Description We are seeking a data-driven Ai Engineer to join our team at a high-growth advertising technology company. This role focuses on scaling our reporting infrastructure for advertising performance and billing reconciliation, ensuring that financial and operational data is accurate, automated, and actionable. - Develop robust data pipelines, ensuring data quality and reliability. - Enable efficient data consumption across the organization. - Collaborate closely with cross-functional teams including Product, Engineering, Analytics, and Business stakeholders to deliver high-impact data platforms. The ideal candidate is a proactive problem-solver with strong technical expertise, capable of working with large datasets, modern data architectures, and cloud-based environments. You thrive in fast-paced settings, navigate ambiguity with confidence, and are passionate about turning data into actionable value. Qualifications - Databricks & AI Architecture (Must-Have) - Strong experience working with Databricks Lakehouse architecture. - Nice to have expertise in Databricks Mosaic AI and Unity Catalog for governing AI assets. - Hands-on experience building RAG (Retrieval-Augmented Generation) pipelines using Vector Search. - SQL & Data Modeling (Must-Have) - Advanced SQL development. - AI Engineering & Data Workflows (Must-Have) - Experience integrating LLM APIs (OpenAI, Anthropic, etc.) into data workflows. - Hands-on experience using AI for: - Data enrichment - Anomaly detection - Automated classification - Experience with LangChain, LlamaIndex, or similar frameworks. - Exposure to Model Context Protocol (MCP) or similar approaches to connect AI models with external tools and data sources. - Strong understanding of Tool Calling / Function Calling: enabling LLMs to interact with SQL databases and external APIs securely. - Experience in Prompt Engineering and Guardrailing: designing system prompts that maintain context and hierarchy (e.g., understanding team associations). - Platform & Engineering Practices (Nice-to-Have / Medium) - Experience with GitHub workflows. - Familiarity with CI/CD pipelines (Jenkins or similar). - Experience working with YAML/YML configuration files. Requirements - Architect AI Agents: Build and deploy agents that can perform NLP-based data generation, automated data enrichment, and complex data reasoning within Databricks. - Natural Language Interfaces: Develop "Chat with your Data" features, allowing stakeholders to query the data warehouse using natural language. - Integrate LLMs into data workflows for automation and intelligence. - Develop scalable data models to support analytics and AI use cases. - Implement AI-driven enhancements such as anomaly detection and data enrichment. - Collaborate with data, analytics, and engineering teams to improve data reliability. - Optimize performance and scalability of data and AI workflows. - Support automation through CI/CD practices. - Ensure data quality, traceability, and maintainability across pipelines. Benefits - Monetary compensation - Year-end Bonus - IMSS, AFORE, INFONAVIT - Major Medical Expenses Insurance - Life Insurance - Funeral Expenses Coverage - TDU Membership - MediAccess - Health Check-Up Subsidy - Preferential rates for car insurance - Vacations - Official Mexican Holidays - Life Happens Days - Bereavement Leave - Civil Marriage Leave - English Classes - Certifications - Educational Agreements (Talisis, U-ERRE, UNID, TecMilenio, Tec de Monterrey, UDEM, SPIS) - Corporate Agreements & Discounts (Sorteos Tec, Envia Flores, TopGolf) - Taquitos Rewards - Birthday Bonus - Work-from-home Bonus - Laptop Policy
• Design, develop, and maintain scalable data pipelines and ETL/ELT processes across AWS cloud environments. • Manage and optimize data platforms using services such as AWS Glue, Redshift, Athena, S3, Glue Data Catalog, Lake Formation, and IAM. • Implement and orchestrate workflows with Apache Airflow for large-scale data processing and analytics. • Build and optimize data warehouse solutions and business-facing analytical models using dbt, Spark, and SQL. • Collaborate with cross-functional teams to deliver secure, scalable, and highly available data solutions. • Support CI/CD pipelines and deployment automation for data platform workflows, especially with CodeBuild, Jenkins, and GitHub Actions. • Architect and maintain lakehouse storage layers on S3 using Apache Iceberg and/or Delta Lake table formats. • Manage AWS Glue Data Catalog, crawlers, and Lake Formation permissions for data governance and access control.
Senior Data Product Owner
ProArchConsulting and technology- enabled by cloud, guided by data, fueled by apps, and secured by design.
Role Description We are looking for a Senior Data Product Owner to lead the delivery and strategic development of data-driven products and platforms. This role will bridge the gap between business needs and technical implementation—owning the product lifecycle, translating business requirements into data solutions, and ensuring the integrity of data models and analytics pipelines, especially within the Azure ecosystem. - Define and prioritize the product backlog for data-driven initiatives, focusing on measurable business outcomes. - Act as a key liaison between business stakeholders, data engineers, data scientists, and analytics teams. - Lead data product development using Agile methodologies, driving sprint planning, backlog grooming, and stakeholder reviews. - Analyze complex business requirements and translate them into scalable data models and solutions. - Drive the design and validation of robust data models, ensuring high data quality and consistency. - Collaborate with Azure data platform teams to deliver secure, efficient, and scalable cloud-based data solutions. - Monitor and measure the performance of data products, ensuring alignment with KPIs and business goals. - Champion data governance, quality, and compliance practices across all stages of the product lifecycle. Qualifications - Bachelor’s or Master’s degree in Computer Science, Data Science, Information Systems, or a related field. - 6+ years of experience in Data/Analytics/BI roles, with 3+ years in a Product Owner or Business Analyst capacity. - Strong understanding of data modelling (relational, dimensional, and modern NoSQL/data lake concepts). - Hands-on experience with Azure Data Services (Azure Data Factory, Azure Synapse, Azure SQL, Data Lake, etc.). - Proven track record of gathering, documenting, and prioritizing business and technical requirements. - Proficiency in tools like Power BI, SQL, and Azure DevOps/JIRA. - Strong stakeholder management, analytical thinking, and problem-solving skills. - Agile/Scrum certification is a plus (e.g., CSPO, SAFe PO/PM). Company Description At ProArch, we partner with businesses around the world to turn big ideas into better outcomes through IT services that span cybersecurity, cloud, data, AI, and app development. We’re 400+ team members strong across 3 countries (we call ourselves ProArchians)—and here’s what connects us all: - A love for solving real business problems. - A belief in doing what’s right. What’s it like to work here? - You’ll keep growing. You’ll work alongside domain experts who love to share what they know. - You’ll be supported, heard, and trusted to make an impact. - You’ll take on projects that touch industries, communities, and lives. - You’ll have the time to focus on what matters most in your life outside of work. At ProArch, you’ll be part of teams that design and deliver technology solutions solving real business challenges for our clients. With services spanning AI, Data, Application Development, Cybersecurity, Cloud & Infrastructure, and Industry Solutions, your work may involve building intelligent applications, securing business-critical systems, or supporting cloud migrations and infrastructure modernization. Every role here contributes to shaping outcomes for global clients and driving meaningful impact. You’ll collaborate with experts across data, AI, engineering, cloud, cybersecurity, and infrastructure—solving complex problems with creativity, precision, and purpose. You’ll join a culture rooted in technology, curiosity, and continuous learning. A place where we move fast, trust you to make an impact, encourage innovation, and support your growth.




