Senior Data Engineer (AI/ML and AWS Cloud)
Location
United States
Posted
59 days ago
Salary
$140K - $175K / year
Seniority
Senior
Job Description
Senior Data Engineer (AI/ML and AWS Cloud)
Pantheon Data
Company Overview Pantheon Data (a Kenific Holding company) is a private, small business based in the Washington, DC, area. Pantheon Data was founded in 2011, initially providing acquisition and supply chain management services to the US Coast Guard. Our service offerings have grown in the past ten years, including infrastructure resiliency, contact center operations, information technology, software engineering, program management, strategic communications, engineering, and cybersecurity. We have also grown our customer base to include commercial clients. The company has used this experience to expand our service offerings to other agencies within the Department of Homeland Security (DHS), the Department of Defense (DoD), and other Federal Civilian Agencies. Position Overview We are seeking a Senior Data Engineer to design, build, and optimize the data foundations for our next-generation Generative AI applications. This role is focused on architecting the Data Enrichment and Vectorization pipelines that power Large Language Models (LLMs). You will be responsible for the end-to-end lifecycle of data, from ingestion in AWS to serving high-context, enriched datasets to AWS Bedrock. Responsibilities - LLM Data Pipelines: Design and implement scalable data ingestion and transformation pipelines specifically for RAG (Retrieval-Augmented Generation) architectures. - AWS Bedrock Integration: Operationalize LLM workflows using AWS Bedrock, managing model invocations, and embedding generation. - Data Enrichment & Quality: Develop advanced Python-based processing jobs to clean and enrich unstructured data with metadata to improve LLM retrieval accuracy. - Vector Database Management: Architect and maintain vector stores (e.g., OpenSearch Serverless or Postgressql pgvector) to support efficient semantic search. - Cloud Architecture: Leverage core AWS services (S3, Glue, Lambda, Step Functions) to build resilient, automated data workflows. - DevSecOps Collaboration: Work with the security team to ensure all data handling meets stringent compliance standards (e.g., FedRAMP/DISA STIGs) through Infrastructure as Code. Required Skills and Experience - Python Mastery: Expert-level Python programming with experience in libraries such as Pandas and LLM orchestration frameworks like LangChain or LlamaIndex. - AWS AI/ML Ecosystem: Hands-on experience with AWS Bedrock and Amazon SageMaker. - Data Engineering Foundations: Proven track record with AWS Glue (ETL), Athena, and Redshift. - Certifications: Must hold a recognized Data Science Certification (e.g., AWS Certified Data Engineer, Databricks Certified Data Scientist). - Database Expertise: Proficiency in both SQL and NoSQL, with specific experience in Vector Databases. - Ability to work effectively remotely in cross-functional teams. - Ability to meet deadlines and produce quality work. - Proficient in Microsoft Suite software including Outlook, Word, Excel, SharePoint, and PowerPoint. Preferred Skills and Experience - Bachelor's Degree Clearance Requirements U.S. Citizenship with the ability to obtain and maintain a DoD Secret clearance. Work Location: United States - Remote - Our company prioritizes the benefits of flexibility and collaboration, whether that happens in person or remotely. - If the position is remote or hybrid, you may periodically work from a Pantheon Data office location or client site. - If this position is assigned to a Pantheon Data office location or client site, you'll work with colleagues and clients in person, as needed for specific client requirements. Compensation The salary range for this position is $140,000 - $175,000. This is not, however, a guarantee of compensation or salary. Rather, salary will be set based on experience, geographic location and possibly contractual requirements and could fall outside of this range. Benefits Overview We are always looking for good people! Pantheon Data is committed to providing its employees with competitive salaries and benefits in order to increase employee satisfaction and productivity. In addition to our benefits, we also offer SmartBenefits through the Washington Metro Area Transportation Authority, where you specify an amount of your pre-tax wages be paid directly to your SmarTrip account. In some cases, tuition assistance may be available for continuing education expenses and certifications related to their position. Additional details may be found at https://pantheon-data.com/careers/ Pantheon Data Important Information All qualified applicants will be considered for employment without regard to disability, status as a protected veteran, or any other status protected by applicable federal, state, local, or international law. As part of the application process, you are expected to be on camera during interviews and assessments. We reserve the right to take your picture to verify your identity and prevent fraud. If you require reasonable accommodation in completing this application, interviewing, completing any pre-employment testing, or otherwise participating in the employee selection process, please direct your inquiries to our Talent Team at Recruiting@pantheon-data.com or by phone (571) 363-4020. This company uses E-Verify to confirm each employee's work authorization. For more information, click here E-Verify Participation Poster
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
• Construir e manter pipelines de ingestão e processamento de dados a partir de múltiplas fontes. • Desenvolver processos de extração, transformação e carga (ETL/ELT) utilizando Python, Apache Airflow e Azure Data Factory. • Estruturar e gerenciar dados em Data Lake, garantindo organização e governança das informações. • Atuar no desenvolvimento de transformações e processamento de dados utilizando Spark / SparkSQL em Databricks. • Suportar, monitorar e evoluir a plataforma de dados do cliente, garantindo estabilidade e performance. • Trabalhar com diferentes bancos de dados e tecnologias de armazenamento, como SQL Server, Oracle, MongoDB e Hadoop. • Colaborar com times multidisciplinares em um ambiente dinâmico e orientado a dados.
Senior Data Engineer
CRB GroupFounded in 1984, CRB is a privately held global engineering, architecture, construction, and consulting firm that delivers sustainable and integrated solutions
Company Description CRB's over 1,100 expert professionals drive innovative, life-changing and life-saving solutions for manufacturers in the life sciences and food and beverage industries. Our mission, vision, and core values put client satisfaction and employee experience at the center of everything we do. As an AEC Firm we proudly specialize in industries that inherently carry important social responsibility - we recognize our impact and influence in the communities we serve and pursue corporate responsibility through the lens of people, community, and planet. From oncology and rare disorders to COVID-19 or alternative proteins, our design and construction projects are pioneering solutions addressing important issues such as food scarcity and global health. Job Description CRB is looking for an energetic, self-motivated, pro-active, organized, and well-rounded individual that has a good understanding of Microsoft’s data, reporting, and business intelligence tools to assist in delivering on our internal data and business intelligence initiatives. The successful candidate must possess the skills required to successfully execute project tasks, have a strong work ethic, and be a dynamic team player. This person will be expected to plan and execute multiple, simultaneous, large or long duration projects. In addition to project work, this individual will work with peers to assist in the identification, planning, and delivery of strategic infrastructure and analytical capabilities that will provide actionable information to the business. - Elicit data related business requirements from stakeholders and develop or approve associated technical specifications. - Design, implement, and maintain ETL processes to facilitate warehousing and systems integration needs. - Develop and enhance data models to deliver value to the organization. - Maintain, monitor, optimize, and troubleshoot/resolve issues with existing processes. - Contribute to the development of standard methods and best practices for the transformation and preparation of data for reporting, analytics, and self-service activities. - Contribute to reporting and visualization projects as needed. - Apply data team best practices regarding documentation, access, and security. - Participate in discipline related internal and external project communication and coordination. - Implement data quality checks and validation processes. - Incorporate data classification into ETL processes and data masking techniques to ensure model security. - Elicit and participate in code reviews with other members of the team. - Ensure the accuracy, completeness, and reliability of data used for analysis. - Design and implement robust data architectures to support scalable and maintainable data pipelines. - Research and adopt new technologies and methodologies to keep the data infrastructure modern and efficient. - Develop strategies for integrating data from disparate sources into a cohesive and comprehensive data ecosystem. - Automate data processes to improve efficiency and reduce manual intervention. - Ensure compliance of data protection regulations and ensure confidentiality of sensitive information. - Mentor junior data engineers and contribute to their professional growth within the organization. - Travel as required for project development and execution. Qualifications - Bachelor’s Degree or equivalent experience. - 7+ years of experience working with Microsoft SQL databases and warehousing tools, including SQL Server, Azure SQL, and Synapse SQL. - 7+ years of experience developing ETL processes using Microsoft technologies, such as SSIS, Azure Data Factory, Azure Synapse Analytics, and Microsoft Fabric. - 7+ years’ experience developing Microsoft Analysis Services Tabular, Power BI, or Fabric semantic data models. - Experience leveraging medallion architecture design patterns. - Proficiency using Azure Gen2 Storage, Synapse, Fabric, Spark Notebooks, REST API’s, Databricks and related technologies. - Expert proficiency in Data Warehousing methodologies and concepts to store and model the data (Kimball). - Expert proficiency authoring SQL queries, tables, views, stored procedures, and functions. - Experience using software development management tools such as Azure DevOps. - Experience using Python and PowerShell required. - Experience with provisioning data related resources and components in Azure preferred. - Experience with Redgate Monitoring or Azure Log Analytics is a plus. - Familiarity with D365 ERP modules (e.g., Finance and Operations, Customer Engagement) preferred. - Understanding of D365 data structures, entities, and integration points preferred. - Experience with R language preferred. - Extreme attention to detail and accuracy a must. - Excellent verbal/written communication skills required. - Comfortable working and collaborating within a fully remote team. - Microsoft DP-700 Certification is a plus. Additional Information All your information will be kept confidential according to EEO guidelines. CRB is committed to hiring and retaining a diverse workforce. We are proud to be an Equal Opportunity Employer and it is our policy to provide equal opportunity to all people without regard to race, color, religion, national origin, ancestry, marital status, veteran status, age, disability, pregnancy, genetic information, citizenship status, sex, sexual orientation, gender identity or any other legally protected category. Employment is contingent on background screening. CRB does not accept unsolicited resumes from search firms or agencies. Any resume submitted to any employee of CRB without a prior written search agreement will be considered unsolicited and the property of CRB. Please, no phone calls or emails. CRB offers a complete and competitive benefit package designed to meet individual and family needs. If you are unable to complete this application due to a disability, contact this employer to ask for an accommodation or an alternative application process. - Compensation: USD 100000 - USD 140000 - yearly
Senior Data Engineer
CRB GroupFounded in 1984, CRB is a privately held global engineering, architecture, construction, and consulting firm that delivers sustainable and integrated solutions
• Plan and execute multiple, simultaneous, large or long duration projects • Elicit data related business requirements from stakeholders and develop or approve associated technical specifications • Design, implement, and maintain ETL processes to facilitate warehousing and systems integration needs • Develop and enhance data models to deliver value to the organization • Maintain, monitor, optimize, and troubleshoot/resolve issues with existing processes • Contribute to the development of standard methods and best practices for the transformation and preparation of data • Participate in discipline related internal and external project communication and coordination • Implement data quality checks and validation processes • Research and adopt new technologies and methodologies • Mentor junior data engineers and contribute to their professional growth
• Design, build, and maintain scalable data pipelines and infrastructure using our core data stack • Implement data quality checks, monitoring, and contribute to data governance practices • Develop and optimise data models using dbt for analytics and operational use cases • Troubleshoot data issues, potentially assisting with tracking implementation problems • Ensure data infrastructure supports downstream consumers, including analytics and potentially future AI/ML initiatives • Collaborate with analysts and stakeholders to ensure data accessibility for visualisation and operational syncing (Hightouch) • Optimise performance and cost-efficiency of the data platform • Mentor team members and promote data engineering best practices.

