Founded in 1969, ICF is a global advisory and technology services company headquartered in Reston, Virginia. It delivers data-driven solutions across energy, en

Data Engineer - Data Warehouse Architect

Data EngineerData EngineerFull Time Remote Mid Level Company Site

Location

United States

Posted

59 days ago

Salary

$74.1K - $125K / year

Seniority

Mid Level

ETL SQL Python Apache Spark Performance Optimization Unix Linux Shell Oracle Database Java J2EE REST API Microservices Apache Kafka OAuth Spring GCP Git CI/CD Docker/Containers Databricks AI

Job Description

This position focuses on developing, implementing, and maintaining architecture solutions across a large enterprise data warehouse to support effective and efficient data management and enterprise-wide business intelligence analytics. Responsibilities: - Implement and optimize data pipeline architectures for sourcing, ingestion, transformation, and extraction processes, ensuring data integrity and compliance with organizational standards. - Develop and maintain scalable database schemas, data models, and data warehouse structures; perform data mapping, schema evolution, and integration between source systems, staging areas, and data marts. - Automate data extraction workflows and create comprehensive technical documentation for ETL/ELT procedures; collaborate with cross-functional teams to translate business requirements into technical specifications. - Establish and enforce data governance standards, including data quality metrics, validation rules, and best practices for data warehouse design and architecture. - Develop, test, and deploy ETL/ELT scripts using SQL, Python, Spark, or other relevant languages; optimize code for performance and scalability. - Tune data warehouse systems for query performance and batch processing efficiency; apply indexing, partitioning, and caching strategies. - Perform advanced data analysis, validation, and profiling using SQL and scripting languages; develop data models, dashboards, and reports in collaboration with stakeholders. - Conduct testing and validation of ETL workflows to ensure data loads meet SLAs and quality standards; document testing protocols and remediation steps. - Troubleshoot production issues, perform root cause analysis, and implement corrective actions; validate data accuracy and consistency across systems. Basic Qualifications: - Minimum of 3 years of experience in data analysis. Additional Qualifications: - Strong analytical and problem-solving skills with attention to detail. - Proficiency in SQL and ability to develop complex queries (e.g., multi-join), tune performance, and troubleshoot. - Experience with Unix/Linux shell scripting for ETL automation. - Familiarity with database tools and platforms (e.g., Teradata, Oracle, Non-Relational). - Excellent verbal and written communication skills; ability to collaborate across all levels. - Ability to prioritize and multi-task in a fast-paced environment. - Knowledge of Java/J2EE, REST APIs, Web Services, and event-driven microservices. - Experience with Kafka streaming, schema registry, OAuth authentication. - Familiarity with Spring Framework, GCP services, Git, CI/CD pipelines, containerization, and data ingestion/data modeling. Preferred Qualifications: - Experience with Databricks concepts and terminology (e.g., workspace, catalog). - Proficiency in Python and Spark. - Background in architecting real-time data ingestion solutions using microservices and Kafka. Working at ICF ICF is a global advisory and technology services provider, but we’re not your typical consultants. We combine unmatched expertise with cutting-edge technology to help clients solve their most complex challenges, navigate change, and shape the future. We can only solve the world's toughest challenges by building a workplace that allows everyone to thrive. We are an equal opportunity employer. Together, our employees are empowered to share their expertise and collaborate with others to achieve personal and professional goals. For more information, please read our EEO policy. We will consider for employment qualified applicants with arrest and conviction records. Reasonable Accommodations are available, including, but not limited to, for disabled veterans, individuals with disabilities, and individuals with sincerely held religious beliefs, in all phases of the application and employment process. To request an accommodation, please email Candidateaccommodation@icf.com and we will be happy to assist. All information you provide will be kept confidential and will be used only to the extent required to provide needed reasonable accommodations.  Read more about workplace discrimination rights or our benefit offerings which are included in the Transparency in (Benefits) Coverage Act. Candidate AI Usage Policy At ICF, we are committed to ensuring a fair interview process for all candidates based on their own skills and knowledge. As part of this commitment, the use of artificial intelligence (AI) tools to generate or assist with responses during interviews (whether in-person or virtual) is not permitted. This policy is in place to maintain the integrity and authenticity of the interview process.  However, we understand that some candidates may require accommodation that involves the use of AI. If such an accommodation is needed, candidates are instructed to contact us in advance at candidateaccommodation@icf.com. We are dedicated to providing the necessary support to ensure that all candidates have an equal opportunity to succeed.   Pay Range - There are multiple factors that are considered in determining final pay for a position, including, but not limited to, relevant work experience, skills, certifications and competencies that align to the specified role, geographic location, education and certifications as well as contract provisions regarding labor categories that are specific to the position. The pay range for this position based on full-time employment is: $74,090.00 - $125,954.00 Nationwide Remote Office (US99)

Related Categories

Data Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More Data Engineer Jobs

Data Engineer

LeoLabs

Persistent Orbital Intelligence 📡 🛰️

Data Engineer59 days ago

Full Time RemoteTeam 51-200Since 2016H1B Sponsor

Company Site LinkedIn

• Play a key role in building and operating data pipelines and analytics infrastructure • Work closely with software engineers, radar and catalog teams, and data scientists • Ensure reliable extraction, transformation, and loading (ETL) of mission-critical datasets • Develop scalable batch and streaming data workflows • Enable advanced analytics and support machine learning initiatives • Help transform large volumes of sensor and orbital data into actionable intelligence • Engage in hands-on development with opportunities to grow into increased ownership of data platform design and optimization

Airflow Apache AWS Cloud ETL Kafka Python Spark SQL

View details: Data Engineer

United States

Apply

Job Closed

Lead Metadata Specialist

McGraw Hill LLC.

The work you do at McGraw Hill will be work that matters. We are collectively designing content that will build the future of education. Play your part and experience a sense of fulfilment that will inspire you to even greater heights.

Data Engineer59 days ago

Full Time RemoteTeam 1,001-5,000

Role Description We are seeking a Lead Metadata Specialist to guide metadata strategy and implementation across McGraw Hill’s Higher Education business unit. This role leads projects that advance the design, management, and use of educational metadata—especially around competencies, objectives, and assessment and other types of content—to improve discoverability, personalization, and analytics across products and platforms. The Lead Metadata Specialist will collaborate with curriculum, product, and technical teams to ensure alignment across metadata workflows and business goals. This role also provides mentorship and leadership for metadata specialists and coordinates with the Enterprise Metadata team to support enterprise-level initiatives. This is a remote position open to applicants authorized to work for any employer within the United States. What You'll Do - Lead the design and execution of metadata projects that enhance the creation, delivery, and management of higher education content while enabling robust personalized learning services. - Partner with learning and data scientists to explore opportunities for metadata inference and enrichment to support adaptive learning and personalization. - Develop and maintain learning ontologies and controlled vocabularies for higher education academic disciplines, content, and learning services. - Collaborate with curriculum, product, and technical teams to define metadata strategies that align with business unit goals and emerging learning technology trends. - Provide technical leadership in implementing scalable metadata workflows and quality assurance processes using standard tools and platforms. - Lead and collaborate with Metadata Specialists, fostering consistency, capacity, and growth across metadata initiatives. - Collaborate cross-functionally to develop documentation, training materials, and communications that promote understanding and effective use of metadata across design and product teams. - Monitor external developments in metadata standards and best practices for higher education and integrate relevant frameworks to ensure alignment and innovation. Qualifications - 6+ years in education or educational technology, with at least 3 years direct experience managing educational metadata, learning objective frameworks, and/or competency structures in Higher Education. - Master’s degree in education, learning sciences, information science, or a related field (required or equivalent experience). - Advanced understanding of metadata standards and interoperability frameworks used in Higher Education. - Experience with e-book, assessment, and other educational content design for Higher Education. - Highly organized, self-motivated, able to manage multiple complex projects simultaneously. - Strong leadership and mentoring capabilities. - Growth mindset and openness to change, with a positive attitude and interest in improving over time. - Excellent communication skills, able to bridge technical and non-technical audiences. Preferred - Experience working with AI tools and awareness of their potential for educational content and interactions. - Experience designing and implementing educational recommendation systems. - Proficiency with collaboration and project tools such as JIRA, Confluence, Teams, and Slack. Benefits - The pay range for this position is between $125,000 - $155,000 annually. - Base pay offered may vary depending on job-related knowledge, skills, experience, and location. - An annual bonus plan may be provided as part of the compensation package. - A full range of medical and/or other benefits, depending on the position offered.

AI JIRA Confluence Slack

View details: Lead Metadata Specialist

United States

$125K - $155K / year

Apply

data entry

Marion Counseling Services

Data Engineer59 days ago

Full Time Remote

Role Description Join Marion Counseling Services as a vital member of our team. This position offers an exciting opportunity to support our operations by accurately inputting and managing data essential for our counselling services. - Enter and maintain data in various systems with a high degree of accuracy. - Assist in the preparation of reports and documentation as required. - Ensure confidentiality and security of sensitive information. - Collaborate with team members to improve data management processes. - Respond to inquiries regarding data entries and assist in troubleshooting issues. Qualifications - Proven experience in data entry or a related field. - Strong attention to detail and accuracy. - Proficiency in using data entry software and Microsoft Office Suite. - Excellent organisational skills and ability to manage multiple tasks. - Effective communication skills, both written and verbal. Requirements - Experience in the healthcare or counselling sector. - Familiarity with data management systems. - Ability to work independently and as part of a team.

Microsoft Office

View details: data entry

United States

£40K - £50K / year

Apply

Data Engineer

Auerbach Grayson

Data Engineer59 days ago

Full Time Remote

Role Description We're looking for a Data Engineer with a strong foundation in data pipelines and a meaningful edge in AI-native data infrastructure, specifically RAG pipelines, vector search, embedding workflows, and semantic retrieval systems. You'll work on two interconnected problem sets: - Consolidating eight legacy systems into a unified, reliable data platform: ETL pipelines, a data warehouse, and cross-system client identity resolution. - Transforming three decades of institutional research into an intelligent, searchable, interactable knowledge layer that clients can query in ways that weren't possible two years ago. This is a small, senior team. You'll work directly with the CTO, have real architectural ownership, and build systems that are in production. Qualifications - Strong foundation in data pipelines. - Experience with AI-native data infrastructure. - Familiarity with RAG pipelines, vector search, embedding workflows, and semantic retrieval systems. Requirements - Lead the data engineering work for our research portal migration — extracting, transforming, and loading data from legacy systems into modern cloud infrastructure. - Build and maintain ETL/ELT pipelines across multiple integration points: CRM, research distribution platforms, trading systems, and third-party APIs. - Design and implement our “Golden Record” initiative — cross-system client identity resolution across eight legacy databases with no unified identifiers. - Implement event-driven data flows using AWS EventBridge as the central routing layer, treating each source system as a swappable adapter. - Design and build production-grade RAG (Retrieval-Augmented Generation) pipelines over AGCO's research archive — ingestion, chunking strategy, embedding generation, vector storage, and retrieval. - Implement hybrid search approaches that combine semantic (vector) search with keyword and metadata filtering, appropriate for structured financial research queries. - Build and maintain embedding pipelines that keep the vector store current as new research is published, with full observability and freshness guarantees. - Evaluate and implement emerging retrieval strategies as the space evolves: Re-ranking with cross-encoders; Hypothetical Document Embeddings (HyDE); Query expansion and decomposition; Graph-based retrieval (e.g., GraphRAG) for analyst relationship mapping; Structured metadata retrieval for faceted financial queries; Wire retrieval layers into LLM interfaces for research summarization, analyst Q&A, and recommendation-change tracking across the archive. - Apply DataOps practices across all pipelines: version control, CI/CD, environment parity across dev/staging/production, and infrastructure as code. - Monitor pipeline health, embedding freshness, retrieval quality, and LLM call latency — build alerting that catches problems before users do. - Work within our AWS environment (App Runner, EventBridge, CDK) and contribute to IaC best practices. - Partner with the CTO, product team, and application developers to translate business requirements into sound data and retrieval architecture decisions. - Document data flows, schema designs, chunking strategies, and retrieval logic so systems are maintainable and not a black box. - Contribute to evaluation frameworks for retrieval quality — precision, recall, answer faithfulness — so we know when the system is actually working. Company Description

AI ETL Data Engineering CRM AWS Observability/Monitoring LLM CI/CD Infrastructure as Code

View details: Data Engineer

United States

Apply

Job Closed