G2i serves enterprises with remote staff augmentation for developer teams. The company provides talented web and mobile developers to help companies grow and re
Data Engineer
Location
Europe
Posted
28 days ago
Salary
$62 / hour
Seniority
Senior
Job Description
Data Engineer
G2i
• Own the data stack end-to-end: ingestion → transformation → modeling → serving → monitoring • Build and maintain ETL/ELT pipelines from APIs, webhooks, and operational systems • Design resilient data models that handle evolving and imperfect source systems • Implement monitoring and alerting for data quality, freshness, and pipeline failures • Ensure high reliability and observability across the data layer • Lead improvements, migrations, and infrastructure decisions • Collaborate with engineering leadership on architecture, including modern data access patterns
Job Requirements
- Strong experience building and operating production-grade data systems
- Solid fundamentals in data modeling and pipeline design
- Ability to work autonomously and take ownership of a domain
- High attention to detail and commitment to data accuracy
- Strong communication skills in async, distributed environments
- Comfortable working in fast-moving, ambiguous environments
- Nice to Have: Experience building internal tools or reusable data systems
- Familiarity with modern/experimental data patterns (e.g., real-time or agent-driven data workflows)
- Interest in working close to product and infrastructure layers
Benefits
- High ownership and autonomy over the data domain
- Opportunity to build and scale systems from early stages
- Exposure to modern data and AI-driven workflows
- Remote-first environment with a small, high-performing team
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
• Develop production-grade ETL workflows using Python and Microsoft-based frameworks to ingest, transform, and validate large-scale structured and unstructured data. • Implement schema enforcement, data validation, and quality checks to maintain integrity across diverse sources. • Optimize pipelines for performance, scalability, and fault tolerance using open-source and cloud-native patterns. • Manage Azure-based data solutions, including Data Lake Storage, Azure SQL, and cloud storage access from Python services. • Deploy workflow orchestration using Azure Data Factory or Foundry for scheduling, monitoring, and automation. • Ensure secure integration of APIs and services within the Microsoft ecosystem for seamless data exchange. • Build Python-based data services leveraging libraries such as Pandas, Pytorch, and other open-source frameworks for high-performance processing. • Implement logging, monitoring, and performance tuning for robust operational reliability. • Develop API endpoints and microservices to enable interoperability with analytics and ML platforms. • Work closely with data scientists, analysts, and cloud architects to deliver clean, reliable data for predictive modeling and real-time dashboards. • Apply data governance best practices, ensuring compliance, reproducibility, and auditability across workflows. • Contribute to Agile team processes, driving iterative improvements and shared problem-solving. • Work in Agile teams; drive iterative delivery, joint problem-solving, and continuous improvement. • Engage closely with project managers, technical leads, client representatives, and cross-functional teams to provide timely updates, resolve issues, and ensure alignment with business goals. • Translate technical specifications into code and design documents.
• Architect data infrastructure roadmap centered on a Lakehouse architecture • Establish data governance, cataloging, and lineage frameworks • Partner with OT and Engineering teams to operationalize IIoT and supply chain data • Oversee transition from legacy BI tools to modern analytics platforms • Lead the transition to MLOps and DataOps methodologies • Collaborate with business unit leaders to identify data products for growth • Build and mentor a high-performing team of data engineers, ML engineers, and data architects
Senior Data Specialist
Alex Staff AgencyWe help the best professionals and companies find each other despite borders.
Role Description We need someone who understands data deeply and uses Python to wrangle it — not a platform engineer, not a pure pipeline builder, but a data specialist who's comfortable with research, investigation, and the unglamorous work of making messy energy market data actually usable. You'll spend significant time on tasks like: - Mapping BM units to power plants and fuel types - Reconciling legacy data formats with current ones - Ensuring consistency between different Elexon message types - Cleaning time-series data (outliers, gaps, overlaps) Some of this requires genuine investigation — cross-referencing sources, making judgment calls, documenting edge cases. There's no API that solves these problems for you. Python is your primary tool (Pandas, Numpy, standard libraries) to minimise manual effort, but you should be comfortable that some detective work is unavoidable. If you find satisfaction in truly understanding a dataset's structure and quirks — rather than just piping data through and hoping for the best — this role is for you. Qualifications - Strong Python skills for data work — fluent with pandas, comfortable writing clean, testable code, and can build reusable data processing logic. - Solid SQL skills — complex queries, window functions, CTEs in PostgreSQL. - Experience with messy, real-world data — reconciliation, cleaning, or mapping work. - Methodical and detail-oriented — notice inconsistencies and want to understand root causes. - Good documentation habits — understanding that undocumented mappings and assumptions are technical debt. - Self-directed — able to own ambiguous problems, do research, and communicate findings clearly. Requirements - Map BM units from Elexon to their corresponding power plants, substations, and fuel types. - Map substations to ETYS zones and grid supply points. - Build and maintain reference/master datasets that link identifiers across disparate sources. - Document mappings, assumptions, and known limitations clearly for downstream users. - Reconcile legacy data formats with current formats. - Ensure consistency between different Elexon message types. - Investigate discrepancies between data sources and determine authoritative values. - Clean time-series data: detect outliers, fill gaps appropriately, resolve overlapping or duplicate timestamps. - Develop reusable Python-based cleaning routines that can be applied across datasets. - Write and maintain Python data grabbers for energy market APIs. - Build dbt models to transform raw data into clean, analysis-ready datasets. - Orchestrate workflows via GitHub Actions. - Design PostgreSQL schemas that reflect your understanding of the domain. Benefits - Remote-first with async collaboration (Slack, GitHub, documented decisions). - Core overlap with UK business hours expected (at least 4 hours daily). - Competitive compensation based on location and experience. - Plenty of opportunities for learning and professional growth. - B2B contract with a paid vacation.
• Design and implement enterprise data architecture leveraging Snowflake for structured and unstructured data • Develop scalable data models, data lakes, and data warehouses to support analytics, AI/ML, and reporting • Define and enforce data governance, security, and compliance frameworks (HIPAA, FDA, GDPR) • Architect data pipelines and ETL/ELT processes using modern tools (e.g., Snowpipe, dbt, Informatica, Azure Data Factory) • Collaborate with cross-functional teams (Engineering, Clinical, Regulatory, Quality, and Business stakeholders) • Ensure data integrity, lineage, and traceability aligned with medical device standards (e.g., ISO 13485, GxP) • Optimize Snowflake performance, cost management, and workload management • Support cloud data architecture (AWS, Azure, or GCP) and hybrid environments • Enable advanced analytics, including AI/ML and real-world evidence (RWE) use cases • Establish best practices for data lifecycle management (ingestion, storage, archival, and retention)


