EVOLVE BY INTEGRATION
Junior Data Engineer
Location
United Kingdom
Posted
1 day ago
Salary
0
Seniority
Junior
Job Description
Junior Data Engineer
Prima Power
• Shaping the architecture of data products designed for data analytics and data science specifically focusing on use cases like forecasting, feature engineering, customer behaviour, and integration of new data sources. • Support the data transformation by setting up best practices in areas like Data modelling, performance optimisation, Data Governance etc, ensuring that the data used within Prima is consistent, available and reliable. • Build reusable technology that enables teams to ingest, store, transform, and serve their own data products. • Engaging with data scientists and machine learning engineers to explore the product landscape and refine data requirements for enhanced data infrastructure.
Job Requirements
- Strong academic background in STEM disciplines;
- Ability to break down problems, learn quickly, and test different approaches;
- Programming foundations (language is not important. We value clean, maintainable code above all);
- Curiosity for data and software — whether from coursework, projects, or personal initiatives;
- Interest in building systems end-to-end: from ingesting and transforming data, to creating models, to deploying services;
- Motivation and eagerness to learn from more experienced teammates.
- Participation in hackathons, open-source contributions, or side projects;
- Knowledge of Python
- Experience with data pipelines or distributed data systems (e.g., Spark, Airflow).
Benefits
- Private healthcare
- Gym discounts
- Wellbeing programs
- Mental health support
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Senior Data Engineer
KearneyCervello, Inc. prides itself on providing a culture where our employees belong and thrive equally, which means our people feel comfort, confidence, and joy as they do great things for our firm, our colleagues, and our clients. That’s why Cervello, Inc. is committed to building a diverse workforce and inclusive environment. Cervello, Inc. is an equal opportunity employer; we recruit, hire, train, promote, develop, and provide other conditions of employment without regard to a person’s gender identity or expression, sexual orientation, race or ethnicity, religion, age, national origin, disability, marital status, pregnancy status, veteran status, genetic information, or any other differences consistent with applicable laws. This includes providing reasonable accommodation for disabilities or religious beliefs and practices. We encourage everyone to apply, including those who may not feel historically represented in consulting.
Role Description Sr. Data Engineer (Boston MA) wanted for business analytics and planning co. Requires Masters in Computer Science or Electrical Engineering. Remote employment possible, work from anywhere. To apply please mail your resume to the Cervello office at: 155 Seaport Blvd Floor 2 Boston, MA 02210 Base salary range: $140,130 - $160,000. It is important to note that at Cervello, Inc. it is not typical for an individual to be hired at the top of the range for their role. Individual salaries within each range are determined through a wide variety of factors, including but not limited to education, experience, knowledge, and skills. Cervello, Inc. reviews compensation regularly and may adjust base salaries to reflect market competitiveness. In addition to salary, individuals may be eligible for a discretionary performance bonus. Qualifications - Masters in Computer Science or Electrical Engineering Requirements - Remote employment possible, work from anywhere Benefits - Paid time off - Paid sick leave - 401(k) match and profit sharing - Medical, dental, and vision coverage - Healthcare concierge services - Backup child/adult care - Annual employer HSA contributions - Home office stipend - Subsidized gym membership - Annual wellness program - Leaves of absence when needed to support employees’ physical, mental, and emotional well-being Company Description Cervello, Inc. prides itself on providing a culture where our employees belong and thrive equally, which means our people feel comfort, confidence, and joy as they do great things for our firm, our colleagues, and our clients. That’s why Cervello, Inc. is committed to building a diverse workforce and inclusive environment. Cervello, Inc. is an equal opportunity employer; we recruit, hire, train, promote, develop, and provide other conditions of employment without regard to a person’s gender identity or expression, sexual orientation, race or ethnicity, religion, age, national origin, disability, marital status, pregnancy status, veteran status, genetic information, or any other differences consistent with applicable laws. This includes providing reasonable accommodation for disabilities or religious beliefs and practices. We encourage everyone to apply, including those who may not feel historically represented in consulting.
• Architect and own the enterprise AI data platform — the unified, governed layer that ingests, transforms, stores, and serves all data consumed by AI systems across the organisation. • Design multi-domain data models (lakehouse, data mesh, event-driven) that are structured from day one to serve AI workloads: clean lineage, versioned schemas, well-documented contracts, and low-latency serving APIs. • Own the full data stack: real-time streaming (Kafka, Spark Structured Streaming), batch processing (Databricks, PySpark, Delta Lake), cloud storage and compute (AWS, Azure), and data quality /metadata management. • Ensure this platform is the single, authoritative data source for all downstream consumers — conversational AI, dashboard assistants, autonomous agents, ML models, and reporting — eliminating data silos and conflicting truths. • Drive modernisation of legacy pipelines (on-prem ETL, batch DWH) to cloud-native, AI-ready architectures with measurable improvements in cost, latency, and delivery velocity. • Design the semantic layer that sits above raw data — business-aligned ontologies, entity relationships, domain taxonomies, and knowledge graphs — so AI systems understand context, not just tokens. • Build and maintain knowledge graphs (Neo4j or equivalent) that capture relationships between business entities, policies, KPIs, hierarchies, and domain rules — enabling structured reasoning alongside unstructured retrieval. • Define and govern a feature store and semantic data contracts that serve both classical ML models and LLM-based applications from a single, well-versioned, trusted source. • Own metadata management, data lineage, and audit trails across the semantic layer — ensuring every AI system can trace its outputs back to source data with full accountability. • Design and enforce a comprehensive data governance model that governs access for both human users and AI agents — with role-based access control (RBAC), attribute-based policies, and agent-specific permission scopes that prevent privilege escalation.
• Build, test, and maintain production pipelines (batch & real-time) on Snowflake, PySpark, Delta Lake, and Kafka. • Implement data quality checks, schema validation, and alerting at every pipeline stage. • Migrate legacy ETL/DWH to cloud-native AWS/Azure architectures with measurable latency and cost improvements. • Maintain CI/CD pipelines: automated testing, deployment, rollback, and IaC (Terraform, GitHub Actions). • Build end-to-end retrieval infrastructure: document ingestion, embedding pipelines, vector store management (Pinecone, FAISS, ChromaDB, OpenSearch), and hybrid retrieval layers. • Implement chunking, metadata filtering, and re ranking — tuning for precision, recall, and latency. • Maintain data freshness and index consistency; instrument with context relevance and faithfulness metrics. • Implement and maintain business entity mappings, ontologies, and knowledge graphs (Neo4j) per Architect design. • Build and version the feature store and semantic data contracts serving both ML models and LLM applications. • Manage metadata, data lineage, and audit trail instrumentation across the platform. • Build ML data infrastructure: training curation, feature engineering, MLflow experiment tracking, dataset versioning. • Support LLM fine-tuning workflows — corpus curation, quality filtering, dataset formatting. • Implement automated evaluation pipelines: factual accuracy, hallucination detection, regression tracking. • Maintain production monitoring dashboards for pipeline health, model metrics, and alerting. • Build and maintain data APIs, tool schemas, and memory/state stores that autonomous agents depend on. • Implement agent observability: capture inputs, retrieved context, tool calls, reasoning traces, and outputs. • Maintain text-to-SQL layers, semantic query interfaces, and context APIs for conversational AI consumers. • Implement RBAC, attribute-based access, PII detection/masking, data classification, and audit logging. • Enforce data contracts and schema governance with automated breaking-change detection and versioned migrations. • Build data quality monitoring (completeness, freshness, consistency) with automated alerting and root-cause tooling. • Support compliance readiness: audit trails, data provenance, and regulatory documentation.
• Design, develop, and maintain scalable data pipelines and integration solutions within the Azure ecosystem • Build and support enterprise data platforms that consolidate information from multiple source systems • Develop ETL/ELT processes to ingest, transform, validate, and distribute business-critical data • Contribute to data modelling, data quality, and governance initiatives • Support Master Data Management (MDM) processes and ensure consistency across systems • Work with structured and semi-structured data from ERP, business applications, APIs, databases, and external sources • Collaborate with business and technical stakeholders to translate requirements into scalable data solutions • Participate in deployment, testing, monitoring, and continuous improvement activities • Contribute to CI/CD pipelines, automation, and DevOps best practices • Actively participate in Agile ceremonies and team collaboration


