BHFT logo
BHFT

Step into the state-of-the-art algorithmic trading solutions

Market Data Engineer

Data EngineerData EngineerFull TimeRemoteSeniorTeam 11-50H1B No SponsorCompany SiteLinkedIn

Location

Spain

Posted

6 days ago

Salary

0

Seniority

Senior

Job Description

Market Data Engineer

BHFT

• Capture & Ingestion. Own the full capture path from wire to lake: decode and normalize raw exchange feeds (pcap, multicast UDP / ITCH / FIX) and vendor sources (OneTick, Refinitiv, Bloomberg, ICE) into a unified canonical model with nanosecond timestamps. Build batch + stream pipelines (Airflow, Spark, dbt) for tick and reference data. Own L2/L3 order-book reconstruction with gap handling. Provide Python and Rust producer SDKs for internal feed handlers. • Storage & Modeling — Apache Iceberg. Own the Iceberg-over-S3 lakehouse: design partitioning, sort orders, and row-group layout for fast scans; manage schema evolution, snapshots, time travel, compaction, and TTL. Maintain reference data as slowly-changing tables with point-in-time correctness for backtests. Drive storage cost optimisation via compaction, tiering, and snapshot expiry. • Tooling & Libraries. Build libraries for schema management, data contracts, validation, and lineage on top of the Iceberg catalog. Develop shared access services (Spark + Polars) so Research, backtesting, and trading share one normalized data layer, including gap detection and pcap-vs-lake reconciliation. • Reliability & Observability. Embed monitoring, alerting, SLAs/SLOs, and CI/CD across capture and pipeline layers on Kubernetes (EKS). Own data-quality dashboards and incident runbooks for the capture fleet. • Collaboration. Partner with Quant Research, Data Science, Backend, and DevOps to translate requirements into platform capabilities and champion market-data engineering best practices.

Job Requirements

  • 5+ years building production-grade data systems, with proven expertise architecting and launching data lakes / lakehouses from scratch.
  • Hands-on experience with Apache Iceberg (or comparable table formats — Delta / Hudi): partitioning, schema evolution, snapshots, compaction, and catalog operations; familiarity with Apache Arrow for zero-copy, columnar in-memory interchange.
  • Experience with market data and/or network packet capture — decoding pcap, exchange feed protocols (ITCH, FIX/FAST, multicast UDP), order-book reconstruction, and time-series at scale (strong plus; willingness to learn required).
  • Experience normalizing market data from multiple vendors — e.g. OneTick, Refinitiv/Reuters, Bloomberg, ICE — into a unified schema and symbology (strong plus).
  • Expert-level Python (incl. Polars and/or PySpark); Rust a strong plus (relevant for high-performance capture/decoding).
  • Modern orchestration (Airflow) and distributed processing (Apache Spark).
  • Advanced SQL: complex aggregations, window functions, query optimization, partition pruning.
  • Solid fundamentals in Linux, containerization (Docker, Kubernetes / EKS), and cloud object storage (AWS S3).
  • DevOps & observability: CI/CD, infrastructure-as-code (Terraform), GitOps (ArgoCD), and metrics/dashboards/alerting (Grafana, Prometheus).
  • Strong grasp of structured + unstructured / binary data, and storage optimization — partitioning, compression, cost management.
  • English fluency for documentation and collaboration in an international team.

Benefits

  • Compensation for health insurance, sports, professional development, and more.

Related Categories

Related Job Pages

More Data Engineer Jobs

Marketing Data and Analytics Engineer

Fortune Media

Fortune Media is a magazine and digital platform dedicated to delivering in-depth, high-impact business reporting that informs and inspires audiences globally.

Data Engineer6 days ago

Title: Marketing Data and Analytics Engineer Location: New York, NY, USA Job Description: Full time job requisition id JR100045 About the Role Fortune Media is building a modern, data-driven marketing technology practice from the ground up. This is an early and foundational hire on a new martech team — a rare opportunity to define how data flows, behaves, and is governed across our customer-facing systems. You'll sit at the intersection of data engineering and marketing operations: designing the pipelines, definitions, and governance structures that connect Salesforce Sales Cloud and Marketing Cloud today, and that will scale into a broader CDP and ESP ecosystem over the next 12–24 months. You won't just maintain the systems — you'll help architect what they become. This role reports to the Director of Martech & CRM and works closely with marketing, technology, editorial, events, and revenue operations stakeholders. What You'll Own Data Architecture & Flows - Design, build, and maintain data pipelines between Salesforce Sales Cloud, Marketing Cloud and any future ESPs, and adjacent systems (event platforms, subscription tools, analytics layers) - Ensure reliable and performant data syncs within the integration architecture between Sales Cloud and Marketing Cloud Connect and any future ESP and tools - Evaluate and help implement future ESP and CDP tooling, defining how data enters, moves through, and exits each system - Document all data flows, field mappings, and transformation logic in a centralized, accessible way Data Governance & Definitions - Establish and maintain a data dictionary: canonical definitions for contacts, leads, accounts, subscribers, and audience segments across systems - Enforce data quality standards — deduplication rules, field validation, consent flags, suppression logic — with an eye toward a unified customer "golden record" - Partner with the Salesforce Developer to ensure custom objects, fields, and automations conform to governance standards - Champion compliance hygiene: CAN-SPAM, GDPR, and opt-in/opt-out data integrity across all marketing sends Analytics & Measurement - Build and maintain reporting frameworks for email performance, audience health, list growth, and data quality metrics - Surface actionable insights to marketing and revenue stakeholders — deliverability trends, segment performance, data decay, contact coverage gaps - Support attribution modeling and funnel reporting as Sales Cloud and Marketing Cloud mature Stakeholder Partnership - Serve as the internal expert and translator between marketing operations, engineering, and business stakeholders on data questions - Work with the Marketing Cloud Solutions Architect on technical implementation decisions - Contribute to vendor evaluation and RFP processes as the team considers CDP and ESP investments What You Bring Required - Demonstrated ability to solve complex, high-scale engineering problems through clean, efficient code. Strong proficiency in at least one major programming language (e.g., Python, Go, Java) beyond basic scripting for data manipulation. - System Design & Distributed Architecture: - Expertise in designing scalable, fault-tolerant distributed systems. - Deep understanding of system design trade-offs (e.g., CAP theorem, latency vs. throughput, synchronous vs. asynchronous processing) in the context of enterprise data pipelines. - Advanced Data Modeling: Beyond SQL proficiency, candidates should have a deep understanding of NoSQL vs. Relational schemas and how to optimize data structures for high-performance retrieval in a CDP/CRM environment. - 2–6 years of experience in a data, data engineering, marketing operations, or martech-focused role - Hands-on experience with Salesforce Sales Cloud and Marketing Cloud — you understand how the two systems connect and where they break - Fluency in SQL; ability to query, transform, and validate data independently - Working knowledge of data governance concepts: data dictionaries, lineage, deduplication, and master data management principles - Experience designing or maintaining data integrations (APIs, connectors, ETL/ELT pipelines) - Strong documentation instincts — you leave systems better-understood than you found them - Comfort operating in ambiguity; this team is new and the roadmap is actively being shaped Preferred - Experience with Marketing Cloud Connect and Synchronized Data Extensions - Knowledge of email deliverability fundamentals (sender reputation, list hygiene, ISP feedback loops) - Exposure to ESP migration or evaluation projects - Familiarity with CDP platforms (Salesforce Data Cloud, Segment, or similar) preferred - Salesforce certification (e.g., Marketing Cloud Email Specialist, Administrator, or Data Cloud) preferred - Experience in a media, publishing, or subscription business preferred The Stack (Today) - CRM: Salesforce Sales Cloud - Marketing Automation: Salesforce Marketing Cloud - Integration: Marketing Cloud Connect - On the roadmap: CDP evaluation and integration, ESP consolidation/migration Compensation: $120-$150K + 10% target bonus Location: New York, NY — hybrid schedule A Few Of Fortune’s Perks And Benefits - 20 vacation days and 2 personal days on top of 11 company holidays and an honour-based sick leave policy - Health, dental, and vision coverage (90% paid for individuals and families), along with flexible spending accounts where Fortune contributes to your HSA - 401(k) plan - Generous parental leave - Dependent care, commuter, and cell phone benefits - Tuition reimbursement program - A commitment to an open, inclusive, and diverse work culture Why Fortune Fortune is one of the world's most recognized business media brands — and we're in an active, well-resourced period of rebuilding our technology and data foundations. This role offers unusual scope: you'll work on problems that matter to the business, with visibility to senior leadership, on a team where your contributions are directly traceable to outcomes.

New York
$120K - $150K / year
VivSoft logo

Data Engineer

VivSoft

Solving complex Public Sector Use cases using emerging technologies - SBIR Phase III Awardee

Data Engineer6 days ago
Full TimeRemoteTeam 51-200Since 2011H1B Sponsor

• Design, develop, maintain, and optimize data pipelines, data services, and data integration solutions within AWS GovCloud environments. • Support migration of legacy data into modernized platform environments. • Develop and maintain APIs, ETL/ELT processes, and file-based integrations with internal and external government systems. • Perform data modeling, database design, schema modifications, and database performance optimization. • Ensure data quality, accuracy, integrity, and consistency across Personnel Vetting Management, Adjudication, and Appeals datasets. • Investigate and resolve data quality issues, coordinate remediation efforts with data providers, and implement preventative data quality controls. • Support data governance activities, data requests, reporting requirements, and audit readiness efforts. • Develop automated data validation, monitoring, and reconciliation processes. • Support database replication, backup/recovery, disaster recovery, and continuity of operations activities. • Collaborate with software engineers, cloud architects, DevSecOps engineers, product owners, and security teams within a SAFe Agile environment. • Support CI/CD pipelines, automated testing, and secure software delivery practices. • Create and maintain technical documentation, data dictionaries, architecture artifacts, and database design documentation. • Support compliance with RMF, Zero Trust, NIST, DoD IL5, and cybersecurity requirements.

United States
Job Closed
Innodata logo

Data Annotation

Innodata

Innodata, with over 35 years of expertise, is a trusted leader in data solutions and AI innovation. The company specializes in training and deploying generative

Data Engineer6 days ago

Title: Data Annotation Location: Remote - United States Job Description: Innodata (Nasdaq: INOD) is a global data engineering company. We believe that data and Artificial Intelligence (AI) are inextricably linked. Our mission is to enable the responsible advancement of artificial intelligence by providing the data, evaluation frameworks, and human expertise required to build AI systems that can be trusted at scale. We provide a range of transferable solutions, platforms, and services for Generative AI / AI builders and adopters. In every relationship, we honor our 36+ year legacy delivering the highest quality data and outstanding outcomes for our customers. Scope of the Role: At Innodata, we’re partnering with the world’s leading technology companies to build the future of generative AI and large language models (LLMs). We’re on the lookout for smart, savvy, and curious Data Annotators to join our global contributor community as part of our Subject Matter Expert (SME) on Demand program. This is not a traditional full-time role. It’s a part-time, remote, flexible, project-specific opportunity designed for those who want to make a real impact—on their schedule. Whether you're a writer, linguist, educator, researcher, or just deeply passionate about language and logic, this role lets you contribute to cutting-edge AI development while maintaining control over your time. You’ll be helping LLMs learn the intricacies of language and reasoning—not just how to write, but how to think. If you’ve ever dreamed of shaping the intelligence behind tomorrow’s technology, this is your chance. This is more than just a gig—it’s a rare chance to help shape the future of AI from anywhere in the world, on your own terms. What You’ll Own: - Rating/assessing the performance of AI models or algorithms based on their output or behavior through a set of evaluative questions. - Labeling elements of a piece of content rather than the content as a whole. - Assigning predefined categories or labels to items. - Evaluating the perceived quality and/or appropriateness of content - Generating labels to advance understanding of a concept, trend etc. - Creation of additional training data for machine learning models by applying transformations to the original data, such as modifying images (rotation, flipping, cropping), generating new text (paraphrasing, summarization), or altering audio/video signals (speed modification, pitch shifting) to reduce overfitting and increase dataset diversity. - Reviewing data and identifying whether or not a product feature works as intended based on the project's guidelines. - Labeling model outputs to identify if a piece of content is or isn't something. Examples: identify clickbait; identifying gaming videos; identifying branded content. - Ordering or ranking items based on a set of preferences or criteria. - Creating prompts or questions that will be used to generate responses from a language model or other AI system. - Projects that evaluate the relevance of content based on a relevancy scale (1-3, 1-5, etc.). - Generating responses to prompts or questions using a language model or other AI system. - Rewriting existing text while preserving the original meaning, often to improve clarity or style and adherence to guidelines. - Producing concise summaries of longer pieces of text or data. - Converting spoken language or audio content into written text. - Converting text or spoken language from one language to another. - Gathering and compiling various forms of data to be used for training, evaluating, or fine-tuning the AI models. This may include text, images, videos, audio files, or other types of digital content. - You’ll Thrive in This Role If You Have: - A High School Diploma or higher is required. - Professional or Expert level proficiency (C1/C2) in English - The expected hourly salary range for this position is $9 p/hour, based on experience, skills, and qualifications.

United States
$0 / hour
Innodata logo

Data Annotation

Innodata

Innodata, with over 35 years of expertise, is a trusted leader in data solutions and AI innovation. The company specializes in training and deploying generative

Data Engineer6 days ago

Title: Data Annotation Location: Remote - New Mexico Job Description: Innodata (Nasdaq: INOD) is a global data engineering company. We believe that data and Artificial Intelligence (AI) are inextricably linked. Our mission is to enable the responsible advancement of artificial intelligence by providing the data, evaluation frameworks, and human expertise required to build AI systems that can be trusted at scale. We provide a range of transferable solutions, platforms, and services for Generative AI / AI builders and adopters. In every relationship, we honor our 36+ year legacy delivering the highest quality data and outstanding outcomes for our customers. Scope of the Role: At Innodata, we’re partnering with the world’s leading technology companies to build the future of generative AI and large language models (LLMs). We’re on the lookout for smart, savvy, and curious Data Annotators to join our global contributor community as part of our Subject Matter Expert (SME) on Demand program. This is not a traditional full-time role. It’s a part-time, remote, flexible, project-specific opportunity designed for those who want to make a real impact—on their schedule. Whether you're a writer, linguist, educator, researcher, or just deeply passionate about language and logic, this role lets you contribute to cutting-edge AI development while maintaining control over your time. You’ll be helping LLMs learn the intricacies of language and reasoning—not just how to write, but how to think. If you’ve ever dreamed of shaping the intelligence behind tomorrow’s technology, this is your chance. This is more than just a gig—it’s a rare chance to help shape the future of AI from anywhere in the world, on your own terms. What You’ll Own: - Rating/assessing the performance of AI models or algorithms based on their output or behavior through a set of evaluative questions. - Labeling elements of a piece of content rather than the content as a whole. - Assigning predefined categories or labels to items. - Evaluating the perceived quality and/or appropriateness of content - Generating labels to advance understanding of a concept, trend etc. - Creation of additional training data for machine learning models by applying transformations to the original data, such as modifying images (rotation, flipping, cropping), generating new text (paraphrasing, summarization), or altering audio/video signals (speed modification, pitch shifting) to reduce overfitting and increase dataset diversity. - Reviewing data and identifying whether or not a product feature works as intended based on the project's guidelines. - Labeling model outputs to identify if a piece of content is or isn't something. Examples: identify clickbait; identifying gaming videos; identifying branded content. - Ordering or ranking items based on a set of preferences or criteria. - Creating prompts or questions that will be used to generate responses from a language model or other AI system. - Projects that evaluate the relevance of content based on a relevancy scale (1-3, 1-5, etc.). - Generating responses to prompts or questions using a language model or other AI system. - Rewriting existing text while preserving the original meaning, often to improve clarity or style and adherence to guidelines. - Producing concise summaries of longer pieces of text or data. - Converting spoken language or audio content into written text. - Converting text or spoken language from one language to another. - Gathering and compiling various forms of data to be used for training, evaluating, or fine-tuning the AI models. This may include text, images, videos, audio files, or other types of digital content. You’ll Thrive in This Role If You Have: - A High School Diploma or higher is required. - Professional or Expert level proficiency (C1/C2) in English The expected hourly salary range for this position is $13 p/hour, based on experience, skills, and qualifications.

New Mexico
$0 / hour