poolside logo
poolside

World's most capable AI for software development

Member of Engineering – Pre-training, Data Engineering

Data EngineerData EngineerOtherRemoteSeniorTeam 51-200Since 2023H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

124 days ago

Salary

0

Seniority

Senior

Job Description

Member of Engineering – Pre-training, Data Engineering

poolside

• Build and maintain high-performance pipelines for trillions of tokens. • Deliver diverse and high quality datasets for pre-training foundation models. • Closely work with other teams such as Pretraining, Posttraining, Evals and Product to to ensure alignment on the quality of the models delivered.

Job Requirements

  • Strong background in building production-grade, distributed data systems for machine learning, with experience in:
  • Orchestration: Slurm, Airflow, or Dagster
  • Observability & Reliability: CI/CD, Grafana, Prometheus, etc.
  • Infra: Git, Docker, k8s, cloud managed services
  • Batched inference (ex: vLLM)
  • Performance obsession, especially with large-scale GPU clusters and distributed pipelines
  • Expert-level python knowledge and ability to write clean and maintainable code
  • Strong algorithmic foundations
  • Proficiency with libraries like Polars, Dask, or PySpark
  • Nice to have:
  • Experience in building trillion-scale SOTA pretraining datasets
  • Experience translating research to production at scale
  • Experience with OCR, web crawling, or evals
  • Prior experience pre-training LLMs

Benefits

  • Fully remote work & flexible hours
  • 37 days/year of vacation & holidays
  • Health insurance allowance for you and dependents
  • Company-provided equipment
  • Wellbeing, always-be-learning and home office allowances
  • Frequent team get togethers
  • Great diverse & inclusive people-first culture

Related Categories

Related Job Pages

More Data Engineer Jobs

OneImaging logo

Data Engineer

OneImaging

Helping employers and employees save up to 80% on health plan and out-of-pocket medical imaging costs.

Data Engineer125 days ago
OtherRemoteTeam 1-10H1B No Sponsor

• Design, build, and maintain end-to-end data pipelines using Databricks (SparkSQL, PySpark) for data ingestion, transformation, and processing. • Integrate data from various structured and unstructured sources, including medical imaging systems, EMRs, Change-Data-Capture from SQL Databases, and external APIs. • Collaborate with the analytics team to create, optimize, and maintain dashboards in Looker. • Deploy and manage cloud-based solutions on AWS (e.g., S3, EMR, Lambda, EC2) to ensure scalability, availability, and cost-efficiency using IaC tooling (Terraform and Databricks Asset Bundles). • Oversee MongoDB and PostgreSQL databases, including schema design, indexing, and performance tuning. • Adhere to healthcare compliance requirements (e.g., HIPAA) and best practices for data privacy and security. • Work cross-functionally with data scientists, product managers, and other engineering teams to gather requirements and define data workflows.

United States
Job Closed
Tava Health logo

Senior Data Engineer

Tava Health

A mental health benefit for every employee. Because healthy minds matter.

Data Engineer125 days ago
OtherRemoteTeam 11-50H1B No Sponsor

• Build and maintain robust data pipelines using tools like Airflow, Airbyte, and BigQuery. • Manage and evolve our data warehouse architecture. • Integrate third-party platforms including Salesforce, HubSpot, Iterable, ZocDoc, and Metabase. • Lead the implementation of centralized data governance, defining sources of truth and managing data flow standards. • Partner with cross-functional teams to implement clean, reporting-ready data models in dbt. • Write clean, production-grade code in a general-purpose language (e.g., Python). • Monitor data quality and ensure reliability across systems.

Alabama + 17 moreAll locations: Alabama | Arizona | California | Connecticut | Florida | Idaho | Maine | Nevada | New Jersey | New York | North Carolina | Oregon | Maryland | Massachusetts | Tennessee | Texas | Utah | Virginia
Job Closed
Circle logo

Senior Manager, Data Engineering

Circle

Circle helps businesses and developers harness the power of stablecoins for payments and internet commerce worldwide.

Data Engineer125 days ago
OtherRemoteTeam 501-1,000Since 2013H1B Sponsor

• Lead, mentor, and grow a team of data engineers • Partner with cross-functional teams to deliver trustworthy, actionable data • Drive the long-term vision for data engineering practices • Enable AI-driven approaches to improve development efficiency

California
$225K - $290K / year
Job Closed
Nomic Bio logo

Senior Data Scientist/Data Engineer

Nomic Bio

The Protein Profiling Company.

Data Engineer125 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor

• Designing, building, iteratively improving, and fully automating the data pipelines and algorithms we use for processing raw flow cytometry data from our highly multiplexed bead-based assays into quantitative protein measurements. • You will leverage your fundamental knowledge of biosensors, fluorescence data, and bioengineering R&D to act as an expert for the interpretation, and analysis of, nELISA experimental data when challenges arise in R&D and day-to-day Lab Operations. • You will also support R&D and Lab Operations teams through developing additional data support features and algorithms to support the growth of Nomic going forward. • This role will involve substantial communication, teamwork, and attention to detail, especially when identifying and troubleshooting issues related to nELISA data and ensuring we build the right tools, and the right abstractions. • When tooling does not yet exist, you will leveraging your technical and bioscience domain expertise to develop new data analysis pipelines.

Canada