Job Closed

This listing is no longer active.

Speechify

Get your reading done faster, easier, and on the go. Listen to any book, document, or website with Speechify.

Software Engineer, Data Infrastructure & Acquisition - Austin, USA

Data EngineerData EngineerOther RemoteTeam 51-200H1B SponsorCompany Site LinkedIn

Location

United States

Posted

90 days ago

Salary

$140K - $200K / year

Python Shell Linux Docker Terraform GCP

Job Description

The mission of Speechify is to make sure that reading is never a barrier to learning. Over 50 million people use Speechify’s text-to-speech products to turn whatever they’re reading – PDFs, books, Google Docs, news articles, websites – into audio, so they can read faster, read more, and remember more. Speechify’s text-to-speech reading products include its iOS app, Android App, Mac App, Chrome Extension, and Web App. Google recently named Speechify the Chrome Extension of the Year and Apple named Speechify its 2025 Design Award winner for Inclusivity. Today, nearly 200 people around the globe work on Speechify in a 100% distributed setting – Speechify has no office. These include frontend and backend engineers, AI research scientists, and others from Amazon, Microsoft, and Google, leading PhD programs like Stanford, high growth startups like Stripe, Vercel, Bolt, and many founders of their own companies. Overview We're looking to hire for our Data side of our AI team at Speechify. This role is responsible for all aspects of data collection to support our model training operations. We are able to build high-quality datasets at petabyte-scale and low cost through a tight integration of infrastructure, engineering, and research work. We are looking for a skilled Software Engineer to join us. What You’ll Do - Be scrappy to find new sources of audio data and bring it into our ingestion pipeline - Operate and extend the cloud infrastructure for our ingestion pipeline, currently running on GCP and managed with Terraform. - Collaborate closely with our Scientists to shift the cost/throughput/quality frontier, delivering richer data at bigger scale and lower cost to power our next-generation models. - Collaborate with others on the AI Team and Speechify Leadership to craft the AI Team’s dataset roadmap to power Speechify’s next-generation consumer and enterprise products. An Ideal Candidate Should Have - BS/MS/PhD in Computer Science or a related field. - 5+ years of industry experience in software development. - Proficiency with bash/Python scripting in Linux environments - Proficiency in Docker and Infrastructure-as-Code concepts and professional experience with at least one major Cloud Provider (we use GCP) - Experience with web crawlers, large-scale data processing workflows is a plus - Ability to handle multiple tasks and adapt to changing priorities. - Strong communication skills, both written and verbal. What we offer - A fast-growing environment where you can help shape the company and product. - An entrepreneurial-minded team that supports risk, intuition, and hustle. - A hands-off management approach so you can focus and do your best work. - An opportunity to make a big impact in a transformative industry. - Competitive salaries, a friendly and laid-back atmosphere, and a commitment to building a great asynchronous culture. - Opportunity to work on a life-changing product that millions of people use. - Build products that directly impact and support people with learning differences like dyslexia, ADD, low vision, concussions, autism, and more. - Work in one of the fastest-growing sectors of tech, the intersection of artificial intelligence and audio. Compensation: The United States base salary range for this full-time position is $140,000-$200,000 + bonus + equity depending on experience Think you’re a good fit for this job? Tell us more about yourself and why you're interested in the role when you apply. And don’t forget to include links to your portfolio and LinkedIn. Not looking but know someone who would make a great fit? Refer them! Speechify is committed to a diverse and inclusive workplace. Speechify does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.

Related Categories

Data Engineer

Related Job Pages

Remote Python Jobs (US)More Remote Jobs

More Data Engineer Jobs

Software Engineer, Data Infrastructure & Acquisition - San Francisco, USA

Speechify

Get your reading done faster, easier, and on the go. Listen to any book, document, or website with Speechify.

Data Engineer90 days ago

Other RemoteTeam 51-200H1B Sponsor

Company Site LinkedIn

Python Shell Linux Docker GCP Terraform

View details: Software Engineer, Data Infrastructure & Acquisition - San Francisco, USA

United States

$140K - $200K / year

Apply

Job Closed

Healthcare Data Architect & Manager

Harbor Health Austin

Harbor Health of Austin, Texas, also known as Harbor Health Team, Inc., is a new model of cocreated health in Austin. The company helps its patients make the ri

Data Engineer90 days ago

Other Remote

Company Site

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description Harbor Health is rebuilding the operating system of healthcare. The HDS team contributes to this by building "HarborOS", the central nervous system of our clinical operations, tying together information from different clinical sources to drive optimal outcomes for our members. Data is our product. We are looking for a Healthcare Data Architect & Manager to lead the design of foundational data structures and ontologies that will help us transform healthcare. You will not just move data; you will define how our business understands it. You will take raw chaos (claims, ADT, disparate EMR feeds) and architect the trusted data models and master indexes that power our patient care, risk adjustment, and financial decisions. What You Will Do - Architect the Healthcare Data Model: Design the core schemas and data models that define our business. You will decide how we model complex, systemic concepts like "Attribution," "Risk," and "Claims Adjudication" to ensure historical accuracy, scalability, and query performance. - Define the Technology Standards & Strategy: Set the overarching strategy for our data stack (Snowflake, realtime streams etc) and dbt environments. You will dictate our approach to data schema, reporting schema, stream and transactional use of data. Dimensional Modeling (Kimball/Star Schema), denormalization vs. normalization, and Medallion architecture progression. - Data Usability and Access: Design and implement the deterministic and probabilistic matching logic to assign unique identifiers to patients coming from highly fragmented external sources (HIEs, external EMRs, payers). - Bridge Engineering and Clinical Operations: Act as the primary strategic interface between Clinical/Business Operations and Engineering. You will interview stakeholders to uncover the "Why" behind their requests and translate complex, ambiguous business needs into precise architectural specifications. - Domain Leadership & Mentorship: Act as the deep subject matter expert on healthcare data for the engineering team. Mentor our engineers on healthcare domain nuances (e.g., why a reversal claim behaves differently than a void) and raise the collective technical bar for system design. - Technical Ownership & Execution: Drive complex initiatives from inception to delivery. Leveraging your deep healthcare expertise, you will interpret asks from clinical & business stakeholders and work with the engineering team to design and deliver the optimal data architecture. Qualifications - Healthcare “Payvider” Domain Mastery: You bring deep subject matter expertise in healthcare data domains both on provider and payer side. You are fluent in EDI, FHIR, ADT and other standards used in healthcare, and how this data is aggregated & modeled to support execution. - Data Architecture Fundamentals: Proven experience designing highly scalable data platforms from scratch. You deeply understand how column-oriented databases (Snowflake) work under the hood and realtime processing and how to design schemas that optimize partition pruning and compute costs at scale. - dbt & Transformation Fluency: You know how to structure a massive dbt project for enterprise scale, including macroarchitecture, incremental strategies, and dependency management while trading off quality management rigor with velocity. - Technical Influence: You can deliver hands-on, writing performant, readable, and complex SQL and read/review Python pipelines, as well as influence & coach other engineers through systemic standards and architectural reviews. Strongly Preferred - You bring deep subject matter expertise in healthcare data domains. You are fluent in processing 837 files, modeling the claim lifecycle (Adjudication, Reversal, Denial), and handling provider data nuances (NPI vs. TIN). - Proven experience designing Dimensional Models (Kimball/Star Schema) from scratch. You know when to denormalize for performance and when to normalize for integrity. Our Stack - Transformation: dbt (Core/Cloud) - Warehouse: Snowflake - Ingestion: Fivetran, Python (Custom) - Orchestration: Airflow / Dagster - BI/Semantic Layer: Omni Benefits - Opportunity to shape the financial foundation of an innovative healthcare model - Collaborative and dynamic work environment - An organization made of people who are passionate about changing the healthcare landscape - Competitive salary, company equity and robust benefits package - Professional development and growth opportunities - A transparent and unique culture Harbor Health is an Equal Opportunity Employer. We do not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other characteristic protected by law. We are committed to creating an inclusive environment for all clinicians and teammates and actively encourage applications from people of all backgrounds.

Snowflake dbt SQL Python Airflow

View details: Healthcare Data Architect & Manager

United States

Apply

Senior Data Engineer – Data Vault, Databricks

Keyrus

#MakeDataMatter #HumanizingTheFuture

Data Engineer90 days ago

Full Time RemoteTeam 1,001-5,000Since 1996H1B Sponsor

Company Site LinkedIn

• Design and implement data pipelines that integrate Warehouse Management Systems (WMS) into an existing enterprise data platform • Extend and maintain a Data Vault 2.0 architecture (Raw Vault, Business Vault, and Information Mart layers) • Develop data transformations using dbt within a Databricks Lakehouse environment • Build persistent staging and ingestion pipelines to address operational system data retention limitations • Integrate data from multiple operational and reference sources to enable consolidated cross-system reporting • Contribute to the creation of scalable Information Marts that support high-frequency analytical reporting • Follow and enhance existing automation frameworks and development standards used in the platform • Collaborate with data architects, engineers, and business stakeholders to ensure solutions are robust, scalable, and future-ready • Contribute to designing architectures that can evolve from batch ingestion to real-time data processing

SQL HashiCorp Vault

View details: Senior Data Engineer – Data Vault, Databricks

Portugal

€50K - €60K / year

Apply

Job Closed

Senior Data Engineer – Microsoft Fabric

Upstart 13

Bringing down borders in technology.

Data Engineer90 days ago

Other RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Own the design, build-out, and evolution of our data platform on Microsoft Fabric • Design and implement end-to-end ELT pipelines using Data Factory, Spark notebooks (PySpark/Spark SQL), and Dataflows Gen2 • Build and maintain a medallion architecture (Bronze → Silver → Gold) • Implement data validation, schema enforcement, and automated quality checks; partner with governance on lineage • Manage Fabric platform operations, capacity monitoring, Git-based CI/CD for notebooks and pipelines • Deliver well-structured Gold-layer tables optimized for Direct Lake • Design and implement RTM Isolation strategies using CDC • Collaborate with BI stakeholders to deliver AI-Ready Gold-layer tables

ETL PySpark Apache Spark SQL

View details: Senior Data Engineer – Microsoft Fabric

United States

Apply

Job Closed