Software Engineer, Data Infrastructure & Acquisition - Boston, USA

Get your reading done faster, easier, and on the go. Listen to any book, document, or website with Speechify.

Other RemoteTeam 51-200H1B Sponsor

The mission of Speechify is to make sure that reading is never a barrier to learning. Over 50 million people use Speechify’s text-to-speech products to turn whatever they’re reading – PDFs, books, Google Docs, news articles, websites – into audio, so they can read faster, read more, and remember more. Speechify’s text-to-speech reading products include its iOS app, Android App, Mac App, Chrome Extension, and Web App. Google recently named Speechify the Chrome Extension of the Year and Apple named Speechify its 2025 Design Award winner for Inclusivity. Today, nearly 200 people around the globe work on Speechify in a 100% distributed setting – Speechify has no office. These include frontend and backend engineers, AI research scientists, and others from Amazon, Microsoft, and Google, leading PhD programs like Stanford, high growth startups like Stripe, Vercel, Bolt, and many founders of their own companies. Overview We're looking to hire for our Data side of our AI team at Speechify. This role is responsible for all aspects of data collection to support our model training operations. We are able to build high-quality datasets at petabyte-scale and low cost through a tight integration of infrastructure, engineering, and research work. We are looking for a skilled Software Engineer to join us. What You’ll Do - Be scrappy to find new sources of audio data and bring it into our ingestion pipeline - Operate and extend the cloud infrastructure for our ingestion pipeline, currently running on GCP and managed with Terraform. - Collaborate closely with our Scientists to shift the cost/throughput/quality frontier, delivering richer data at bigger scale and lower cost to power our next-generation models. - Collaborate with others on the AI Team and Speechify Leadership to craft the AI Team’s dataset roadmap to power Speechify’s next-generation consumer and enterprise products. An Ideal Candidate Should Have - BS/MS/PhD in Computer Science or a related field. - 5+ years of industry experience in software development. - Proficiency with bash/Python scripting in Linux environments - Proficiency in Docker and Infrastructure-as-Code concepts and professional experience with at least one major Cloud Provider (we use GCP) - Experience with web crawlers, large-scale data processing workflows is a plus - Ability to handle multiple tasks and adapt to changing priorities. - Strong communication skills, both written and verbal. What we offer - A fast-growing environment where you can help shape the company and product. - An entrepreneurial-minded team that supports risk, intuition, and hustle. - A hands-off management approach so you can focus and do your best work. - An opportunity to make a big impact in a transformative industry. - Competitive salaries, a friendly and laid-back atmosphere, and a commitment to building a great asynchronous culture. - Opportunity to work on a life-changing product that millions of people use. - Build products that directly impact and support people with learning differences like dyslexia, ADD, low vision, concussions, autism, and more. - Work in one of the fastest-growing sectors of tech, the intersection of artificial intelligence and audio. Compensation: The United States base salary range for this full-time position is $140,000-$200,000 + bonus + equity depending on experience Think you’re a good fit for this job? Tell us more about yourself and why you're interested in the role when you apply. And don’t forget to include links to your portfolio and LinkedIn. Not looking but know someone who would make a great fit? Refer them! Speechify is committed to a diverse and inclusive workplace. Speechify does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.

Python Shell Docker Terraform GCP

View details: Software Engineer, Data Infrastructure & Acquisition - Boston, USA

United States

$140K - $200K / year

Apply

Job Closed

Software Engineer, Data Infrastructure & Acquisition - Salt Lake City, USA

Speechify

Get your reading done faster, easier, and on the go. Listen to any book, document, or website with Speechify.

Data Engineer79 days ago

Other RemoteTeam 51-200H1B Sponsor

Company Site LinkedIn

The mission of Speechify is to make sure that reading is never a barrier to learning. Over 50 million people use Speechify’s text-to-speech products to turn whatever they’re reading – PDFs, books, Google Docs, news articles, websites – into audio, so they can read faster, read more, and remember more. Speechify’s text-to-speech reading products include its iOS app, Android App, Mac App, Chrome Extension, and Web App. Google recently named Speechify the Chrome Extension of the Year and Apple named Speechify its 2025 Design Award winner for Inclusivity. Today, nearly 200 people around the globe work on Speechify in a 100% distributed setting – Speechify has no office. These include frontend and backend engineers, AI research scientists, and others from Amazon, Microsoft, and Google, leading PhD programs like Stanford, high growth startups like Stripe, Vercel, Bolt, and many founders of their own companies. Overview We're looking to hire for our Data side of our AI team at Speechify. This role is responsible for all aspects of data collection to support our model training operations. We are able to build high-quality datasets at petabyte-scale and low cost through a tight integration of infrastructure, engineering, and research work. We are looking for a skilled Software Engineer to join us. What You’ll Do - Be scrappy to find new sources of audio data and bring it into our ingestion pipeline - Operate and extend the cloud infrastructure for our ingestion pipeline, currently running on GCP and managed with Terraform. - Collaborate closely with our Scientists to shift the cost/throughput/quality frontier, delivering richer data at bigger scale and lower cost to power our next-generation models. - Collaborate with others on the AI Team and Speechify Leadership to craft the AI Team’s dataset roadmap to power Speechify’s next-generation consumer and enterprise products. An Ideal Candidate Should Have - BS/MS/PhD in Computer Science or a related field. - 5+ years of industry experience in software development. - Proficiency with bash/Python scripting in Linux environments - Proficiency in Docker and Infrastructure-as-Code concepts and professional experience with at least one major Cloud Provider (we use GCP) - Experience with web crawlers, large-scale data processing workflows is a plus - Ability to handle multiple tasks and adapt to changing priorities. - Strong communication skills, both written and verbal. What we offer - A fast-growing environment where you can help shape the company and product. - An entrepreneurial-minded team that supports risk, intuition, and hustle. - A hands-off management approach so you can focus and do your best work. - An opportunity to make a big impact in a transformative industry. - Competitive salaries, a friendly and laid-back atmosphere, and a commitment to building a great asynchronous culture. - Opportunity to work on a life-changing product that millions of people use. - Build products that directly impact and support people with learning differences like dyslexia, ADD, low vision, concussions, autism, and more. - Work in one of the fastest-growing sectors of tech, the intersection of artificial intelligence and audio. Compensation: The United States base salary range for this full-time position is $140,000-$200,000 + bonus + equity depending on experience Think you’re a good fit for this job? Tell us more about yourself and why you're interested in the role when you apply. And don’t forget to include links to your portfolio and LinkedIn. Not looking but know someone who would make a great fit? Refer them! Speechify is committed to a diverse and inclusive workplace. Speechify does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.

Python Shell Linux Docker Terraform GCP Infrastructure as Code

View details: Software Engineer, Data Infrastructure & Acquisition - Salt Lake City, USA

United States

$140K - $200K / year

Apply

Job Closed

Software Engineer, Data Infrastructure & Acquisition - Seattle, USA

Speechify

Get your reading done faster, easier, and on the go. Listen to any book, document, or website with Speechify.

Data Engineer79 days ago

Other RemoteTeam 51-200H1B Sponsor

Company Site LinkedIn

The mission of Speechify is to make sure that reading is never a barrier to learning. Over 50 million people use Speechify’s text-to-speech products to turn whatever they’re reading – PDFs, books, Google Docs, news articles, websites – into audio, so they can read faster, read more, and remember more. Speechify’s text-to-speech reading products include its iOS app, Android App, Mac App, Chrome Extension, and Web App. Google recently named Speechify the Chrome Extension of the Year and Apple named Speechify its 2025 Design Award winner for Inclusivity. Today, nearly 200 people around the globe work on Speechify in a 100% distributed setting – Speechify has no office. These include frontend and backend engineers, AI research scientists, and others from Amazon, Microsoft, and Google, leading PhD programs like Stanford, high growth startups like Stripe, Vercel, Bolt, and many founders of their own companies. Overview We're looking to hire for our Data side of our AI team at Speechify. This role is responsible for all aspects of data collection to support our model training operations. We are able to build high-quality datasets at petabyte-scale and low cost through a tight integration of infrastructure, engineering, and research work. We are looking for a skilled Software Engineer to join us. What You’ll Do - Be scrappy to find new sources of audio data and bring it into our ingestion pipeline - Operate and extend the cloud infrastructure for our ingestion pipeline, currently running on GCP and managed with Terraform. - Collaborate closely with our Scientists to shift the cost/throughput/quality frontier, delivering richer data at bigger scale and lower cost to power our next-generation models. - Collaborate with others on the AI Team and Speechify Leadership to craft the AI Team’s dataset roadmap to power Speechify’s next-generation consumer and enterprise products. An Ideal Candidate Should Have - BS/MS/PhD in Computer Science or a related field. - 5+ years of industry experience in software development. - Proficiency with bash/Python scripting in Linux environments - Proficiency in Docker and Infrastructure-as-Code concepts and professional experience with at least one major Cloud Provider (we use GCP) - Experience with web crawlers, large-scale data processing workflows is a plus - Ability to handle multiple tasks and adapt to changing priorities. - Strong communication skills, both written and verbal. What we offer - A fast-growing environment where you can help shape the company and product. - An entrepreneurial-minded team that supports risk, intuition, and hustle. - A hands-off management approach so you can focus and do your best work. - An opportunity to make a big impact in a transformative industry. - Competitive salaries, a friendly and laid-back atmosphere, and a commitment to building a great asynchronous culture. - Opportunity to work on a life-changing product that millions of people use. - Build products that directly impact and support people with learning differences like dyslexia, ADD, low vision, concussions, autism, and more. - Work in one of the fastest-growing sectors of tech, the intersection of artificial intelligence and audio. Compensation: The United States base salary range for this full-time position is $140,000-$200,000 + bonus + equity depending on experience Think you’re a good fit for this job? Tell us more about yourself and why you're interested in the role when you apply. And don’t forget to include links to your portfolio and LinkedIn. Not looking but know someone who would make a great fit? Refer them! Speechify is committed to a diverse and inclusive workplace. Speechify does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.

Shell Python Linux Docker GCP Terraform

View details: Software Engineer, Data Infrastructure & Acquisition - Seattle, USA

United States

$140K - $200K / year

Apply

Job Closed

Senior Scientific Data Platform Engineer (Joint Genome Institute)

Lawrence Berkeley National Laboratory

Data Engineer79 days ago

Other RemoteTeam 1,001-5,000

Berkeley Lab’s (LBNL) Joint Genome Institute (JGI) has an opening for a Senior Scientific Data Platform Engineer to play a critical role in transforming raw scientific outputs into high-value, AI-ready data assets that directly support JGI's mission. In this exciting role, you will analyze scientific use-cases, data management challenges, and design robust, automated, and cost-effective solutions to address them. Working at the intersection of scientific data and the data lakehouse platform, this position will contribute to the development and maintenance of JGI’s Data Lakehouse effort, leading ongoing integration efforts to ensure scientific data is well-structured, accessible, and optimized for use by domain scientists and downstream AI applications. The JGI’s mission is to provide the global research community with access to the most advanced integrative genome science capabilities in support of the DOEs research mission to solve the world’s evolving energy and environmental challenges. The JGI supports projects in genome sequencing, synthesis, transcriptomics, metabolomics, and natural products in plants, fungi, algae, and microorganisms. This position is headquartered on the Lab’s main site at the Integrative Genomics Building (IGB) (Virtual Tour). This position has an anticipated start date of May 1, 2026. We’re here for the same mission, to bring science solutions to the world. Join our team and YOU will play a supporting role in our goal to address global challenges! Have a high level of impact and work for an organization associated with 17 Nobel Prizes! Why join Berkeley Lab? We invest in our employees by offering a total rewards package you can count on: - Exceptional health and retirement benefits, including pension or 401K-style plans - A culture where you’ll belong - we are invested in our teams! - In addition to accruing vacation and sick time, we also have a Winter Holiday Shutdown every year. - Parental bonding leave (for both mothers and fathers) - Pet insurance What You Will Do: - Analyze and evaluate complex scientific use-cases and design automated system solutions. - Provide technical expertise in identifying, evaluating, and developing cost-effective systems and procedures that meet user requirements. - Lead the design and implementation of data integration processes for the JGI's Data Lakehouse, ensuring large scientific datasets are structured for efficient querying and analysis. - Design, build, and maintain fault-tolerant, scalable, and efficient Extract, Transform, Load (ETL) data pipelines to ingest, transform, and load genomic data and associated metadata into the Data Lakehouse. - Configure system settings and options. - Plan and perform unit, integration, and acceptance testing. - Create system specifications aligned with business requirements. - Provide consultation and guidance to domain scientists and other users on the use of automated systems. - Collaborate closely with cross-functional teams to resolve business and system-related issues. What Is Required: - A Bachelor’s Degree (or equivalent knowledge/training) in Computer Science, Data Engineering, or a related technical field and a minimum of 8 years of demonstrated experience structuring large-scale datasets for efficient use in Data Lakehouse environments, leveraging technologies such as Parquet, Iceberg, Dremio, Spark, or similar lakehouse and data warehousing platforms or an equivalent combination of education and experience. - Demonstrated proficiency with modern Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) tools and frameworks. - Strong scripting skills in data engineering languages, including Python (with Pandas and PySpark) and advanced SQL for data manipulation and performance optimization. - Strong analytical skills including the ability to identify problems, troubleshoot, and demonstrate good judgement in selecting methods and techniques for obtaining solutions. - Excellent oral and written communication skills, including experience organizing and presenting technical information to varying audiences. - Demonstrated interpersonal skills including experience collaborating with an interdisciplinary research team. Desired Qualifications: - A Master’s Degree (or equivalent knowledge/training) in Computer Science, Data Engineering, or a related technical field. - Experience with Data Lakehouse technologies like Dremio or Spark. - Domain knowledge of genomics data. Additional Information: - Application Date: Priority consideration will be given to candidates who apply with a resume and cover letter by March 31, 2026. Applications will be accepted until the job posting is removed. - Appointment Type: This is a full time, exempt from overtime pay (monthly paid), 2 year (benefits eligible), Term appointment with the possibility of extension or conversion to Career appointment based upon satisfactory job performance, continuing availability of funds, and ongoing operational needs. - Salary Range: This position has a budgeted salary range of $139,440 - $174,312 annually, which fits within the full salary range of $139,440 - $235,308 annually for job code C71.3. It is not typical for an individual to be offered a salary at or near the top of the range for a position. Salary will be commensurate with the final candidate’s qualification and experience, including skills, knowledge, relevant education, certifications, and aligned with the internal peer group. - Background Check: This position is subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment. - Work Modality: This position offers flexibility in work mode, including onsite, hybrid, full-time telework, or remote work, provided the individual resides within the contiguous United States. Onsite work will take place at Lawrence Berkeley National Lab, located at 1 Cyclotron Road, Berkeley, CA 94720. Work schedules are dependent on business needs. A REAL ID or other acceptable form of identification is required to access Berkeley Lab sites (for more information click here). - Relocation Assistance: This position is eligible for relocation assistance. - Work Authorization: Applicants must be legally authorized to work in the United States. Berkeley Lab does not provide visa sponsorship for this position. Want to learn more about working at Berkeley Lab? Please visit: careers.lbl.gov Equal Employment Opportunity Employer: The foundation of Berkeley Lab is our Stewardship Values: Team Science, Service, Trust, Innovation, and Respect; and we strive to build community with these shared values and commitments. Berkeley Lab is an Equal Opportunity Employer. We heartily welcome applications from all who could contribute to the Lab's mission of leading scientific discovery, excellence, and professionalism. In support of our rich global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected categories under State and Federal law. Misconduct Disclosure Requirement: As a condition of employment, the finalist will be required to disclose if they are subject to any final administrative or judicial decisions within the last seven years determining that they committed any misconduct, are currently being investigated for misconduct, left a position during an investigation for alleged misconduct, or have filed an appeal with a previous employer.

Apache Iceberg Apache Spark ETL Python Pandas PySpark SQL

View details: Senior Scientific Data Platform Engineer (Joint Genome Institute)

United States

$139K - $174K / year

Apply

Job Closed

Software Engineer, Data Infrastructure & Acquisition - Portland, USA

Job Description

Related Guides

Related Categories

Related Job Pages

More Data Engineer Jobs

Software Engineer, Data Infrastructure & Acquisition - Boston, USA

Software Engineer, Data Infrastructure & Acquisition - Salt Lake City, USA

Software Engineer, Data Infrastructure & Acquisition - Seattle, USA

Senior Scientific Data Platform Engineer (Joint Genome Institute)