poolside logo
poolside

World's most capable AI for software development

Member of Engineering – Pre-training, Synthetic Data

Software EngineerSoftware EngineerOtherRemoteSeniorTeam 51-200Since 2023H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

124 days ago

Salary

0

Seniority

Senior

Bachelor DegreeEnglishPython

Job Description

Member of Engineering – Pre-training, Synthetic Data

poolside

• You’ll be working on our data team focused on the quality of the datasets being delivered for training our models. • This is a hands-on role where your #1 mission would be to improve the quality of the pretraining datasets by leveraging your previous experience, intuition and training experiments. • This role particularly focuses on generating synthetic data at scale and determining the best strategies to leverage such data into training large models. • You’ll closely collaborate with other teams like Pretraining, Postraining, Evals, and Product to define high-quality data needs that map to missing model capabilities and downstream use cases. • Staying in sync with the latest research in synthetic data generation and pretraining is key to success in this role. • You will constantly lead original research initiatives through short, time-bounded experiments while deploying highly technical engineering solutions into production. • With the volumes of data to process being massive, you'll have a performant distributed data pipeline together with a large GPU cluster at your disposal. • To deliver large, high-quality, and diverse synthetic datasets mixing natural language and code modalities to train best-in-class coding agents.

Job Requirements

  • Strong machine learning and engineering background
  • Experience with Large Language Models (LLM)
  • Understanding of how LLMs learn
  • Data ablations and scaling laws
  • Post-training techniques
  • Training reasoning and agentic models
  • Experience with implementing cost-efficient, complex pipelines to generate synthetical datasets at scale optimizing for data quality, correctness, diversity, etc.
  • Experience with evals tracking model capabilities (general knowledge, reasoning, math, coding, long-context, etc)
  • Experience in building trillion-scale pretraining datasets, and familiarity with concepts like data curation, deduplication, data mixing, tokenization, curriculum, impact of data repetition, etc.
  • Excellent programming skills in Python
  • Strong prompt engineering skills
  • Experience working with large-scale GPU clusters and distributed data pipelines
  • Strong obsession with data quality
  • Research experience: Author of scientific papers on any of the topics: applied deep learning, LLMs, source code generation, etc. - is a nice to have
  • Can freely discuss the latest papers and descend to fine details
  • Is reasonably opinionated

Benefits

  • Fully remote work & flexible hours
  • 37 days/year of vacation & holidays
  • Health insurance allowance for you and dependents
  • Company-provided equipment
  • Wellbeing, always-be-learning and home office allowances
  • Frequent team get togethers
  • Great diverse & inclusive people-first culture

Related Job Pages

More Software Engineer Jobs

Role Description We are looking for a Backend Developer to join an experienced team of developers and help design and implement backend services, data models, and processes. We are searching for an analytical mind capable of crafting abstractions and solutions for problems imposed by the business domain and coding those abstractions using our preferred programming languages. Our focus is on OOP in Scala. While we may use some FP concepts and best practices from time to time, we are not an FP shop. Our development follows the Domain Driven Design methodology. Responsibilities - Design and code backend services. - Contribute to a process involving several transformation steps, starting from an idea (or a business need) and crystallizing it into a relevant domain model. - Code the domain model and spin it up to run our business 24/7. - Engage in the following delivery cycle phases: - A Business Domain: It is a customer's business of interest. - Model of a customer domain: A constrained and purified view of a business domain, in the form of diagrams, drawings, text descriptions, pictures, and specifications. - Model codified in a high-level programming language: High-level abstractions with Domain Specification Language, which compiles into a lower-level programming language. - Model codified in lower-level programming languages: This phase involves coding skills through Scala, Java, and PostgreSQL. - Java Bytecode: Trust in Java compilers without interference. - Machine Instructions: Produced at runtime by bytecode interpretation and JIT compilation. Qualifications - At least 10 years of experience working in backend development with Scala/Java or equivalent. - Good knowledge of OOP/FP paradigms and design patterns. - Hands-on experience in working with transactional systems and concurrency. - Independence in work. - Fluency in English is essential. Requirements - Experience with one of these databases: PostgreSQL, Oracle, MSSQL, or DB2. - Familiarity with Domain-Driven Design approach. - Experience with debugging and performance tuning of applications and servers. Benefits - A skills-oriented organization with experienced experts in software architecture and development. - An abundance of development opportunities. - Interesting and challenging tasks, spanning multiple engineering concerns (features, security, performance, CI/CD, architecture, observability, concurrency, etc.). - Strong product department where developers' opinions are seriously taken into account. - An international and learning environment full of passionate and talented people. - Great work-life balance. Important This opportunity is available only for B2B contractors based in Serbia. The role includes onsite onboarding in Zagreb, as well as occasional business travel. If you’re a senior backend engineer who enjoys domain modeling, clean abstractions, and long-living systems, we’d love to hear from you!

Serbia
Job Closed
Full TimeRemoteTeam 501-1,000Since 2004H1B No Sponsor

Role Description - Design, build and maintain efficient, reusable and reliable C++ code - Help maintain code quality and organization - Implement performance and quality modules - Perform communications within a team - Proficiency in C++, with fair knowledge of the language specification - Knowledge of the standard library, STL containers, and algorithms - Understanding of memory management in non-garbage collected environments - Understanding of dynamic polymorphism and C++ specific notions - Familiarity with C++ templates - Production experience in UE4/5 will be a plus - Familiarity with continuous integration - Good communication skills and spoken English - "Common UI" practical experience would be a big plus - Working with an international team of world class professionals on exciting and challenging projects Qualifications - Proficiency in C++ - Knowledge of the standard library, STL containers, and algorithms - Understanding of memory management in non-garbage collected environments - Understanding of dynamic polymorphism and C++ specific notions - Familiarity with C++ templates - Production experience in UE4/5 (a plus) - Familiarity with continuous integration - Good communication skills and spoken English - "Common UI" practical experience (a plus) Benefits - Learning & Development opportunities – mentoring, lectures, participation at industry conferences and events - Medical Insurance package - Sensibly flexible working hours - Breakfasts, snacks and fruits available during the day, tea and coffee machines - Friendly team and a family-like environment - Casual workplace environment in Krakow (or Warsaw) including games and chill-out zones, and bicycle parking - Additional benefits – care bonus to cover health, educational and safety needs, Cafeteria My Benefit, paid health insurance, corporate parties and team buildings, and many more others

Poland
Job Closed

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description Do you want to be at the forefront of intelligence-driven cybersecurity? We at Centripetal are innovators of disruptive cybersecurity solutions. Our CleanINTERNET managed service operationalizes billions of threat indicators in real-time to prevent over 90% of known threats against enterprise networks. Our customers love us for reducing their cybersecurity risks and enabling their security operations to be more proactive, focused and efficient. Our Intelligence Services is a group within Centripetal that analyzes cyber threat intelligence to envision and create new technologies that power our managed service offerings. We focus on bold ideas around how to leverage data in cyber defense and pursue strategic initiatives that aim to paradigm-shift the cybersecurity landscape. This position has an impactful role in helping visualize and operationalize the findings of Intelligence Services to create and deliver a cutting-edge user experience with actionable insights for our customers. Position: Senior Software Engineer, Intelligence Services Location: US (Hybrid or Remote) What you would do: - Be a member of a high performing product development team. - Be a back-end engineer developing Kubernetes services for streaming data pipelines and application micro services. - Develop distributed software systems that enable analytics across many different cyber threat intelligence and data sources. - Build cloud-delivered cybersecurity products that leverage intelligence and data in innovative ways. - Work in a creative startup environment with a broad scope of ownership and high degree of autonomy. - Continuous delivery of features through CI/CD pipeline to production, delivering documented, maintainable, secure and testable code. - Contributing to innovation projects from research, ideation and incubation to scale and production. - Practicing trunk-based development by continuously integrating changes and commits with other team members. - Developing cloud-native software deployed on multi-cloud platforms. - Rapidly learning and applying leading edge technologies to new products. Qualifications - Experience with Object Oriented and Functional programming languages. Knowledge about the strengths of different programming language paradigms. - Building clean consumable REST interfaces, having clearly negotiated the contract with consumers of that interface. - Proven experience in developing applications on one or more of the major cloud service platforms: Google Cloud Platform, Amazon Web Services, or Microsoft Azure. - Experience in leveraging AI to increase your impact. - Familiarity with container orchestration platforms like Kubernetes and how applications are managed and deployed. - Practice Agile and DevOps methods such as pull-based systems (Kanban), Continuous Integration, and Continuous Delivery. - Familiarity with GitOps and how it can help with Continuous Delivery (like ArgoCD, FluxCD). - Event-Driven or Message based systems like AMQP or Kafka. - Understands how to develop successfully functioning MicroServices from event-driven architectures (like Event Sourcing and CQRS). - Product startup, small teams experience in Agile environments. - Comfortable working with different databases such as relational, NoSQL and search engines. Requirements - BS/MS degree in Computer Science or closely related field. - Strong experience of functional programming (preferably Clojure). - Strong CS skills such as data structures, algorithms, and problem solving. - Build and CI tools such as git, Gradle and CircleCI. - Solid familiarity with Linux based operating systems such as CentOS or RHEL. - Solid understanding of container-based applications (e.g. - how to build and run applications in a containerized environment like on docker). - Testing tools such as JUnit, Spock or Clojure.test. - Scripting skills in Bash, Ruby, or Python. Benefits - Clojure, Go, Python - Databricks, PySPark - Kafka, Kafka Streams, Warpstream - Relational DB, Document Stores (e.g. ElasticSearch) - GCP, AWS - Kubernetes, Cloud Native - AI tools - CI/CD, CircleCI, GitOps, FluxCD

United States
Job Closed
Nursa logo

Sr./Staff/Principal Software Engineer (Frontend/Fullstack)

Nursa

Reimagining the healthcare staffing industry by connecting clinicians and facilities directly to improve patient care.

Software Engineer125 days ago
OtherRemoteTeam 51-200H1B Sponsor

“The battery is the technology of our time.” -The Economist Voltaiq is an Enterprise Battery Intelligence (EBI) software company. Our data platform brings unprecedented analytics, visualization, and predictive capabilities to any company with a battery-powered business model. Our customers are world-leading brands — including global automakers (Mercedes, Subaru), household-name tech giants (Google, Meta, Amazon), and major battery producers and their materials suppliers (Albemarle, Sila Nanotechnologies) — depend on Voltaiq software to accelerate product development, optimize performance, ensure safety and reliability, and unlock financial value in their products. Our high-powered team is composed of battery industry veterans, PhD scientists, a highly skilled product and engineering team, and an advisory board of C-level industry execs, all of whom are passionate about enabling the global energy transition. Voltaiq is a USA-based fully-remote company serving customers around the world. The Role Voltaiq is seeking an exceptional Frontend/Fullstack Software Engineer to join our team as we build out the future of EBI data visualizations and user experiences. We are open to hiring at the Senior, Staff, or Principal level. You will work with other engineers and product managers to develop next-generation battery analytics software solutions to serve some of the world's biggest companies in automotive, consumer electronics, and battery manufacturing. Responsibilities: Design and develop web applications using Plotly, React, Django, and GraphQL Work with engineering leadership to shape future architectural decisions of Voltaiq’s Frontend Products, mentor team-mates, champion development best practices and keep abreast of new front-end technologies Work closely with the Product Management and Design functions, along with other engineers to understand requirements and design performant solutions Required Skills & Qualifications: 3+ years of experience in developing web applications using React, Python, and GraphQL 2+ years of data visualization experience with tools like Plotly.js, D3.js, or similar Deep experience with frontend technologies such as HTML, CSS, TypeScript, and Jest Proficient in cloud computing (AWS, Azure, GCP) Experience with relational database technologies like SQL Experience with containerization technologies like Docker Experience with agile software development methodologies Experience with Git or similar Version Control Systems Strong communication and collaboration skills Strong problem-solving and analytical skills Highly-developed attention to detail, with a drive to deliver intuitive and beautiful solutions to complex analytical workflows Comfortable working in a linux environment Bachelor's degree in Computer Science, Engineering or a related field or comparable experience A passion for working to help accelerate the global transition to sustainable energy Experience in the battery industry is a plus Additional Skills & Qualifications: Experience with React Relay Experience developing from Figma or similar design tools Experience with Django Experience with numpy and pandas Experience with generative AI a plus Voltaiq is a remote-first company, and salaries are adjusted for cost of labor in each city. The salary range for this position is $120,000 - $180,000 + equity, depending on location and experience. Voltaiq is an equal opportunity employer and is committed to achieving a diverse workforce through application of its equal opportunity and nondiscrimination policy, in all aspects of employment.

United States
$120K - $180K / year