The Next Chapter logo
The Next Chapter

IT & Technology recruitment - contingency or "Recruiter as a Service" - we're your recruiter

Senior Linux Kernel Engineer – High-Performance Computing

Full-stack EngineerSoftware EngineerFull TimeRemoteSeniorTeam 1-10Since 2021H1B No SponsorCompany SiteLinkedIn

Location

Netherlands

Posted

35 days ago

Salary

$200K / year

Seniority

Senior

Bachelor Degree5 yrs expEnglishKubernetesLinuxPythonPyTorchTensorflowGo

Job Description

Senior Linux Kernel Engineer – High-Performance Computing

The Next Chapter

• Tuning the performance of clusters and InfiniBand networks to ensure optimal operation in HPC and GPU-based environments. • Analyzing and troubleshooting the root cause of issues related to GPUs and InfiniBand networks, and proposing corrective actions. • Integrating new hardware into the existing infrastructure, including support for new GPU hardware through software stacks like Kubernetes, QEMU, and KVM. • Enhancing automation systems for proactive monitoring, detecting, and resolving issues in GPU and InfiniBand environments. • Configuring and managing GPU devices and InfiniBand fabrics, ensuring efficient and reliable operation.

Job Requirements

  • 5+ years of professional experience in system-level software development (focused on performance optimization, low-level programming)
  • 3+ years of hands-on experience with Linux systems (administration, troubleshooting, and/or performance tuning)
  • Experience with relevant "tools of the trade" for kernel profiling & tuning: perf, ftrace, (e)BPF etc.
  • In-depth understanding of server architecture, including PCIe devices, NICs, Linux OS/Kernel etc.
  • Strong proficiency in one or more performance-oriented programming languages (C/C++, Go, Python).
  • It would be a plus (but not key....) if you have:
  • Experience with GPU end-to-end testing in a cluster environment using InfiniBand networking.
  • Proven track record of analyzing and optimizing the performance of HPC workloads (e.g., simulations, data analysis, AI/ML workloads).
  • Familiarity with RDMA, RoCE, and InfiniBand protocols for high-performance communication.
  • Background in Software-Defined Networking (SDN) and experience with HPC cluster networking.
  • Understanding of QEMU/KVM virtualization and managing virtualized environments.
  • Experience with deep learning frameworks such as PyTorch and TensorFlow, and their integration with HPC systems.
  • Familiarity with collective communication libraries like MPI and NCCL for distributed computing.

Benefits

  • Flexible working arrangements
  • A dynamic and collaborative work environment that values initiative and innovation.

Related Job Pages

More Full-stack Engineer Jobs

Vena Solutions logo

Senior Software Developer – Test

Vena Solutions

Take your entire business from reactive to proactive with the leading AI-Powered Complete FP&A Platform.

Full TimeRemoteTeam 501-1,000Since 2011H1B No Sponsor

• Analyze functional and technical requirements to define comprehensive test strategies and automation needs • Design, write, maintain, and manage automated UI and API test suites • Contribute to building and enhancing scalable automation frameworks (e.g., Playwright, Rest-Assured) • Identify, plan, and execute exploratory, functional, integration, performance, and regression tests • Collaborate with development and product teams to validate testing requirements and optimize test coverage • Champion best practices for automation, test data management, CI/CD integration, and code quality • Monitor, troubleshoot, and optimize automation pipelines and test frameworks for dependability, scalability, and stability • Perform root cause analysis and support teams in identifying and preventing defects

Canada
$90K - $120K / year
Dandy Dental Lab logo

Senior Software Engineer I, ML Platform

Dandy Dental Lab

Dandy oversees a platform created to help modernize the dental lab process. The company’s platform is designed to make the entire process digital from start t

Dandy is transforming the massive and antiquated dental industry—an industry worth over $400B. Backed by some of the world’s leading venture capital firms, we’re on an ambitious mission to simplify and modernize every function of the dental practice through technology. As we expand our reach globally, Dandy is building the operating system for dental offices around the world—empowering clinicians and their teams with technology, innovation, and world-class support to achieve more for their practices, their people, and their patients. About the Role In the past 3 years, Dandy has built the leading digital-first custom dental appliance manufacturer. As we move to the next level of scale, we are looking for a Senior Software Engineer to build the foundation of our ML Platform. You will be the bridge between SoTA Computer Vision research and production-grade reliability. You will design and scale the infrastructure that handles massive 3D datasets, orchestrates complex training pipelines, and ensures our generative models are deployed with high reliability. What You’ll Do - Collaborate with Machine Learning Engineers to build the ML training pipelines that process massive 3D datasets, orchestrate model training, and enable continuous model improvements. - Streamline the ML lifecycle, from data labeling and experimentation to deployment, by optimizing internal ML components and reducing technical debt. - Develop and maintain cloud-native systems and tooling (GCP/Kubernetes) that support Dandy’s 3D dental products in a secure, well-tested, and high-performing manner. - Write clean, maintainable code and tests that set the standard for our internal best practices. - Partner with stakeholders across the Engineering organization to influence long-term architectural goals and maintain a high-quality bar. What We’re Looking For - 5+ years of experience working as a Machine Learning Engineer or Software Engineer, ideally within a high-growth startup environment. - Deep proficiency in building and maintaining ML Platform components, such as feature stores, model registries, and distributed training. - Experience with Python-based ML frameworks (PyTorch/TensorFlow) and experience with 3D geometric Computer Vision is a plus. - Deep expertise in large-scale data processing, with proven experience building ML data pipelines to empower complex model training. - Experience creating and maintaining automated build processes across multiple environments (e.g., Buildkite). - Strong background in implementing metrics, logging, and tracing to monitor complex distributed systems. - Ability to communicate concisely about complex architectural problems and propose iterative, pragmatic solutions. - New York City area candidates: Ability to work hybrid / 3 days a week at the company HQ in NYC Req ID: J-962 For full-time positions, Dandy offers a wide range of best-in-class, comprehensive, and inclusive benefits tailored to each country where we operate. Our local benefits packages typically include healthcare, dental, mental health support, parental planning resources, retirement savings options, and generous paid time off—ensuring our team members are supported no matter where they live and work. Dandy is proud to be an equal-opportunity employer. We are committed to building a diverse and inclusive culture that celebrates authenticity to win as one. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, disability, protected veteran status, citizenship or immigration status, or any other legally protected characteristics. Dandy also fully complies with the Americans with Disabilities Act (ADA). We are dedicated to embracing challenges and creating an accessible, inclusive workplace for all individuals. If you require any accommodations for your interview or have any questions beforehand, rest assured that we will do everything we can to meet your needs. Visit Dandy Careers for more! Data Privacy Notice: By submitting your application, you consent to Dandy collecting, storing, and processing your personal information for recruitment purposes in accordance with our Privacy Policy and GDPR regulations. You have the right to access, rectify, or request the deletion of your data at any time by contacting Privacy Requests.

New York
$179.2K - $224K / year
Job Closed
NerdWallet logo

Senior Software Engineer, Full-Stack

NerdWallet

NerdWallet is a personal finance company that offers expert information, tailored insights, and helpful tools to help people get more from their money. Founded

• At NerdWallet, we’re on a mission to bring clarity to all of life’s financial decisions. • We are looking for a highly motivated Full-Stack back-end and data engineering focused Senior Software Engineer to join the Partner Data Ingestion Engineering team. • Write extensive code, contribute to architecture and design, and elevate the quality of deliverables while fostering team ownership of outcomes. • Projects you may be working on include: - Lead the transition of pricing inputs from spreadsheets to a custom internal tool. - Design and implement a comprehensive dashboard that delivers real-time data insights for improved decision-making. - Build and maintain API and file-based partner ingestion pipelines. - Develop innovative solutions to automate manual investigations. - Overhaul and unify the alerting system, implementing advanced data quality checks. - Leverage AI extensively for innovative solutions.

California
$136K - $252K / year
Job Closed
Full TimeRemoteTeam 51-200H1B No Sponsor

• Define and maintain the overall architecture and system design for scalable and efficient software solutions. • Provide technical leadership to engineering teams, guiding them in best practices for architecture, coding, and system scalability. • Design and document high-level system diagrams, data flows, and API structures. • Evaluate and recommend technologies, frameworks, and cloud services to optimize performance, security, and cost-efficiency. • Ensure high availability, fault tolerance, and security in all architectural decisions. • Oversee system integrations, including microservices, APIs, and third-party services. • Work closely with DevOps and SRE teams to ensure smooth deployment and operational reliability. • Establish coding and architecture standards, ensuring maintainability and scalability of codebases. • Identify and mitigate technical risks, ensuring system resilience and future scalability. • Conduct architecture reviews and performance audits to continuously improve system efficiency.

Ireland
€69K - €91K / year
Job Closed