Senior Software Engineer, NCCL, CUDA
Location
California + 2 moreAll locations: California | Texas | Washington
Posted
35 days ago
Salary
$184K - $287.5K / year
Seniority
Senior
Job Description
Senior Software Engineer, NCCL, CUDA
NVIDIA
• Engage with our CSPs to root cause functional and performance issues in NCCL and CUDA libraries. • Analyze and improve multi-GPU workloads performance through profiling, benchmarking, and tuning. • Understand and solve NCCL and NVSHMEM data movement issues in multi-node clusters. • Understand and solve CUDA porting issues for customer workloads. • Apply datacenter-specific scheduling and topologies for optimal performance. • Debug and resolve complex issues related to GPU computation, memory, and transports. • Collaborate with customers to understand their workload integration specific challenges to NCCL and CUDA libraries and suggest tailored solutions aligned with the NVIDIA ecosystem. • Collaborate with AE, FAE, and solution architects to deliver integrated customer solutions and technical documentation. • Collaborate with internal teams to help customers use the latest advancements in CUDA and in NCCL.
Job Requirements
- 8+ years of system software validation experience
- Excellent C/C++ programming and debugging skills, with experience in CUDA development.
- Deep understanding of operating systems and data-center system architecture.
- Experience with performance optimization and profiling tools (e.g., Nsight, nvprof)
- Good exposure to PCIe and NVLINK.
- Knowledge of high-performance networking like InfiniBand, and RoCE.
- Proficient understanding of compute, networking and cloud deployment, specifically on bare-metal and VMs.
- Familiarity with containers, cloud provisioning and scheduling tools such as Docker, Kubernetes, SLURM, and Ansible.
- BS or MS in Computer Engineering, Computer Science, or related field (or equivalent experience).
- Ability to communicate effectively and collaborate with partner and customer teams.
Benefits
- equity
- benefits
Related Guides
Related Job Pages
More Full-stack Engineer Jobs
Staff Software Engineer, AI, Platform
SmithRxSmithRx is a tech-forward PBM committed to changing the way pharmacy benefits are managed.
• Focus on architecting system; discovering, designing, developing and maintaining SmithRx product features using a varied technology stack such as Golang, JavaScript/Typescript, Node, GraphQL, and SQL (amongst others), while leveraging full CI and CD to iterate quickly • Collaborate with stakeholders and technical leads to translate requirements into high-quality, AI-native scalable software, while providing scope and risk assessments through comprehensive technical design documentation. • Be the Stochastic/Deterministic Bridge: Build the "safety rails" around LLMs, creating deterministic software systems that effectively manage, validate, and wrap the stochastic nature of AI models to ensure 100% accuracy in healthcare data. • Continually apply and enforce software development best practices, design patterns, testing, automation, tools and technologies by adoption of emerging AI technologies like Claude-code, Copilot or aikes. • Leverage AWS technologies, like Lambda, Managed Kafka, EKS to handle various types of jobs and batch processing • Troubleshoot production issues by performing triages for issues that arise, assessing the impact, creating and executing a plan for short-term and long-term mitigation, and performing root cause analysis to prevent future occurrence of issues. • Foster a collaborative learning culture by mentoring junior engineers in both traditional backend best practices and emerging AI-native development patterns, while driving technical excellence through code reviews and the exploration of innovative technologies. • Work with senior leadership to turn technical vision into a tangible roadmap every quarter
Junior Software Engineer
AIS (Applied Information Sciences)A Partner That Brings Enterprise Cloud Transformation Full Circle
• Work in a team with other smart AIS employees using cutting edge technologies to solve challenging business problems • Build elegant, scalable, extensible, cost-effective solutions with an eye towards innovation and agility • Utilize problem-solving and decision-making skills to understand client pain points and to self- troubleshoot as challenges arise • Collaborate with other members of the development team and project managers to deliver solutions that surpass client expectations while meeting deadlines and budgets • Work in an agile environment with participation in daily stand-ups/scrum • Design, write, test, troubleshoot, and document application code • Learn new technologies and be aware of industry standards, best practices, and trends
Senior Software Engineer
AI2CYBERPioneering the Future of Cyber Defense with Cutting-Edge AI Solutions
• Design, build, and maintain **secure, scalable Python backends** (APIs, data pipelines, integrations). • Develop **React/TypeScript frontends** for dashboards, CTI visualizations, and analyst workflows. • Architect and optimize databases (MySQL, MongoDB, PostgreSQL). • Contribute to data ingestion, parsing, and correlation pipelines. • Collaborate with cybersecurity analysts and red/blue teams to integrate detection and response logic. • Ensure coding excellence: clean, testable, efficient, and secure. • Take ownership of technical design, code reviews, and mentoring of junior engineers.
• Orion Innovation is a premier, award-winning, global business and technology services firm. • Work with a wide range of clients across many industries including financial services, professional services, telecommunications and media, consumer products, automotive, industrial automation, professional sports and entertainment, life sciences, ecommerce, and education.




