Job Closed

This listing is no longer active.

Senior Software Developer, AI Networking

Full-stack EngineerSoftware EngineerOtherRemoteSeniorTeam 10,001+Since 1993H1B SponsorCompany SiteLinkedIn

Location

California + 2 moreAll locations: California | Texas | Washington

Posted

81 days ago

Salary

$152K - $241.5K / year

Seniority

Senior

Bachelor Degree3 yrs expEnglishDistributed SystemsPythonPyTorchTensorFlow

Job Description

Senior Software Developer, AI Networking

NVIDIA

• Characterizing AI workloads and deep learning models aimed at large-scale LLM training and inference on NVIDIA supercomputers • The role centers on distributed systems with a focus on high-performance networking and NVIDIA communication libraries • Benchmarking, profiling, and analyzing the performance to find bottlenecks and identify areas for improvement and optimizations, with a strong emphasis on networking aspects • Developing PyTorch trace-based profiling, analysis, and replaying toolset to aid in benchmarking, debugging, and co-designing network systems for LLM workloads • Collaborating with multiple teams from hardware to software to provide performance analysis insights • Defining performance test plans, setting performance expectations for new technologies and solutions, and working to achieve performance targets.

Job Requirements

  • B.Sc in Computer Science or Software Engineering or equivalent experience
  • 3+ years of experience with high-performance networking (RDMA, MPI, NCCL, SHARP)
  • Demonstrated ability in performance evaluation techniques and approaches
  • Experience with NVIDIA GPUs and the CUDA library
  • Knowledge of deep learning frameworks like TensorFlow or PyTorch
  • Expertise in networking collective communication libraries such as NCCL and protocols like RoCE and RDMA
  • Fast and self-learning capabilities with strong analytical and problem-solving skills
  • Proficiency in programming languages: Python, Bash, and C++
  • Experience with a container-based development environment.

Benefits

  • Competitive salaries
  • Generous benefits package
  • Equity opportunities

Related Job Pages

More Full-stack Engineer Jobs

Full TimeRemoteTeam 1,001-5,000H1B Sponsor

• Take technical ownership of medium to large-scale web platforms, contributing hands-on while guiding architectural and delivery decisions. • Translate business goals into robust, scalable technical solutions. • Provide technical leadership through mentoring, code reviews, and pairing. • Design and evolve front-end and back-end architectures with a focus on performance, accessibility, security, and maintainability. • Identify technical risks, delivery bottlenecks, and systemic issues proactively. • Champion high-quality engineering practices such as automated testing, CI/CD, and observability. • Operate within Agile, Lean, or hybrid delivery models, adapting processes pragmatically.

Sweden
Job Closed
DoiT International logo

Product Engineer, DoiT Labs

DoiT International

DoiT International is a computer software company that is on a mission to help clients “focus on building the best products for their own customers.” As an

• Full-lifecycle problem solving • Own problems end-to-end: from understanding user pain, through solution design, implementation, release, measurement, and iteration - not just the coding step. • Engage directly with customers and internal domain experts to build deep empathy for the workflows and challenges of cloud operators and FinOps practitioners. • Translate ambiguous problem spaces into clear, thin-sliced increments that can be shipped, measured, and learned from quickly. • Use AI tools daily to amplify your own engineering work - coding, analysis, research, and prototyping. • Design and build AI-powered features as a default approach: intelligent recommendations, automated insights, natural-language interfaces, and predictive capabilities for cloud cost optimization. • Make informed decisions on model selection, prompt engineering, latency/accuracy/cost tradeoffs, and responsible AI considerations as a core part of your engineering practice. • Operate with a bias toward action: prototype rapidly, ship frequently, and validate ideas through real customer usage rather than prolonged planning cycles. • Build experiments and MVPs that generate measurable learning - and use those learnings to decide what to invest in next. • Maintain high engineering standards without letting perfection slow down delivery; know when to take deliberate shortcuts and when to invest in durability. • Build across the full stack - backend services, APIs, data pipelines, and frontend interfaces - whatever the problem demands. • Work with cloud-native billing, usage, and operational data from AWS, GCP, and Azure to build cost optimization and governance capabilities. • Develop solutions that operate across Kubernetes environments, data cloud platforms, and broader multi-cloud infrastructure. • Build state-of-the-art solutions for Generative AI observability and FinOps - enabling customers to understand, monitor, and optimize the cost and performance of their AI/ML workloads across cloud environments. • Take full ownership of the solutions you ship - including reliability, user experience, and measurable outcomes. • Define what success looks like for your work using clear metrics: adoption, activation, workflow improvement, cost savings delivered, and customer-reported impact. • Participate in customer conversations and feedback loops to continuously validate direction and surface new opportunities.

Portugal
Job Closed
DoiT International logo

Product Engineer, DoiT Labs

DoiT International

DoiT International is a computer software company that is on a mission to help clients “focus on building the best products for their own customers.” As an

• Full-lifecycle problem solving • Own problems end-to-end: from understanding user pain, through solution design, implementation, release, measurement, and iteration - not just the coding step. • Engage directly with customers and internal domain experts to build deep empathy for the workflows and challenges of cloud operators and FinOps practitioners. • Translate ambiguous problem spaces into clear, thin-sliced increments that can be shipped, measured, and learned from quickly. • Use AI tools daily to amplify your own engineering work - coding, analysis, research, and prototyping. • Design and build AI-powered features as a default approach: intelligent recommendations, automated insights, natural-language interfaces, and predictive capabilities for cloud cost optimization. • Make informed decisions on model selection, prompt engineering, latency/accuracy/cost tradeoffs, and responsible AI considerations as a core part of your engineering practice. • Operate with a bias toward action: prototype rapidly, ship frequently, and validate ideas through real customer usage rather than prolonged planning cycles. • Build experiments and MVPs that generate measurable learning - and use those learnings to decide what to invest in next. • Maintain high engineering standards without letting perfection slow down delivery; know when to take deliberate shortcuts and when to invest in durability. • Build across the full stack - backend services, APIs, data pipelines, and frontend interfaces - whatever the problem demands. • Work with cloud-native billing, usage, and operational data from AWS, GCP, and Azure to build cost optimization and governance capabilities. • Develop solutions that operate across Kubernetes environments, data cloud platforms, and broader multi-cloud infrastructure. • Build state-of-the-art solutions for Generative AI observability and FinOps - enabling customers to understand, monitor, and optimize the cost and performance of their AI/ML workloads across cloud environments. • Take full ownership of the solutions you ship - including reliability, user experience, and measurable outcomes. • Define what success looks like for your work using clear metrics: adoption, activation, workflow improvement, cost savings delivered, and customer-reported impact. • Participate in customer conversations and feedback loops to continuously validate direction and surface new opportunities.

Poland
Job Closed
DoiT International logo

Product Engineer

DoiT International

DoiT International is a computer software company that is on a mission to help clients “focus on building the best products for their own customers.” As an

• Full-lifecycle problem solving • Own problems end-to-end: from understanding user pain, through solution design, implementation, release, measurement, and iteration - not just the coding step. • Engage directly with customers and internal domain experts to build deep empathy for the workflows and challenges of cloud operators and FinOps practitioners. • Translate ambiguous problem spaces into clear, thin-sliced increments that can be shipped, measured, and learned from quickly. • Use AI tools daily to amplify your own engineering work - coding, analysis, research, and prototyping. • Design and build AI-powered features as a default approach: intelligent recommendations, automated insights, natural-language interfaces, and predictive capabilities for cloud cost optimization. • Operate with a bias toward action: prototype rapidly, ship frequently, and validate ideas through real customer usage rather than prolonged planning cycles. • Build experiments and MVPs that generate measurable learning - and use those learnings to decide what to invest in next. • Maintain high engineering standards without letting perfection slow down delivery; know when to take deliberate shortcuts and when to invest in durability. • Build across the full stack - backend services, APIs, data pipelines, and frontend interfaces - whatever the problem demands. • Work with cloud-native billing, usage, and operational data from AWS, GCP, and Azure to build cost optimization and governance capabilities. • Develop solutions that operate across Kubernetes environments, data cloud platforms, and broader multi-cloud infrastructure. • Build state-of-the-art solutions for Generative AI observability and FinOps - enabling customers to understand, monitor, and optimize the cost and performance of their AI/ML workloads across cloud environments. • Take full ownership of the solutions you ship - including reliability, user experience, and measurable outcomes. • Define what success looks like for your work using clear metrics: adoption, activation, workflow improvement, cost savings delivered, and customer-reported impact. • Participate in customer conversations and feedback loops to continuously validate direction and surface new opportunities.

Romania
Job Closed