NVIDIA

Principal Software Engineer – Large-Scale LLM Memory and Storage Systems

Full-stack EngineerSoftware EngineerOther Remote LeadTeam 10,001+Since 1993H1B SponsorCompany Site LinkedIn

Location

California + 2 more

Posted

166 days ago

Salary

$272K - $425.5K / year

Seniority

Lead

Postgraduate Degree15 yrs expEnglishDistributed Systems Python

Job Description

• Design and evolve a unified memory layer that spans GPU memory, pinned host memory, RDMA-accessible memory, SSD tiers, and remote file/object/cloud storage to support large-scale LLM inference • Architect and implement deep integrations with leading LLM serving engines (such as vLLM, SGLang, TensorRT-LLM), with a focus on KV-cache offload, reuse, and remote sharing across heterogeneous and disaggregated clusters • Co-design interfaces and protocols that enable disaggregated prefill, peer-to-peer KV-cache sharing, and multi-tier KV-cache storage (GPU, CPU, local disk, and remote memory) for high-throughput, low-latency inference • Partner closely with GPU architecture, networking, and platform teams to exploit GPUDirect, RDMA, NVLink, and similar technologies for low-latency KV-cache access and sharing across heterogeneous accelerators and memory pools • Mentor senior and junior engineers, set technical direction for memory and storage subsystems, and represent the team in internal reviews and external forums (open source, conferences, and customer-facing technical deep dives)

Job Requirements

Masters or PhD or equivalent experience
15+ years of experience building large-scale distributed systems, high-performance storage, or ML systems infrastructure in C/C++ and Python, with a track record of delivering production services
Deep understanding of memory hierarchies (GPU HBM, host DRAM, SSD, and remote/object storage) and experience designing systems that span multiple tiers for performance and cost efficiency
Distributed caching or key-value systems, especially designs optimized for low latency and high concurrency
Hands-on experience with networked I/O and RDMA/NVMe-oF/NVLink-style technologies, and familiarity with concepts like disaggregated and aggregated deployments for AI clusters
Strong skills in profiling and optimizing systems across CPU, GPU, memory, and network, using metrics to drive architectural decisions and validate improvements in TTFT and throughput
Excellent communication skills and prior experience leading cross-functional efforts with research, product, and customer teams.

Benefits

Equity
Benefits

Related Categories

Remote Full-stack Engineer Jobs Remote Software Engineer Jobs Remote Backend Engineer Jobs Frontend Engineer Android Engineer iOS Engineer Game Engineer

Related Job Pages

Remote Full-stack Engineer Jobs Full-stack Engineer Jobs in California Remote Python Jobs (US)More Remote Jobs

More Full-stack Engineer Jobs

Director, Product Engineering

May Mobility

Transforming cities through autonomous technology to create a safer, greener, more accessible world.

Full-stack Engineer166 days ago

Other RemoteTeam 51-200Since 2017H1B Sponsor

Company Site LinkedIn

• Lead a team of product managers that span across autonomous driving technology, application software, data & analytics, and vehicle hardware • Create compelling & inspiring product visions, strategies and roadmaps for several areas by collaborating with internal stakeholders and leveraging market, behavior & technology trends • Collaborate across other key stakeholders in product management, engineering, BD, customer success, etc. on product vision & strategy. • Implement a unified product development process based on being a highly iterative, learning organization and by managing internal and external resources throughout the product life cycle (customer insights→ launch→ post-launch improvements→ end-of-life). • Ensure that product roadmaps are customer and innovation driven and create compelling and lasting competitive advantage • Ensure pricing and value analysis is developed to maximize competitiveness, volume and profit for all products while meeting budget requirements. • Communicate product vision, strategy, positioning and plans with other leaders and stakeholders. • Define OKRs for team and products and deliver on them • Ensure collaboration of cross-functional teams to develop refined and improved KPI processes. • Set market research objectives and ensure the market research team serves May Mobility to the highest possible level. • Provide support with special projects as requested.

View details: Director, Product Engineering

Michigan

$160K - $230K / year

Apply

Staff Engineer – Workflows Engine

HighLevel

The all-in-one sales & marketing platform that agencies can white-label. CRM, Email, 2-way SMS, Funnel Builder, & more!

Full-stack Engineer166 days ago

Full Time RemoteTeam 201-500Since 2018H1B No Sponsor

Company Site LinkedIn

• Re-architecture: Rebuild the Workflow Engine from Node.js to Go, creating a modular, high-performance foundation for billions of executions • Core abstractions: Design orchestration, state, retries, and execution guarantees with clear contracts and isolation boundaries • Performance model: Optimise for throughput-first execution while maintaining strict ordering within each workflow execution context • APIs & contracts: Define interfaces and schemas between Engine, Triggers, and Actions. Ensure consistent, reliable, and versioned communication • Reliability & observability: Partner with SRE to instrument metrics (latency, throughput, failure rate) and build replay and diagnostics tooling • Operational ownership: Own the engine’s runtime — incidents, RCA, and prevention. Deliver measurable reliability improvements (<1% failures/day) • Migration & rollout: Drive dual-run migration with progressive rollout and auto-rollback safety • Engineering culture: Set the technical benchmark for clarity, testability, and performance within Workflows and beyond

AWS Azure Distributed Systems GCP JavaScript Microservices MongoDB Node.js

View details: Staff Engineer – Workflows Engine

India

Apply

Job Closed

Staff Engineer – Contacts Platform

HighLevel

The all-in-one sales & marketing platform that agencies can white-label. CRM, Email, 2-way SMS, Funnel Builder, & more!

Full-stack Engineer166 days ago

Full Time RemoteTeam 201-500Since 2018H1B No Sponsor

Company Site LinkedIn

• Architect and scale the Contact Creation Engine, ensuring a single source of truth for contact data with 99.95% availability and minimal latency • Enhance and evolve the Search Engine ecosystem, working with Elasticsearch, Firestore, and ClickHouse to deliver fast, accurate search results at scale • Own and optimise cloud-native infrastructure using Docker and advanced Kubernetes (cluster management, tuning, networking, and configuration) • Establish technical standards, drive architecture governance, conduct design reviews, and champion engineering best practices • Diagnose and solve complex production issues involving latency, throughput, system contention, scaling limits, and distributed node behavior • Create and analyze postmortems and implement long-term fixes to prevent recurrence • Partner closely with Product, Platform, and DevOps teams to architect shared services and ensure platform-wide reliability • Mentor engineers, influence technical decisions, and guide long-term platform evolution

AWS Distributed Systems Docker Elasticsearch GCP Grafana JavaScript Jenkins Kubernetes Microservices Node.js

View details: Staff Engineer – Contacts Platform

India

Apply

Job Closed

Senior Software Engineer – FOS

CannonDesign

We design solutions that help people continuously flourish. Living-Centered Design is how we do it.

Full-stack Engineer166 days ago

Other RemoteTeam 1,001-5,000H1B No Sponsor

Company Site LinkedIn

• Contribute to the design and implementation of scalable, robust, and secure application architecture under the guidance of the Director of Software Engineering. • Lead a team of developers through the full software development lifecycle — from design and development to deployment and ongoing support of SaaS products. • Drive the adoption of best practices in software engineering within your team, focusing on DevOps competencies (CI/CD), monitoring and observability, performance, and automated testing. • Oversee modernization efforts - including maintenance and migration of legacy applications while ensuring minimal disruption to existing customers. • Ensure compliance with SOC2 controls by embedding evidence collection, access management, and secure development processes into daily workflows. • Set high standards for code quality by modeling clean, maintainable coding practices and guiding the team through effective peer reviews. • Work across teams to ensure seamless integration and successful deployment of applications. • Assist in the architectural design of SaaS software systems and implement key architectural initiatives as directed. • Continuously assess and improve system performance ensuring accuracy, reliability, and scalability and drive root cause analysis for production issues. • Stay current with industry trends, emerging technologies, and best practices in software engineering and architecture.

JavaScript Node.js Python React TypeScript .NET

View details: Senior Software Engineer – FOS

United States

$148K - $175K / year

Apply

Principal Software Engineer – Large-Scale LLM Memory and Storage Systems

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More Full-stack Engineer Jobs

Director, Product Engineering

Staff Engineer – Workflows Engine

Staff Engineer – Contacts Platform

Senior Software Engineer – FOS