Member of Technical Staff, TPU & AMD GPU Performance Engineering

Full-stack EngineerSoftware EngineerFull TimeRemoteLeadTeam 11-50Since 2025H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

4 days ago

Salary

$200K - $400K / year

Seniority

Lead

Job Description

Member of Technical Staff, TPU & AMD GPU Performance Engineering

Inferact

Role Description We're looking for a TPU and AMD GPU performance engineer to make vLLM a first-class inference engine across non-NVIDIA accelerators. Frontier inference cannot be locked to one hardware stack. As AMD GPUs, TPUs, and other accelerators become increasingly important, vLLM needs backend paths that are fast, correct, benchmarked, and maintainable across heterogeneous hardware platforms. - Build and optimize AMD GPU and TPU backends, kernels, compiler integrations, runtime paths, and benchmarking infrastructure. - Work at the boundary of inference systems, kernels, compilers, and hardware architecture. - Improve paths such as attention, GEMM, sampling, KV-cache, communication-heavy operations, and model serving on non-NVIDIA hardware. - Your work will directly impact how broadly and efficiently the world can run AI inference with vLLM. Qualifications - Bachelor's degree or equivalent experience in computer science, engineering, machine learning systems, hardware systems, compilers, or similar. - Hands-on experience optimizing workloads on AMD GPUs, TPUs, or another non-NVIDIA accelerator stack. - Experience with AMD ecosystem tools such as ROCm, HIP, Triton, CK, AITER, or equivalent GPU performance libraries and tooling. - Experience with TPU, XLA, JAX, Pallas, or related compiler and runtime tooling for accelerator workloads. - Ability to optimize ML inference paths such as attention, GEMM, sampling, KV-cache, fused kernels, backend runtimes, or communication-heavy operations. - Strong performance profiling and benchmarking discipline, including tokens/second, latency, throughput, correctness parity, hardware counters, and reproducible measurement methodology. - Ability to navigate immature tooling, incomplete documentation, backend-specific rough edges, and cross-platform performance differences without getting stuck. Requirements - Experience with vLLM, SGLang, TensorRT-LLM, ATOM, JAX-based serving framework, or other LLM inference systems. - Deep understanding of inference architecture and serving tradeoffs, including batching, KV-cache, decoding, prefill/decode scheduling, and backend performance constraints. - Experience with compiler technologies such as XLA, MLIR, LLVM, Triton, Pallas, or other compiler/kernel DSLs, including lowering, fusion, and backend code generation. - Knowledge of quantization techniques such as MXFP8, MXFP4, mixed precision, or hardware-specific numeric formats, and the ability to reason about accuracy/performance tradeoffs. - Experience with distributed inference performance, including communication, memory movement, hardware topology, and scale-out bottlenecks across multi-accelerator workloads. - Open-source contributions to vLLM, JAX/XLA, ROCm, Triton, PyTorch, compiler projects, or related ML systems infrastructure. Benefits - Generous health, dental, and vision benefits. - 401(k) company match. Logistics - Location: This role is based in San Francisco, California. Will consider remote in the US for exceptional candidates. - Compensation: Depending on background, skills, and experience, the expected annual salary range for this position is $200,000 - $400,000 USD + equity. - Visa sponsorship: We sponsor visas on a case-by-case basis.

Related Job Pages

More Full-stack Engineer Jobs

Acuity Insights logo

Senior Software Developer – One45 Team

Acuity Insights

Empowering higher education institutions to look beyond book smarts

Full TimeRemoteTeam 51-200Since 2014H1B No Sponsor

• You’ll help maintain and improve a product that institutions already depend on while contributing to a modernization effort that’s beginning to move from planning into execution. • You’ll likely spend most of your time learning how the system behaves in production by working on customer-facing enhancements, bug fixes, and workflow improvements. • Over time, that balance shifts, and you’ll move into larger initiatives that help shape where One45 goes next. • You’ll help the team navigate a platform that’s already doing a lot while contributing to where it goes next. • You’ll participate in shared support responsibilities, including occasional deployment activities, incident response, and coverage during key client periods throughout the year.

Canada
$160K - $180K / year
ConsumerAffairs logo

AI-First Software Engineer

ConsumerAffairs

Make big purchases no big deal

Full TimeRemoteTeam 201-500Since 2007H1B No Sponsor

• Build and Maintain: Design, build, test, and maintain back-end systems and APIs to ensure scalability, reliability, and performance. • Collaborate: Work closely with engineers, product managers, and designers to implement features and ensure seamless integration across the stack. • Write High-Quality Code: Deliver clean, efficient, and well-documented code that aligns with best practices and coding standards. • Database Management: Design and optimize database schemas, queries, and data storage solutions to support application performance. • Debug and Troubleshoot: Identify and resolve back-end issues promptly to maintain system reliability. • Code Reviews: Participate in peer code reviews, providing constructive feedback to ensure code quality and maintainability. • Improve the development system: Encode standards and patterns into reusable AI context. Build verification hooks that catch quality issues before human review.

United States
Full TimeRemoteTeam 1,001-5,000Since 2014H1B No Sponsor

• Build and maintain connections to banks and brokerages in Canada through scraping, custom flows, and direct APIs. • Develop tools that make data acquisition more reliable and easier to scale. • Use AI-assisted systems to reduce manual scraping work and help the team react faster to interface changes. • Design systems that recover quickly when institutions change their login flows or layouts. • Work with product, security, infrastructure, and SRE teams to deliver accurate balances, holdings, and transactions to clients. • Improve our internal automation so adding and repairing institutions becomes faster over time.

Canada
CA$151.2K - CA$189K / year
Accenture Federal Services logo

Full Stack Developer

Accenture Federal Services

We believe in the power of change, harnessed in ways that matter for our country and communities.

Full TimeRemoteTeam 10,001+Since 2017H1B No Sponsor

• Develop solutions using .NET, C#, React, NodeJS, Javascript, Azure Services, webapps, and function apps • Use requirements gathering and research to develop insightful conclusions and generate solutions to address user needs • Field requests from stakeholders, clarifying requirements, and responding with data solutions that satisfy the needs • Assist in documentation of current processes and development of documentation for new processes along with other methods of knowledge sharing • Participate in the full software development life cycle, including solution design, development, code review, source code control, testing, deployment • Prioritize attention to detail and accountability to meet critical deadlines • Flexibility in schedule based on business need. While night and weekend work is generally not required, critical deliverables must go out on time

Virginia
$64K - $124.2K / year