Senior DGX Cloud AI Infrastructure Software Engineer

LLM EngineerMachine Learning EngineerOtherRemoteSeniorTeam 10,001+Since 1993H1B SponsorCompany SiteLinkedIn

Location

California + 3 moreAll locations: California | Oregon | Texas | Washington

Posted

135 days ago

Salary

$184K - $287.5K / year

Seniority

Senior

Bachelor Degree8 yrs expEnglishDistributed SystemsPrometheusPython

Job Description

Senior DGX Cloud AI Infrastructure Software Engineer

NVIDIA

• Develop infrastructure software and tools for large-scale pre-training, post-training, and inference. • Develop and optimize tools and libraries to improve infrastructure efficiency and resiliency. • Co-design and implement APIs for integration with NVIDIA's resiliency stacks. • Enhance infrastructure and products underpinning NVIDIA's AI platforms. • Define meaningful and actionable reliability metrics to track and improve system and service reliability. • Skilled in problem-solving, root cause analysis, and optimization. • Root cause and analyze and triage failures from the application level to the hardware level.

Job Requirements

  • Minimum of 8+ years of experience in developing software infrastructure for large scale AI systems.
  • Bachelor's degree or higher in Computer Science or a related technical field (or equivalent experience).
  • Strong debugging skills and experience in analyzing and triaging AI applications from the application level to the hardware level.
  • Experience with observability platforms for monitoring and logging (e.g., ELK, Prometheus, Loki).
  • Proven track record in building and scaling large-scale distributed systems.
  • Experience with AI training and inferencing infrastructure services.
  • Proficiency in programming languages such as Python, C/C++, script languages.
  • Experience in quality software engineering practices, including test development, defensive programming, version control, and CI.
  • Excellent communication and collaboration skills, and a culture of diversity, intellectual curiosity, problem solving, and openness are essential.

Benefits

  • equity
  • benefits

Related Job Pages

More LLM Engineer Jobs

Zillow logo

Conversational AI Engineer

Zillow

Reimagining real estate to make it easier than ever to move from one home to the next.

LLM Engineer139 days ago
OtherRemoteTeam 5,001-10,000Since 2006H1B Sponsor

• Design, build, and deploy intelligent chat agents and automated workflows to resolve common customer and frontline issues. • Integrate core systems (such as Salesforce) with AI tools to create a unified, compliant user experience. • Develop and optimize prompts to ensure the AI delivers accurate, relevant answers and help content. • Evaluate, onboard, and manage AI/ML tools and emerging technologies to enhance system performance. • Implement safeguards and monitoring to maintain accuracy, prevent misinformation, and build user trust. • Collaborate with Product, Engineering, QA, Content, and Analytics teams to embed conversational AI into business strategy and track performance. • Apply machine learning and large language models to improve natural language understanding and generation in our chat agents.

California + 15 moreAll locations: California | Colorado | Connecticut | District of Columbia | Hawaii | Illinois | Nevada | New Jersey | New York | Ohio | Maryland | Massachusetts | Minnesota | Rhode Island | Vermont | Washington
$136.3K - $217.7K / year
Job Closed
Mirantis logo

Technical Partner Manager, AI Infrastructure

Mirantis

Strategic open source infrastructure for containers and virtual machines.

LLM Engineer153 days ago
OtherRemoteTeam 501-1,000H1B Sponsor

• Develop and manage strategic technical partnerships across the AI infrastructure ecosystem. • Support Business Development leadership as the primary technical liaison between Mirantis and strategic technology partners. • Collaborate with product management, engineering, and sales to drive joint solution development, technical validation, and technical go-to-market alignment. • Build and/or support enablement sessions, solution demos, and technical workshops for partners and, where needed, customers. • Represent Mirantis at industry events, technical summits, and partner briefings. • Gather partner and customer feedback to influence product roadmap and partnership strategy.

United States
$250K - $300K / year
Job Closed