Job Closed

This listing is no longer active.

NVIDIA

Systems Software Engineer, AI Infrastructure

LLM EngineerMachine Learning EngineerFull Time Remote SeniorTeam 10,001+Since 1993H1B SponsorCompany Site LinkedIn

Location

India

Posted

160 days ago

Salary

Seniority

Senior

Bachelor Degree5 yrs expExperience acceptedEnglishAWS Azure GCP Linux Perl Prometheus Python Ruby Terraform

Job Description

• Develop and maintain large-scale systems supporting critical use-cases including frontier model training for AI Infrastructure, driving reliability, operability, and scalability across global public and private clouds. • Collaborate on tooling for HPC, GPU Training, and AI Model training workflows. • Build tools and frameworks to improve observability, define actionable reliability metrics, and enable fast issue resolution, driving continuous improvement in system performance. • Establish frameworks for operational maturity, lead sustainable incident response protocols, and conduct blameless postmortems to improve team efficiency and system resilience. • Implement SRE fundamentals, including incident management, monitoring, and performance optimization, while designing automation tools to reduce manual processes and operational overhead. • Work with engineering teams to deliver innovative solutions, uphold high standards for code and infrastructure, and contribute to hiring for a diverse, high-performing team.

Job Requirements

Degree in Computer Science or related field, or equivalent experience with 5+ years in Software Development, SRE, or Production Engineering.
Proficiency in Python and at least one other language (C/C++, Go, Perl, Ruby).
Expertise in systems engineering within Linux or Windows environments and cloud platforms (AWS, Azure, GCP, or OCI).
Strong understanding of SRE principles, including error budgets, SLOs, SLAs, and Infrastructure as Code tools (e.g., Terraform CDK).
Hands-on experience with observability platforms (e.g., ELK, Prometheus, Loki) and CI/CD systems (e.g., GitLab).
Strong communication skills with the ability to convey technical concepts effectively to diverse audiences.
Commitment to fostering a culture of diversity, curiosity, and continuous improvement.

Benefits

highly competitive salaries
comprehensive benefits package

Related Categories

LLM Engineer AI Engineer Machine Learning Engineer AI Research Scientist Computer Vision Engineer NLP Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More LLM Engineer Jobs

Senior AI/ML Engineer (LLM)

iBusiness Funding

Helping to provide capital in an efficient and transparent manner to every small and medium-sized business in America.

LLM Engineer160 days ago

Other RemoteTeam 201-500Since 2013H1B No Sponsor

Company Site LinkedIn

About iBusiness Funding iBusiness Funding is a software and lender service provider specializing in small business lending. Our technology, team, and process enable us to support loans from $10,000 to $25 million for our lending partners. Our technology solutions have been proven to quickly scale our clients’ portfolios without the need for additional overhead. Our flagship product, LenderAI, features end-to-end lending functionality from sales all the way through servicing To date, we’ve processed over $11 billion in SBA and non-SBA volume and handle more than 1,000 business loan applications daily. Our team is driven by our core values of innovation, integrity, enjoyment, and family. Join us and be part of a team that’s transforming the finance industry and empowering businesses to thrive! Position Description We are seeking an experienced Senior LLM Engineer to join our team. You will play a key role in designing and implementing workflows that leverage large language models (LLMs, LAMs, LMMs, LVLMs, etc.) to automate the process and drive innovation in our products. The ideal candidate will have a deep understanding of NLP, experience with foundational models, and a flexible, problem-solving mindset. You will collaborate closely with cross-functional teams, contributing to the development of scalable AI-driven solutions. Major Areas of Responsibility Design, Implement, and optimize workflows that incorporate large language models to automate and enhance product features. Leverage existing foundational models and adapt them to fit into various product requirements, ensuring alignment with business goals.

AWS AI / ML Python

View details: Senior AI/ML Engineer (LLM)

Florida

$170K - $210K / year

Apply

Job Closed

Technical Director, Media & Entertainment AI Infrastructure

Nebius Group

LLM Engineer166 days ago

Full Time RemoteTeam 1,001-5,000H1B No Sponsor

Company Site LinkedIn

• Own the Technical Blueprint: Personally architect the infrastructure solutions for our most strategic M&E partnerships, studio-scale content production pipelines, agency data consolidation plays, generative AI model deployments. • The Physics to P&L Narrative: Fluently demonstrate to executive stakeholders how infrastructure decisions, data lake locality, storage tiering, inference optimization, directly impact their business model and operability. • Deconstruct the Bottleneck: Go beyond the stated problem to find the technical truth. Translate vague business goals (e.g., “We need lower rendering costs”) into precise engineering requirements (e.g., “We need to optimize the inference batch size on L40s to reduce cost-per-token by 30%”). • Map the Transition: Identify exactly where a customer sits on the curve from legacy service bureau to AI-native tech platform and prescribe the specific infrastructure intervention needed to move them forward. • Build and Validate the Integration Layer: Identify, engage, and technically validate relationships with the most critical ISVs in the media and entertainment landscape, from rendering and VFX toolchains to generative AI platforms. • Define the Standard: It is not enough to support these tools. You will define the reference architectures for how they run best on Nebius infrastructure, and work directly with ISV engineering teams to build and publish those standards. • Decide What’s Worth Doing: In partnership with the GM, evaluate ISV and partner opportunities on their technical merit and strategic leverage, and be equally rigorous about what not to pursue. • Shape the M&E Roadmap: Use forensic evidence from the field to prioritize and justify the M&E vertical roadmap. You will work directly with Nebius’s global Head of Product and Head of Engineering to translate partner and customer needs into product direction. • Lead the M&E Product Summit: Chair a quarterly summit with Core Engineering leadership, using field evidence to drive roadmap decisions and maintain vertical momentum.

Cloud Distributed Systems Kubernetes Go

View details: Technical Director, Media & Entertainment AI Infrastructure

California + 1 more

$295K - $365K / year

Apply

AI/LLM Engineer

Ostro

Knowledge is the best medicine.

LLM Engineer166 days ago

Other RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Develop performant, scalable, and high quality APIs and backend processes for Ostro's SaaS platform, with a strong emphasis on LLM integration. • Collaborate with cross-functional teams to implement new features and refine existing ones, particularly those involving AI/LLM capabilities. • Provide feedback on roadmap and features for your team, contributing to the strategic direction of Ostro’s AI/LLM initiatives. • Ensure code quality and compliance through thorough reviews, unit testing, and adherence to best practices for LLM-powered applications and Ostro engineering. • Optimize application performance and scalability to meet user demands, especially for LLM inference and data processing. • Stay informed about emerging AI/LLM technologies, prompt engineering techniques, and industry trends. • Troubleshoot and resolve production issues, ensuring performance, reliability, and scalability of LLM-driven features.

Django Python

View details: AI/LLM Engineer

United States

$159.3K - $202.4K / year

Apply

Job Closed

AI/LLM Engineering – Working Student

Cognitx

Empowering your vision with transformative AI solutions.

LLM Engineer173 days ago

Part Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Build and improve agentic workflows (tool/function calling, planning, self-checks) for analytics, summaries, visualizations, and task automation. • Implement adapters and tools to connect LLMs with internal and external services. • Contribute to our FastAPI backend with clean interfaces, Pydantic validation, and tests. • Develop evaluation metrics to measure accuracy, latency, and cost. • Optimize prompts, retrieval/contexting, and execution strategies for privacy, reliability, and performance. • Ship services in containers (Docker) and collaborate on deployments (Kubernetes), CI, and observability. • Document technical decisions and share learnings with the team.

Docker Kubernetes Microservices PostgreSQL Python React Redis TypeScript

View details: AI/LLM Engineering – Working Student

Germany

Apply

Systems Software Engineer, AI Infrastructure

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More LLM Engineer Jobs

Senior AI/ML Engineer (LLM)

Technical Director, Media & Entertainment AI Infrastructure

AI/LLM Engineer

AI/LLM Engineering – Working Student