Job Closed

This listing is no longer active.

Andromeda

Where technology meets empathy – pioneering the future of human-robot interaction.

Performance Engineer – AI Infrastructure

LLM EngineerMachine Learning EngineerOther Remote SeniorTeam 11-50H1B SponsorCompany Site LinkedIn

Location

California

Posted

156 days ago

Salary

Seniority

Senior

Bachelor DegreeEnglishKubernetes Python PyTorch Rust TensorFlow

Job Description

• Conduct end-to-end profiling of training workloads to identify bottlenecks across GPU kernels, NCCL communication, and storage I/O • Collaborate with systems engineers to improve scheduling efficiency, collective communication performance, and kernel execution • Build and maintain high-fidelity tooling to monitor and visualize MFU, throughput, and cluster uptime • Design technical processes that help the team operate effectively and avoid repeating performance regressions

Job Requirements

Proven experience running distributed training jobs on multi-GPU systems or HPC clusters
Strong programming skills in Python and C++ (Rust or CUDA experience is a major plus)
Solid understanding of PyTorch, JAX, or TensorFlow, and large-scale training loops
Familiarity with modern cloud infrastructure, including Kubernetes and Infrastructure as Code
Passion for measuring efficiency rigorously and translating raw profiling data into practical engineering improvements.

Benefits

Ownership and autonomy to shape how systems run
Celebrate diversity and create an inclusive environment

Related Categories

LLM Engineer AI Engineer Machine Learning Engineer AI Research Scientist Computer Vision Engineer NLP Engineer

Related Job Pages

LLM Engineer Jobs in California Remote Python Jobs (US)More Remote Jobs

More LLM Engineer Jobs

Senior Conversational AI Engineer

Miratech

Helping Visionaries Change the World

LLM Engineer156 days ago

Full Time RemoteTeam 501-1,000Since 1989H1B No Sponsor

Company Site LinkedIn

• Design, develop, and scale agentic AI systems using Google Agent Development Kit (ADK), ensuring enterprise-grade performance, security, and scalability. • Architect and implement multi-agent workflows, tool orchestration, and stateful conversational systems integrated with Dialogflow CX/ES. • Develop production-grade Python services (FastAPI, Flask, or equivalent) to support middleware, APIs, and enterprise integrations. • Design and deploy scalable solutions on Google Cloud Platform (GCP), leveraging services such as CCAI, Cloud Run, Cloud Functions, Pub/Sub, and BigQuery. • Implement advanced prompt engineering strategies, NLP/NLU best practices, context management, and robust error handling to optimize conversational experiences. • Integrate conversational agents with enterprise platforms (CRM systems, contact centers, databases) while ensuring observability through logging, monitoring, and performance optimization. • Provide technical leadership through architecture reviews, mentorship, best-practice enforcement, and cross-functional collaboration with product, DevOps, and business stakeholders.

BigQuery Django Flask GCP Python Terraform

View details: Senior Conversational AI Engineer

India

Apply

Job Closed

Senior Datacenter Architect – AI Infrastructure

ePlus Technology Solutions

Có tâm, đủ tầm, phát triển, vươn xa, ...

LLM Engineer158 days ago

Other RemoteTeam 51-200Since 2015H1B No Sponsor

Company Site LinkedIn

• Design and deliver end-to-end data center solutions covering compute, storage, and networking • Deploy and manage GPU-based systems (NVIDIA DGX, HGX, or similar) for AI and HPC workloads • Implement and support virtualization platforms (VMware ESXi, vCenter, vSAN, NSX) • Build and manage containerized environments using Kubernetes or related platforms • Automate infrastructure provisioning and operations using Ansible, Terraform, or scripting (Bash/Python) • Conduct infrastructure assessments, capacity planning, and performance tuning • Work closely with networking, storage, and DevOps teams to ensure smooth integration and delivery • Create and maintain technical documentation for customer and internal team

Ansible Kubernetes Python Terraform VMware

View details: Senior Datacenter Architect – AI Infrastructure

United States

$125K - $170K / year

Apply

Job Closed

Director, Data Center Energy Strategy – AI Infrastructure

EQL Tech (sales & engineering talent)

Tech recruitment specialists, scaling AI-native startups by hiring top 1% Sales, GTM & Engineering talent globally.

LLM Engineer158 days ago

Other RemoteTeam 1-10Since 2025H1B No Sponsor

Company Site LinkedIn

• Define the Standard: Establish technical and operational frameworks for solar + storage, fire safety, and water usage in next-gen data centers. • Drive the Narrative: Reframe solar as critical infrastructure for national security and economic competitiveness. • Build the Coalition: Engage directly with Frontier AI labs, hyperscalers, and energy experts to move solar-first design from concept to pilot. • Navigate Siting: Work with federal and local authorities to define permitting pathways for industrial and public land (e.g., BLM). • Publish the Manifesto: Author and gain external validation for a "Data Center Manifesto" defining best practices for the industry.

View details: Director, Data Center Energy Strategy – AI Infrastructure

United States

Apply

Job Closed