AI Benchmarking and Telemetry Engineer

Artificial IntelligenceArtificial IntelligenceOtherRemoteLeadTeam 10,001+Since 1993H1B SponsorCompany SiteLinkedIn

Location

California + 2 moreAll locations: California | Texas | Virginia

Posted

117 days ago

Salary

$184K - $287.5K / year

Seniority

Lead

Job Description

AI Benchmarking and Telemetry Engineer

NVIDIA

• Formulate benchmarking methods for high-performance computing and AI tasks. • Perform and bring these methods to completion on large-scale GPU clusters. • Assess performance metrics to detect optimization opportunities and upgrade architecture. • Develop and maintain telemetry infrastructure to capture performance data. • Collaborate closely with hardware engineering, software development, and customer-facing teams to define performance requirements, fix bottlenecks, and validate configurations against real-world workloads. • Deploy and manage observability stacks including monitoring tools like Prometheus, visualization platforms such as Grafana, NVIDIA's DCGM, and custom telemetry solutions to provide actionable insights into cluster health, utilization, and performance trends. • Work directly with engineering and collaborate with internal partners to understand their performance requirements, conduct on-site benchmarking engagements, and deliver detailed analysis and recommendations for workload optimization. • Maintain extensive knowledge of industry-standard benchmarks in advanced computing and machine learning fields such as HPL, HPCG, MLPerf, and NCCL tests. • Contribute to developing new benchmarking methodologies for emerging workloads.

Job Requirements

  • Bachelor's degree in Computer Science, Electrical Engineering, Computer Engineering, or a related field (or equivalent experience).
  • 8+ years of direct experience working with HPC and/or AI infrastructure, including cluster deployment, performance analysis, and benchmarking.
  • Deep expertise in Linux system administration, including kernel tuning, process scheduling, storage I/O optimization, and solving performance issues at scale.
  • Proven experience crafting and implementing telemetry and monitoring solutions for large-scale distributed systems, with proficiency in tools such as Prometheus, Grafana, DCGM, collectd, or similar observability platforms.
  • Solid grasp of GPU architectures, CUDA programming principles, and GPU performance traits in high-performance computing and artificial intelligence workloads.
  • Familiarity with job schedulers (Slurm, PBS, LSF) and container orchestration platforms (Kubernetes, Docker) in HPC/AI environments.
  • Proficiency in Python, Bash, and other scripting languages for automation, data analysis, and workflow orchestration.
  • Excellent analytical and problem-solving skills with the ability to interpret complex performance data and communicate findings to both technical and non-technical audiences.

Benefits

  • Competitive salaries
  • Equity and benefits

Related Job Pages

More Artificial Intelligence Jobs

Nucs AI logo

Physician Annotator – Nuclear Medicine Clinical AI

Nucs AI

Personalizing care for patients with prostate cancer via AI powered advanced analyses of medical images.

OtherRemoteTeam 1-10H1B No Sponsor

• Clinical Expertise: Contribute to the development and refinement of annotation protocols and clinical guidelines for different use cases. • Identify edge cases, ambiguities, and potential sources of bias in clinical data and model behavior. • Participate in applied clinical AI research, including hypothesis development and evaluation of model performance. • Perform reviews of nuclear medicine imaging exams and provide clinical insight and annotations to support AI model training and refinement. • Validate AI outputs for clinical accuracy, safety, and relevance across defined use cases. • Provide concise, actionable clinical feedback on product features and workflows. • Collaborate asynchronously with clinical, product, and data science teams.

California + 1 moreAll locations: California | New York
Valsoft Corporation logo

AI Lead

Valsoft Corporation

Valsoft Corporation acquires and builds market software solutions. The company invests in stable businesses and aims to foster an entrepreneurial environment po

Role Description Servico is seeking an AI lead to join our growing team in Belgium! The AI Lead is responsible for identifying, building, validating, and scaling AI-powered product solutions that drive real customer and business outcomes. This role is hands-on and AI-first. An AI Lead does not operate through documents, tickets, or handoffs. They build working solutions using AI, engage directly with customers, and iterate rapidly based on real-world feedback. The AI Lead does everything an AI Product Engineer does - plus owns customer discovery, validation, prioritization, and measurable impact. Core Responsibilities - AI-First Product Building - Build working product solutions using AI as the primary execution medium - Rapidly prototype, iterate, and refine solutions to validate ideas early - Use AI agents for requirements analysis, design exploration, implementation, and testing - Move from problem understanding to working solution without reliance on legacy SDLC rituals - Customer Discovery & Engagement - Engage directly with customers to understand workflows, pain points, and unmet needs - Use live demos and working prototypes to drive customer conversations - Validate solutions through real usage, not assumptions or opinions - Build trust by clearly explaining AI behavior, limitations, and tradeoffs - Problem Framing & Opportunity Identification - Frame problems clearly and precisely before execution - Identify new AI-powered solution opportunities and revenue potential - Evaluate ideas based on impact, feasibility, and learning velocity - Use experimentation to discard weak ideas quickly - Outcome Ownership - Define success in terms of adoption, retention, efficiency, and revenue impact - Prioritize initiatives based on outcomes, not feature volume - Continuously assess whether solutions are creating real value - Own the full lifecycle from idea to impact - Cross-Functional Leadership - Work closely with AI Product Engineers and AI Developers within AI Pods - Align engineering, design, and data efforts around outcomes - Communicate clearly with internal stakeholders - Represent the product externally with credibility and clarity What This Role Is Not - Not a backlog or roadmap-only role - Not a handoff-based product manager - Not a requirements writer detached from execution Qualifications - Strong ability to build working software using AI tools - Proven experience rapidly prototyping and iterating on product ideas - Deep understanding of probabilistic AI systems and guardrails - Ability to reason about systems end-to-end - Experience engaging directly with customers - Ability to identify business and revenue opportunities - Strong product judgment and taste - Comfort prioritizing under uncertainty - Ability to explain complex AI behavior clearly and honestly - Strong storytelling grounded in real system behavior - Comfortable leading through influence rather than authority

Belgium
Job Closed
OtherRemoteTeam 1-10H1B No Sponsor

• Help improve AI reasoning in institutional asset management by contributing your fundamental research and portfolio management expertise to a training data project • Capture how senior PMs and research analysts evaluate securities, construct portfolios, and communicate investment views to institutional clients

United States
Job Closed
OtherRemoteTeam 1-10H1B No Sponsor

• Contribute commercial and corporate banking expertise to an AI training project • Help develop training data that reflects how relationship bankers and credit analysts evaluate borrowers, structure credit facilities, and manage corporate client relationships

United States
Job Closed