Senior HPC AI Cluster Engineer

Artificial IntelligenceArtificial IntelligenceFull TimeRemoteSeniorTeam 10,001+Since 1993H1B SponsorCompany SiteLinkedIn

Location

Germany

Posted

8 days ago

Salary

0

Seniority

Senior

Job Description

Senior HPC AI Cluster Engineer

NVIDIA

• Design, implement and maintain large scale HPC/AI clusters with monitoring, logging and alerting • Manage Linux job/workload schedules and orchestration tools • Develop and maintain continuous integration and delivery pipelines • Develop tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources • Deploy monitoring solutions for the servers, network and storage • Perform troubleshooting bottom up from bare metal, operating system, software stack and application level • Being a technical resource, develop, re-define and document standard methodologies to share with internal teams • Support Research & Development activities and engage in POCs/POVs for future improvements

Job Requirements

  • A degree in Computer Science, Engineering, or a related field and 8+ years of experience
  • Knowledge of HPC and AI solution technologies from CPU’s and GPU’s to high speed interconnects and supporting software
  • Experience with job scheduling workloads and orchestration tools such as Slurm, K8s
  • Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu) networking (sockets, firewalld, iptables, wireshark, etc.) and internals, ACLs and OS level security protection and common protocols e.g. TCP, DHCP, DNS, etc.
  • Experience with multiple storage solutions such as Lustre, GPFS, Weka.io. Familiarity with newer and emerging storage technologies.
  • Python programming and bash scripting experience.
  • Comfortable with automation and configuration management tools such as Jenkins, Ansible, Puppet/chef
  • Deep knowledge of Networking Protocols like InfiniBand, Ethernet
  • Deep understanding and experience with virtual systems (for example VMware, Hyper-V, KVM, or Citrix)
  • Familiarity with cloud computing platforms (e.g. AWS, Azure, Google Cloud)

Benefits

  • We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.

Related Job Pages

More Artificial Intelligence Jobs

Part TimeRemoteTeam 5,001-10,000H1B No Sponsor

• Data collection, evaluation, and annotation. • Pairwise comparisons. • Counting tasks. • Object tagging and labeling across different content types (audio, video, images, or collected data)

South Africa
$8 / hour
ContractRemoteTeam 5,001-10,000H1B No Sponsor

• Perform data collection, evaluation, and annotation • Conduct pairwise comparisons • Complete counting tasks • Tag and label objects across different content types (audio, video, images, or collected data)

Kenya
$8 / hour
Job Closed
Wing Assistant logo

AI Voice Trainer – Bengali

Wing Assistant

We're the World's Best Assistant. Hit "Learn More" to speak to an expert about transforming your business! 🚀

Full TimeRemoteTeam 501-1,000H1B No Sponsor

• Complete short voice recordings or conversation tasks for AI training purposes. • Follow clear project instructions to ensure natural, accurate, and usable speech data. • Join tasks when available; assignments may range from a few minutes to multiple hours. • Submit recordings through the designated online workflow or platform.

India
$2 - $4 / hour
TELUS logo

Multimodal AI Content Specialist - Intermediate, AI Community

TELUS

When you’re with TELUS, you’re part of a network of giving. | Choisir TELUS, c'est prendre part à un grand mouvement. 💜

ContractRemoteTeam 10,001+Since 1990H1B Sponsor

Multimodal AI Content Specialist - Intermediate (AI Community) Preferred availability 4 to 40 hours per week Compensation USD 10 - 20 per hour About the Role: At TELUS Digital, we are teaching AI to see, hear, and understand the world just as humans do. As a Multimodal AI Content Specialist in our Global Community, you are at the forefront of the most exciting frontier in technology. We are moving beyond text-only models to create AI that can reason across images, videos, and audio in real-time. We look to you to ensure these complex, multi-layered outputs are accurate, contextually aware, and culturally resonant. At TELUS, you are helping to build the "eyes and ears" of the next generation of artificial intelligence. What You'll Do (Key Responsibilities): - Cross-Modal Verification: Evaluate the relationship between different data types (e.g., verifying if an AI-generated video perfectly matches a complex text prompt). - Visual-Semantic Analysis: Audit image and video datasets to ensure the AI correctly identifies complex objects, spatial relationships, and subtle cultural nuances. - Temporal & Contextual Auditing: Review long-form video content to ensure the AI maintains logical consistency and "memory" from the beginning of the clip to the end. - Multimodal Safety & Bias Detection: Identify safety risks that only appear when media are combined (e.g., an image that is safe on its own but becomes harmful when paired with specific text). - Instruction Tuning for Media: Help design complex prompts that teach models how to describe visual scenes with high technical or artistic precision. Mandatory Qualifications: - Education: Minimum of a Bachelor's Degree in Cybersecurity, Criminal Justice, Forensic Science, or Information Security. - Native Language: Native-level proficiency in your primary language is mandatory to identify localized document types and regional identity nuances. - English Proficiency: Minimum B1 (Intermediate) level English. - Analytical Eye: Exceptional attention to detail, specifically in identifying pixel-level anomalies in digital images. Assessment: In order to be hired into our community, you'll go through a subject-specific qualification exam that will determine your suitability for the position and complete ID verification. Payment: Currently, pay rates for experts range from $10 - $20 USD per hour Our Payment terms are defined for each project. Why Join the TELUS Digital AI Community: - Community Connection: Work remotely as part of a 1M+ diverse global AI Community of contributors and experts. - Impactful Expertise: Use your native language and academic degree to shape real-world AI systems and products used by millions every day. - Hands-on Innovation: Gain experience in the latest AI research, evaluation, and fine-tuning methodologies. - Global Collaboration: Collaborate with a network of specialists across 20+ domains and 500+ languages and dialects. - Flexible Engagement: Manage your own schedule while making a meaningful impact on the future of technology.

Australia + 15 moreAll locations: Australia | Bangladesh | China | Hong Kong | India | Indonesia | Japan | Malaysia | New Zealand | Pakistan | Philippines | Singapore | Sri Lanka | Taiwan | Thailand | Vietnam
$10 - $20 / hour