Senior HPC AI Cluster Engineer
Location
Germany
Posted
8 days ago
Salary
0
Seniority
Senior
Job Description
Senior HPC AI Cluster Engineer
NVIDIA
• Design, implement and maintain large scale HPC/AI clusters with monitoring, logging and alerting • Manage Linux job/workload schedules and orchestration tools • Develop and maintain continuous integration and delivery pipelines • Develop tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources • Deploy monitoring solutions for the servers, network and storage • Perform troubleshooting bottom up from bare metal, operating system, software stack and application level • Being a technical resource, develop, re-define and document standard methodologies to share with internal teams • Support Research & Development activities and engage in POCs/POVs for future improvements
Job Requirements
- A degree in Computer Science, Engineering, or a related field and 8+ years of experience
- Knowledge of HPC and AI solution technologies from CPU’s and GPU’s to high speed interconnects and supporting software
- Experience with job scheduling workloads and orchestration tools such as Slurm, K8s
- Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu) networking (sockets, firewalld, iptables, wireshark, etc.) and internals, ACLs and OS level security protection and common protocols e.g. TCP, DHCP, DNS, etc.
- Experience with multiple storage solutions such as Lustre, GPFS, Weka.io. Familiarity with newer and emerging storage technologies.
- Python programming and bash scripting experience.
- Comfortable with automation and configuration management tools such as Jenkins, Ansible, Puppet/chef
- Deep knowledge of Networking Protocols like InfiniBand, Ethernet
- Deep understanding and experience with virtual systems (for example VMware, Hyper-V, KVM, or Citrix)
- Familiarity with cloud computing platforms (e.g. AWS, Azure, Google Cloud)
Benefits
- We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.
Related Guides
Related Categories
Related Job Pages
More Artificial Intelligence Jobs
• Data collection, evaluation, and annotation. • Pairwise comparisons. • Counting tasks. • Object tagging and labeling across different content types (audio, video, images, or collected data)
• Perform data collection, evaluation, and annotation • Conduct pairwise comparisons • Complete counting tasks • Tag and label objects across different content types (audio, video, images, or collected data)
AI Voice Trainer – Bengali
Wing AssistantWe're the World's Best Assistant. Hit "Learn More" to speak to an expert about transforming your business! 🚀
• Complete short voice recordings or conversation tasks for AI training purposes. • Follow clear project instructions to ensure natural, accurate, and usable speech data. • Join tasks when available; assignments may range from a few minutes to multiple hours. • Submit recordings through the designated online workflow or platform.
Multimodal AI Content Specialist - Intermediate, AI Community
TELUSWhen you’re with TELUS, you’re part of a network of giving. | Choisir TELUS, c'est prendre part à un grand mouvement. 💜
Multimodal AI Content Specialist - Intermediate (AI Community) Preferred availability 4 to 40 hours per week Compensation USD 10 - 20 per hour About the Role: At TELUS Digital, we are teaching AI to see, hear, and understand the world just as humans do. As a Multimodal AI Content Specialist in our Global Community, you are at the forefront of the most exciting frontier in technology. We are moving beyond text-only models to create AI that can reason across images, videos, and audio in real-time. We look to you to ensure these complex, multi-layered outputs are accurate, contextually aware, and culturally resonant. At TELUS, you are helping to build the "eyes and ears" of the next generation of artificial intelligence. What You'll Do (Key Responsibilities): - Cross-Modal Verification: Evaluate the relationship between different data types (e.g., verifying if an AI-generated video perfectly matches a complex text prompt). - Visual-Semantic Analysis: Audit image and video datasets to ensure the AI correctly identifies complex objects, spatial relationships, and subtle cultural nuances. - Temporal & Contextual Auditing: Review long-form video content to ensure the AI maintains logical consistency and "memory" from the beginning of the clip to the end. - Multimodal Safety & Bias Detection: Identify safety risks that only appear when media are combined (e.g., an image that is safe on its own but becomes harmful when paired with specific text). - Instruction Tuning for Media: Help design complex prompts that teach models how to describe visual scenes with high technical or artistic precision. Mandatory Qualifications: - Education: Minimum of a Bachelor's Degree in Cybersecurity, Criminal Justice, Forensic Science, or Information Security. - Native Language: Native-level proficiency in your primary language is mandatory to identify localized document types and regional identity nuances. - English Proficiency: Minimum B1 (Intermediate) level English. - Analytical Eye: Exceptional attention to detail, specifically in identifying pixel-level anomalies in digital images. Assessment: In order to be hired into our community, you'll go through a subject-specific qualification exam that will determine your suitability for the position and complete ID verification. Payment: Currently, pay rates for experts range from $10 - $20 USD per hour Our Payment terms are defined for each project. Why Join the TELUS Digital AI Community: - Community Connection: Work remotely as part of a 1M+ diverse global AI Community of contributors and experts. - Impactful Expertise: Use your native language and academic degree to shape real-world AI systems and products used by millions every day. - Hands-on Innovation: Gain experience in the latest AI research, evaluation, and fine-tuning methodologies. - Global Collaboration: Collaborate with a network of specialists across 20+ domains and 500+ languages and dialects. - Flexible Engagement: Manage your own schedule while making a meaningful impact on the future of technology.



