Job Closed
This listing is no longer active.
Senior HPC DevOps Engineer
Location
Germany
Posted
152 days ago
Salary
0
Seniority
Senior
Job Description
Senior HPC DevOps Engineer
NVIDIA
• Innovate and Implement: Design, implement, and maintain large-scale HPC/AI clusters with state-of-the-art monitoring, logging, and alerting systems. • Infrastructure as Code (IaC): Utilize and develop tools to manage infrastructure as code, ensuring scalable and repeatable deployments. • Streamline CI/CD Pipelines: Develop and maintain continuous integration and continuous delivery (CI/CD) pipelines to automate and streamline deployment processes. • Automate Everything: Develop automation scripts and tools to automate deployment, configuration management, and operational monitoring. • Develop complex Networking automations. • Troubleshoot Complex Issues: Perform comprehensive troubleshooting from bare metal to application level, ensuring system reliability and efficiency. • Lead and Educate: Serve as a technical resource, developing and sharing best practices with internal teams. • Drive Innovation: Support R&D activities and engage in proof of concepts (POCs) and proof of values (POVs) for future improvements.
Job Requirements
- B.Sc. in Computer Science, Engineering, or a related field with 5+ years of experience.
- Deep knowledge of HPC and AI solution technologies, including CPUs, GPUs, high-speed interconnects, and supporting software.
- Advanced proficiency in programming and scripting languages, with a solid understanding of object-oriented programming principles.
- Familiarity with Jenkins, Ansible, Puppet/Chef.
- Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu), networking and OS-level security.
- Deep understanding of networking protocols such as InfiniBand and Ethernet.
- Experience with job scheduling workloads and orchestration tools such as Slurm and Kubernetes.
- Background with multiple storage solutions like Lustre, GPFS, ZFS, and XFS.
- Expertise with virtual systems (VMware, Hyper-V, KVM, Citrix).
- Familiarity with cloud platforms (AWS, Azure, Google Cloud).
Benefits
- Health insurance
- Retirement plans
- Paid time off
- Flexible work arrangements
- Professional development
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Design, build, and operate the core infrastructure that powers Owner’s engineering organization, with an emphasis on reliability, security, and ease of use. • Own and evolve our Kubernetes-based platform on AWS, improving how services are deployed, scaled, monitored, and secured in production. • Build and maintain CI/CD pipelines that are fast, reliable, and easy to reason about by reducing deploy risk while increasing developer confidence and velocity. • Focus deeply on developer experience: identifying pain points in local development, testing, deployment, and observability, then replacing manual or error-prone workflows with self-service tooling and automation. • Partner closely with application engineers to set clear patterns and golden paths for how services are built and run at Owner, balancing flexibility with strong, opinionated defaults. • Strengthen our approach to operational excellence by improving monitoring, alerting, incident response, and post-incident learning. • Help embed security into our infrastructure and delivery pipelines, ensuring best practices are automated and invisible.
Azure DevOps Engineer
3formA leading manufacturer of architectural resin, glass products, acoustic solutions, markerboards and light fixtures.
• Work at the heart of our digital transformation projects. • Design robust infrastructure, shape CI/CD pipelines, and ensure smooth delivery of products. • Collaborate directly with clients and understand their operational challenges. • Translate operational challenges into practical, secure, and maintainable DevOps solutions. • Optimize cloud environments and build deployment automation.
Senior Azure DevOps Engineer, .NET
XebiaCreating Digital Leaders. Digital Transformation Consultancy Services and Solutions
• designing and evolving cloud platforms with Azure Cloud services • leading on-premises to cloud migration initiatives • implementing automated deployment pipelines • driving platform security testing • partnering with Product, Architecture, and Software Engineering teams • contributing to platform engineering and security research • supporting continuous improvement of operational processes • aligning work with company-wide OKRs
• Designing and evolving cloud platforms with a focus on Azure Cloud services • Leading on-premises to cloud migration initiatives • Implementing automated deployment pipelines • Partnering with Product, Architecture, and Software Engineering teams to deliver secure SaaS products • Supporting continuous improvement of operational processes




