Developing the best generative AI models
Systems Engineer, HPC – APAC
Location
Singapore
Posted
1 day ago
Salary
0
Seniority
Senior
Job Description
Systems Engineer, HPC – APAC
Mistral AI
• Operate and maintain large-scale Linux environments (bare metal, clusters, cloud) • Monitor system health, troubleshoot incidents, and ensure high availability • Support production and research workloads across multiple environments • Help scale clusters toward hundreds to thousands of nodes • Work on systems handling petabyte-scale storage • Improve performance, reliability, and resource utilisation • Automate operational tasks using tools like Python, Bash, Ansible, or Terraform • Improve deployment, provisioning, and system lifecycle management • Contribute to system design and architecture decisions • Work closely with HPC / infrastructure teams, Platform / DevOps engineers, Research teams
Job Requirements
- Strong Linux systems administration experience (core requirement)
- Experience working in large-scale environments: HPC clusters or cloud infrastructure
- Experience with Job schedulers (e.g. Slurm)
- Solid troubleshooting skills across systems, hardware, and networks
- Nice-to-have (any of these): Containers / orchestration (e.g. Kubernetes)
- Storage systems (e.g. Ceph, Lustre, NFS)
- Networking fundamentals (Ethernet; InfiniBand is a plus)
- Infrastructure as Code / automation tooling
- GPU or AI/ML experience
Benefits
- 💰 Competitive salary and equity (stock-options)
- 🧑⚕️ Health insurance
- 🚴 Transportation allowance
- 🥎 Sport allowance
- 🥕 Meal vouchers
- 🍼 Generous parental leave policy
Related Guides
Related Categories
Related Job Pages
More Systems Engineer Jobs
• Design and maintain enterprise architecture frameworks. • Develop system roadmaps for PeopleSoft and supporting technologies. • Ensure architecture compliance with security and DoD standards. • Provide technical leadership and guidance to development teams. • Define integration strategies and system performance improvements.
• Implement SSO applications and manage onboarding/offboarding using Okta. • Optimize SaaS spend through automated workflows and license utilization monitoring. • Build tools to automate user account creation and integrate systems via API. • Resolve IT issues and eliminate recurring request patterns through automation. • Enhance and maintain the employee Mac experience and IT infrastructure.
Cloud Systems Engineer
ai2ioFrom Transformative AI to Foundational I/O — Custom solutions for your unique business needs!
• Maintain and troubleshoot Azure IaaS and PaaS environments, ensuring secure configuration, high availability, and adherence to organizational cloud standards. • Administer and support Microsoft Entra ID including security groups, dynamic group queries, M365 group lifecycle management, RBAC role assignments, and Conditional Access policy implementation. • Manage and support Microsoft 365 services including Teams, Exchange Online, SharePoint, and licensing governance. • Configure and troubleshoot Azure networking components including Virtual Networks, subnetting, Private Endpoints, Private DNS Zones, and Global Secure Access connectivity. • Implement and support Azure Storage solutions including Blob Storage and Azure File Shares with RBAC + NTFS permissions and Kerberos authentication (AADDS/Entra scenarios). • Serve as escalation point for complex Azure identity, networking, and storage-related issues. • Support cross-tenant collaboration initiatives including B2B guest access configuration and cross-tenant migration activities. • Assist with tenant-to-tenant resource migrations including storage accounts, virtual machines, and identity configurations. • Monitor and optimize Azure spend using Azure Cost Management; provide recommendations for cost control and right-sizing resources. • Support Azure Virtual Desktop (AVD) environments including Remote Apps, Desktop Pools, image management, and familiarity with Nerdio automation (if applicable). • Participate in backup and retention configuration for Azure workloads, ensuring alignment with business continuity requirements.
• Act as members of a team responsible for providing technical guidance concerning the business implications of the application of various systems • Documents existing operations and practices • Analyzes operations and practices against documented best practices • Develops plans of action and milestones to evolve to the best practices




