Senior AI Infrastructure Engineer – Virtualisation
Location
Australia
Posted
80 days ago
Salary
0
Seniority
Senior
Job Description
Senior AI Infrastructure Engineer – Virtualisation
Firmus Technologies
• Design and implement a highly scalable, multi-tenant control plane that supports Firmus’ growing AI and infrastructure needs • Contribute to the development of exabyte-scale, S3-compatible object storage, distributed file systems, and high-performance filesystems • Work with bare-metal provisioning tools such as Base Command Manager, Warewulf, Ironic, MaaS, and similar platforms • Apply a deep understanding of operating systems, computer networks, software-defined storage, and high-performance applications • Work with technologies including RDMA, GPU Direct Storage, RoCE, InfiniBand, DPDK, Ceph, Weka, DAOS, and others • Collaborate with operations teams to monitor, analyse, and optimise internal clusters and storage platforms • Document architecture designs, operational procedures, and performance results • Collaborate with L2 SRE engineers, site operations, and networking teams to ensure platform reliability, reproducibility, and performance • Contribute to continuous improvement in cluster validation, CI/CD automation, and provisioning and testing frameworks • Apply knowledge of Kubernetes and composable storage clusters • Contribute to the development of custom Kubernetes operators and intelligent orchestration frameworks to optimise AI workload performance for large-scale GPU cluster commissioning
Job Requirements
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
- 6–10 years of experience in infrastructure engineering and/or storage engineering
- Hands-on experience with bare-metal provisioning
- Ability to operate software-defined storage platforms such as Ceph, Weka, Vast Data, DAOS, or Lustre
- Solid understanding of cloud-native infrastructure, Kubernetes, and scalable system architectures
- Strong debugging and problem-solving skills in distributed, high-performance environments
- Practical Linux systems engineering experience (kernel, cgroups, system services, networking, drivers)
- Strong automation mindset using tools such as Ansible, Helm, Terraform/OpenTofu, or equivalent
- Understanding of firmware, BIOS, BMC/IPMI/Redfish, and low-level system tuning
- Proficiency in one or more programming languages such as Go, Bash, Rust, or Python
- Excellent documentation skills with strong attention to detail
- Experience participating in an on-call rotation supporting production services
- Proactive self-starter with a drive for continuous technical improvement.
Benefits
- Professional development opportunities
- Flexible working hours
Related Guides
Related Categories
Related Job Pages
More Infrastructure Engineer Jobs
• Design, operate, and support infrastructure systems with parity across tenancy models (single vs multi) and public clouds (AWS, Azure, and GCP) - and work with engineering teams to get their services consistently deployed to those environments • Bring cloud infrastructure expertise to the team, helping us strengthen and scale our infrastructure as we expand dbt Cloud’s multi-cloud capabilities. • Help create a great developer experience while working with our close partners in Architecture, Release Engineering, Product Engineering and Security • Leverage tools and languages such as Terraform, Kubernetes, Python, Bash, Helm, ArgoCD, Go, and DataDog • Design and build automation to eliminate manual toil and streamline infrastructure operations at scale • Identify and implement infrastructure optimizations that reduce cloud spend without sacrificing reliability • Participate in a balanced on-call rotation in an environment that values continuous improvement, and help to upgrade our tooling and reduce toil
• Implement and maintain robust infrastructure security across hybrid environments. • Contribute to system and platform-level infrastructure architecture for performance, security, and reliability. • Build and maintain onboard compute environments as self-contained, fault-tolerant micro–data centers. • Develop and support secure cloud infrastructure for fleet orchestration, telemetry ingestion, observability, and software deployment. • Manage bare-metal provisioning and life-cycle management for shipboard hardware. • Build and optimize CI/CD and release processes for autonomy software deployment. • Work closely with mechanical, electrical, and autonomy engineers to navigate constraints. • Implement monitoring, logging, and remote debugging capabilities for distributed systems. • Support system integration and troubleshoot field operations.
• interact with various teams within our company that develop and maintain our network infrastructure • configure network equipment (Juniper, Brocade, Extreme) • arrange maintenance for network infrastructure in data centers • create technical tasks for on-site engineers to install and replace network equipment • conduct business correspondence with service providers
• Manage the planning, scheduling and coordination of all client installations including but not limited to software, hardware, server migrations and other projects • Planning, scheduling, coordinating and communicating all elements of delivering site and non-site installations and projects to a high level of client satisfaction • Communicating with all stakeholders including external clients, account managers and engineers to ensure the efficient delivery of projects and installations • Coordinating and communicating the correct allocation of skilled staff within the time constraints to complete projects within client expectations • Providing status reports on all active projects including any risks associated with completing projects on time and on budget as part of a regular reporting cycle • Accurately reflecting engineers service calls in service delivery calendar and effectively communicates client instructions • Effectively communicate schedule changes to all relevant stakeholders • Accurately recording relevant data in all systems • Developing and maintaining effective working relationships with personnel from all departments • Demonstrating and upholding exceptional safety standards at all times in accordance with any workplace health and safety requirements, to ensure your own safety and the safety of others • Other duties as required




