Senior AI Infrastructure Engineer – Virtualisation

Infrastructure EngineerInfrastructure EngineerFull TimeRemoteSeniorTeam 51-200H1B No SponsorCompany SiteLinkedIn

Location

Australia

Posted

80 days ago

Salary

0

Seniority

Senior

Bachelor Degree6 yrs expEnglishAnsibleKubernetesLinuxPythonRustTerraform

Job Description

Senior AI Infrastructure Engineer – Virtualisation

Firmus Technologies

• Design and implement a highly scalable, multi-tenant control plane that supports Firmus’ growing AI and infrastructure needs • Contribute to the development of exabyte-scale, S3-compatible object storage, distributed file systems, and high-performance filesystems • Work with bare-metal provisioning tools such as Base Command Manager, Warewulf, Ironic, MaaS, and similar platforms • Apply a deep understanding of operating systems, computer networks, software-defined storage, and high-performance applications • Work with technologies including RDMA, GPU Direct Storage, RoCE, InfiniBand, DPDK, Ceph, Weka, DAOS, and others • Collaborate with operations teams to monitor, analyse, and optimise internal clusters and storage platforms • Document architecture designs, operational procedures, and performance results • Collaborate with L2 SRE engineers, site operations, and networking teams to ensure platform reliability, reproducibility, and performance • Contribute to continuous improvement in cluster validation, CI/CD automation, and provisioning and testing frameworks • Apply knowledge of Kubernetes and composable storage clusters • Contribute to the development of custom Kubernetes operators and intelligent orchestration frameworks to optimise AI workload performance for large-scale GPU cluster commissioning

Job Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
  • 6–10 years of experience in infrastructure engineering and/or storage engineering
  • Hands-on experience with bare-metal provisioning
  • Ability to operate software-defined storage platforms such as Ceph, Weka, Vast Data, DAOS, or Lustre
  • Solid understanding of cloud-native infrastructure, Kubernetes, and scalable system architectures
  • Strong debugging and problem-solving skills in distributed, high-performance environments
  • Practical Linux systems engineering experience (kernel, cgroups, system services, networking, drivers)
  • Strong automation mindset using tools such as Ansible, Helm, Terraform/OpenTofu, or equivalent
  • Understanding of firmware, BIOS, BMC/IPMI/Redfish, and low-level system tuning
  • Proficiency in one or more programming languages such as Go, Bash, Rust, or Python
  • Excellent documentation skills with strong attention to detail
  • Experience participating in an on-call rotation supporting production services
  • Proactive self-starter with a drive for continuous technical improvement.

Benefits

  • Professional development opportunities
  • Flexible working hours

Related Categories

Related Job Pages

More Infrastructure Engineer Jobs

dbt Labs logo

Senior Infrastructure Engineer

dbt Labs

The creators and maintainers of dbt

Full TimeRemoteTeam 51-200H1B Sponsor

• Design, operate, and support infrastructure systems with parity across tenancy models (single vs multi) and public clouds (AWS, Azure, and GCP) - and work with engineering teams to get their services consistently deployed to those environments • Bring cloud infrastructure expertise to the team, helping us strengthen and scale our infrastructure as we expand dbt Cloud’s multi-cloud capabilities. • Help create a great developer experience while working with our close partners in Architecture, Release Engineering, Product Engineering and Security • Leverage tools and languages such as Terraform, Kubernetes, Python, Bash, Helm, ArgoCD, Go, and DataDog • Design and build automation to eliminate manual toil and streamline infrastructure operations at scale • Identify and implement infrastructure optimizations that reduce cloud spend without sacrificing reliability • Participate in a balanced on-call rotation in an environment that values continuous improvement, and help to upgrade our tooling and reduce toil

India
OtherRemoteTeam 11-50

• Implement and maintain robust infrastructure security across hybrid environments. • Contribute to system and platform-level infrastructure architecture for performance, security, and reliability. • Build and maintain onboard compute environments as self-contained, fault-tolerant micro–data centers. • Develop and support secure cloud infrastructure for fleet orchestration, telemetry ingestion, observability, and software deployment. • Manage bare-metal provisioning and life-cycle management for shipboard hardware. • Build and optimize CI/CD and release processes for autonomy software deployment. • Work closely with mechanical, electrical, and autonomy engineers to navigate constraints. • Implement monitoring, logging, and remote debugging capabilities for distributed systems. • Support system integration and troubleshoot field operations.

Massachusetts
$150K - $190K / year
Job Closed
Full TimeRemoteTeam 201-500Since 2013H1B No Sponsor

• interact with various teams within our company that develop and maintain our network infrastructure • configure network equipment (Juniper, Brocade, Extreme) • arrange maintenance for network infrastructure in data centers • create technical tasks for on-site engineers to install and replace network equipment • conduct business correspondence with service providers

Serbia
Full TimeRemoteTeam 1,001-5,000H1B Sponsor

• Manage the planning, scheduling and coordination of all client installations including but not limited to software, hardware, server migrations and other projects • Planning, scheduling, coordinating and communicating all elements of delivering site and non-site installations and projects to a high level of client satisfaction • Communicating with all stakeholders including external clients, account managers and engineers to ensure the efficient delivery of projects and installations • Coordinating and communicating the correct allocation of skilled staff within the time constraints to complete projects within client expectations • Providing status reports on all active projects including any risks associated with completing projects on time and on budget as part of a regular reporting cycle • Accurately reflecting engineers service calls in service delivery calendar and effectively communicates client instructions • Effectively communicate schedule changes to all relevant stakeholders • Accurately recording relevant data in all systems • Developing and maintaining effective working relationships with personnel from all departments • Demonstrating and upholding exceptional safety standards at all times in accordance with any workplace health and safety requirements, to ensure your own safety and the safety of others • Other duties as required

Philippines
Job Closed