
Firmus Technologies
Remote Jobs
1 Jobs
• Design and implement a highly scalable, multi-tenant control plane that supports Firmus’ growing AI and infrastructure needs • Contribute to the development of exabyte-scale, S3-compatible object storage, distributed file systems, and high-performance filesystems • Work with bare-metal provisioning tools such as Base Command Manager, Warewulf, Ironic, MaaS, and similar platforms • Apply a deep understanding of operating systems, computer networks, software-defined storage, and high-performance applications • Work with technologies including RDMA, GPU Direct Storage, RoCE, InfiniBand, DPDK, Ceph, Weka, DAOS, and others • Collaborate with operations teams to monitor, analyse, and optimise internal clusters and storage platforms • Document architecture designs, operational procedures, and performance results • Collaborate with L2 SRE engineers, site operations, and networking teams to ensure platform reliability, reproducibility, and performance • Contribute to continuous improvement in cluster validation, CI/CD automation, and provisioning and testing frameworks • Apply knowledge of Kubernetes and composable storage clusters • Contribute to the development of custom Kubernetes operators and intelligent orchestration frameworks to optimise AI workload performance for large-scale GPU cluster commissioning