Job Closed

This listing is no longer active.

Senior / Staff Software Engineer - Infrastructure

Location

United States

Posted

79 days ago

Salary

0

Seniority

Lead

Job Description

Senior / Staff Software Engineer - Infrastructure

Boundless Networks, Inc.

The Role As an Infrastructure Engineer, you'll build and deploy massive computational infrastructure that positions Boundless as the leading decentralized proving network.. You'll architect GPU clusters at unprecedented scale, orchestrate proving across every major blockchain, and manage the complex systems that power billions of cycles of ZK proofs daily. This role demands expertise in both bare-metal optimization and cloud-native architectures. What You'll Do - Build Massive Proving Clusters: Design and deploy proving infrastructure with 1000s of GPUs across both on-premises data centers and cloud services (AWS, GCP, Azure) - Orchestrate Multi-Chain Proving: Build infrastructure that coordinates proving workloads across every major blockchain, ensuring optimal resource allocation and throughput - Optimize Container Topology: Design and refine the topology of complex containerized services, maximizing efficiency and minimizing latency in proof generation - Bare Metal Engineering: Work at the hardware level, optimizing GPU performance, managing CUDA installations, and tuning kernel parameters for maximum throughput - Cloud Infrastructure: Architect highly available, auto-scaling cloud infrastructure that can dynamically respond to proving demand across multiple regions - Release Management: Manage deployment pipelines and release schedules for complex distributed software, ensuring zero-downtime upgrades - Performance Monitoring: Build comprehensive monitoring and alerting systems to track GPU utilization, proof generation metrics, and system health - Cost Optimization: Implement strategies to minimize infrastructure costs while maintaining performance, including spot instance management and resource scheduling

Job Requirements

  • 5+ years of infrastructure/DevOps experience with 2+ years managing large-scale GPU clusters
  • Experience with both on-premises compute operations and cloud platforms (AWS/GCP/Azure)
  • Proficiency in infrastructure-as-code tools (Terraform, Ansible, Pulumi)
  • Deep expertise in Kubernetes, Docker, and container orchestration at scale
  • Experience with GPU computing infrastructure (CUDA)
  • Experience releasing complex software to communities, including building and packaging AMIs, Docker images, binaries, and maintaining distribution channels
  • Track record of managing mission-critical, high-throughput systems
  • Strong Linux systems administration and bare-metal optimization skills
  • Proficiency in Rust and low-level systems programming
  • Nice to Have
  • Familiarity with ZK proof generation or blockchain infrastructure
  • Experience operating cryptocurrency mining or ML training infrastructure
  • Knowledge of network optimization and topology design
  • Experience with multi-region, globally distributed systems

Benefits

  • At Boundless, we take care of our people, because building the future of decentralized computing starts with an empowered team. Here’s what you can expect when you join us:
  • Competitive salary + equity/token allocation
  • Health, dental, vision (for U.S. employees; region-adjusted globally)
  • Flexible PTO + home-office/equipment stipend
  • Professional development and conference travel budget
  • Remote-first with regular off-sites and a high-trust, high-velocity team environment

Related Categories

Related Job Pages

More Infrastructure Engineer Jobs

MLabs logo

Senior Infrastructure Engineer

MLabs

We are a Haskell, Rust, Blockchain and AI consultancy.

OtherRemoteTeam 51-200H1B No Sponsor

Role Description Our client is a venture-backed financial technology firm dedicated to transforming the global movement of money through stablecoin infrastructure. They are currently seeking a Senior Infrastructure Engineer to design and build an internal platform that empowers product teams to deploy software with confidence, reliability, and speed. - Platform Architecture: Own and evolve core platform components, including a TypeScript-based Pulumi codebase and Kubernetes-based runtime environments. - Engineering Standards: Enforce high engineering standards through code, architecting scalable systems that prioritize reliability and security. - Developer Experience: Improve developer productivity by building internal development platforms focused on self-service and "golden paths." - System Observability: Design and maintain the monitoring stack, defining SLIs/SLOs, error budgets, and operational dashboards. - Infrastructure as Code: Build reusable systems and infrastructure primitives rather than one-off scripts to ensure a scalable and maintainable environment. - Operational Excellence: Participate in the maintenance and operations of production-grade systems, including on-call rotations and incident response tooling. Qualifications - 5+ years of software engineering experience, with a significant focus on infrastructure and cloud domains. - Strong programming skills in TypeScript or another strictly typed language. - Deep understanding of AWS architecture and proven experience designing, operating, and scaling Kubernetes workloads. - Strong grasp of distributed systems fundamentals, including availability, consistency, and fault tolerance. - Familiarity with GitOps patterns, deployment automation, and Infrastructure as Code (IaC) systems. - Ability to operate in short feedback loops and a desire to build foundational infrastructure for the future of digital finance. - Ability to provide significant overlap with Eastern Time business hours. Requirements - Opportunity to build foundational infrastructure for a next-generation financial fabric. - Work alongside a small, mission-driven group of builders from high-performance finance and crypto backgrounds. - Engagement with a fast-moving, well-funded startup during a period of high growth. - Commitment to fostering a diverse and equitable workplace regardless of race, religion, gender identity, or veteran status. Interview Process - Recruiter / HR Initial Screening: Candidates will be asked to respond to a set of questions via video recording. - Hiring Manager Interview: A technical and background discussion with the Head of Engineering. - Technical Interview I: Deep dive into engineering capabilities and systems design. - Technical Interview II: Further assessment of infrastructure expertise and coding proficiency. - Final Interview: Comprehensive review and cultural alignment. Commitment to Equality and Accessibility At MLabs, we are committed to offering equal opportunities to all candidates. We ensure no discrimination, accessible job adverts, and providing information in accessible formats. Our goal is to foster a diverse, inclusive workplace with equal opportunities for all.

United States
$170K - $220K / year
Job Closed
Full TimeRemoteTeam 51-200H1B No Sponsor

• Design and implement a highly scalable, multi-tenant control plane that supports Firmus’ growing AI and infrastructure needs • Contribute to the development of exabyte-scale, S3-compatible object storage, distributed file systems, and high-performance filesystems • Work with bare-metal provisioning tools such as Base Command Manager, Warewulf, Ironic, MaaS, and similar platforms • Apply a deep understanding of operating systems, computer networks, software-defined storage, and high-performance applications • Work with technologies including RDMA, GPU Direct Storage, RoCE, InfiniBand, DPDK, Ceph, Weka, DAOS, and others • Collaborate with operations teams to monitor, analyse, and optimise internal clusters and storage platforms • Document architecture designs, operational procedures, and performance results • Collaborate with L2 SRE engineers, site operations, and networking teams to ensure platform reliability, reproducibility, and performance • Contribute to continuous improvement in cluster validation, CI/CD automation, and provisioning and testing frameworks • Apply knowledge of Kubernetes and composable storage clusters • Contribute to the development of custom Kubernetes operators and intelligent orchestration frameworks to optimise AI workload performance for large-scale GPU cluster commissioning

Australia
dbt Labs logo

Senior Infrastructure Engineer

dbt Labs

The creators and maintainers of dbt

Full TimeRemoteTeam 51-200H1B Sponsor

• Design, operate, and support infrastructure systems with parity across tenancy models (single vs multi) and public clouds (AWS, Azure, and GCP) - and work with engineering teams to get their services consistently deployed to those environments • Bring cloud infrastructure expertise to the team, helping us strengthen and scale our infrastructure as we expand dbt Cloud’s multi-cloud capabilities. • Help create a great developer experience while working with our close partners in Architecture, Release Engineering, Product Engineering and Security • Leverage tools and languages such as Terraform, Kubernetes, Python, Bash, Helm, ArgoCD, Go, and DataDog • Design and build automation to eliminate manual toil and streamline infrastructure operations at scale • Identify and implement infrastructure optimizations that reduce cloud spend without sacrificing reliability • Participate in a balanced on-call rotation in an environment that values continuous improvement, and help to upgrade our tooling and reduce toil

India
OtherRemoteTeam 11-50

• Implement and maintain robust infrastructure security across hybrid environments. • Contribute to system and platform-level infrastructure architecture for performance, security, and reliability. • Build and maintain onboard compute environments as self-contained, fault-tolerant micro–data centers. • Develop and support secure cloud infrastructure for fleet orchestration, telemetry ingestion, observability, and software deployment. • Manage bare-metal provisioning and life-cycle management for shipboard hardware. • Build and optimize CI/CD and release processes for autonomy software deployment. • Work closely with mechanical, electrical, and autonomy engineers to navigate constraints. • Implement monitoring, logging, and remote debugging capabilities for distributed systems. • Support system integration and troubleshoot field operations.

Massachusetts
$150K - $190K / year
Job Closed