boxxe logo
boxxe

Making tech human

Infrastructure Engineer

Infrastructure EngineerInfrastructure EngineerFull TimeRemoteSeniorTeam 51-200H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

5 days ago

Salary

0

Seniority

Senior

Job Description

Infrastructure Engineer

boxxe

• The IT Infrastructure Engineer is responsible for the deployment, configuration and ongoing support of enterprise infrastructure across both virtual and physical environments. • The role focuses on Windows Server deployments, clustering technologies, storage integration (MPIO), identity and access solutions (PingFederate, UserLock MFA), and application onboarding within a secure, highly available environment. • Deploy and configure Windows Server environments across VMware vCenter (virtual builds) and physical servers. • Perform system provisioning including OS installation, patching and baseline security hardening. • Manage virtual machine lifecycle (provisioning, resizing, decommissioning). • Monitor and optimise resource utilisation across clusters. • Document onboarding processes and standards. • Maintain technical documentation including SOPs and runbooks. • Administer and maintain VMware vSphere / vCenter environments. • Design and implement Windows Failover Clustering solutions. • Configure and support Multipath I/O (MPIO) for SAN storage. • Ensure resilience and high availability of critical systems. • Lead/support application onboarding including authentication integration and infrastructure provisioning. • Collaborate with application owners to meet security and connectivity requirements. • Provide support for infrastructure incidents and requests. • Troubleshoot complex issues across Windows, VMware and identity platforms. • Participate in on-call support where required. • Ensure systems meet organisational security and compliance standards. • Support patching cycles and vulnerability remediation. • Work with cyber and compliance teams as required. • Contribute to continuous service improvement initiatives.

Job Requirements

  • Strong experience with Windows Server (2016/2019/2022)
  • Experience with VMware vSphere / vCenter
  • Knowledge of Windows Failover Clustering and MPIO
  • Configure and support PingFederate for SSO and application federation
  • Manage UserLock MFA including policy configuration and access control
  • Support secure authentication integration across enterprise platforms
  • Understanding of Active Directory, DNS, DHCP, and networking fundamentals
  • Experience with F5 load balancers, Citrix, and Linux (RHEL/OEL)
  • Knowledge of ITIL processes and PowerShell scripting
  • Strong problem-solving skills and ability to work under pressure
  • Clear communication with stakeholders
  • Proactive approach to documentation
  • Secure and reliable infrastructure deployments
  • High availability of critical systems
  • Successful application onboarding
  • Timely incident resolution and audit-ready documentation
  • Scheduled On-call and Out of Hours OS Patching participation (renumerated)
  • Ability to obtain/holder of BPSS & SC Clearance.

Benefits

  • As an equal opportunity employer, we are committed to building a team that represents a variety of backgrounds, perspectives, and skills.

Related Categories

Related Job Pages

More Infrastructure Engineer Jobs

Full TimeRemoteTeam 201-500Since 2024H1B No Sponsor

Role Description At Nscale, our AI Infrastructure Operations team is responsible for the reliability and scalability of one of the most demanding AI platforms in the industry. We value engineers who think in systems, lead through influence, and raise the bar for operational excellence across the organisation. We’re looking for a Principal Site Reliability Engineer (SRE) to provide technical leadership across our AI Infrastructure Operations domain. This is a senior, highly impactful role focused on setting reliability strategy, designing foundational systems, and driving cross-team improvements at scale. You will operate as a technical authority for reliability, automation, and operational architecture across Nscale’s GPU, network, and control-plane platforms. - Owning and evolving the long-term reliability strategy for Nscale’s AI and HPC infrastructure - Designing and leading the development of large-scale control-plane systems, automation frameworks, and operational tooling - Defining reliability standards, SLO frameworks, and operational best practices used across multiple teams - Acting as a senior technical escalation point during critical incidents, guiding resolution and ensuring systemic fixes - Identifying structural reliability risks and driving cross-functional initiatives to address them at the architectural level - Partnering with Engineering, Network Operations, and Fleet Operations leadership to influence platform design and operational maturity - Mentoring senior and mid-level engineers, raising the overall quality and effectiveness of SRE practices - Driving measurable improvements in availability, MTTR, cost efficiency, and operational scalability Qualifications - 10+ years of experience in Site Reliability Engineering, Systems Engineering, or Software Engineering roles operating complex, large-scale infrastructure - Expert-level software engineering skills, with a strong track record of building production-grade automation and systems - Deep expertise in Linux, networking, and distributed systems design at scale - Extensive experience debugging and resolving failures across hardware, OS, networking, and application layers - Proven ability to lead technical initiatives across teams without direct authority - Strong systems-thinking mindset, with the ability to balance reliability, velocity, and cost Requirements - Deep hands-on experience with AI or HPC platforms, including GPUs, high-speed interconnects (InfiniBand/RDMA), and workload schedulers (e.g. SLURM) - Experience designing observability systems for high-cardinality, high-throughput environments - Familiarity with Kubernetes at scale and hybrid or bare-metal cloud architectures - A history of driving step-change improvements in reliability, scalability, or operational efficiency Benefits - Collaborative, supportive, and innovative environment where your contributions spark real impact - Highly competitive package (base + equity) with reviews every 12 months - Opportunity to join the fastest-growing tech startup, pushing boundaries and collaborating with brilliant minds - Dynamic progression plan tailored to your ambitions - Human-First Flexibility: autonomy to shape your day around life's moments - Thriving remote-first team with seamless virtual collaboration

Worldwide
$150K - $2,150K / year
Full TimeRemoteTeam 1,001-5,000Since 1994H1B No Sponsor

• Lead the administration and evolution of Microsoft Active Directory in a complex enterprise environment. • Own Exchange and Exchange Hybrid (on-premises and Exchange Online), ensuring reliability, security, and seamless coexistence. • Design, operate, and maintain Public Key Infrastructure (PKI), including certificate lifecycle management. • Administer and develop Microsoft 365 / Entra ID identity services, roles, and access models. • Implement and support ADFS / SSO and federation scenarios for internal and external applications. • Ensure secure access control, authentication, and authorization across platforms. • Collaborate with network, platform, and application teams on identity-related integrations. • Drive continuous improvement of identity and messaging architectures and operational practices.

Ukraine
Full TimeRemoteTeam 10,001+Since 1954H1B Sponsor

• Supports the Case Management Modernization (CMM) Program for the U.S. Courts by designing, implementing, and managing secure authentication and authorization frameworks • Ensures compliance with federal identity governance, FedRAMP, and Zero Trust Architecture (ZTA) principles • Collaborates with architecture, security, and DevSecOps teams to ensure access control and credential management are integrated across all layers of the CMM application ecosystem • Designs and maintains the identity architecture utilizing Keycloak • Implements federated identity and single sign-on (SSO) solutions using modern protocols (SAML, OAuth2.0, OIDC) • Configures directory services and identity providers (AWS Cognito, AWS IAM Identity Center, Azure AD, etc.) • Conducts access audits, user entitlement reviews, and anomaly detection to ensure least-privilege compliance

United States
$153K - $207K / year
Full TimeRemoteTeam 10,001+Since 1993H1B Sponsor

• Craft creative scalable cloud solutions for running millions of jobs, thousands of systems, and petabytes of storage. • Address exciting challenges in infrastructure such as Kubernetes, job scheduling, multi-region services, resource management, and automated recovery. • Create agentic workflows for infrastructure. • Collaborate with customers to understand their needs and develop innovative solutions that cater to their requirements.

California
$184K - $356.5K / year