__
Member of Technical Staff, Cluster Administration
Location
United States
Posted
3 days ago
Salary
$200K - $400K / year
Seniority
Lead
No structured requirement data.
Job Description
Member of Technical Staff, Cluster Administration
Inferact
Role Description We're looking for a hands-on cluster administration engineer to own and operate the high-performance GPU compute infrastructure that keeps Inferact engineering productive. Inferact runs on expensive, high-performance GPU and HPC clusters across neo-cloud and dedicated compute providers. Your job is to make sure that infrastructure is healthy, available, observable, and usable around the clock. - Take ownership of cluster health, GPU availability, monitoring, alerting, scheduling, access, diagnostics, and incident response. - Work closely with engineering leadership and infrastructure owners to standardize provisioning, operation, debugging, and scaling of compute across providers. - Directly impact how fast Inferact can build, test, and improve the systems powering vLLM. Qualifications - Bachelor's degree or equivalent experience in computer science, engineering, systems administration, or similar. - Hands-on experience administering large compute clusters, HPC environments, university or research clusters, supercomputing systems, or production GPU clusters. - Strong Linux systems administration fundamentals across networking, processes, storage, package management, shell scripting, logs, access control, and system debugging. - Experience operating GPU servers, including driver management, GPU health monitoring, node failures, memory errors, scheduler issues, and hardware diagnostics. - Experience with cluster scheduling and resource allocation using SLURM, Kubernetes, or equivalent tooling. - Ability to own urgent infrastructure incidents end-to-end when compute issues are blocking engineering teams. - Ability to automate operational workflows using Bash, Python, Ansible, Terraform, Helm, or similar tooling. Requirements - Experience operating GPU compute across providers such as Lambda, CoreWeave, Crusoe, Nebius, Together, Fireworks, RunPod, or similar environments. - Experience improving cluster utilization, reducing idle or unavailable GPU capacity, and debugging scheduling or resource contention issues. - Familiarity with high-performance GPU networking such as InfiniBand, RoCE, NVLink / NVSwitch, RDMA, NCCL, or equivalent systems. - Experience with storage for HPC or ML workloads, including NFS, Lustre, Ceph, distributed filesystems, or other high-throughput storage systems. - Experience managing secure access, identity, permissions, SSH, VPNs, bastion hosts, secrets, and basic infrastructure security hygiene. - Background in research computing, scientific computing, ML infrastructure, SRE, platform engineering, or infrastructure operations for engineering-heavy teams. Benefits - Generous health, dental, and vision benefits. - 401(k) company match.
Related Guides
Related Categories
Related Job Pages
More Administration Jobs
Disaster Recovery Grants Administrator – Reservist On Call
ICFWe are not a typical consulting firm and our people are not typical consultants.
• Review of the current obligated project worksheets for validation of the eligible scope of work and cost. • Validation of the submitted procurement, invoice and supporting documentation against the project scope of work to ensure alignment. • Research and review relevant regulations and policies that support eligibility or denial. • Interact with technical team as necessary to discuss concerns or challenges with recommendations. • Work on cost analyzation and reconciliation, including insurance alignment with client assets and funding. • Keep Project Manager and required stakeholders informed on issues, problems & resolutions. • Measure performance with key metrics. • Maintain and track data as required in reporting management information systems. • Apply analytical and evaluative techniques to the identification, consideration and resolution of a wide variety of grant and financial issues and problems. • Superior customer service skill set, ability to listen, facilitate and negotiate problems. • Carry out timely and accurate duties as requested with expertise in the area you are assigned. • As requested by ICF, travel to subrecipient site to assist in document collection and upload to DRS. • Coordinate and participate in resolution of project related issues and concerns. • Optimize procedures and maintain communication and focus. • Maintain and track each case as required in project report management information system. • Measure performance with key metrics. • Keep management team informed on issues, problems & resolutions. • Superior customer service skill set, ability to listen, facilitate and negotiate problems. • Expertise in area in which you are assigned. Carry out timely and accurate duties as requested. • Travel as required to client recovery and ICF sites as required and requested by ICF management.
Disaster Recovery Lead, Grants Administrator
ICFWe are not a typical consulting firm and our people are not typical consultants.
• Provide guidance, supervision and oversight to: Disaster Recovery Specialists, Closeout Specialists, and Grant Administrators as requested. May include training and presentations. • Lead FEMA grants review, reporting and documentation. • Review of the current obligated project worksheets for validation of the eligible scope of work and cost. • Perform QA/QC review and oversight as requested. • Validation of the submitted procurement, invoice and supporting documentation against the project scope of work to ensure alignment. • Research and review relevant regulations and policies that support eligibility or denial. • Interact with technical team as necessary to discuss concerns or challenges with recommendations. • Work on cost analyzation and reconciliation, including alignment with client assets and funding, measuring performance with key metrics. • Keep Project Manager, immediate supervisor and required stakeholders informed on issues, problems & resolutions. • Maintain and track data as required in reporting management information systems. • Apply analytical and evaluative techniques to the identification, consideration and resolution of a wide variety of grant and financial issues and problems. • As requested by ICF, travel to subrecipient site to assist in document collection and upload to DRS. • Coordinate and participate in resolution of project related issues and concerns. • Optimize procedures and maintain communication and focus. • Maintain and track each case as required in project report management information system. • Superior customer service skill set, ability to listen, facilitate and negotiate problems. • Expertise in area in which you are assigned. Carry out timely and accurate duties as requested. • Travel as required to client recovery and ICF sites as required and requested by ICF management.
Senior SAP Application Administrator
Enterprise Horizon Consulting GroupEnterprise Horizon solves complex IT and business challenges for the DoD, Federal, and Private sectors.
• Configure, monitor, tune, and troubleshoot the SAP technical environment, ensuring optimal system performance and reliability. • Manage and execute the SAP transport system, including scheduling, transport migration, and issue resolution. • Collaborate with technical teams to diagnose and resolve SAP transport and source code issues. • Lead the installation, upgrade, patching, and ongoing maintenance of SAP systems and related components. • Evaluate, design, and maintain interfaces between SAP and external systems, ensuring data integrity and seamless integration. • Maintain the SAP Data Dictionary, database objects, and related system architecture. • Oversee the migration of SAP database and application configurations into production environments. • Analyze, develop, and maintain data architectures and process models within SAP. • Produce and update system documentation, technical procedures, and architectural diagrams. • Partner with IT and business stakeholders to modernize and optimize the SAP environment, supporting long‑term system evolution.
SmartPlant Electrical SME / Administrator
ReVisionz Inc.Helping process companies reduce millions of dollars lost every year due to deficient asset and process safety data
Role Description The SmartPlant Electrical (SPEL) SME / Administrator plays a critical role in enabling reliable, standards-driven electrical engineering execution across capital projects and operational environments. This role is accountable for configuring, administering, and supporting electrical engineering design systems so that project teams and owner-operators can trust their data, drawings, and deliverables throughout the asset lifecycle. Acting as a subject matter expert and client-facing technical advisor, this role bridges engineering standards, business processes, and digital tools. Success is measured by stable system performance, high-quality electrical data, confident end-user adoption, and consistent alignment with client engineering and operational requirements. This role will support ReVisionz on a U.S. nuclear project. As such, applicants must meet all “Must Have Requirements” in order to be eligible for consideration. Core Responsibilities - SPEL Design System Administration & Configuration: Configure and administer SPEL in alignment with client standards, project requirements, and operational workflows, including reference data, catalogs, templates, reports, and system services. - Data Integrity, Quality & Lifecycle Support: Maintain the integrity, accuracy, and completeness of electrical engineering data through audits, validations, bulk data operations, and corrective actions across projects and operations. - User Enablement & Adoption: Support engineers, designers, and operations users through responsive system support, training, and documentation that enable confident and consistent use of electrical design tools. - Documentation, Standards & Governance: Develop and maintain system documentation, work instructions, and procedures that support engineering standards, data governance, and repeatable delivery practices. - Integration & Process Enablement: Collaborate with solution architects and data specialists to support integrations between electrical design tools and related engineering, asset, or information management systems. - Client Engagement & Delivery Support: Serve as a trusted technical advisor to client stakeholders, supporting defined scopes of work and contributing to delivery outcomes that align with schedule, quality, and operational objectives. Qualifications - Hands-on experience administering electrical engineering design tools within Owner/Operator and/or EPC environments. - Demonstrated exposure to electrical engineering data, drawings, reports, and deliverables across project and operations lifecycles. - Experience supporting engineering standards, work processes, and data governance practices. - Background working with relational databases (e.g., SQL Server, Oracle) and performing data validation or integrity checks. - Experience developing user documentation, procedures, and training materials. - Exposure to system integrations, data migration, or multi-tool engineering environments. - Industry experience in Energy, Chemicals, Mining, or related asset-intensive sectors. Requirements - Must be a Canadian or U.S. citizen and resident. Candidates with citizenships from other countries must be listed in Appendix A to Part 810, Title 10 Generally Authorized Destinations to be eligible. - Must pass a 5-panel drug and alcohol test and a NATO security/background check. - Willing to complete an SPEL skills assessment following a successful interview. - Have strong and recent (last 3 years) proficiency in SPEL Administration. What You Bring - A calm, methodical presence that builds confidence in complex technical environments. - A client-first mindset that balances engineering rigor with practical delivery needs. - A collaborative approach that elevates teams, strengthens adoption, and improves outcomes. Agency / Search Firm Notice Thank you for your interest in supporting ReVisionz. At this time, we are managing this search directly and are not accepting unsolicited candidate submissions from external search firms or recruitment agencies for this role. Any unsolicited resumes or profiles submitted without a prior written agreement will be considered the property of ReVisionz, and no placement fee will be paid. We respectfully ask that agencies do not contact our hiring team or employees regarding this posting. At ReVisionz, we embrace diversity and welcome applications from all qualified candidates. While we appreciate every submission, only those under consideration will receive further communication. We foster a respectful, supportive, and inclusive environment where accessibility, diversity, and equal opportunity are paramount. Applicants requiring accommodations throughout the recruitment process are encouraged to communicate their needs. As a proud advocate of diversity and equal opportunity, ReVisionz does not discriminate on the basis of race, color, religion, sex, national origin, age, sexual orientation, disability, or any other characteristic protected by applicable laws. Our selection decisions are solely based on job-related factors.



