Nscale is the Hyperscaler engineered for AI.
Staff HPC Systems Software Engineer
Location
United States
Posted
4 days ago
Salary
$225K - $275K / year
Seniority
Lead
Job Description
Staff HPC Systems Software Engineer
Nscale
Role Description We’re hiring a Staff HPC Systems Software Engineer to define the technical direction and evolution of a core HPC platform domain at Nscale. In this role, you will operate beyond a single team, shaping how multiple teams build, automate, and run Slurm-based capabilities within Nscale’s wider cloud-native platform. You’ll work across engineering boundaries to bring coherence to architecture, interfaces, lifecycle models, and operational approaches, while partnering closely with teams working on platform tooling, infrastructure APIs, identity systems, and Kubernetes-adjacent systems. This is a high-impact staff-level role for someone who combines deep hands-on software engineering with strong systems judgement. Your work will help ensure Nscale’s HPC services are robust, supportable, and maintainable, while creating leverage through shared patterns, reusable implementations, and clear technical direction across ambiguous, business-critical problem spaces. What you'll be doing - Domain Architecture & Technical Direction - Own and evolve the technical direction for a defined HPC systems domain, such as Slurm platform architecture, scheduler integrations, cluster lifecycle, workload environments, or service automation. - Make architectural decisions that balance software quality, operational realities, customer needs, and long-term maintainability. - Define how proven Slurm implementations should be packaged, automated, and exposed as a service. - Resolve ambiguity around ownership, interfaces, lifecycle boundaries, and operating models across teams. - Act as the technical escalation point for the most complex issues within the domain. - Cross-Team Engineering Leverage - Establish shared patterns and standards for automation, service lifecycle management, observability, reliability, and supportability across the HPC platform. - Drive cross-team design for integrations between Slurm, Kubernetes-adjacent systems, infrastructure APIs, identity systems, and platform tooling. - Create reusable modules, automation, deployment patterns, and reference implementations that increase engineering leverage. - Identify and correct avoidable technical divergence, duplicated effort, and fragile operating models. - Ensure domain designs reflect the realities of GPU scheduling, HPC networking, performance isolation, and production operations. - Delivery, Reliability & Influence - Lead technically critical initiatives spanning 2–4 teams or a defined HPC platform area. - Unblock delivery by clarifying technical direction and reducing ambiguity in complex system design problems. - Contribute hands-on where needed to de-risk or accelerate critical work. - Influence engineering teams without formal authority through strong judgement, design clarity, and practical solutions. - Partner with adjacent cloud-native software engineers so HPC implementations build on shared platform patterns rather than separate ones. KPIs - Technical direction across a defined HPC domain - Delivery of critical initiatives across 2–4 teams - Reduction in technical divergence and duplicated effort - Reliability and supportability of Slurm-based HPC services Qualifications - Extensive experience designing and building production software and automation for HPC systems, especially Slurm-based environments. - Strong track record of writing maintainable, testable, and resilient software in Go, Python, or similar languages. - Proven ability to define technical direction across a domain spanning multiple teams or services. - Strong understanding of Slurm internals, scheduler behaviour, cluster lifecycle concerns, and operational trade-offs. - Strong practical understanding of GPU-backed infrastructure and HPC networking, including InfiniBand, RoCE, RDMA, and performance-sensitive workload characteristics. - Experience integrating HPC systems with cloud-native platforms, APIs, or service delivery models. - Experience creating engineering leverage through standards, reusable patterns, shared tooling, and architectural clarity. - Strong judgement in balancing short-term delivery with long-term platform health and supportability. - Strong written and verbal communication skills, with the ability to align multiple teams around a coherent technical direction. - Experience with other schedulers or batch systems such as Kueue is valuable. Benefits - Highly competitive US compensation package (base + bonus + equity), with performance reviews every 12 months. - Join one of the fastest-growing AI infrastructure companies — your chance to directly shape how global AI capacity is planned and deployed. - Expect a dynamic progression plan tailored to your ambitions. Grow by leading critical cross-functional initiatives and shaping capital strategy — always with our full support. - Human-First Flexibility: We treat you as humans first. Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments.
Related Guides
Related Categories
Related Job Pages
More Systems Engineer Jobs
Senior Product Manager – Database Change Governance, Security & Ecosystem Integrations
LiquibaseFast database change. Fluid delivery.
• Define and own roadmap and execution plans for ecosystem integrations, deployment connectors, policy content, compliance, security, and risk evaluation capabilities • Translate Liquibase’s strategic direction into clear product outcomes, phased initiatives, measurable business impact, and differentiated market positioning • Help establish Liquibase as the enterprise leader in Database Change Governance by clearly articulating customer pain, market opportunity, competitive differentiation, and business value • Engage deeply with enterprise customers, prospects, field teams, partners, and internal stakeholders to understand pain points across DevOps, data platforms, security, compliance, audit, and enterprise change management • Turn strategy into clear requirements, epics, user stories, acceptance criteria, release plans, and measurable success criteria • Partner closely with engineering, architecture, design, documentation, security, customer success, sales, marketing, and leadership to deliver high-quality product capabilities to market • Use customer evidence, product telemetry, market analysis, competitive insight, adoption data, and business impact to prioritize product investments • Define and monitor KPIs that measure product success, including governance adoption, policy coverage, integration adoption, compliance workflow efficiency, enterprise expansion impact, and time-to-value for new enterprise customers • Support positioning, messaging, enablement, customer validation, and go-to-market execution in partnership with product marketing, sales, customer success, and leadership
Founding Lead Engineer – Principal Systems Architect
OpenTeamsOpenTeams is your single source for everything open source.
• Work side-by-side with the concept architect to convert advanced system ideas into technical specifications, service maps, data models, APIs, schemas, tests, and deployment plans. • Translate verbal and written design guidance into architecture diagrams, implementation backlogs, acceptance criteria, and working prototypes. • Identify ambiguity, missing assumptions, engineering risks, security issues, and implementation conflicts. • Help turn an evolving concept architecture into reproducible, testable, maintainable software. • Build production-grade Python services, APIs, data pipelines, background workers, and orchestration logic. • Design clean service boundaries for ingestion, entity resolution, evidence management, review workflows, reporting, audit logging, and model integration. • Design and implement relational schemas, graph models, object-storage structures, retrieval indexes, and audit records. • Build canonical identity and entity-linking systems that reconcile conflicting real-world records. • Support relationship topology, ownership mapping, provider-network analysis, and source-conflict preservation. • Integrate model-serving infrastructure such as vLLM, KServe, Ray Serve, and Hugging Face, or equivalent tools where appropriate. • Deploy services using Docker, Kubernetes, Helm, GitOps, CI/CD, RBAC, secrets management, observability, and secure environment practices.
Business Systems Analyst III
Computer Task Group, IncCTG, a Cegeka company, is at the forefront of digital transformation, providing IT and business solutions that accelerate project momentum and deliver desired value. Over nearly 60 years, we have earned a reputation as a faster and more reliable, results-driven partner. Our vision is to be an indispensable partner to our clients and the preferred career destination for digital and technology experts. CTG leverages the expertise of over 9,000 team members in 19 countries to provide innovative solutions. Together, we operate across the Americas, Europe, and India, working in close cooperation with over 3,000 clients in many of today's highest-growth industries. For more information, visit www.ctg.com . Our culture is a direct result of the people who work at CTG, the values we hold, and the actions we take. In other words, our people define our culture. It's a living, breathing thing that is renewed every day through the ways we engage with each other, our clients, and our communities. Part of our mission is to cultivate a workplace that attracts and develops the best people. CTG will consider for employment all qualified applicants including those with criminal histories in a manner consistent with the requirements of all applicable local, state, and federal laws. CTG is an Equal Opportunity Employer. CTG will assure equal opportunity and consideration to all applicants and employees in recruitment, selection, placement, training, benefits, compensation, promotion, transfer, and release of individuals without regard to race, creed, religion, color, national origin, sex, sexual orientation, gender identity and gender expression, age, disability, marital or veteran status, citizenship status, or any other discriminatory factors as required by law. CTG is fully committed to promoting employment opportunities for members of protected classes.
Role Description CTG is seeking a Business Systems Analyst III to support enterprise technology governance initiatives, with a strong focus on API governance, business analysis, requirements gathering, testing, and stakeholder collaboration. This role will work closely with business leaders, technology teams, cybersecurity, architecture, risk management, and software engineering groups to define standards, facilitate projects, and ensure compliance with governance policies and regulatory requirements. Location: Remote Duration: 12 Months Key Responsibilities - Support and enhance the organization's API governance program through documentation, standards development, and process improvement initiatives. - Gather, analyze, and document business and technical requirements for moderately complex projects. - Facilitate discussions between business units and technology teams to ensure alignment on project objectives and application changes. - Develop functional specifications, test plans, test scripts, and execute testing activities, including User Acceptance Testing (UAT). - Coordinate implementation activities and post-implementation reviews to ensure successful deployment of application enhancements. - Identify governance gaps and recommend improvements to controls, policies, standards, and best practices. - Collaborate with cybersecurity, architecture, risk, and software engineering teams on governance and compliance initiatives. - Monitor project scope, regulatory requirements, and adherence to corporate technology standards. - Research and analyze business and technical data to support solution design and decision-making. - Mentor junior analysts and provide guidance on analysis methodologies and application functionality. Qualifications - Minimum combination of 4 years of higher education and/or business analysis, systems analysis, or operational analytics experience. - Experience coordinating data gathering and data definition efforts across multiple business and technical teams. - Experience facilitating discussions among cybersecurity, architecture, IT, risk, and software engineering stakeholders. - Strong experience creating functional specifications, test plans, and testing documentation. - Excellent analytical, problem-solving, and organizational skills. - Proficiency with Microsoft Office and business productivity applications. - Strong verbal and written communication skills. Requirements - Bachelor's degree in Business, Information Systems, Computer Science, or a related field. - Experience supporting API governance or broader technology governance programs. - Experience drafting or implementing governance policies, standards, and controls. - Knowledge of application development lifecycles and integrated business systems. - Ability to manage multiple projects and priorities simultaneously. - Strong presentation and stakeholder management skills. - Excellent verbal and written English communication skills and the ability to interact professionally with a diverse group are required. Benefits - The expected base salary for this position ranges from $26.00 to $43.34/hour. - Salary offers are based on a wide range of factors including relevant skills, training, experience, education, market factors, and where applicable, licensure or certifications obtained. - In addition to salary, a competitive benefit package is also offered. Company Description CTG, a Cegeka company, delivers IT and business solutions that enhance clients’ digital agility, empowering them to seize new opportunities and overcome any challenge. Backed by more than 60 years’ experience and a commitment to being a reliable, results-driven partner, we work shoulder to shoulder with clients to shape digital together. Our vision is to be an indispensable partner to our clients and the preferred career destination for digital and technology experts. With more than 9,000 team members in over 15 countries, we combine global expertise with local insight to deliver innovative solutions. We operate across the Americas, Europe, and India, working with over 3,000 clients in many of today's highest-growth industries. Together, we shape what’s next—working shoulder to shoulder to deliver impactful solutions for our clients and society. Our culture is built by the people who work at CTG, the values we hold, and the actions we take.
Senior Systems Engineer
Hyper Recruitment SolutionsA Global multi-award winning life science recruitment consultancy driven by a core vision of "changing lives."
• Serve as a customer-facing technical expert responsible for designing, specifying, and supporting electrical power distribution systems • Review and interpret customer specifications, one-line diagrams, Basis of Design documents, and technical requirements • Design and support integrated medium-voltage and low-voltage power distribution systems • Translate customer and AI workload requirements into scalable electrical system architectures • Serve as the primary technical point of contact throughout the project lifecycle • Lead customer meetings, technical reviews, design walkthroughs, and clarification discussions • Provide technical training and knowledge transfer related to system operation, functionality, and architecture • Collaborate with Sales and Applications Engineering teams to develop technical proposals • Participate in customer site visits, facility tours, and technical evaluations




