Nscale is the Hyperscaler engineered for AI.
Principal Network Architect- AI Infrastructure
Location
United States
Posted
5 days ago
Salary
0
Seniority
Lead
Job Description
Principal Network Architect- AI Infrastructure
Nscale
Role Description Nscale is seeking a Network Architect Engineer to lead the evolution, reliability, and operational excellence of our global AI networking infrastructure. This role sits at the core of Nscale’s platform, where network performance directly impacts AI training outcomes. You will act as a technical authority across large-scale RDMA / Infiniband / RoCE fabrics, driving automation, availability improvements, and system-level design across a globally distributed GPU cloud. You will combine deep network protocol-level networking expertise with strong software and automation skills to operate and scale one of the most demanding AI networking environments in the industry. What You’ll Do - Technical Leadership & Strategy - Own the technical direction and operational lifecycle management of Nscale’s high-performance RDMA network fabrics. - Define long-term architecture, reliability strategy, and operational standards for AI interconnect networks. - Lead availability and performance improvement initiatives across globally distributed GPU clusters. - Act as a technical authority (SME) across networking, influencing platform-wide decisions. - Network Engineering & Operations - Support design, build, and evolve large-scale Infiniband and RoCE fabrics. - Drive deep debugging and resolution of complex cross-layer issues (hardware, firmware, kernel, distributed workloads). - Lead incident response and postmortems, ensuring systemic fixes and long-term improvements. - Define and enforce standards across: - Congestion control and traffic engineering. - Routing (BGP, ECMP, fabric-level routing strategies). - Firmware lifecycle and change management. - Network observability and telemetry. - Automation & Systems Development - Develop and scale automation frameworks for network provisioning, validation, and operations. - Build tooling to support high-reliability, low-touch network operations at scale. - Improve operational efficiency across hundreds of thousands of endpoints and high-throughput links. - Cross-Functional Leadership - Lead complex technical initiatives across Network, SRE, Compute, and Platform teams. - Serve as technical lead on critical programs, coordinating engineers and stakeholders. - Influence product and infrastructure roadmaps based on operational insights and customer needs. - Mentor senior engineers and raise the bar for technical rigor and execution. Qualifications - 10+ years of experience in network engineering in hyperscale, AI, or HPC environments. - Deep expertise in RDMA, Infiniband, and/or large-scale RoCE fabrics. - Strong understanding of: - RDMA internals and performance tuning. - Congestion control and fabric failure modes. - Distributed system communication patterns. Requirements - Expert-level knowledge of data center networking protocols (BGP, OSPF, ECMP). - Proven ability to debug multi-layer issues across network, system, and application layers. - Strong programming/scripting skills for automation (Python, Go, etc.). - Experience designing high-scale, highly available network systems. Leadership & Impact - Demonstrated ability to lead complex technical programs without direct authority. - Experience acting as a senior escalation point for critical production issues. - Strong ability to drive cross-team alignment and execution. - Systems-level thinking balancing performance, reliability, scalability, and cost. Nice to Have - Experience with NVIDIA / Mellanox networking platforms. - Familiarity with distributed AI training frameworks and GPU communication patterns. - Experience building network observability systems at scale. - Background influencing infrastructure strategy in high-growth environments. Equal Opportunities Statement We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds. If there’s anything we can do to accommodate your specific situation, please let us know. The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
Related Guides
Related Categories
Related Job Pages
More Solutions Engineer Jobs
• The discovery, development, maturation and eventual integration of rover applicable technologies with government and commercial partners. • Cross discipline coordination and integration of design efforts. • Supporting business development in proposal responses, customer development and market exploration, leveraging a comprehensive overview of technical capabilities of the rover. • Supporting the rover payloads customer team from a technical integration standpoint.
Senior Solutions Architect
GitLabBuild software faster. The One DevOps Platform enables your entire org to collaborate around your code. We're hiring.
• Guide technical discovery, product demonstrations, and validation activities, including proofs of value, to confirm technical fit, accelerate evaluation milestones, and improve technical win rates for GitLab’s AI-powered DevSecOps platform. • Own the technical evaluation process for complex opportunities, including solution design, workshop facilitation, proof of concept or proof of value execution, and technical materials for tenders, audits, and assessments, with accountability for clear success criteria and documented evaluation outcomes. • Develop end-to-end technical strategies for assigned accounts that expand platform adoption, reduce delivery risk, and enable multi-team and multi-year transformation milestones. • Collaborate with Account Executives and regional sales teams in the East territory to shape account and territory plans, support qualification, and align technical strategy to customer priorities, opportunity progression, and business outcomes. • Advise technical practitioners and business leaders on modern software development, continuous integration, continuous deployment, security, cloud, and platform adoption practices to improve delivery efficiency, strengthen security outcomes, and increase adoption of GitLab workflows. • Drive competitive analysis and positioning for complex opportunities by using market, industry, and customer context to clarify GitLab’s differentiated approach and improve technical win readiness. • Represent the voice of the customer by sharing product feedback, use cases, integration needs, and field insights with Product Management, Engineering, Sales, and Marketing to improve roadmap decisions, integration readiness, and field effectiveness. • Mentor other Solutions Architects, contribute to team learning initiatives, improve technical collateral and documentation, and share subject matter expertise through GitLab’s common collaboration channels to increase team readiness, reuse of technical assets, and consistency across engagements.
• Act as the technical voice of the company for our partner network • Design and deliver high-impact technical training programs • Provide architectural guidance and hands-on support during complex POCs • Serve as the trusted liaison between channel partners and internal Product Management
• Partner with the sales team as a technical expert during discovery, scoping, and solution design while owning the technical narrative from first call through close. • Lead and deliver tailored product demonstrations and proof-of-concept engagements that map Worth's capabilities directly to a prospect's compliance, onboarding, and underwriting workflows. • Develop and execute detailed implementation plans, timelines, and success criteria for new client deployments, ensuring smooth handoffs and a fast time-to-value. • Serve as the primary technical point of contact during onboarding, coordinating across internal engineering, product, and data teams to resolve integration questions and unblock client teams. • Design and deliver customer training programs and technical documentation to drive platform adoption and self-sufficiency. • Capture and synthesize client feedback post-implementation, partnering with Product to translate real-world use cases into roadmap input. • Collaborate with Customer Success Managers to identify expansion opportunities, close training gaps, and build enablement materials that support long-term account health. • Stay current on FinTech, RegTech, and AI trends particularly in KYB/KYC, AML, fraud detection, and SMB lending and bring that context into client conversations and internal strategy discussions.




