Nscale is the Hyperscaler engineered for AI.
Senior Software Engineer - Fleet Management
Location
Worldwide
Posted
4 days ago
Salary
0
Seniority
Senior
Job Description
Senior Software Engineer - Fleet Management
Nscale
Role Description We're hiring a Senior Software Engineer to build our Fleet Manager platform — the workflow automation system that provisions, tests, and remediates GPU nodes and network switches at scale. You'll build foundational Python-based automation systems that manage the entire lifecycle of our compute infrastructure: - Device enrolment - Burn-in testing - Network configuration - GPU health monitoring - Self-healing capabilities This role is for someone obsessed with distributed systems at scale, infrastructure reliability, scalability, security, and continuous improvement. What you'll do: - Build workflow automation systems for GPU node and network switch lifecycle management at scale - Design foundational platform components with established software patterns that others build on - Implement device provisioning, burn-in testing, network configuration, and hardware health validation workflows - Integrate with datacenter infrastructure management systems, cloud orchestration platforms, and bare metal provisioning tools - Build distributed workflow orchestration systems to coordinate complex automation tasks across the fleet - Drive technical strategy for reliability, observability, incident response, and operational excellence - Partner with Infrastructure, Platform, and SRE teams to automate hardware lifecycle operations - Use AI tools to accelerate delivery while maintaining architectural coherence Qualifications - 5+ years software engineering experience building and operating production systems, with focus on infrastructure automation or workflow tooling - Strong proficiency in Python (Fleet Manager is built entirely in Python) - Driven by building distributed systems at scale, infrastructure reliability, scalability, security, and continuous improvement - Technical expertise: quickly understanding systems design tradeoffs, keeping track of rapidly evolving software systems - Use AI tools like Claude or Cursor as a core part of your development workflow - Delivered automation systems from ambiguous requirements to operational systems in production, with hands-on day 2 operations experience (monitoring, incident response, performance optimisation) - Strong problem-solving skills and ability to work independently in a fast-paced, high-agency environment - Excellent communication skills to build consensus with stakeholders, both internally and externally Requirements - Experience with workflow orchestration tools like Temporal, Airflow, Prefect, or similar - Hands-on experience with infrastructure tooling: DCIMs, NetBox, OpenStack, or ERP systems - Bare metal provisioning and automation: MAAS, Ironic, IPMI, PXE boot, or network automation - Experience building hardware lifecycle automation: provisioning, validation, testing, or remediation workflows - GPU infrastructure experience: health monitoring, burn-in testing, or cluster management - HPC and networking: datacenter topology, high-performance interconnects (InfiniBand, RoCE) - Deep knowledge of Kubernetes, Infrastructure as Code (Terraform, Pulumi), AWS, and GCP - Open-source contributions in infrastructure automation or cloud-native tooling Benefits - Collaborative, supportive, and innovative environment where your contributions spark real impact - Highly competitive package (base + equity) with reviews every 12 months - Dynamic progression plan tailored to your ambitions - Human-First Flexibility: autonomy to shape your day around life's moments - Thriving remote-first team with seamless virtual collaboration
Related Guides
Related Job Pages
More Full-stack Engineer Jobs
• Help design the systems behind our Performance Medicine product, from the first technical plan through to the code that ships. • Build the Ruby services that capture and move athlete medical data, where accuracy and privacy carry real weight. • Take on the reliability and scaling challenges of a platform used by thousands of teams across the world's top leagues. • Work directly with product, design, and the medical and performance staff who rely on the product, and shape it around how they actually work.
• Help design the systems behind our Performance Medicine product, from the first technical plan through to the code that ships. • Build the Ruby services that capture and move athlete medical data, where accuracy and privacy carry real weight. • Take on the reliability and scaling challenges of a platform used by thousands of teams across the world's top leagues. • Work directly with product, design, and the medical and performance staff who rely on the product, and shape it around how they actually work.
Role Description Hledáme nového kolegu nebo kolegyni, který/á se prostřednictvím Y Softu připojí k týmu Believer a pomůže formovat další růst produktu. CO NABÍZÍME: - Startupové prostředí se zázemím stabilní technologické firmy. - Produkt, který už používají reální zákazníci. - Malý tým, ve kterém je tvoje práce vidět a tvoje rozhodnutí mají dopad. - End-to-end ownership. Od nápadu až po produkci. - Přímou spolupráci s lidmi, kteří určují směr produktu. - Moderní vývojové postupy včetně AI-assisted developmentu. - Možnost ovlivňovat produkt i technické směřování. - Home office není benefit, ale standard. NA ČEM BUDEŠ PRACOVAT: - Vývoj funkcionalit napříč celým stackem v TypeScriptu. - Frontend v Next.js a Reactu. - Návrh a implementace serverless API v Azure Functions. - Funkcionality kolem darů, předplatného, komunikace, událostí, onboardingu a správy komunit. - Zvyšování spolehlivosti, monitoringu a produkční připravenosti platformy. - Zavádění AI funkcionalit do produktu. - Aktivní účast na produktových diskusích, nejen samotné implementaci. - Některé z témat na roadmapě: - Disaster recovery a monitoring - Opakované dary a správa předplatného - Integrace e-mailové a SMS komunikace - Vylepšení plánování služeb - AI funkcionality a integrace AI agentů Qualifications - Velmi dobrou znalost TypeScriptu. - Zkušenost s Reactem a/nebo Next.js. - Schopnost pracovat samostatně a převzít odpovědnost za svěřené oblasti. - Komfort v malém týmu, kde každý přispívá víc než jen psaním kódu. - Zájem o tvorbu produktů, ne jen plnění ticketů. - Ochotu diskutovat nápady, zpochybňovat předpoklady a podílet se na rozhodování. - Schopnost zapojovat AI nástroje do vývoje tak, aby reálně zrychlovaly práci, ale zároveň nenarušovaly bezpečnost, kvalitu kódu ani budoucí udržitelnost řešení. - Zkušenost s AI-assisted developmentem a moderními vývojářskými nástroji. - Zvídavost a chuť učit se nové věci. Requirements - Zkušenost s Azure nebo serverless architekturou. - Zkušenost s Progressive Web Apps (PWA). - Integrace platebních bran nebo práce se Stripe. - Zkušenost s CI/CD. - Infrastructure as Code (Bicep, Terraform apod.). - Produktové myšlení a zájem o to, proč se jednotlivé funkcionality staví. Technologies We Use - Frontend: Next.js, React, TypeScript, Fluent UI, Redux Toolkit, react-intl, Progressive Web App (PWA) - Backend: Node.js, TypeScript, Azure Functions, Azure Table Storage, Blob Storage, Cognitive Search, Auth0, Stripe, Mailersend, QuickBlox - Infrastruktura a tooling: Azure, Bicep (Infrastructure as Code), GitHub Actions, Auth0, Monorepo architektura - Kvalita: Jest, Playwright, CodeRabbit AI-powered code reviews - Vývoj: Cursor, Claude, AI-assisted development Pokud hledáš místo, kde můžeš ovlivnit produkt od prvního dne, pracovat na reálných problémech a vidět svůj kód používaný skutečnými uživateli každý týden, ozvi se nám.
Remote PC, Modem, Headset, Light Bonus, Monday to Saturday, No Experience Required - Apply Now!
Covisian PerúComo importante multinacional seguimos en crecimiento.
Role Description ¡Buscamos #COVISIANlovers con ganas de crecer! Únete a uno de los mejores lugares para trabajar en Perú, puesto 16° del GPTW, líder en servicios de Call Center. Como importante multinacional seguimos en crecimiento y actualmente estamos en búsqueda de: - ¡𝐓𝐑𝐀𝐁𝐀𝐉𝐀 𝐑𝐄𝐌𝐎𝐓𝐎 𝐃𝐄 𝐋𝐔𝐍𝐄𝐒 𝐀 𝐒𝐀𝐁𝐀𝐓𝐎 𝐃𝐄 𝟗𝐀𝐌 𝐀 𝟔𝐏𝐌! - Horario Laboral (48 horas a la semana) - Franja horaria de 9:00 am a 6:00 pm (1 hora de break diario) - Descanso Fijo: Domingos - Feriados compensables Qualifications - Con o Sin Experiencia - Edad desde 18 años a más - Secundaria completa - Contar con Pc o Laptop para la formación virtual. Requirements - Disponibilidad presencial para asistir solo 6 días bajo contrato a la sede Encarnación, en el Centro de Lima, referencia a una cuadra de la plaza San Martín. Benefits - Sueldo Fijo S/. 1130 - Bono de Luz S/. 90 - Bono de Asistencia S/. 100 - Comisiones libres (más de S/.700) - Asignación Familiar S/. 113 - Incentivos, premios y reconocimientos. - Te enviamos la Pc o Laptop, modem de internet, headset. - Planilla Completa con todos los beneficios de Ley (CTS, Gratificación, Liquidación, Asignación Familiar, Seguro Essalud, etc.) - Becas y descuentos educativos. - Descuentos en diferentes marcas. - Convenios financieros. - Capacitación virtual y corta (solo 7 días) - FIRMA DE CONTRATO DIRECTO - Subvención de S/700 para tus estudios - Atención médica gratis en clínica Auna - Medicamentos gratis - Adelanto de sueldo sin cobro de comisiones - Días adicionales de descanso - Línea de carrera laboral ¡Crece con Covisian!

