Principal Platform Software Engineer – RAS
Location
California
Posted
138 days ago
Salary
$272K - $431.3K / year
Seniority
Lead
Job Description
Principal Platform Software Engineer – RAS
NVIDIA
• Drive next generation fleet management solutions for scaling AI infrastructure using GPUs and Grace solution from Nvidia • Work with customers, product management and other architects to narrow down on requirements for implementation • Bring up clarity on architecture for fleet health monitoring and fault-remediation solution at scale • Work with customers and other architects, understand their requirements on health monitoring • Detailed architecture, do POCs to validate architecture • Educate customers about product architecture and take feedback • Write architecture specs, design documents and own end to end delivery of product • Do code review for the code produced because of architecture specs • Ensure product is properly tested by working with the development team • Drive product life cycles with QA teams to productize the code and be responsible as a product owner • Articulate requirements as part of Jira and bug management tools and work out an end-to-end execution plan • Contribute to all phases of product development, from product definition, architecture, and design, through implementation, debugging, testing and early customer support.
Job Requirements
- BS, MS, or PhD in EE/CS or related field of education (or equivalent experience)
- 15+ years hands-on coding experience
- Strong knowledge of time series databases like Influxdb & Prometheus
- Strong knowledge of building and consuming REST APIs (Redfish is big plus)
- Strong knowledge of telemetry visualization solutions like Grafana & Influx
- Strong knowledge of firmware architecture, optimize firmware for low latency APIs
- Strong knowledge of analyzing algorithms for time & space complexity and project system resource requirements
- Proven record of solutions for scalability
- Strong and demonstrable skill in C/C++ and Python
- Experience programming and debugging skills for server platforms
- Experience in SCM (e.g., Git, Perforce) and project management tools like Jira.
Benefits
- Equity
- Benefits
Related Guides
Related Job Pages
More Full-stack Engineer Jobs
Senior Full Stack Software Engineer – ClickPipes Platform
ClickHouseClickHouse, Inc. is a database management system that allows users to generate analytical reports using real-time SQL queries. The company’s technology works
• Build scalable UI systems that handle large datasets, async operations, and real-time state changes • Own features end-to-end, from initial design through production launch and long-term maintenance • Collaborate closely with product, design, and other engineering teams to deliver new features • Partner on API design and system contracts • Participate in an on-call rotation to support ClickPipes in production, helping diagnose incidents, and mitigate issues • Take ownership of production quality, including monitoring, debugging, performance tuning, and reliability improvements
Senior Full Stack Software Engineer – ClickPipes Platform
ClickHouseClickHouse, Inc. is a database management system that allows users to generate analytical reports using real-time SQL queries. The company’s technology works
• Build scalable UI systems that handle large datasets, async operations, and real-time state changes • Own features end-to-end, from initial design through production launch and long-term maintenance • Collaborate closely with product, design, and other engineering teams to deliver new features • Partner on API design and system contracts • Participate in an on-call rotation to support ClickPipes in production, helping diagnose incidents, and mitigate issues • Take ownership of production quality, including monitoring, debugging, performance tuning, and reliability improvements
Senior Software Engineer II, Storage
InstacartInstacart invites the world to share love through food. This is how homemade is made.
• be a senior engineer in the team responsible for Storage platforms, with ownership and autonomy • work closely with other application engineering teams and internal stakeholders, owning a large part of the process • ship high quality, scalable and robust solutions with a sense of urgency • have the freedom to suggest and drive high-impact initiatives related to Storage solutions
Staff Software Engineer – GenAI Innovations
Abnormal SecurityAbnormally-Precise, Cloud-Native Email Security
• Architect the 'Agent-Ready' infrastructure, building the sandboxed environments and headless interfaces that allow AI agents to execute safely without human hand-holding • Embed with product teams to identify friction in the 'Plan → Code → Test' loop, shipping 0-to-1 internal tools that ruthlessly automate bottlenecks • Replace manual human validation with deterministic proof; build the validation frameworks that allow agents to verify their own work • Act as a technical scout for the organization, critically evaluating emerging agentic frameworks to inform our build-vs-buy strategy



