Senior Systems Software Engineer, Data Center Infrastructure Management – EngOps
Location
California + 1 moreAll locations: California | Texas
Posted
72 days ago
Salary
$152K - $287.5K / year
Seniority
Senior
Job Description
Senior Systems Software Engineer, Data Center Infrastructure Management – EngOps
NVIDIA
• Take ownership of daily cluster failures and issues, troubleshooting them promptly to maintain optimal cluster availability and performance. • Manage updates to the site controller management nodes. • Manage the rollout and rollback of cluster software and firmware updates, ensuring smooth transitions and minimal disruptions.
Job Requirements
- BS or MS in Computer Science, Computer Engineering, Electrical Engineering, or a related field, or equivalent experience.
- 5+ years of hands-on experience in deploying and administrating clusters, servers, switches, and related infrastructure.
- Experience with deployment and configuration of operating systems, computer networks, and high-performance applications.
- Proven ability to work effectively with developers and test engineers across different teams and time zones.
- Experience deploying services in Kubernetes.
- Datacenter or computer architecture experience is required—you should understand server, rack, and network topologies, as well as hardware/firmware/software interactions.
- Background with hardware management protocols (Redfish, IPMI, BMC) and firmware update automation.
- Experience configuring and debugging complex data center networks.
- Experience developing scripts to automate recovery actions for management controllers and datacenter systems.
Benefits
- Equity
- Benefits
Related Guides
Related Job Pages
More Full-stack Engineer Jobs
Software Engineer III, Community Builders
RedditReddit is an online platform utilized by thousands of communities to connect and converse about a wide variety of topics, including TV and movie fan theories, s
• Design, develop, and maintain backend application services while ensuring the performance, security, and scalability of our systems. • Work collaboratively with product managers, designers, data scientists and other engineers to deliver high-quality products. • Contribute to the full development cycle: technical design, development, test, experimentation, analysis, and launch. You’ll be writing design docs and code, and get valuable feedback on your work. • Continuously learn and improve your technical and non-technical abilities.
Senior Full-stack Engineer
QodeQode is a unified hiring platform designed for enterprise scale, centralizing the entire hiring lifecycle in one environment. Key features include: Candidate sourcing and outreach. AI-led interviews in 30+ languages. Applicant tracking and job distribution to 200+ platforms. Outcome-based pricing: clients pay for results, not software seats. 800M+ candidate profiles indexed. 200,000+ applicants processed per month. 56+ enterprise clients worldwide.
Our fast-growing remote working company is seeking a highly talented and experienced Senior Java Full-stack Engineer (Required exp in Java/React/GraphQL)to join our team. In this role, you will be responsible for leading the development of software applications and systems that meet the needs of our clients. As a Senior Java Fullstack Engineer, you will be responsible for: - Provide technical leadership in the design, integration, implementation, and transition of enterprise SaaS human services system - Flexible design architectural approach to deliver configurability along the data creation, integrations, workflows, notifications, and data persistence - Understand and synthesize integration requirements, and develop recommendations based on business objectives, product roadmap, solution architecture and technical considerations - Contribute to the development of our platform functionality using state-of-the-art frameworks and tools Specify and troubleshoot API integrations in an ecosystem of multiple systems - Work closely with client stakeholders, partners, product managers, creative designers, platform architect, and other software engineers - Constantly learn and use leading-edge technologies - Implement a system that will address the needs of vulnerable populations - Contribute to the highest security, extensibility, reusability and testing standards in system architecture and software, interface, components, data structures, and algorithms specifications. The ideal candidate will have: - 8+ years of software engineering experience. Proficient in Java for multi-tier web app development. - Proven, deep hands-on experience with React, TypeScript in production environments - Proven track record of building and consuming GraphQL APIs, including schema design and client integration - Skilled in building microservices and adhering to OpenAPI standards. - Experienced with ElasticSearch, PostgreSQL, Redis, S3, Redshift, Apache Kafka, Lambda, and EMR - Led major IT application implementations. - Strong problem-solving skills and the ability to identify roadblocks. - Familiar with Agile, Git, IT security architecture, and testing methodologies. - Knowledgeable in AWS/cloud deployment and scaling. - Proficient in REST APIs, GraphQL, SQL, NoSQL, and web dev concepts. - Emphasize clean, efficient, and documented code. - Practitioner of TDD, CI/CD, and time management. - Strong proficiency in English communication. - Committed to continuous learning Working time: Must work with US clients from 8-11 PM or 9 PM-12 AM VNT (3 hours/day, can be flexible with the rest 5 hours) 👉 Our Benefit Packages: - Attractive salary range and we are open to negotiate if you're a strong fit. - Hybrid/Remote-friendly culture, work where you grow best! - Flexible hours, async teamwork (we respect your focus time) - Work equipment support - Allowance for Certification & Skill Development - Year-end bonus & performance-based rewards - 22 paid leaves from your 5th year - take a full month off - Career growth with personal coaching sessions - Open, collaborative team culture - no micromanagement, only trust - Tools & AI-powered workflows that make remote work easier About CoderPush CoderPush is a remote-first technology company that partners with startups and global businesses to build scalable, high quality software products. We focus on long-term collaboration, clear communication, and delivering real impact through strong engineering and product thinking. Please find more at: https://coderpush.com/
SR SOFTWARE DEVELOPER
Lumen TechnologiesLumen Technologies is self-described as a global company of 40,000+ professionals empowering businesses, government, and communities to “produce amazing things.” Driven by the
Lumen is the trusted network for AI. We’re transforming how businesses connect, secure, and scale in an AI-driven world. By connecting people, data, and applications quickly, securely, and effortlessly, we help organizations move faster and unlock what’s next. At Lumen, people power progress. Our culture is built on teamwork, trust, and transparency, giving you the flexibility, support, and opportunity to make a lasting impact. We’re looking for top-tier talent ready to take on the challenge. Join us in building the future. The Role We are seeking an experienced Senior Software Engineer to join our growing team. In this pivotal role, you will be instrumental in designing, developing, and deploying highly scalable and resilient AI solutions and AI Agents within our cloud-native environment. Location: Remote (Poland) or Hybrid Benefits: private health care (Medicover), Multisport card/Multicafeteria, lunchpass card, life insurance (voluntary), PPK (voluntary), English & Polish classes, working abroad policy, CSBF (voluntary), Wellness Day The Main Responsibilities - Design, development, and implementation of complex software systems using Python, ensuring high performance, scalability, and reliability in a cloud environment. - Develop robust AI Agents, APIs, microservices, and RAG pipelines to support various applications and services. - Collaborate closely with business partners, architects, and other engineering teams to define technical requirements and deliver innovative solutions. - Drive technical excellence, promoting clean code, test-driven development, and continuous integration/continuous deployment (CI/CD) practices. - Stay abreast of emerging technologies and industry trends, particularly in cloud computing, Python, DevOps, and AI, and advocate for their adoption where appropriate. - Troubleshoot and resolve complex technical issues, ensuring the stability and availability of our production systems. What We Look For in a Candidate - Bachelor's or master's degree in computer science, Software Engineering, or a related technical field. - 7+ years of professional software development experience, with at least 3 years in a senior engineer capacity at a large technology company. - Deep expertise in Python using Fast API is mandatory, with a proven track record of building and deploying production-grade applications. - Extensive experience with cloud platforms (AWS, Azure, or GCP) is required, containerization (Docker, Kubernetes), databases (relational and NoSQL), and messaging queues. - Strong understanding of software engineering best practices, including design patterns, and secure coding principles. - Excellent problem-solving skills, with the ability to analyze complex technical challenges and propose effective solutions. Preferred Qualifications: - Experience using any vector database like Pinecone / Azure AI Search etc. for AI development. - Well versed with context and prompt engineering techniques - Experience using any vector database like Pinecone / Azure AI Search for AI development. - Experience with multi agentic framework like Pedantic / Crew AI / Cloud centric frameworks like Microsoft Agent framework, or AWS agent core for agentic development - Well versed with agentic protocols like A2A, MCP to drive agentic development - Experience with distributed systems and microservices architectures. - Familiarity with agile development methodologies. What to Expect Next Requisition #: 341564 The above job definition information has been designed to indicate the general nature and level of work performed by employees within this classification. It is not designed to contain or be interpreted as a comprehensive inventory of all duties, responsibilities, and qualifications required of employees assigned to this job. Job duties and responsibilities are subject to change based on changing business needs and conditions. We are committed to making reasonable adjustments to the recruitment process for people with disabilities. If there is anything we can do to help you, please let us know. We are committed to providing equal employment opportunities to all persons regardless of race, religion, colour, sex, age, disability or sexual orientation or any other status protected by local or national law. We do not tolerate unlawful discrimination in any employment decisions, including recruiting, hiring, compensation, promotion, benefits, discipline, termination, job assignments or training. Join a diverse and inclusive culture where everyone is welcome and every voice is heard. A culture where people feel they belong, can be themselves and feel inspired to share different perspectives. Our culture, shared values and behaviours truly make Lumen a fantastic place to work and provides an environment where people can genuinely thrive. Privacy Notice Lumen is committed to protecting the privacy and security of personal information collected during the recruitment and hiring process. Our Privacy Notice explains how we collect, use, disclose, and protect applicant information, as well as how individuals may request access to or deletion of their personal data. To review Lumen’s Privacy Notice, please visit: https://jobs.lumen.com/global/en/privacy-notice
Full Stack Engineer
Idenfo Direct GlobalWe're here for all your identity verification, AML and KYC needs!
• Design, develop, and maintain scalable web applications using Next.js, React, Node.js, and other modern web technologies. • Implement RESTful APIs and/or GraphQL endpoints for seamless frontend-backend communication. • Write clean, maintainable, and efficient code following best practices and coding standards. • Optimize application performance for speed, responsiveness, and scalability. • Collaborate with designers, product managers, and other developers to turn product ideas into reality. • Participate in code reviews, technical discussions, and agile ceremonies (stand-ups, sprint planning, retrospectives). • Troubleshoot and resolve bugs, performance issues, and production incidents.



