A distributed marketplace for compute
Senior Storage Engineer
Location
United States
Posted
1 day ago
Salary
$150K - $200K / year
Seniority
Senior
Job Description
Senior Storage Engineer
Hydra Host
Role Description As a Storage Engineer, you will be responsible for designing and building Hydra Host’s first production-grade storage platform from the ground up, supporting the company’s rapidly expanding network of bare-metal GPU clusters. You’ll own the architecture, technology selection, implementation, and evolution of this platform, defining how Hydra Host manages data for large-scale, distributed AI workloads across global data centers. This is a senior, hands-on role for an engineer who has built storage systems for GPU clusters before, with deep expertise in both block and object storage and a strong understanding of parallel file systems, performance optimization, and large-scale orchestration. - Define, architect, and implement Hydra Host’s first production storage platform tailored for bare-metal GPU clusters and AI/HPC workloads. - Lead all technical decisions around storage stack design, from hardware infrastructure to parallel file system orchestration and performance tuning. - Select, build, and maintain storage solutions spanning both block (NVMe, SAN, Ceph, etc.) and object storage (S3-compatible, custom, or Ceph Object Gateway) layers. - Design for high-throughput, low-latency access, supporting large datasets, rapid checkpointing, and parallel access for distributed AI training workloads. - Integrate and optimize parallel file systems such as Lustre, BeeGFS, Spectrum Scale, WekaIO, or CephFS, ensuring maximum performance and fault tolerance. - Ensure compatibility across Hydra’s diverse GPU/OEM ecosystem, accounting for unique firmware, BMC/Redfish APIs, and hardware configurations. - Develop automation, observability, and management tooling for storage, focusing on reliability, scalability, and efficiency. - Act as a builder and architect: deeply hands-on in deployment, troubleshooting, and optimization, while guiding long-term storage roadmap. - Collaborate cross-functionally with GPU, HPC, and platform engineering teams to integrate storage with compute and network layers. - Interface with customers and product leadership to define feature priorities, performance benchmarks, and future enhancements. Qualifications - 8+ years of progressive, hands-on experience designing and implementing high-performance storage systems for compute clusters in HPC, AI, or bare-metal cloud environments. - Proven track record building storage infrastructure from scratch, not just operating existing systems. - Deep expertise in block storage (NVMe, SAN, Ceph, distributed block systems) and object storage (S3, MinIO, Ceph Object Gateway, etc.). - Strong background in parallel file systems (WekaIO, BeeGFS, Lustre, Spectrum Scale, or similar) supporting GPU or AI cluster workloads. - Solid foundation in Linux systems engineering, automation, and scripting for distributed environments. - Familiarity with BMC, Redfish APIs, and OEM server firmware for bare-metal management. - Deep understanding of AI/ML data pipelines: model checkpointing, data locality, and multi-tiered storage optimization. - Excellent problem-solving, debugging, and communication skills, able to translate technical decisions into clear architectural direction. Preferred Qualifications - Experience building storage solutions for large-scale GPU or HPC infrastructure. - History of technical leadership or mentorship, growing teams or owning a product roadmap. - Experience evaluating and managing vendor relationships and negotiating storage hardware/software contracts. - Contributions to open-source HPC or storage projects (Ceph, Lustre, BeeGFS, etc.). - Familiarity with confidential computing, secure data handling, or high-availability architectures.
Related Guides
Related Categories
Related Job Pages
More Engineer Jobs
Senior Full Stack Engineer
PavagoPavago specializes in connecting businesses with top-tier offshore talent in operations, sales, and marketing, offering a comprehensive recruitment solution designed to reduce cost
Role Description At Pavago, one of our clients is hiring a Senior Full Stack Engineer to architect and build a modern tax filing platform from the ground up. This isn’t just another software engineering role. You’ll own the architecture, backend services, frontend experience, and technical decisions behind a mission-critical platform where security, accuracy, and reliability are essential. You’ll work extensively with AI-assisted development tools such as Cursor and Claude Code to accelerate development while maintaining exceptional code quality. If you enjoy solving complex technical problems, building scalable systems, and taking ownership from architecture through deployment, this role is for you. Key Responsibilities - Full-Stack Application Development - Design, build, and maintain production-grade web applications. - Develop secure backend services using Python and FastAPI. - Build intuitive, responsive frontend interfaces for complex workflows. - Design APIs and scalable service architectures. - Integrate frontend and backend systems seamlessly. - System Architecture & Design - Own end-to-end application architecture. - Design scalable data models and service boundaries. - Build systems designed for long-term maintainability. - Drive technical decisions that improve scalability and reliability. - Identify technical risks early and recommend practical solutions. - Backend Development - Develop secure REST APIs using FastAPI. - Build business logic for complex workflow-driven applications. - Implement third-party integrations and external services. - Optimize application performance and API responsiveness. - Build reliable background processing workflows. - Frontend Development - Develop user-friendly interfaces for forms, dashboards, and workflow management. - Build responsive, well-structured UI components. - Integrate frontend applications with backend APIs. - Improve usability while maintaining performance and accessibility. - AI-Assisted Software Development - Leverage Cursor and Claude Code throughout the development lifecycle. - Build AI-assisted workflows that improve engineering productivity. - Use AI to accelerate implementation, debugging, testing, and documentation. - Continuously refine development workflows using emerging AI tools. - Testing, Quality & Reliability - Write and maintain unit tests, integration tests, and end-to-end tests. - Maintain high code quality standards. - Improve application stability and reliability. - Refactor code where necessary to reduce technical debt. - Security & Data Integrity - Implement robust validation and error handling. - Ensure secure handling of sensitive user information. - Follow security best practices for authentication, authorization, and input validation. - Build systems where data accuracy and integrity are critical. - Documentation & Collaboration - Document architecture decisions and implementation details. - Collaborate with product, engineering, and stakeholders across time zones. - Participate in sprint planning, technical discussions, and code reviews. - Proactively communicate risks, blockers, and recommendations. Qualifications - 5+ years of professional Full Stack Software Engineering experience. - Strong expertise in: - Python - FastAPI - REST API development - Experience designing and shipping production applications. - Strong frontend development skills with modern JavaScript frameworks. - Experience building workflow-driven or form-heavy applications. - Hands-on experience using: - Cursor - Claude Code - AI-assisted development tools - Experience with: - PostgreSQL or similar relational databases - Git - API integrations - Strong written and verbal English communication skills. Preferred Qualifications - Experience in: - FinTech - Tax software - Compliance platforms - Financial systems - Experience building highly secure applications. - Familiarity with asynchronous collaboration in distributed engineering teams. - Experience designing scalable system architecture. What Makes You Successful - You think like an owner, not just a contributor. - You proactively solve problems without waiting for detailed specifications. - You thrive in complex, detail-oriented environments. - You communicate clearly and proactively. - You embrace feedback and continuous improvement. - You care deeply about code quality, scalability, and maintainability. Typical Day - Review overnight updates and sprint priorities. - Build backend APIs and frontend features. - Design and improve application architecture. - Develop workflow logic and complex business rules. - Use Cursor and Claude Code to accelerate development. - Collaborate with product and engineering teams. - Write tests and review pull requests. - Document technical decisions and deployment updates. Key Performance Indicators (KPIs) - High-quality features delivered on schedule. - Secure, reliable, and scalable application performance. - Low production error rates and technical debt. - Strong automated test coverage. - Well-documented, maintainable code. - Positive collaboration across engineering and product teams. Why Join Us? - Architect and build a mission-critical platform from the ground up. - High ownership with direct technical impact. - Work with modern AI-assisted engineering workflows. - Solve complex engineering challenges. - Fully remote environment with flexible collaboration. - Opportunity to grow into: - Staff Engineer - Technical Lead - Engineering Manager - Software Architect Interview Process - Initial Phone Screen - Video Interview with Pavago Recruiter - Technical Assessment - Client Interview with Engineering Team - Offer & Background Verification Apply Now If you’re passionate about building scalable applications, leveraging AI to improve software development, and taking ownership of complex engineering challenges, we’d love to hear from you.
• Define the backend architecture of the solution • Build scalable and resilient APIs • Design and implement integration with AI and multi-agent systems • Ensure data security and governance • Support technical decisions and mentor mid-level and junior developers • Work closely with the frontend team to define contracts (BFF when necessary)
• Act as the technical lead on customer engagements—including enablement programs, onboarding sessions, mentoring, and consulting projects—with responsibility for running simulations, reviewing designs, interpreting results, and recommending improvements based on findings and experience. • Stay up to date with Ansys platforms and expand knowledge through ongoing training and self-guided learning to broaden Rand Simulation's enablement and consulting capabilities. • Collaborate closely with Rand Simulation's sales team to provide technical guidance, credibility, and subject-matter expertise in both pre-sales and post-sales engagements. • Analyze customer technical needs, infrastructure, and workflows to ensure an application fit and develop recommendations for both immediate and long-term simulation success. • Articulate the value of Ansys technology and Rand Simulation's services—including training, mentoring, and consulting—in helping customers achieve their design and productivity goals. • Partner with RandSim marketing to generate blog posts, white papers, webinars, and other content related to key initiatives for your team and customers. • Provide customized Ansys training and mentoring sessions that help customers improve simulation proficiency, increase productivity, and achieve early success with new tools. • Occasionally assist with technical support overflow and follow-up to ensure customer satisfaction and retention.
Full Stack Engineer
ICFWe are not a typical consulting firm and our people are not typical consultants.
Role Description ICF is seeking a Full Stack Engineer for a Rural Health Program to design, build, and maintain full-stack features supporting a scalable Workflow & Platform Solution for large-scale public sector programs, including intake, workflow management, and performance tracking. - Build configurable workflows with stage progression, role-based tasking, status tracking, and automated notifications aligned to program operations. - Develop and integrate automation and AI-assisted capabilities to improve efficiency and structure unstructured inputs, while preserving appropriate human review. - Implement reporting and dashboard features that provide clear visibility into participation, progress, and outcomes for internal teams and external stakeholders. - Develop data inputs and integrations, including structured forms, document management, and connections to approved external data sources. - Translate business requirements into technical solutions, communicate feasibility and timelines, support demos and training, and drive continuous improvement through user feedback. - Up to 25% travel may be required. Qualifications - Minimum 3 years of professional experience in full stack software development. - Strong experience with modern full-stack web development, including a current JavaScript/TypeScript front-end framework and a server-side runtime. - Experience designing and building applications backed by relational databases. - Experience integrating RESTful APIs and external services into applications. - Hands-on experience integrating large language model (LLM) capabilities into applications, such as structured extraction, summarization, or evaluation against defined criteria. - Experience building multi-step, stateful processes with reliable execution, including human-in-the-loop steps and audit history. - Strong written and verbal communication skills, with the ability to adapt technical explanations to non-technical audiences including program staff and state agency stakeholders. - Demonstrated ability to manage ambiguity and ship useful software when requirements are still taking shape. Requirements - Experience in a client-facing, embedded, or consulting engineering role. - Experience building case management, workflow, or grant and program management systems. - Experience working with Federal, state or local government clients. - Experience handling personally identifiable information (PII) responsibly and supporting data governance. - Domain interest or experience in health, rural health, or public sector program delivery. - Proficiency with AI-assisted coding tools in a professional setting. - Experience with CI/CD practices and automated testing. Benefits - ICF is a global advisory and technology services provider, combining unmatched expertise with cutting-edge technology. - We are an equal opportunity employer. - Reasonable accommodations are available for disabled veterans, individuals with disabilities, and individuals with sincerely held religious beliefs. - Pay range for this position based on full-time employment is: $108,476.00 - $184,409.00.



