Johns Hopkins University logo
Johns Hopkins University

Department name: IT@JH Networking, Telecom and Data Ctr Personnel area: University Administration

HPC Scientific Software Engineer

Location

United States

Posted

31 days ago

Salary

$85.5K - $149.8K / year

Seniority

Mid Level

Job Description

HPC Scientific Software Engineer

Johns Hopkins University

Role Description IT@JH Research Computing is seeking a HPC Scientific Software Engineer to support faculty, researchers, and students engaged in high-performance and AI-driven research across Johns Hopkins University. The position is responsible for deploying, optimizing, and maintaining scientific software and computational workflows on advanced HPC Systems and related infrastructure. Working primarily within Linux-based environments, the engineer manages and troubleshoots complex software stacks, containerized applications, and GPU-accelerated workloads using tools such as SLURM, Easy build, Spack, etc. The role combines ticket-based user support with long-term project work, collaborating closely with interdisciplinary research groups to enhance system performance, streamline data-intensive workflows, and integrate cutting-edge technologies. The position operates with significant independence while coordinating regularly with systems engineers and research computing leadership to ensure reliable, high-efficiency computing resources that advance the university’s scientific mission. Specific Duties & Responsibilities - Software Deployment and Design (15%) - Develop and refine deployment strategies for scientific software on HPC and AI systems. - Design computational workflows, selecting optimal software configurations, and utilizing tools like Ansible for automation. - Assist teams in implementing, tuning, and optimizing AI models and gateway applications (e.g., XDMoD, Coldfront, Open OnDemand, CryoSPARC Live, SBGrid, AI Agents). - Performance Optimization (20%) - Analyze and optimize the performance of AI models and HPC applications, focusing on GPU-enabled computing. - Implement parallel processing, distributed computing, and resource management techniques for efficient job execution. - Integration and Optimization (15%) - Develop, debug, and maintain software tools, libraries, and frameworks supporting HPC and AI workloads. - Collaborate with the system team and software vendors (e.g., NVIDIA, Intel, Matlab) to optimize systems for maximum performance. - Utilize CUDA, DNN, TensorRT, and Intel Compilers to enhance system performance. - HPC Scientific Software Support (30%) - Manage and support scientific software deployment across HPC, cloud-based, and colocation facilities. - Oversee installation, configuration, and maintenance of HPC packages with tools like CMake, Make, EasyBuild, Spack, and Lua module files. - Collaboration and Mentorship (5%) - Work closely with cross-functional teams, including researchers, data scientists, and software developers, to address complex HPC/AI challenges. - Mentor junior engineers and foster a culture of continuous learning. - Technical Support and Training Workshops and Troubleshooting (15%) - Resolve complex technical issues and perform root cause analysis for HPC/AI software challenges. - Implement effective solutions to prevent recurrence and improve system reliability. - Provide training workshops for researchers and students, focusing on troubleshooting, optimizing workflows, and effectively using HPC systems. - Learning and Development (5%) - Stay current with advances in HPC and AI technologies and methodologies. - Incorporate new research findings into existing systems to improve performance and capabilities. - Container Orchestration (5%) - Develop and manage container orchestration strategies to ensure scalability, reliability, and security of applications. - Oversee the container lifecycle from creation and deployment to scaling and removal. - Documentation and Compliance (5%) - Create comprehensive documentation for system designs, performance metrics, and project status. - Ensure compliance with security and regulatory standards for all HPC and AI systems. - Other duties as assigned. Qualifications - Master’s Degree in computer science or a closely related quantitative discipline. - Five years of experience in HPC user support, software deployment, and performance optimization within an academic or research environment. - Experience in scientific computing environments and applications. - Hands-on experience with SLURM, for job scheduling. - Proficiency in Python, Perl, C/C++, and Shell scripting for automation and system management. - Advanced knowledge of Linux systems and proficiency in scripting languages such as Python, Perl, and Shell. - Familiarity with scientific application management tools such as Containerization, LUA modules, CMake, Spack, and EasyBuild. - Additional education may substitute for required experience, and additional related experience may substitute for required education beyond a high school diploma/graduation equivalent, to the extent permitted by the JHU equivalency formula. Preferred Qualifications - PhD in a quantitative discipline, such as Computer Science Engineering, Physics, Bioinformatics, or related fields, with advanced training in scientific computing. Requirements - Classified Title: HPC Scientific Software Engineer - Job Posting Title (Working Title): HPC Scientific Software Engineer (IT@JH Research Computing) - Role/Level/Range: ATP/04/PF - Starting Salary Range: $85,500 - $149,800 Annually (Commensurate w/exp.) - Employee group: Full Time - Schedule: Mon-Fri, 8:30am-5pm - FLSA Status: Exempt - Location: Remote - Department name: IT@JH Research Computing - Personnel area: University Administration

Related Job Pages

More Software Engineer Jobs

Minsait logo

Analista Desenvolvedor Sênior, Cobol

Minsait

Join a more human technology #MoreMinsait

Full TimeRemoteTeam 10,001+Since 2016H1B No Sponsor

• Operar e monitorar malhas batch em Control-M (Mainframe), garantindo SLA. • Atuar com agilidade em abends/falhas na malha: análise, correção, liberação e reprocessamento/restart. • Realizar levantamentos rápidos em programas para identificar causa e parâmetros de execução. • Executar alterações em JCLs e programas COBOL/DB2 (steps, DDs, parâmetros, datasets, condicionais, RCs). • Ajustar e validar SYSINs (cartões/parametrizações de execução). • Efetuar alterações e análises em programas COBOL/JCLs seguindo processo de mudança (quando aplicável).

Brazil
Job Closed
Minsait logo

Senior Analyst/Developer, COBOL

Minsait

Join a more human technology #MoreMinsait

Full TimeRemoteTeam 10,001+Since 2016H1B No Sponsor

• Operate and monitor batch workflows in Control-M (Mainframe), ensuring SLA compliance. • Respond quickly to abends/failures in the batch: analyze, fix, release and reprocess/restart jobs. • Perform rapid code reviews in programs to identify root cause and execution parameters. • Implement changes in JCLs and COBOL/DB2 programs (steps, DD statements, parameters, datasets, conditionals, return codes). • Adjust and validate SYSINs (control cards/execution parameters). • Make changes and perform analysis on COBOL programs/JCLs following the change management process (when applicable).

Brazil
Jumpfactor Marketing logo

Senior WordPress Developer

Jumpfactor Marketing

$1.1 Billion in Revenue Generated for MSPs. Grow your MRR Today.

Full TimeRemoteTeam 51-200Since 2009H1B No Sponsor

• Build and maintain WordPress themes, plugins, and custom sites • Convert designs into pixel-perfect, responsive web experiences • Implement APIs, tracking, microdata, and third-party integrations • Optimize site speed, SEO, performance, and security standards • Debug, troubleshoot, and ensure reliable site performance • Collaborate with SEO, design, and strategy teams on delivery

Canada
OpenMined logo

Staff Software Engineer

OpenMined

Building the public network for non-public information

Full TimeRemoteTeam 11-50Since 2017H1B No Sponsor

Role Description We're looking for a Senior / Staff Software Engineer to join OpenMined, a nonprofit on a mission to build the public network for the world's non-public data — unlocking 1,000,000x more information across every scientific field and industry, and ensuring it stays open, equitable, and accessible to all. You'll lead our Network Sourced AI squad — the team building two core products that make this vision real. Today, the entire AI stack is built on copying content into model weights and hoping the lawsuits don't land. We think there's a better architecture: data stays at the source, AI queries it live, and every answer carries attribution back to who made it possible. - Syft Space is the local node any data owner (a newsroom, a publisher, a research lab, an individual) runs to publish their content under their own terms and pricing. - Syft Hub is the network layer that lets AI labs, agents, and app developers find those nodes and query across them in real time. This is a team lead + hands-on technical leadership role. You write code. You make architecture decisions. You set engineering standards by example. You're the most experienced engineer on the squad, the person others look to for production-quality judgment, and the voice that connects this team's decisions with the organization's broader platform strategy. The squad is talented and mission-driven and looking to strengthen its ability to continuously ship production software. You bring this production muscle: the instinct for what scales, the judgment about what to build and what to kill, and the craft to raise the team's standard through code review, pairing, and architectural guidance rather than mandates. If you're motivated by building distributed systems that matter, mentoring brilliant people who are just getting started, and making hard technical calls in a fast-moving nonprofit that's trying to change how the world shares data — we'd love to hear from you. Responsibilities - Own the NSAI squad technical architecture end to end — align design, engineering, and deployment across stack to deliver reliable technical capabilities and user experience from current implementation through product roadmap evolution. - Drive production readiness through deployment monitoring, error handling, graceful degradation, and operational runbooks. - Be a force multiplier by establishing, demonstrating, and mentoring the team to establish norms around engineering best practices (including system design for modularity, security, reliability, & scalability), code quality, and delivery to production. - Collaborate on core technology stack strategy through architectural review and identification of NSAI components that should become shared organizational primitives and defining standard interfaces that prevent tight coupling. - Partner with product lead to develop and execute on realistic roadmaps that balance what is desired with what is feasible, scalable, and architecturally sound. - Own the delivery of engineering’s quarterly goals, ensuring consistent, high-impact delivery while effectively managing resources. - Own team output, delivery, and success (productivity, timeliness, morale) through appropriately matching team members and work to properly balance interest, learning opportunities, capacity, and capability. - Champion high-quality software delivery through the establishment of robust tools, processes, & methodologies, and by providing hands-on technical leadership via individual development contributions and code reviews. - Create technical documentation and processes that support high velocity and knowledge transfer. - Manage a team of 2-4 engineers as a people leader, performing 1:1s, providing technical and professional feedback, creating professional development plans, and performing performance reviews. Qualifications - 8+ years of professional software engineering experience with a track record of personally building, shipping, and operating production systems at meaningful scale. - Strong Python proficiency — FastAPI, async patterns, Docker orchestration. - Vue 3 / TypeScript, Rust / Tauri, or cross-platform desktop app experience extremely beneficial. - Networking/P2P application experience beneficial. - Distributed systems and API design experience including design & evolution of interfaces consumed by multiple independent teams. - Practical RAG and information retrieval experience with vector databases, retrieval pipelines, embedding strategies, and aggregation. - Demonstration of technical leadership through the introduction of engineering quality practices to one or more teams. - Comfort with ambiguity and resource constraints. - Async-fluent with strong written communication and capacity to build high-trust working relationships across 6hr+ time zone spread. - Cross-functional collaboration & communication to facilitate and improve understanding/context sharing between engineers and non-engineering/non-technical stakeholders. Benefits - US Healthcare Benefits: We offer healthcare benefits for employees located in the United States. - Mission-Driven Work: Be part of a 501(c)(3) nonprofit organization, focused on social impact rather than profit, offering the chance to make a meaningful difference in the world. - Open Source Contribution: Make a real impact by contributing to an Open Source project that benefits a broad community. - Collaborative Global Team: Join a dynamic, international team that values collaboration and diversity. - Flexible Work Hours: Enjoy the autonomy to structure your work hours around what best fits your life and productivity. - Fully Remote Work: Work from anywhere in the world—no office or commute required. - Results-Driven Culture: Thrive in an environment that values efficiency and results over bureaucracy. - Flexible Paid Time Off: Take the time you need with flexible vacation days, alongside recognition of local national holidays. - Competitive Compensation: Receive competitive pay, with equity in salary across different regions.

Worldwide