Advanced scientific computing services that help accelerate the development of your next scientific breakthrough.
HPC Engineer
Location
United States
Posted
2 days ago
Salary
0
Seniority
Senior
Job Description
HPC Engineer
RCH Solutions
• Work closely with customer stakeholders, scientists, and IT professionals to deliver Compute at Scale • Develop, evolve, and administer HPC platforms along with support for Scientific applications, workflows, and other related infrastructure both on-prem and Cloud hosted • Drive architecture, roadmaps, and execution of projects to establish and operate IT infrastructure best practices for customers • Full stack support - design and evolution of platforms, application administration, supporting customer workflows, profiling and performance tuning • Monitoring and maintenance of scoped systems, platform and systems administration, troubleshooting hardware, software, and networking related issues • Solution architecting and hands-on engineering (on-prem + Cloud) • Documentation • Collaborating with cross-discipline team members and customers • Supporting internal and customer Architecture and Design efforts • Supporting customers with their workflow pipelines (advisory and hands-on) • Comprehensively documenting new and existing computational assets • Maintaining the flexibility to pivot as engagement scopes may evolve • Support for AWS & GCP Cloud applications, migrations, and modernization • CloudOps / IaC for on-going platform management • Setup and configuration of AWS & GCP Cloud infrastructure for new platform builds • Ensuring system compliance with company security standards and applicable regulatory requirements • Transition support for modernized services to operational teams • Provide engineering level troubleshooting and services restoration for operational issues as they arise on supported platforms • Provide training/mentorship for junior level team members • Escalation point on multiple engagements to ensure resolution
Job Requirements
- A bachelor’s degree or master’s degree in Computer Science or related field
- 5 + years of experience administering HPC clusters and systems
- Experience with SLURM and Grid Engine scheduling software preferred
- 5 + years of professional experience in Solution Architecture or Cloud Infrastructure Deployment and support
- 5+ years professional experience developing or administering compute solutions for Scientific / Research IT domains, Life Sciences being preferred
- Experience with POSIT products (Package Manager, Connect, Workbench) either in an end-user or administrator capacity
- Experience developing scientific workflows on HPC systems using Nextflow
- Extensive command-line system administration experience: User and group management
- Advanced knowledge of Active Directory, DNS, DHCP, LDAP, NFS, SMB
- Building applications from source code, installing, maintaining, and troubleshooting application-level Linux and scientific software in line with industry best practices
- Installation of Linux operating system and fine tuning
- Familiarity with leveraging and maintaining Linux package management systems
- Intermediate OS level networking knowledge
- Experience using with scripting tools, automation tools, and configuration management tools
- Ansible, Terraform and Cloud Formation experience preferred
- Experience administering and integrating Scientific / Research applications.
- Strong time-management skills; able to complete projects in a timely manner, plan and prioritize tasks while keeping leadership and stakeholders updated regularly on status
- Excellent communication skills, including preparation of written documentation for IT colleagues and end users
- Proactive thinking skills to identify potential issues and solution options prior to incidents occurring
- Extreme attention to detail is needed to interface with multi different clients simultaneously
- Ability to understand and analyze complex technical problems and situations
- Candidates must be a passionate engineer with a strong vision and a desire to stay on top of trends in the Scientific Computing sector.
- Ability to work independently or with a team
- Ability to take a project from start to finish with minimal supervision
Benefits
- Comprehensive health and wellness benefits, including Medical, Dental, and Vision Insurance
- Company-provided Life and Long-Term Disability Insurance
- Company-sponsored 401(k) Plan
- Company-provided continuing education benefit
- Team-focused culture and unlimited opportunity for advancement
Related Guides
Related Categories
Related Job Pages
More Engineer Jobs
• You will be responsible for ensuring the product meets all requirements for safety, efficacy, and functionality through product lifecycle management. • You will manage the creation and maintenance of design documentation in accordance with quality procedures. • Providing guidance on combination product and device design requirements and specifications • Leading product test strategies and execution to demonstrate product safety, performance, and efficacy • Manage combination product and device Design History Files • Analyzing data to support design acceptance, performance capability, and failure analysis • Creating and driving test protocols, methods, and reports • Transferring of technical information to manufacturing sites and support manufacturing scale-up and launches • Employ basic engineering skills and practices to gather user requirements and translate them into documentation • Engaging suppliers and development partners regarding specifications and quality levels • Provides authorship and expert technical leadership for regulatory filings • Managing project scope, schedule, and budget • Owns and support quality records, change records, and deviations • Supports device design complaint investigations and tracking to ensure timely resolution and continuous improvement • Collaborate with Process Development and external partners as a technical authority.
Ingeniero/a de Automatización, Junior
IRIUMLíderes en gestión de servicios integrados de infraestructuras y plataformas IT.
• Colaborar en un proyecto internacional del sector bancario en modalidad full-remote.
• Lead migration projects from CA OPS/MVS to IBM Tivoli Systems Automation for z/OS and from CA Automation Point to IBM SAIOM. • Configure and implement IBM Systems Automation solutions, including policy-based automation for z/OS • Develop and maintain automation rules, REXX execs, and System Automation Policy Database (PDB) • Provide installation, configuration, and troubleshooting support for IBM automation products. • Collaborate with operations teams to improve startup, shutdown, recovery, and high-availability automation processes.
Senior Databricks Engineer
EXLWe make sense of data to drive your business forward. #MakeSenseofData #DriveYourBusinessForward #PartnerYourWay
• Ingestion & Transformation: Design and optimize high-volume ETL/ELT pipelines using Delta Live Tables (DLT) and PySpark, ensuring data integrity across the Bronze, Silver, and Gold layers. • Workflow Orchestration: Develop and maintain sophisticated pipelines using Databricks Workflows or Airflow, focusing on modularity, reusability, and automated error handling. • Streaming & Real-time Integration: Implement real-time data flows utilizing Structured Streaming and Kafka/Event Hubs to enable immediate data availability for downstream consumption. • Data Security & Privacy: Enforce data anonymization and fine-grained access controls to ensure compliance with global regulations (GDPR/CCPA/HIPAA). • DataOps & DevOps: Implement CI/CD patterns using Databricks Asset Bundles (DABs), Terraform, and Git to automate environment parity and deployments. • Open Table Formats: Manage and optimize Delta Lake storage, utilizing advanced features like Liquid Clustering, Z-Ordering, and Change Data Feed (CDF). • Compute Engine Optimization: Drive cost efficiency and performance by optimizing Spark configurations, Photon engine utilization, and Serverless SQL Warehouses. • Observability & Monitoring: Integrate comprehensive monitoring and alerting (e.g., Databricks System Tables, Grafana, or Splunk) to rapidly identify bottlenecks and troubleshoot production issues.




