
RCH Solutions
Remote Jobs
Advanced scientific computing services that help accelerate the development of your next scientific breakthrough.
4 Jobs
Cloud Engineer – Workflow Automation Specialist
RCH SolutionsAdvanced scientific computing services that help accelerate the development of your next scientific breakthrough.
• Design, build, and support solutions on GCP , including services such as BigQuery, BigTable, Cloud Run, Cloud Storage, Pub/Sub, Dataflow, Agent Platform (formerly Vertex Ai), and related cloud-native services. • Develop and maintain ETL and data pipeline automation for structured and unstructured data workflows. • Build automation using Python and shell scripting to support workflow execution, data movement, platform integrations, monitoring, and operational tasks. • Support AI/ML and agent-based workloads including model integration, agent workflows, and AI-enabled automation where applicable. • Containerize applications, scripts, services, and workflow components using Docker. • Deploy containerized workloads to cloud-native platforms such as Cloud Run, or other managed GCP runtimes. • Build and maintain CI/CD pipelines for application deployment, infrastructure changes, workflow automation, and data pipeline releases. • Develop and maintain Terraform modules for repeatable GCP infrastructure deployments. • Support and integrate workflow or lab orchestration platforms such as Mendix, NAN, Nextflow, or similar tools. • Collaborate with application teams, data engineers, lab systems teams, and business stakeholders to translate workflow requirements into scalable cloud solutions. • Contribute to best practices around cloud security, logging, monitoring, reliability, automation, and cost optimization.
Principal Cloud Platform Engineer
RCH SolutionsAdvanced scientific computing services that help accelerate the development of your next scientific breakthrough.
• Design, operate, and continuously improve production-grade K8s clusters at the platform level. • Lead complex cluster lifecycle management, including: Version upgrades and dependency coordination Failure recovery and incident resolution Non-trivial maintenance and system evolution • Build and maintain highly reliable, scalable, multi-tenant infrastructure. • Build and maintain end-to-end observability for LLM-based systems using Grafana, LangFuse, and LangSmith — covering performance, latency, token usage, and alerting. • Architect and operate shared infrastructure across multiple teams and use cases. • Implement and enforce RBAC and access control models, Tenant isolation and security boundaries, Resource management and fairness at scale. • Ensure platform stability under diverse and competing workloads. • Operate and optimize vector database systems (Weaviate preferred) in production environments. • Support and scale Retrieval-Augmented Generation (RAG) systems. • Drive improvements in Query performance and latency, Cluster tuning and resource efficiency, Operational stability of retrieval pipelines. • Take technical ownership of production systems over time. • Build and maintain strong practices in Observability (metrics, logs, tracing), Incident response and root cause analysis, Long-term system health and resilience. • Proactively identify and resolve reliability risks. • Work closely with backend and GenAI engineers to ensure seamless integration with the platform.
Senior Cloud & Container Infrastructure Engineer
RCH SolutionsAdvanced scientific computing services that help accelerate the development of your next scientific breakthrough.
About Us RCH Solutions is a rapidly growing global provider of computational science expertise within Life Sciences and Healthcare. At RCH, our team rallies around a culture crafted for learning and achieving. We’re relentless in our pursuit for innovation and demanding of ourselves to deliver a ground-breaking computing experience for our clients, so that they can deliver life-saving science to humanity. Core Values At RCH, our Core Values are more than just words—they represent the threads that weave together the fabric of our culture. Used as a guide when interviewing new team members; as a barometer when evaluating our performance as individuals and teams, and even when deciding which customers to work with, RCH’s Values embody the behaviors upon which we measure our success and create a framework for our growth as people and professionals. Our Core Values: - Embrace Excellence: We strive for best-in-class delivery of innovation and service. - Be Accountable: Integrity, ownership and accountability are non-negotiable. - Adventure Together: We are committed to fostering a culture that embraces continuous improvement. - Succeed as a Team: We believe harnessing the power of a team drives outcomes not achievable by individuals. - Boundaries and Balance: Work-life balance is a core facet of our culture. If you share in our core values, then we encourage you to continue reading this posting as you may have found a great home for your career. Job Description RCH Solutions is seeking multiple Senior Cloud & Container Infrastructure Engineer to join our team of scientific computing experts. You will design, implement, automate, and operate scalable, secure, and highly reliable infrastructure that powers mission-critical applications and services. This is a hands-on senior individual contributor role with strong emphasis on Google Kubernetes Engine (GKE), container-native architectures, infrastructure as code, observability, and security best practices in Google Cloud Platform (GCP). You will serve as a GCP subject-matter expert within the team, mentor engineers, and drive platform improvements that enable developer velocity and business scale. If you're passionate about building reliable, scalable, developer-friendly platforms on Google Cloud and solving hard container and infrastructure problems at scale, we'd love to hear from you. Key Responsibilities: - Design, deploy, and operate containerized workloads on GKE across enterprise-scale environments. - Manage GCP compute resources (Compute Engine, Cloud Run, GKE Autopilot) for high availability and cost efficiency. - Operate and scale Weaviate vector database clusters to support production AI and semantic search workloads - Optimize indexing, query performance, and storage configurations as data volumes grow - Collaborate with AI/ML teams to define schema strategies and ingestion pipelines - Build and maintain monitoring dashboards and alerting pipelines using Grafana - Integrate LLM observability tooling (LangFuse / LangSmith) to track model performance, latency, and usage across AI services - Drive incident response, root cause analysis, and continuous reliability improvements - Implement infrastructure-as-code (Terraform / Deployment Manager) for reproducible, auditable deployments and CI/CD integration. - Define and enforce multitenant GKE architecture: cluster security, namespace/tenant isolation, RBAC, network policies, maintenance, and scaling. - Mentor engineers and drive platform adoption and best practices. - Automate end-to-end provisioning, deployment pipelines, and day-2 operations using CI/CD tools (Cloud Build, GitHub Actions, ArgoCD, etc.) - Design and implement observability stacks using Google Cloud Operations Suite (formerly Stackdriver), Prometheus/Grafana, Cloud Logging, Cloud Monitoring, and distributed tracing (Cloud Trace) - Troubleshoot complex production issues spanning compute, networking, storage, and Kubernetes layers Essential Qualifications: - 6+ years of hands-on experience building and operating production cloud infrastructure - 4+ years of deep, production experience with GCP, particularly in a senior or lead capacity - 3+ years of strong expertise with Kubernetes in production (preferably GKE), including cluster design, upgrades, troubleshooting, and scaling - Expert-level proficiency with Terraform for GCP infrastructure provisioning - Strong experience with container technologies: Docker, container registries (Artifact Registry), container security scanning - Solid understanding of GCP core services: Compute Engine, Cloud Run, Cloud SQL / AlloyDB, Cloud Storage, BigQuery, Pub/Sub, Cloud Functions, VPC, Cloud Load Balancing, Cloud Interconnect - Experience implementing secure IAM strategies, organization policies, and security controls in GCP - Proficiency in Linux systems administration, networking fundamentals, and scripting (Bash, Python, Go preferred) - Experience with modern CI/CD and GitOps practices in cloud environment - Experience supporting or using HPC environments leveraging SLUR - Containerization/orchestration (Docker, Kubernetes/GKE) - Strong understanding of data governance, cataloging, and lineage tools; basic familiarity with regulated environments (GxP, HIPAA). - Experience assessing existing code and workflows and identifying bottlenecks and optimization opportunities - Experience in software requirements gathering, documentation, design, and development Preferred Qualifications: - Google Cloud Professional certifications (e.g., Professional Cloud Architect, Professional Cloud DevOps Engineer, Professional Kubernetes Engineer) - Experience with Anthos, Config Management, Policy Controller, or multi-cluster management - Familiarity with service mesh (Istio/Envoy), ingress controllers (GKE Gateway API / Ingress), and microservices observability Additional Information: Great talent should benefit from a great work environment. If you join our team, you’ll have access to: - A competitive salary and bonus package based on experience - Comprehensive health and wellness benefits, including Medical, Dental, and Vision Insurance - Company-provided Life and Long-Term Disability Insurance - Company-sponsored 401(k) Plan - Company-provided continuing education benefit - Team-focused culture and unlimited opportunity for advancement **This is a remote position and the candidate is expected to be able to work on an east coast (US) time schedule. **Role is only open to applicants not needing sponsorship now or in the future, no third parties please.
HPC Engineer
RCH SolutionsAdvanced scientific computing services that help accelerate the development of your next scientific breakthrough.
• Work closely with customer stakeholders, scientists, and IT professionals to deliver Compute at Scale and support our customer's scientific initiatives • Develop, evolve, and administer HPC platforms along with support for Scientific applications, workflows, and other related infrastructure both on-prem and Cloud hosted • Drive architecture, roadmaps, and execution of projects to establish and operate IT infrastructure best practices for customers • Provide full stack support - design and evolution of platforms, application administration, supporting customer workflows, profiling and performance tuning, monitoring and maintenance of scoped systems, platform and systems administration, troubleshooting hardware, software, and networking related issues, solution architecting and hands on engineering (on-prem + Cloud), as well as documentation • Collaborate with cross-discipline team members and customers to deliver HPC and peripheral Compute at Scale services • Thorough understanding of related industry best practices • Support internal and customer Architecture and Design efforts • Support customers with their workflow pipelines (advisory and hands-on) • Comprehensively document new and existing computational assets • Maintain the flexibility to pivot as engagement scopes may evolve • Support for AWS & GCP Cloud applications, migrations, and modernization • CloudOps / IaC for on-going platform management • Setup and configuration of AWS & GCP Cloud infrastructure for new platform builds • Ensure system compliance with company security standards and applicable regulatory requirements • Transition support for modernized services to operational teams • Provide engineering level troubleshooting and services restoration for operational issues as they arise on supported platforms • Provide training/mentorship for junior level team members