Biotechnology and Life Science
DevOps Engineer
Location
New York
Posted
19 hours ago
Salary
0
Seniority
Senior
Job Description
DevOps Engineer
THEMIS Waste Recovery Technology
• Build and operate the infrastructure that keeps Themis secure, reliable, and fast • Own the systems for cloud infrastructure, CI/CD pipelines, observability, and security controls • Automate provisioning, configuration, scaling, and routine operational tasks • Manage containerized workloads and orchestration • Build monitoring, logging, alerting, and dashboards to ensure system health and performance • Define and improve incident response processes • Drive reliability improvements, capacity planning, and performance tuning • Implement and maintain security controls and access management
Job Requirements
- Experience operating production infrastructure on a major cloud provider (AWS, GCP, or Azure)
- Proficiency with infrastructure-as-code (Terraform)
- Experience building and maintaining CI/CD pipelines
- Experience with containers and orchestration (Docker, Kubernetes)
- Strong scripting or programming skills (e.g., Python, Go, or Bash)
- Solid understanding of networking, security, and observability fundamentals
- Ability to own systems end to end and manage incidents calmly
Benefits
- Remote-first team
- Flexible working hours
- Meaningful daily overlap for collaboration
- Shared on-call rotation
- Fair rotations for on-call duties
- Flexibility for availability during core hours
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior DevOps Engineer
CodiLimeA strategic partner for technology-driven companies | Network engineering | Software engineering
• Design, provision, and maintain cloud infrastructure using Terraform and Terraform Cloud. • Manage Azure networking, including VNets, subnets, Private Endpoints, DNS zones, and NSGs. • Manage Azure Kubernetes Service (AKS) clusters. • Implement and optimize CI/CD pipelines using GitHub Actions. • Manage container build, deployment, and release processes. • Implement monitoring and observability solutions. • Support incident analysis and root-cause investigations. • Collaborate with architects, developers, and security teams. • Promote an automation-first and DevOps culture across engineering teams. • Participate in technical discovery, proof-of-concepts, and architecture discussions.
Senior DevOps Engineer
CodiLimeA strategic partner for technology-driven companies | Network engineering | Software engineering
• Design, provision, and maintain cloud infrastructure using Terraform and Terraform Cloud. • Manage Azure networking, including VNets, subnets, Private Endpoints, DNS zones, and NSGs. • Manage Azure Kubernetes Service (AKS) clusters. • Implement and optimize CI/CD pipelines using GitHub Actions. • Manage container build, deployment, and release processes. • Implement monitoring and observability solutions. • Support incident analysis and root-cause investigations. • Collaborate with architects, developers, and security teams. • Promote an automation-first and DevOps culture across engineering teams. • Participate in technical discovery, proof-of-concepts, and architecture discussions.
Senior Site Reliability Engineer
RemoteThe easier way to employ globally. Remote builds belonging for your team with payroll, benefits, & compliance solutions.
• As a Senior SRE at Remote, you'll work with a high degree of autonomy on complex reliability and platform problems, owning the plan and execution of features and projects within our SRE/Platform domain. • You'll contribute to the platform's architecture and reliability strategy, translating ambiguous requirements into robust, maintainable solutions and raise the technical bar of the engineers around you while collaborating closely with product and security teams in an async-first, fully remote environment. • You'll work AI-natively day to day and build reusable AI workflows that make the whole team faster and more reliable, not just yourself. • Lead solution discovery and delivery for reliability and infrastructure problems with real ambiguity, complexity, or scope. Autonomously, coordinating with other contributors where needed. • Contribute to the platform's architecture, tooling, and roadmap. Influence team priorities and advocate for technical initiatives. • Help define and operate reliability practices for our platform: SLOs/SLIs, error budgets, alerting, observability. Take responsibility for the team's operational stance, using support/incident metrics to shape technical strategy. • Resolve cross-team requests, identify systemic issues, and turn recurring ones into reusable fixes and runbooks rather than one-off answers. • Work AI-natively and operationalise it for the team: use agentic workflows by default; build reusable prompts, skills, and tooling embedded in the codebase so others ship faster, safely; design agent-ready systems (clean interfaces, good observability) that make AI-assisted changes easy to review. Establish shared standards and domain-level guardrails (secure-by-default patterns, CI protections, AI-assisted review practices). • Mentor and give timely, actionable feedback to less-senior engineers; participate in hiring, onboarding, and RFC discussions. • Collaborate with Security on platform hardening and threat mitigation; contribute to capacity and cost-efficiency of the infrastructure. • Participate in incident response and on-call rotations to rapidly resolve issues and maintain system reliability.
• Own and optimize Noxtua's infrastructure across OTC and our self-hosted GPU servers — ensuring efficient architecture, reliable operation, and cost control. • Lead and grow a team of 4–5 DevOps engineers, setting technical direction, supporting their development, and having a strong ownership mindset. • Operate our self-managed GPU server fleet — provisioning, driver installation, hardening, and connectivity via Ansible — and manage provider SLAs to keep heavy AI workloads running reliably. • Build and maintain infrastructure automation using Infrastructure as Code (Terraform & Ansible). • Run our container platform on Kubernetes, support teams with Docker, and keep our services (APIs) stable, accessible, and secure. • Set up and maintain monitoring and alerting (e.g., Prometheus, Grafana) to ensure system reliability and performance. • Develop and maintain CI/CD pipelines and collaborate with the development and AI teams to automate deployments and support AI-driven workloads.



