Job Closed

This listing is no longer active.

GetBlock

GetBlock provides developers with instant connection to 40+ blockchain nodes via JSON-RPC, REST and WebSockets APIs.

SRE Lead

DevOps EngineerDevOps EngineerOther Remote SeniorTeam 11-50Since 2020H1B No SponsorCompany Site LinkedIn

Location

United States

Posted

160 days ago

Salary

Seniority

Senior

Bachelor Degree3 yrs expEnglishConsul Kubernetes Linux Node.js Prometheus Terraform HashiCorp Vault Blockchain / Web3

Job Description

• Lead and grow the SRE team: hiring, onboarding, 1:1s, performance reviews, and career development. • Own SRE operating cadence: prioritization, planning, execution, and visibility of reliability work. • Maintain high standards for production readiness: runbooks, operational checklists, change management, and quality gates. • Own production reliability end-to-end across gateways, clusters, and blockchain node fleets. • Define and evolve SLIs/SLOs for uptime, response time, RPS, and time-to-resolve; partner with engineering teams to meet targets. • Own incident management standards: alerting strategy, escalation, incident coordination, and communications. • Run and improve postmortems: ensure follow-ups are executed and reliability debt is reduced over time. • Lead capacity planning and performance work across regions and chains; balance reliability, speed, and cost. • Lead design reviews and set engineering standards for reliability, scalability, and operational excellence. • Drive architecture decisions across Nomad + Kubernetes environments, gateways, and observability stack. • Build and evolve internal tooling that improves reliability and operational efficiency (automation, health systems, diagnostics, self-service).

Job Requirements

3+ years in SRE / infrastructure / production engineering, including 1+ year leading people
Strong Linux, networking, and production incident debugging skills
Experience running and scaling distributed, multi-region, high-load systems
Hands-on with orchestration (Nomad and/or Kubernetes) and modern gateways/proxies
Solid observability practices (metrics, logs, traces, alerting, incident response)
Using AI agents to improve operational efficiency and reliability automation
Strong communication and ability to lead technical decisions end to end
Nice to have: Web3 / RPC infrastructure and blockchain node operations
HashiCorp stack (Nomad, Consul, Vault), Prometheus ecosystem
Terraform / IaC, capacity & cost modeling, DDoS and abuse protection
Building internal platforms: self-service tools, runbooks, reliability automation.

Benefits

20 days of annual leave, plus an additional 12 days off to use for your holidays or personal days.
Well-being programs to support your health and balance.
Coworking space compensation for a productive work environment.
Paid sick leave to ensure you can rest when needed.
A company that invests in your growth, with personalized roadmaps to guide your professional development.
An actively growing company with great opportunities for both horizontal and vertical career development.
Opportunity to shape the initiatives you’re working on and make a real impact.

Related Categories

DevOps Engineer

Related Job Pages

More Remote Jobs

More DevOps Engineer Jobs

Senior Site Reliability Engineer – Golang, OpenShift, AWS, Linux

Red Hat

The leading provider of enterprise open source solutions.

DevOps Engineer160 days ago

Full Time RemoteTeam 10,001+Since 1993H1B Sponsor

Company Site LinkedIn

• Develop, scale, and operate OpenShift managed cloud services • Enable customer self-service and improve monitoring systems • Eliminate work through automation • Participate in a regular on-call schedule, including occasional paid weekends and holidays • Resolve customer issues escalated from the Red Hat Global Support team • Work within a small agile team to develop and improve SRE software

Ansible AWS Azure Chef Distributed Systems DNS Docker GCP Java Kubernetes Linux OpenShift Prometheus Puppet Python TCP/IP

View details: Senior Site Reliability Engineer – Golang, OpenShift, AWS, Linux

Australia

Apply

Job Closed

Engineer III – CICD DevOps

CrowdStrike

CrowdStrike has redefined security with the world’s most advanced cloud-native platform that protects and enables the people, processes and technologies that drive modern enterprise. Tested and proven, the world's largest organizations trust CrowdStrike to stop breaches with unparalleled protection against the most sophisticated cyberattacks. The CrowdStrike culture has been built upon our Core Values since the day we began. We are Fanatical About the Customer, Relentlessly Focused on Innovation and believe that our Limitless Passion drives Unlimited Potential for every CrowdStriker. As a purpose-built remote-first company, we believe cultivating a connected culture for every employee, no matter where they are in the world, is a key ingredient in building a high-performing, diverse team. We don’t have a mission statement. We’re on a mission—to stop breaches. Ready to join a mission that matters?

DevOps Engineer160 days ago

Other RemoteTeam 5,001-10,000Since 2011H1B Sponsor

Company Site LinkedIn

• On-Premise Provisioning and Administration • Experience with Kubernetes (k8s), ArgoCD, FLuxCD, and Containers • Jenkins with JCASC • Artifact repository services such as: JFrog Artifactory, Nexus, or Quay.io • Atlassian Stack (Jira, Confluence, Bitbucket) • IaaS Provisioning tools such as Ansible, Chef, Salt, Puppet etc. • Experience with common scripting languages Python, REST APIs, Groovy • Experience with Linux and Windows server administration in Hybrid Environments • Knowledge of proper monitoring, maintenance, and disaster recovery of critical services • Ability to document processes/procedures.

Ansible Chef Groovy Jenkins Kubernetes Linux Puppet Python SaltStack

View details: Engineer III – CICD DevOps

United States

$120K - $180K / year

Apply

Job Closed

Senior DevOps Engineer

ScaleUP Week

Four transformational days of best practices, impact and inspiration.

DevOps Engineer160 days ago

Full Time RemoteTeam 11-50Since 2024H1B No Sponsor

Company Site LinkedIn

• Own and scale ZayZoon’s AWS infrastructure to ensure reliability, performance, and security. • Design and automate infrastructure provisioning using CloudFormation, build CI/CD pipelines and manage existing infrastructure stacks. • Optimize our PostgreSQL databases for high availability and performance. • Improve monitoring and observability when possible, ensuring detection and a proactive resolution of issues. • Designing secure (SOC-2 and cybersecurity compliance), scalable cloud services in AWS. • Evaluate and remediate Critical and High CVEs across all our services on a moment’s notice. • Work cross-functionally across multiple teams such as development, data, testing, and security to convey concepts and build understanding as well as improve deployment processes. • Apply DevOps best practices in everything you do - there are many ways to do things, and you bring the best of them to our environment.

Ansible AWS Chef Cloud Cyber Security Google Cloud Platform PostgreSQL Python Ruby SQL Terraform Go

View details: Senior DevOps Engineer

Canada

Apply

Job Closed

Senior Site Reliability Engineer – Chaos Engineering

Articul8 AI

Solving the world's toughest problems with Generative AI.

DevOps Engineer160 days ago

Full Time RemoteTeam 11-50H1B No Sponsor

Company Site LinkedIn

• Architect and maintain scalable, highly available infrastructure for our GenAI platform. • Design and implement robust monitoring, alerting, and observability solutions to proactively ensure system health and performance. • Automate deployment, scaling, and management of our cloud-native infrastructure, reducing toil and improving efficiency. • Define, measure, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to deliver outstanding service quality. • Participate in on-call rotations and provide rapid response to production incidents, minimizing downtime and user impact. • Collaborate closely with development teams to build reliable, scalable, and efficient systems for complex AI workloads. • Lead incident response efforts, conduct thorough post-mortems, and champion continuous improvement initiatives. • Optimize infrastructure for performance, scalability, and cost-effectiveness—especially for high-demand AI workloads. • Implement and enforce security best practices across all systems and environments. • Create and maintain comprehensive documentation, including runbooks and knowledge base articles, to foster a culture of shared knowledge.

AWS Azure Distributed Systems Docker GCP Grafana Kubernetes NoSQL Prometheus Python SQL Terraform

View details: Senior Site Reliability Engineer – Chaos Engineering

Brazil

Apply

Job Closed

SRE Lead

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior Site Reliability Engineer – Golang, OpenShift, AWS, Linux

Engineer III – CICD DevOps

Senior DevOps Engineer

Senior Site Reliability Engineer – Chaos Engineering