Cribl logo
Cribl

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy.

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 501-1,000Since 2017H1B SponsorCompany SiteLinkedIn

Location

Poland

Posted

15 days ago

Salary

0

Seniority

Senior

Job Description

Senior Site Reliability Engineer

Cribl

• Engage with teams and improve service delivery and reliability across their entire lifecycle • Measure and monitor all production systems with an eye towards availability, latency and overall system health • Seek out the cause of errors and instability in our production cloud services and drive teams towards better operational excellence • Engage with product and platform teams to improve and evolve systems by lobbying for changes that improve reliability, resilience, and observability • Help identify and drive down toil with creative innovation and automation • This position will require stand-by, on-call, or off-hours duties

Job Requirements

  • Proven experience designing, implementing, and operating observability systems for complex cloud-based platforms
  • Experience with Configuration Management and Infrastructure as a Code Tools like Terraform (preferred) or Ansible
  • Knowledge of cloud platforms (prefer AWS and Azure)
  • Experience with APM and Observability and related tools such as, New Relic, Splunk, CloudWatch, Prometheus, Grafana/Kibana, Sentry etc.
  • Extensive experience with enterprise scale continuous delivery environments
  • Development with JavaScript/Node.js/TypeScript in a Linux/Mac environment
  • Experience with sustainable incident response in a blameless environment
  • Background in Linux Systems Engineering
  • Experience with Incident response related tools for instance, PagerDuty, FireHydrant, Blameless etc.
  • Comfortable with a high level of autonomy and working with a distributed team
  • Knowledge of Cloud and application security best practices
  • Strong knowledge of cloud design patterns for scale, data management, resiliency, etc.
  • A love for high quality and a knack for testing
  • Opinions about business metrics, and SLOs

Benefits

  • Diversity drives innovation and better decisions
  • Remote-first culture
  • Welcoming and valuing differences

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Full TimeRemoteTeam 51-200Since 2009H1B No Sponsor

• Own production PostgreSQL reliability: HA design, Patroni, PgBouncer, replication, failover, upgrades, vacuum/bloat control, query tuning, locks, indexes, capacity, backups, PITR, and restore validation. • Improve disaster recovery and operational evidence: tested restores, documented recovery paths, measurable RTO/RPO targets, runbooks, and safe maintenance plans. • Support the wider database estate: ClickHouse, MongoDB, and Redis. You will troubleshoot incidents, review access and data-safety changes, improve monitoring, and learn the production ClickHouse patterns already in use. • Automate DBA workflows with Ansible, Terraform/OpenTofu, GitLab CI/CD, scripts, and reproducible runbooks for provisioning, grants, backups, restores, health checks, and ownership metadata. • Help build DBaaS-style self-service capabilities so engineering teams can request databases, access, credentials, and operational checks with less manual DBA intervention. • Improve observability and incident response through Grafana, metrics, logs, SLOs, alert rules, Opsgenie routing, and clear communication during production issues.

Poland
Resilient Co. logo

Senior DevOps

Resilient Co.

WE ARE RESILIENT CO. We adapt to your needs.

DevOps Engineer15 days ago
ContractRemoteTeam 11-50Since 2020H1B No Sponsor

• Design and implement infrastructure-as-code using Terraform for Azure services including AKS, Blob Storage and App Services. • Build, maintain and optimize CI/CD pipelines and mobile/web build pipelines. • Operate, troubleshoot and tune Kubernetes and Docker-based workloads running on AKS. • Implement and manage SSO and External ID flows using Microsoft Entra. • Create reusable templates, Terraform modules and pipeline templates to enable developer self-service. • Collaborate directly with technical leads to define platform direction and deployment patterns. • Mentor engineers on deployment best practices, observability and platform usage. • Own platform-level decisions and improvements, prioritizing strategic work over ticket-level execution. • Write clear, async-friendly documentation and communicate effectively in AI-augmented workflows. • Manage and support PostgreSQL-related deployment and operational concerns as they relate to platform infrastructure.

Argentina
SupplyHouse.com logo

Site Reliability Engineer

SupplyHouse.com

Plumbing, Heating & HVAC Supplies. Real People. Real Service.

DevOps Engineer15 days ago
Full TimeRemoteTeam 501-1,000Since 2004H1B Sponsor

• Design, build, and maintain scalable, reliable systems on GCP (Compute Engine, GKE, Cloud Storage, Cloud SQL) • Develop automation for infrastructure provisioning using Terraform, Ansible, or Deployment Manager • Build and maintain observability platforms (monitoring, logging, tracing) using tools such as Stackdriver (Cloud Monitoring), Prometheus, or Grafana • Manage incident response, conduct postmortems, and implement improvements to reduce recurrence • Partner with DevOps and engineering teams to enhance CI/CD pipelines for resilient deployments • Define and monitor SLAs, SLOs, and SLIs to ensure application availability and performance • Implement disaster recovery (DR) and backup strategies across cloud services • Continuously optimize performance, capacity, and cost-efficiency of GCP resources

India
$29K - $36K / year
HRM Group logo

DevOps Engineer, AWS, Terraform

HRM Group

Accelerating Digital Evolution

DevOps Engineer15 days ago
Full TimeRemoteTeam 201-500H1B No Sponsor

• Manage, automate and optimize cloud environments, with a particular focus on AWS. • Implement Infrastructure as Code, manage CI/CD pipelines, and support continuous delivery of applications. • Collaborate with development and operations teams to ensure system reliability, scalability and performance. • Contribute to platform evolution and process automation.

United States