ClickHouse, Inc. is a database management system that allows users to generate analytical reports using real-time SQL queries. The company’s technology works

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerOther Remote Senior Company Site

Location

United States

Posted

91 days ago

Salary

$141K - $208K / year

Seniority

Senior

Bachelor Degree8 yrs expEnglishAnsible AWS Azure Docker GCP Kubernetes Puppet Python SQL Terraform

Job Description

• Collaborate with various engineering teams in ClickHouse to design and implement scalable, secure, and highly available systems for ClickHouse. • Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud. • Ensure all the infrastructure components in ClickHouse Cloud (including Dataplane, Control Plane and ClickHouse Core) have monitoring and alerting in place to ensure timely detection and resolution of incidents. • Enhance and refine incident response processes and post-mortem analysis for any outages in ClickHouse Cloud including working with the support team to communicate to the impacted customers. • Continuously improve the reliability and performance of our ClickHouse services. • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities. • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize downtime.

Job Requirements

Bachelor’s or Master’s degree in Computer Science or a related field.
At least 8 years of experience in Site Reliability Engineering or a related field.
Previous experience using ClickHouse in production.
Hands on experience with Go and/or Python.
Strong knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform.
Excellent understanding of distributed databases and SQL, particularly ClickHouse is a major plus.
Hands on experience with container orchestration tools such as Kubernetes or Docker Swarm.
Strong experience with automation and configuration management tools such as Ansible, Terraform, or Puppet.
You are a strong problem solver and have solid production debugging skills.
You are passionate about efficiency, availability, scalability, and data governance.
You thrive in a fast paced environment, and see yourself as a partner with the business with the shared goal of moving the business forward.
You have a high level of responsibility, ownership, and accountability.
Excellent communication and interpersonal skills.

Benefits

Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries.
Healthcare - Employer contributions towards your healthcare.
Equity in the company - Every new team member who joins our company receives stock options.
Time off - Flexible time off in the US, generous entitlement in other countries.
A $500 Home office setup if you’re a remote employee.
Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites.

Related Categories

DevOps Engineer

Related Job Pages

Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

IT Operations Engineer I

Aledade

Self-described as "a new company with an old-fashioned goal," Aledade aims to put healthcare control back into the hands of doctors. Headquartered in Bethesda, Maryland, the compan

DevOps Engineer92 days ago

Other Remote

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description As an IT Operations Engineer I, you are a vital contributor to the health, stability, and efficiency of our production environments. Sitting at the intersection of traditional systems administration and modern DevOps, you are responsible for deploying standard infrastructure components and ensuring that our systems remain reliable and secure. While this role focuses on the execution of foundational IT operations, you will work closely with Senior Engineers to automate manual processes and uphold rigorous compliance standards. You will be expected to understand how server and cloud uptime impacts the broader business, ensuring that every task—from server patching to incident resolution—is performed with accuracy, documentation, and a culture of continuous improvement in mind. Primary Duties - Hybrid Infrastructure & Identity Support: Deploy standard infrastructure components; assist in cloud computing architectures and identity migrations (e.g., AD to Microsoft Entra). - Automation & Modernization: Execute infrastructure tasks using scripting (PowerShell, Python); assist in managing VDI and computing infrastructure in Azure. - System Reliability & Incident Management: Resolve alerts/tickets in a timely fashion; participate in the On-Call rotation and support root-cause analysis (RCA) activities. - Security, Compliance & Audit: Maintain firewalls, automated patching, and security monitoring to ensure audit-readiness (ITGC, SOX, SOC II Type II). - Documentation & Standardization: Contribute to the team Wiki/SOP library; accurately estimate time for server configs and notify leads of potential risks. Qualifications - Education: Bachelor’s degree in Information Technology, Computer Science, or a related field. - Experience: 6+ years of experience in IT operations or similar roles, with demonstrated expertise in system administration and cloud network management. - Technical Skill: Strong analytical and problem-solving skills, with a focus on system efficiency and user satisfaction. Requirements - Proficiency in managing IT infrastructure, including security, networking, and systems administration. - Familiarity with IT compliance frameworks (ITGC, SOX, SOC II Type II, NIST) and security protocols. - Strong communication skills for effective collaboration across departments. - Experience identifying infrastructure gaps and contributing to complex project solutions. - Experience with Mobile Device Management tools. Physical Requirements - Environment: Prolonged periods of sitting; extensive use of computers and keyboards. - Physicality: Occasional walking and lifting may be required. - Availability: Must be available for on-call duties as necessary to maintain system uptime. Benefits - Flexible work schedules and the ability to work remotely are available for many roles. - Health, dental and vision insurance paid up to 80% for employees, dependents and domestic partners. - Robust time-off plan (21 days of PTO in your first year). - Two paid volunteer days and 11 paid holidays. - 12 weeks paid parental leave for all new parents. - Six weeks paid sabbatical after six years of service. - Educational Assistant Program and Clinical Employee Reimbursement Program. - 401(k) with up to 4% match. - Stock options. - And much more!

View details: IT Operations Engineer I

United States

Apply

Job Closed

Senior DevOps Engineer

Javelo

DevOps Engineer92 days ago

Full Time RemoteTeam 11-50Since 2015H1B No Sponsor

Company Site LinkedIn

• Own and drive infrastructure projects end-to-end — from breaking down the problem into subtasks, through implementation, to communicating results to stakeholders. • We don't just "do tasks"; we solve problems and explain how and why. • Evolve Kubernetes (with Argo) and cloud infrastructure — mostly GCP. • Take part in cloud infrastructure unification. • Develop and maintain Terraform configurations for scalable, reliable systems. • Build and optimize CI/CD pipelines using GitHub Actions. • Strengthen observability with OpenTelemetry and Datadog. • Integrate and act on insights from AIkido and other security tools to detect and mitigate issues within workloads. • Support and tune PostgreSQL and other managed databases used by our applications. • Collaborate with engineering teams — proactively communicate progress, share context, and manage expectations. • Troubleshoot and resolve production issues as part of our on-call rotation. • Participate in internal and external security audits, ensuring our systems meet compliance and resilience standards. • Drive SRE and GitOps principles — from post-mortems to automation and clear documentation.

AWS Distributed Systems Elixir GCP JavaScript Kubernetes Linux Node.js PostgreSQL Prometheus Terraform

View details: Senior DevOps Engineer

Poland

zł22K - zł30K / month

Apply

Job Closed

DevOps Engineer

Javelo

DevOps Engineer92 days ago

Full Time RemoteTeam 11-50Since 2015H1B No Sponsor

Company Site LinkedIn

• Work on Kubernetes (with Argo) and cloud infrastructure — mostly GCP • Contribute to cloud infrastructure unification • Write and maintain Terraform configurations • Build and improve CI/CD pipelines using GitHub Actions • Help strengthen observability with OpenTelemetry and Datadog • Learn to integrate and act on insights from AIkido and other security tools • Support PostgreSQL and other managed databases used by our applications • Collaborate with engineering teams • Participate in troubleshooting production issues • Contribute to internal and external security audits

AWS Elixir GCP Grafana JavaScript Kubernetes Linux Node.js PostgreSQL Prometheus Python Terraform

View details: DevOps Engineer

Poland

PLN16K - PLN22K / month

Apply

Job Closed

Mid DevOps Engineer

INDG | Grip

Every Product Playable, Beauty at Scale

DevOps Engineer92 days ago

Full Time RemoteTeam 201-500H1B No Sponsor

Company Site LinkedIn

• Own and Evolve CI/CD Infrastructure • Manage Multi-Cloud Infrastructure as Code • Strengthen Kubernetes & Container Operations • Elevate Monitoring, Observability & Incident Response • Embed DevSecOps and Operational Discipline

AWS Azure Docker Grafana Kubernetes Prometheus Python

View details: Mid DevOps Engineer

Netherlands

Apply

Senior Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

IT Operations Engineer I

Senior DevOps Engineer

DevOps Engineer

Mid DevOps Engineer