ClickHouse logo
ClickHouse

ClickHouse, Inc. is a database management system that allows users to generate analytical reports using real-time SQL queries. The company’s technology works faster than traditio

Database Reliability Engineer – Core Team

Location

United Kingdom

Posted

54 days ago

Salary

0

Seniority

Senior

Bachelor Degree5 yrs expEnglishAWSAzureCloudGoogle Cloud PlatformPythonSQL

Job Description

Database Reliability Engineer – Core Team

ClickHouse

• Continuously improve the reliability and performance of ClickHouse core. • Improve and create metrics and alerts for ClickHouse to be able to identify and prevent problems in production before they affect customers. • Dig deeper into the most common problems encountered by customers in ClickHouse Core to identify the root cause of problems and submit bug fixes, issue reports and suggest improvements. • Enhance and refine incident response processes and post-mortem analysis for ClickHouse core related outages including working with support and Cloud teams to communicate to the impacted customers. • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities. • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize customer impact.

Job Requirements

  • Bachelor’s or Master’s degree in Computer Science or a related field.
  • At least 5 years of experience in Reliability Engineering, QA or customer facing engineering.
  • Previous experience operating ClickHouse or other SQL databases in production.
  • Excellent understanding of distributed database internals and SQL, particularly ClickHouse is a major plus.
  • Scripting experience with Shell or Python, and ability to read and understand C++ code.
  • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform.
  • You are a strong problem-solver and have solid production debugging skills.
  • You thrive in a fast-paced environment as part of a global team, and you see yourself as a partner with the business with the shared goal of moving the business forward.
  • You have a high level of responsibility, ownership, and accountability.
  • Excellent communication skills.

Benefits

  • Flexible work environment - ClickHouse is a globally distributed company and remote-friendly. We currently operate in 20 countries.
  • Healthcare - Employer contributions towards your healthcare.
  • Equity in the company - Every new team member who joins our company receives stock options.
  • Time off - Flexible time off in the US, generous entitlement in other countries.
  • A $500 Home office setup if you’re a remote employee.
  • Global Gatherings – We believe in the power of in-person connection and offer opportunities to engage with colleagues at company-wide offsites.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Scratch Financial logo

Staff DevOps Engineer

Scratch Financial

Scratch Financial is the world's simplest patient financing solution.

DevOps Engineer54 days ago
Full TimeRemoteTeam 11-50Since 1912H1B Sponsor

Company Description NBCUniversal is one of the world's leading media and entertainment companies. We create world-class content, which we distribute across our portfolio of film, television, and streaming, and bring to life through our global theme park destinations, consumer products, and experiences. We own and operate leading entertainment and news brands, including NBC, NBC News, NBC Sports, Telemundo, NBC Local Stations, Bravo, and Peacock, our premium ad-supported streaming service. We produce and distribute premier filmed entertainment and programming through our powerhouse film and television studios, including Universal Pictures, DreamWorks Animation, and Focus Features, and the four global television studios under the Universal Studio Group banner, and operate industry-leading theme parks and experiences around the world through Universal Destinations & Experiences, including Universal Orlando Resort, home to Universal Epic Universe, and Universal Studios Hollywood. NBCUniversal is a subsidiary of Comcast Corporation. Visit www.nbcuniversal.com for more information. Our impact is rooted in improving the communities where our employees, customers, and audiences live and work. We have a rich tradition of giving back and ensuring our employees have the opportunity to serve their communities. We champion an inclusive culture and strive to attract and develop a talented workforce to create and deliver a wide range of content reflecting our world. Job Description As the DevOps Lead Engineer, you will be responsible for spearheading our DevOps initiatives. You will foster a culture of automation, continuous integration, observability and delivery. Your efforts will support consumer data driven advertising and marketing products, standardized consumer identity solutions, and machine learning initiatives for NBCUniversal and its brands. You will collaborate with cross-functional teams to optimize our cloud infrastructure, ensuring high availability, scalability, and security. Your expertise in AWS services, containerization technologies, monitoring tools, and cloud architecture will be pivotal in designing and implementing robust DevOps solutions that streamline our development, testing, and deployment processes. Responsibilities: - Develop and lead the implementation of DevOps strategies and best practices to improve the efficiency, reliability, and scalability of our cloud-based applications. - Design, build, and maintain robust continuous integration and continuous delivery pipelines to automate the software development and deployment lifecycle. - Utilize your in-depth knowledge of AWS services to architect, deploy, and manage scalable and resilient cloud infrastructure solutions. - Implement containerization technologies (e.g., Docker, Kubernetes) to orchestrate application deployment and ensure consistent environments across various stages of development. - Implement effective monitoring and logging solutions to proactively identify performance bottlenecks, security issues, and system anomalies. Develop auto-scaling solutions to meet fluctuating demand. - Design and optimize cloud architecture to ensure high availability, disaster recovery, and cost-effectiveness. - Implement security measures and best practices to safeguard our cloud infrastructure and applications against potential threats and vulnerabilities. - Lead and mentor a team of DevOps engineers, fostering a collaborative and innovative work environment. - Promote automation in all aspects of DevOps and maintain detailed documentation of infrastructure, processes, and procedures. Qualifications - Bachelor's degree in Computer Science, Software Engineering, or a related field. - Proven experience of 6+ years in DevOps and cloud engineering, with at least 2 years in a leadership or senior role. - Expertise in building and managing CI/CD pipelines using tools like Jenkins, GitLab CI/CD, or AWS CodePipeline. - Strong proficiency in AWS services, including EC2, S3, RDS, Lambda, IAM, and VPC. - Solid understanding of containerization technologies (e.g., Docker, Kubernetes) and container orchestration. - Experience with infrastructure-as-code tools (e.g., CloudFormation, Terraform). - Familiarity with monitoring and logging tools such as Prometheus, Grafana, ELK stack, Splunk, Datadog and CloudWatch. - Knowledge of cloud security best practices and compliance standards (e.g., CIS benchmarks, CCPA, GDPR). - Strong problem-solving skills and the ability to troubleshoot complex issues in a cloud environment. - Excellent communication and leadership skills to effectively collaborate with cross-functional teams. Additional Requirements: - Fully Remote: This position has been designated as fully remote, meaning that the position is expected to contribute from a non-NBCUniversal worksite, most commonly an employee's residence. This position is eligible for company sponsored benefits, including medical, dental and vision insurance, 401(k), paid leave, tuition reimbursement, and a variety of other discounts and perks. Learn more about the benefits offered by NBCUniversal by visiting the Benefits page of the Careers website. Salary range: $130,000 - $160,000 (bonus eligible) We are accepting applications for this position on an ongoing basis. Additional Information As part of our selection process, external candidates may be required to attend an in-person interview with an NBCUniversal employee at one of our locations prior to a hiring decision. NBCUniversal's policy is to provide equal employment opportunities to all applicants and employees without regard to race, color, religion, creed, gender, gender identity or expression, age, national origin or ancestry, citizenship, disability, sexual orientation, marital status, pregnancy, veteran status, membership in the uniformed services, genetic information, or any other basis protected by applicable law. If you are a qualified individual with a disability or a disabled veteran, you have the right to request a reasonable accommodation if you are unable or limited in your ability to use or access nbcunicareers.com as a result of your disability. You can request reasonable accommodations by emailing AccessibilitySupport@nbcuni.com. For LA County and City Residents Only: NBCUniversal will consider for employment qualified applicants with criminal histories, or arrest or conviction records, in a manner consistent with relevant legal requirements, including the City of Los Angeles' Fair Chance Initiative For Hiring Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, where applicable.

California
$130K - $160K / year
Job Closed
Renishaw logo

Manager Site Reliability Engineering

Renishaw

LexisNexis® Risk Solutions provides customers with solutions and decision tools that combine public and industry specific content with advanced technology and analytics to assist them in evaluating and predicting risk and enhancing operational efficiency. We use the power of data and advanced analytics to help our customers make better, timelier decisions. By bringing clarity to information, we ultimately help make communities safer, insurance rates more accurate, commerce more transparent, business decisions easier and processes more efficient. You can learn more about LexisNexis Risk at the link below: LexisNexis Risk Solutions

DevOps Engineer54 days ago
Full TimeRemoteTeam 5,001-10,000

We’re looking for a Manager, DevOps & Site Reliability Engineering who blends people leadership with credible, hands‑on engineering. You’ll lead a global team of SREs and DevOps engineers, from junior to principal, across the US, Australia, and Europe, while staying close to the code, tooling, and AWS infrastructure. About 60% of your time is leadership: growing people, steering priorities, enforcing accountability, and partnering with the tech leaders and architects who drives deep technical direction. About 40% is hands‑on: guiding IaC patterns, improving CI/CD and observability, participating in complex incident response and DR exercises, and contributing to high‑impact automation and reliability improvements. You will ensure our infrastructure, operations, and practices meet stringent KPIs in security, resiliency, cost efficiency, and operational maturity. This position is remote, with occasional in-office presence in Sydney for team events and collaboration sessions. Responsibilities People Leadership & Team Growth (≈60%) - Lead, mentor, and coach a distributed DevOps/SRE team of ~9 engineers; drive career progression, performance evaluations, morale, and hiring. - Remove roadblocks, clarify priorities, and maintain a predictable operational cadence across incidents, maintenance, and change windows. - Reinforce RELX Leadership Excellence (RLE) “Accelerating Leader” expectations and foster a blameless, learning‑oriented reliability culture. - Partner closely with technical leads, architects, and project management to drive consistent execution across teams. Hands‑on Expectations (≈40%) - Contribute to infrastructure‑as‑code (CDK modules), CI/CD pipeline improvements, automation - Participate in complex incident response: triage, mitigation, recovery, and high‑quality postmortems with actionable, long‑term fixes. - Improve metrics, logging, tracing, dashboards, and meaningful, actionable alerts aligned to SLOs. - Contribute to disaster recovery planning and testing, resilience patterns, and readiness reviews for major changes. - Champion cloud security (zero‑touch prod, vulnerability & secrets management, audit/compliance workflows) leveraging Wiz, Snyk, and other tooling; help land fixes. - Drive FinOps improvements: right‑sizing, auto‑shutdown, cost forecasting/variance explanations, and cross‑team cost hygiene. Cloud Infrastructure Ownership (AWS) - Oversee the AWS foundations and guardrails that power IDVerse; ensure they are secure, scalable, compliant, and well‑architected - Maintain environment hygiene, patching baselines, dependency updates, and EOL remediation with clear exception handling. Incident, Risk, and Change Management - Set the bar for incident management in partnership and fast coordination across stakeholders. - Ensure timely RCAs with durable long‑term actions and transparent tracking; coach engineers on risk reduction and reliability trade‑offs. Cross‑Functional Collaboration - Work with Engineering, Security, Product, Compliance, Finance/Cloud Business Office, SMC, and broader RELX partners to align on KPIs, risks, capacity, and costs. - Communicate proactively and translate complex operational topics into clear, outcome‑oriented language for diverse stakeholders. Preferred Tech Stack We don’t expect you to know everything, but we value curiosity, persistence, and deep problem‑solving over surface‑level fixes. AWS, CloudFormation, DynamoDB, Documenting, CDK, RDS, GraphQL, Bash, Typescript, Lambda, Rust, EC2, Git, Automation, API Gateway, Unix/Linux, Packer, REST APIs, SSO. Requirements - Proven success managing DevOps/SRE/Cloud Infrastructure teams in fast‑moving environments; comfortable with ~40% hands‑on contribution. - Experience in security‑sensitive and compliance‑aware operations: vulnerability/secrets management, patching baselines, audit readiness - Demonstrated FinOps mindset with practical cost optimization and forecasting experience. - Excellent leadership, communication, and stakeholder management skills across geographies and time zones; embodies RLE behaviors for Accelerating Leaders. - Balanced judgment: able to validate and contribute to technical direction (with a Principal Engineer and Architect leading most decisions) while driving execution. We know your well-being and happiness are key to a long and successful career. We are delighted to offer country specific benefits. Click here to access benefits specific to your location. We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact 1-855-833-5120. Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here. Please read our Candidate Privacy Policy. We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law. USA Job Seekers: EEO Know Your Rights.

Australia
Job Closed

DevOps Engineer – International Remote Hire

MEMX

MEMX is an exchange operator and market technology platform dedicated to delivering transparent, efficient, and cost-effective securities trading services designed to revolutionize

DevOps Engineer54 days ago

• Collaborate with development and operations teams to build and maintain infrastructure which supports MEMX operated platforms • Monitor scalable solutions which run infrastructure, both on-premises and cloud • Help debug build system and continuously improve build performance through metrics and analysis • Monitor systems capacity and performance to allow for scaling of high performance as necessary in addition to performing root cause analysis for incidents • Work with information security team to mitigate software and hardware vulnerabilities in the environment • Performs other duties as required and any other duties as assigned

Georgia
VELZI.AI LIMITED logo

DevOps Engineer

VELZI.AI LIMITED

Revolutionizing Accounting; AI Powered Business Solutions

DevOps Engineer54 days ago
Full TimeRemoteTeam 1-10H1B No Sponsor

Role Description As a DevOps Engineer at VELZI AI, you’ll build, maintain, and scale a secure, cloud-native infrastructure across GCP, Azure, and AWS. You’ll work closely with our engineering team to ensure seamless deployment pipelines, reliable environments, and high-performance systems that support our AI and fintech applications. You’ll be the backbone of our infrastructure enabling fast iteration, smooth deployments, and resilient architectures. Qualifications - Proven experience as a DevOps Engineer, Cloud Engineer, or Site Reliability Engineer - Strong experience with GCP, Azure, and/or AWS (multi-cloud experience is a huge plus) - In-depth knowledge of Kubernetes, Docker, Cloud Run, and container-based application delivery - Experience with CI/CD tooling and automation frameworks - Solid understanding of networking, firewalls, DNS, and cloud security models - Experience with VM deployment, Cloud Run (or similar), autoscaling, and infra optimization - Strong understanding of cloud storage solutions (Blob, S3, Cloud Storage) - Experience with Authentication, IAM, or identity systems is a bonus - Understanding of API architecture, microservices, and distributed systems - Experience with Terraform, Pulumi, or other infrastructure-as-code tools - Bonus: Experience with observability stacks, FinOps cost optimization, or AI infrastructure - You’re a problem-solver who thrives in fast-moving environments - You’re comfortable owning systems end-to-end and improving processes proactively - You're collaborative, communicative, and enjoy cross-functional teamwork Requirements - Design, deploy, and maintain cloud infrastructure across GCP, Azure, and AWS - Manage and optimize compute environments including Virtual Machines, Cloud Run, and container orchestration platforms - Architect scalable, multi-region solutions with high availability, redundancy, and strong security practices - Manage and monitor cloud storage solutions such as GCP Cloud Storage, Azure Blob Storage, and AWS S3 - Build, deploy, and manage containers using Docker - Deploy and manage services on Kubernetes (GKE / AKS / EKS) - Implement auto-scaling strategies, load balancing, and autoschedulers - Optimize resource utilization and cost efficiency across environments - Build and maintain CI/CD pipelines (GitHub Actions, GitLab, or similar) - Automate deployments, updates, rollbacks, and environment provisioning - Create infrastructure-as-code using tools like Terraform or Pulumi - Push our platform toward full automation and self-healing systems - Configure VPCs, VPNs, firewalls, DNS, subnets, and secure routing - Ensure secure API communication between our Next.js app and backend services - Implement monitoring, logging, and alerting systems (Prometheus, Grafana, Cloud Logging, Azure Monitor, CloudWatch, etc.) - Ensure compliance, data protection, and identity management across clouds - Work with Firebase Authentication, secure API gateways, and IAM policies - Collaborate with engineers to support API creation and deployments - Optimize performance of server environments running in Cloud Run or VMs - Implement failover systems, disaster recovery, and backup strategies - Ensure reliable operation of services including SES / SendGrid, databases, and microservices - Work closely with backend, AI, and product teams to support rapid iteration - Establish DevOps best practices, guidelines, and internal documentation - Conduct root-cause analysis, implement fixes, and prevent future incidents - Help shape the engineering culture by improving processes, scalability, and system observability Benefits - Remote-first & flexible – Work from wherever you do your best work - Cutting-edge tech – Work with AI, automation, and the latest dev tools - Rapid growth – Be part of an early-stage rocket ship - Collaborative culture – Join a team that values innovation, autonomy, and teamwork - Competitive salary & equity – Get well-compensated with generous stock options - Career advancement – As we scale, so do your opportunities

Worldwide