Senior Database Reliability Engineer – Worldwide Remote
Location
Poland
Posted
10 days ago
Salary
0
Seniority
Senior
Job Description
Senior Database Reliability Engineer – Worldwide Remote
CloudLinux
• Own production PostgreSQL reliability: HA design, Patroni, PgBouncer, replication, failover, upgrades, vacuum/bloat control, query tuning, locks, indexes, capacity, backups, PITR, and restore validation. • Improve disaster recovery and operational evidence: tested restores, documented recovery paths, measurable RTO/RPO targets, runbooks, and safe maintenance plans. • Support the wider database estate: ClickHouse, MongoDB, and Redis. You will troubleshoot incidents, review access and data-safety changes, improve monitoring, and learn the production ClickHouse patterns already in use. • Automate DBA workflows with Ansible, Terraform/OpenTofu, GitLab CI/CD, scripts, and reproducible runbooks for provisioning, grants, backups, restores, health checks, and ownership metadata. • Help build DBaaS-style self-service capabilities so engineering teams can request databases, access, credentials, and operational checks with less manual DBA intervention. • Improve observability and incident response through Grafana, metrics, logs, SLOs, alert rules, Opsgenie routing, and clear communication during production issues.
Job Requirements
- Deep hands-on PostgreSQL experience in business-critical production environments, typically 5+ years or equivalent depth.
- Strong understanding of PostgreSQL internals and operations: MVCC, WAL, transactions, locks, indexes, query planning, replication, autovacuum, bloat, major upgrades, backups, PITR, and restore testing.
- Proven experience with highly available databases and the ability to reason about quorum, split-brain risk, failover, rollback, and recovery.
- Strong Linux and infrastructure fundamentals: systemd, networking, storage, filesystems, CPU/memory/disk bottlenecks, TLS, DNS, firewalls, and root-cause troubleshooting.
- Automation skills with Ansible and scripting. Terraform/OpenTofu, GitLab CI/CD, and merge-request based delivery are strong advantages.
- Ability to support more than one database engine. You do not need to be a ClickHouse expert on day one, but you must be ready to learn it quickly and take responsibility for it.
- Practical use of AI engineering assistants such as Claude and Codex. We expect you to use them to improve speed and quality, while personally verifying generated SQL, commands, scripts, and operational conclusions.
- Clear written English for asynchronous work in Jira, Slack, GitLab, Slite, and runbooks.
Benefits
- A focus on professional development.
- Interesting and challenging projects.
- Fully remote work with flexible working hours, which allows you to schedule your day and work from any location worldwide.
- Paid 24 days of vacation per year, 10 days of national holidays, and unlimited sick leaves.
- Compensation for private medical insurance.
- Co-working and gym/sports reimbursement.
- Budget for education.
- The opportunity to receive a reward for the most innovative idea that the company can patent.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Design and implement infrastructure-as-code using Terraform for Azure services including AKS, Blob Storage and App Services. • Build, maintain and optimize CI/CD pipelines and mobile/web build pipelines. • Operate, troubleshoot and tune Kubernetes and Docker-based workloads running on AKS. • Implement and manage SSO and External ID flows using Microsoft Entra. • Create reusable templates, Terraform modules and pipeline templates to enable developer self-service. • Collaborate directly with technical leads to define platform direction and deployment patterns. • Mentor engineers on deployment best practices, observability and platform usage. • Own platform-level decisions and improvements, prioritizing strategic work over ticket-level execution. • Write clear, async-friendly documentation and communicate effectively in AI-augmented workflows. • Manage and support PostgreSQL-related deployment and operational concerns as they relate to platform infrastructure.
Site Reliability Engineer
SupplyHouse.comPlumbing, Heating & HVAC Supplies. Real People. Real Service.
• Design, build, and maintain scalable, reliable systems on GCP (Compute Engine, GKE, Cloud Storage, Cloud SQL) • Develop automation for infrastructure provisioning using Terraform, Ansible, or Deployment Manager • Build and maintain observability platforms (monitoring, logging, tracing) using tools such as Stackdriver (Cloud Monitoring), Prometheus, or Grafana • Manage incident response, conduct postmortems, and implement improvements to reduce recurrence • Partner with DevOps and engineering teams to enhance CI/CD pipelines for resilient deployments • Define and monitor SLAs, SLOs, and SLIs to ensure application availability and performance • Implement disaster recovery (DR) and backup strategies across cloud services • Continuously optimize performance, capacity, and cost-efficiency of GCP resources
• Manage, automate and optimize cloud environments, with a particular focus on AWS. • Implement Infrastructure as Code, manage CI/CD pipelines, and support continuous delivery of applications. • Collaborate with development and operations teams to ensure system reliability, scalability and performance. • Contribute to platform evolution and process automation.
• Platform & IaC Ownership: Analyze and implement infrastructure designs for services and shared components, managing them as Infrastructure as Code (IaC) using tools like Terraform and Helm within our cloud environment (AWS). • Delivery Lifecycle Management: Design and implement robust CI/CD pipelines and own the full delivery lifecycle of infrastructure tools, services, and components from development testing through to production rollout. • Developer Enablement: Actively participate in regular support cadences to provide hands-on technical assistance and expertise to development teams regarding platform adoption and usage. • Reliability Integration: Integrate and maintain monitoring, logging, and alerting components for platform services, and participate in the team's on-call rotation for immediate incident mitigation within the platform ownership scope. • Security & Compliance: Collaborate closely with the Security team to embed DevSecOps best practices and guardrails, ensuring the security and compliance of the platform and delivery process. • Process Improvement: Drive continuous improvements in platform tooling usability, deployment efficiency, and environment stability.




