Job Closed

This listing is no longer active.

CloudLinux

Senior Database Reliability Engineer

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 51-200Since 2009H1B No SponsorCompany Site LinkedIn

Location

Worldwide

Posted

57 days ago

Salary

Seniority

Senior

No structured requirement data.

Job Description

Role Description We are hiring a Senior Database Reliability Engineer to join the Infrastructure DBA cell. This is a hands-on production ownership role, not a narrow ticket-processing DBA position. You will keep critical database services reliable, automate repeated work, support engineering teams, and reduce single-person dependency in our PostgreSQL, ClickHouse, MongoDB, and Redis operations. PostgreSQL is the main requirement. ClickHouse experience is a strong plus, but it is not a day-one blocker. We need a senior engineer with enough database, Linux, automation, and incident-response depth to learn our ClickHouse environment quickly and operate it safely. Your Responsibilities - Own production PostgreSQL reliability: HA design, Patroni, PgBouncer, replication, failover, upgrades, vacuum/bloat control, query tuning, locks, indexes, capacity, backups, PITR, and restore validation. - Improve disaster recovery and operational evidence: tested restores, documented recovery paths, measurable RTO/RPO targets, runbooks, and safe maintenance plans. - Support the wider database estate: ClickHouse, MongoDB, and Redis. Troubleshoot incidents, review access and data-safety changes, improve monitoring, and learn the production ClickHouse patterns already in use. - Automate DBA workflows with Ansible, Terraform/OpenTofu, GitLab CI/CD, scripts, and reproducible runbooks for provisioning, grants, backups, restores, health checks, and ownership metadata. - Help build DBaaS-style self-service capabilities so engineering teams can request databases, access, credentials, and operational checks with less manual DBA intervention. - Improve observability and incident response through Grafana, metrics, logs, SLOs, alert rules, Opsgenie routing, and clear communication during production issues. What Success Looks Like - PostgreSQL clusters have tested backup and restore paths, useful dashboards, clear ownership, and documented failover procedures. - Repeated DBA tickets become automation or self-service workflows. - ClickHouse operational knowledge is no longer a single-person dependency. - Database incidents have owners, runbooks, evidence, and measurable recovery paths. - Product and engineering teams get database help faster without sacrificing safety, auditability, or reliability. What We Expect From You - Deep hands-on PostgreSQL experience in business-critical production environments, typically 5+ years or equivalent depth. - Strong understanding of PostgreSQL internals and operations: MVCC, WAL, transactions, locks, indexes, query planning, replication, autovacuum, bloat, major upgrades, backups, PITR, and restore testing. - Proven experience with highly available databases and the ability to reason about quorum, split-brain risk, failover, rollback, and recovery. - Strong Linux and infrastructure fundamentals: systemd, networking, storage, filesystems, CPU/memory/disk bottlenecks, TLS, DNS, firewalls, and root-cause troubleshooting. - Automation skills with Ansible and scripting. Terraform/OpenTofu, GitLab CI/CD, and merge-request based delivery are strong advantages. - Ability to support more than one database engine. Ready to learn ClickHouse quickly and take responsibility for it. - Practical use of AI engineering assistants such as Claude and Codex to improve speed and quality, while personally verifying generated SQL, commands, scripts, and operational conclusions. - Clear written English for asynchronous work in Jira, Slack, GitLab, Slite, and runbooks. Nice to Have - ClickHouse operations: replication, Keeper/ZooKeeper, MergeTree engines, distributed DDL, grants, row policies, backups, query troubleshooting, and cluster recovery. - MongoDB replica sets and Percona Backup for MongoDB. - Redis/Sentinel and broker/cache failure modes. - Database observability, SLOs, golden signals, alert tuning, and executable incident runbooks. - Building internal platforms, self-service portals, or DBaaS workflows for engineering teams. Benefits - A focus on professional development. - Interesting and challenging projects. - Fully remote work with flexible working hours, allowing you to schedule your day and work from any location worldwide. - Paid 24 days of vacation per year, 10 days of national holidays, and unlimited sick leaves. - Compensation for private medical insurance. - Co-working and gym/sports reimbursement. - Budget for education. - The opportunity to receive a reward for the most innovative idea that the company can patent.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Senior Site Reliability Engineer

GoReel

GoReel is an iGaming tech provider and game developer.

DevOps Engineer57 days ago

Full Time RemoteTeam 51-200Since 2015H1B No Sponsor

Company Site LinkedIn

• We are looking for an experienced and motivated Senior Site Reliability Engineer (SRE) to join our team. • In this role, you will be responsible for the reliability, scalability, performance, and stability of our systems and applications. • You will work closely with cross-functional teams to automate processes, improve infrastructure, and support continuous product delivery.

AWS Cloud Docker EC2 ElasticSearch Grafana Jenkins Kubernetes Prometheus

View details: Senior Site Reliability Engineer

Poland

Apply

Job Closed

Mid-Level DevOps Engineer

Traffic Label Limited

DevOps Engineer57 days ago

Full Time RemoteTeam 11-50Since 2006H1B No Sponsor

Company Site LinkedIn

• Design, implement, and maintain scalable Kubernetes infrastructure on GKE/EKS • Develop and manage Infrastructure as Code using Terraform, Helm, and Ansible • Build and improve CI/CD pipelines for fast and reliable deployments • Implement and maintain monitoring, logging, and alerting solutions • Support PostgreSQL and Kafka environments • Automate operational tasks using Python and Bash scripting • Troubleshoot production issues across cloud and Kubernetes environments • Collaborate with developers to improve deployment and operational processes • Participate in on-call rotation and production support

Ansible AWS Cloud Docker Google Cloud Platform Kafka Kubernetes PostgreSQL Prometheus Python Terraform

View details: Mid-Level DevOps Engineer

Europe

Apply

Job Closed

Senior DevOps Engineer – Aviation, Mission-Critical Systems

Driving automotive innovation through talent

DevOps Engineer57 days ago

Full Time RemoteTeam 11-50Since 2017H1B Sponsor

Company Site LinkedIn

• Design, implement, and operate Kubernetes-based infrastructure for production environments • Build and maintain CI/CD pipelines using Git-based workflows and modern automation tools • Develop automation and internal tooling using Python and Go • Manage artifact repositories and dependency workflows (e.g., Artifactory or similar solutions) • Support and optimize SQL and/or NoSQL databases in production environments • Implement monitoring, logging, and full-stack observability solutions • Ensure high availability, scalability, and resilience of distributed systems • Collaborate with engineering teams to improve deployment processes and developer experience • Participate in incident response, root cause analysis (RCA), and continuous improvement initiatives • Contribute to platform architecture decisions and DevOps best practices • Enforce security, access control, and compliance standards across environments

Cloud Distributed Systems Kubernetes Linux NoSQL Python SQL Go

View details: Senior DevOps Engineer – Aviation, Mission-Critical Systems

United States

Apply

Job Closed

Agile Technical Delivery Manager – DevOps

LegalMatch

Attorneys: Get the Legal Clients You Need. Call 866.953.4259 to View Cases.

DevOps Engineer57 days ago

Full Time RemoteTeam 51-200Since 1999H1B No Sponsor

Company Site LinkedIn

• Develop and execute a scalable, secure, and efficient DevOps strategy that supports business continuity. • Manage and prioritize the DevOps backlog to balance business value, operational needs, and technical feasibility. • Ensure the reliability, security, performance, and high availability of systems and applications. • Lead the design and continuous improvement of CI/CD pipelines, infrastructure automation, and monitoring solutions. • Integrate Agile and DevOps practices to improve delivery speed, collaboration, and operational efficiency. • Lead, mentor, and develop the DevOps team while fostering a proactive, accountable, and improvement-driven culture. • Set team and individual goals, monitor KPIs, conduct performance reviews, and address skill gaps through training. • Facilitate Scrum ceremonies and remove blockers to help the team achieve sprint goals effectively. • Act as the main coordination point between technical teams, leadership, and stakeholders by communicating priorities, risks, and progress updates. • Drive continuous improvement initiatives by identifying operational gaps, enforcing DevOps best practices, and leveraging AI tools to optimize workflows and risk management.

View details: Agile Technical Delivery Manager – DevOps

Philippines

Apply

Job Closed

Senior Database Reliability Engineer

Job Description

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior Site Reliability Engineer

Mid-Level DevOps Engineer

Senior DevOps Engineer – Aviation, Mission-Critical Systems

Agile Technical Delivery Manager – DevOps