Job Closed
This listing is no longer active.
Your Single Backup and Data Management Platform for Cloud, Virtual and Physical
Manager, Site Reliability Engineering
Location
Czechia
Posted
143 days ago
Salary
0
Seniority
Lead
Job Description
Manager, Site Reliability Engineering
Veeam Software
• Hire, onboard, and grow your SRE team; coach career development and performance • Foster a psychologically safe, blameless culture that favors learning over blame and emphasizes engineering over firefighting • Ensure a sustainable operational coverage; monitor on-call health and workload • Track and cap toil so engineers spend the majority of time on project work that reduces future toil • Establish and operationalize SLIs/SLOs and error budgets with service owners; run reliability reviews and hold teams accountable to outcomes • Define reliability standards, runbooks, readiness checklists, and alerting patterns (including SLO-based alerting) • Partner with product/EMs to align reliability work with service goals and customer experience, not as a gate but as an enabler • Ensure incident response readiness; lead/coordinate major incidents; drive fast, high-quality postmortems and systemic fixes • Measure MTTR, change failure rate, SLO posture, and repeat-incident reduction; publish learning broadly • Lead software-first reliability investments: observability, deployment safety (canary/blue-green), resilience testing/chaos, and self-service guardrails • Drive platform improvements (IaC, CI/CD, Kubernetes) and internal tools that scale operations and improve developer experience
Job Requirements
- 7+ years in Software, Platform, and/or Reliability Engineering with 2+ years managing engineers
- Demonstrable experience leading engineering teams to predictably deliver outcomes
- Experience leading cross-functional initiatives collaboratively with peers through influence
- Experience with public cloud (Azure preferred), Kubernetes, IaC (Terraform, Pulumi), CI/CD (Github Actions, ArgoCD, Azure DevOps), and observability (OpenTelemetry, Elastic, Datadog, Prometheus, Grafana)
- Coding background with experience improving service reliability
- Hands-on incident management and postmortem practice; excellent cross-geo communication
- Willingness to participate in an on-call rotation (typically during daytime hours, including weekends/holidays)
Benefits
- 25 vacation days, 4 sick days, 21 paid medical leave days, plus 4 extra global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares
- Premium private medical insurance for employees and dependents
- Daily meal vouchers for restaurants and groceries (180 CZK per working day)
- Flexible cafeteria platform with thousands of lifestyle benefit options
- Multisport Card for gym and wellness, with family add-on options
- Annual public transport reimbursement up to a set limit
- Corporate mobile plan with optional family tariff
- Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops and learning events like our annual Global Day of Learning
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior DevSecOps Engineer
AiraloWorld’s first eSIM store that gives you access to eSIMs for 200+ countries worldwide at affordable prices.
• Design, implement, and manage security solutions across the entire software development lifecycle (SDLC), with a focus on automation and continuous integration/continuous delivery (CI/CD) pipelines, including robust API security measures and authentication protocols. • Champion security best practices within engineering, DevOps, SRE, and IT teams, fostering a culture of shared responsibility for security. • Proactively identify and remediate security vulnerabilities in applications, mitigating OWASP Top 10 vulnerabilities, infrastructure, and cloud services through threat modeling, vulnerability assessments, and penetration testing. • Develop and maintain security monitoring and alerting solutions to detect and respond to potential security incidents in real-time and prevent common cyber attacks such as DDoS, injection attacks, and credential stuffing. • Define and enforce secure coding standards and provide training and mentorship to development teams on DevSecOps principles. • Lead compliance initiatives by contributing to security policies, controls, and audit readiness for SOC 2, ISO 27001, GDPR, and other relevant regulations.
• Drive the consolidation of environments, frameworks, and toolsets across PowerTrack, Athena, and Locus platforms • Develop and execute a roadmap for platform standardization, reducing technical debt and operational complexity • Establish unified CI/CD pipelines, deployment patterns, and release processes across teams • Standardize Infrastructure-as-Code practices, module libraries, and configuration management approaches • Consolidate observability tooling and establish consistent monitoring, logging, and alerting standards across all platforms • Define and enforce common security baselines, compliance controls, and operational procedures • Create reference architectures and golden paths that teams can adopt for common use cases • Lead migration efforts to move legacy or divergent systems onto standardized platforms • Document architectural decisions (ADRs) and maintain living documentation for platform standards • Define and drive architectural standards, patterns, and best practices across teams • Mentor and guide DevOps engineers; conduct architecture reviews and provide technical direction • Collaborate with product managers on platform lifecycle decisions including maintenance, modernization, and retirement • Facilitate evaluation and selection of software products, services, and tooling standards • Build consensus across teams and drive adoption of unified approaches
Senior Data Engineer – AWS, Cloud DevOps
Tecla TEntregamos muito mais do que tecnologia. Entregamos transformação.
• Develop data pipelines on AWS • Automate infrastructure and deployments (IaC) • Implement Cloud DevOps best practices • Ensure scalability, security, and observability • Operate in mission-critical data environments • Work with global teams
• Design, implement, and maintain AWS infrastructure using Terraform and Terragrunt • Own Infrastructure as Code patterns, module design, and environment consistency • Build and maintain CI/CD pipelines for infrastructure and application deployments • Support production systems with a strong focus on reliability, performance, and security • Monitor systems, troubleshoot incidents, and drive root cause analysis • Improve observability through logging, metrics, and alerting • Partner with engineering teams to enable scalable and repeatable deployments • Identify and implement cost optimization opportunities across AWS services • Enforce best practices around IAM, networking, and security controls



