Mastering Cloud. Accelerating Business.
Senior Site Reliability Engineer – Kubernetes Platform
Location
Germany
Posted
61 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer – Kubernetes Platform
SysEleven GmbH
• Design and implement observability solutions using Prometheus, Loki and Mimir • Analyze, troubleshoot and further develop proprietary Kubernetes controllers • Develop and maintain production applications • Operate, automate and continuously evolve the MKA platform • Enhance internal tooling solutions
Job Requirements
- Experience operating highly available, business-critical applications in cloud and on-premises environments
- Strong Kubernetes knowledge
- Experience in cluster management
- Experience with GitOps principles for deployment and delivery workflows
- Experience with Infrastructure as Code, particularly Terraform
- Good skills in Bash and/or Python
- Understanding of CI/CD pipelines
- Very good German and good English skills (B2+)
Benefits
- Deep hands-on Kubernetes experience
- Freedom to solve challenges
- Opportunities to share knowledge and continuously learn
- Collaborative team environment
- Internal show-and-tell sessions
- Attendance at conferences such as KubeCon or Container Days
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Collaborate with cross-functional teams to assist with design and assist with implementing efficient DevSecOps pipelines tailored to Power & Controls embedded software development • Help develop automated test, continuous integration, and continuous delivery processes to enhance efficiency and accelerate software development cycles • Help select, configure, and integrate tools to enhance software development flow, optimizing productivity and performance • Work closely with the DevOps Manager and Enterprise Digital Product Life Cycle (DPLC) team to understand and help adapt DevOps practices • Help design, develop, and maintain dashboards displaying DevSecOps metrics, measures, performance, and continuous improvement
• Design and maintain container-based deployment setups using Docker and related technologies • Define repeatable deployment and runtime models across cloud and hybrid environments • Build and maintain deployment and environment automation to ensure reliable and traceable operations • Support development teams with CI/CD pipelines, deployment automation, and release processes • Design and improve observability solutions (monitoring, logging, tracing) for reliable operations • Advise on infrastructure aspects such as networking, security, hardware requirements, and scalability • Work closely with engineering and product teams to ensure operability is considered early • Use AI-assisted tools to improve operational efficiency while critically validating their outputs
• Monitor, maintain, and improve system availability in a cloud production environment. • Ensure the stability and availability of cloud production systems. • Perform monitoring, alerting, and incident response. • Automate recurring operational tasks and contribute to infrastructure improvements. • Troubleshoot complex issues related to performance, system reliability, networking, and service integrations. • Collaborate with development and operations teams to enhance system performance and reduce operational risks. • Participate in on-call rotations and continuous improvement initiatives.
• Manage and administer our operations tools (e.g., Sentry, CheckMK, GitLab) • Ensure monitoring via CheckMK and continuously develop it • Maintain and enhance our Proxmox and Azure systems • Support the on-premises and corporate networks (routing, port forwarding, VLAN management, VPNs) • Administer access rights and implement automated access management concepts • Design and implement improvements to network architecture and security • Adapt and further develop GitLab CI/CD pipelines • Maintain and optimize AWX playbooks (Ansible) • Implement Infrastructure as Code (Terraform, Ansible) • Administer permissions (Entra ID / Azure AD, third-party applications)




