Empower every employee
Site Reliability Engineer
Location
Germany
Posted
40 days ago
Salary
0
Seniority
Junior
Job Description
Site Reliability Engineer
Flip
• Further expand and optimize our cloud infrastructure on Azure and our Kubernetes clusters - designed for high throughput and highest availability - to support Flip's rapid growth across the globe. • Design and implement zero-downtime deployments, rollback mechanisms and disaster-recovery strategies that keep our platform available around the clock. • Evolve our LGTM stack (Loki, Grafana, Tempo, Mimir) to give every team the visibility they need - and use it to define and optimize our SLOs. • Design, develop and optimize infrastructure as code with Pulumi in Go, eliminating toil and making our platform self-service for engineering teams. • Promote CI/CD best practices, incident management, post-mortems and developer experience across the entire engineering organization. • Collaborate with your squad and engineering leadership to define the platform's direction - from scalable, high-throughput systems and cost optimization to security posture and compliance.
Job Requirements
- You have 1–3 years of hands-on experience as a Site Reliability Engineer (SRE), Platform Engineer, DevOps Engineer, Infrastructure Engineer, Cloud Engineer, or Backend Engineer with a strong infrastructure focus.
- Experience operating and scaling cloud infrastructures (Azure, GCP, AWS).
- Deep knowledge of Kubernetes and container orchestration in production environments.
- Hands-on experience with modern observability stacks (e.g. Prometheus, Mimir, Loki, ELK) and comfortable defining and operating SLOs and error budgets.
- Solid software development skills in Go (preferred, since our IaC runs on Pulumi in Go), Python or Kotlin.
- Hands-on experience with infrastructure as code (e.g. Pulumi, OpenTofu, Terraform) and configuration tooling (e.g. Ansible, Chef).
- A collaborative mindset, strong communication skills and business-fluent English.
- Willingness to participate in on-call rotations to ensure the reliability of our platform.
Benefits
- Work mode: We’re remote-first, giving you flexibility to work from home. At the same time, we deeply value the power of in-person collaboration. Depending on the role, you’ll join occasional team events, workshops, or meetings in our Berlin or Stuttgart offices - always with plenty of notice. The exact balance will be discussed during your interview.
- Work-Life-Balance: We don't want you to grow roots to your desk chair. That's why we cover the costs of your E-Gym-Wellpass membership and offer job bike leasing.
- Celebrating success: Expect highly motivated and committed people in a relaxed working atmosphere.
- Be part of something bigger: You actively shape Flip in your role. Along the way, you are an enabler of the rapid growth process of a young tech company and grow towards your goals, fun is guaranteed.
- Happy to be a Flipster: Stay tuned for regular team events and culture days that bring us together as Flipsters.
- Working abroad: At Flip you can also work abroad in the European Union. Let's talk about remote work in the interview.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• As a Site Reliability Engineer on our Platform Squad, you will play a key role in keeping Flip's infrastructure fast, resilient, and ready to scale. • You will shape the reliability culture, tools, and practices that enable our engineering teams to ship with confidence—at scale and without compromising availability. • This role is ideal for an engineer passionate about building high-throughput, highly available systems who wants to help define how a fast-growing SaaS platform operates in production. • Enable scaling: expand and optimize our cloud infrastructure on Azure and our Kubernetes clusters—designed for high throughput and maximum availability—to support Flip’s rapid global growth. • Ensure resilience & security: design and implement zero-downtime deployments, rollback mechanisms, and disaster-recovery strategies that keep our platform available around the clock. • Build observability: evolve our LGTM stack (Loki, Grafana, Tempo, Mimir) to provide every team with the visibility they need—and use it to define and optimize our SLOs. • Automate everything: design, develop, and optimize Infrastructure as Code with Pulumi in Go to eliminate manual toil and provide our platform to engineering teams as self-service. • Drive reliability practices: promote CI/CD best practices, incident management, post-mortems, and developer experience across the engineering organization. • Shape our roadmap: work with your squad and engineering leadership to define the platform direction—from scalable high-throughput systems and cost optimization to security posture and compliance.
Senior DevOps Engineer, German
RecruityTalentConnecting top IT and Executive talents with great companies in EMEA/LATAM through tailored recruitment solutions.
• Design, build, and maintain CI/CD pipelines and automation processes • Manage and optimize cloud infrastructure and deployment environments • Monitor system performance, availability, and security • Collaborate with development teams to improve deployment workflows • Implement infrastructure as code (IaC) using modern tools and best practices • Troubleshoot complex system and deployment issues • Ensure high availability, scalability, and reliability of applications • Communicate with stakeholders and teams in German and English • Mentor junior engineers and contribute to DevOps best practices
Sagemaker DevOps Engineer
Xenon SevenHuman Experts Implementing Artificial Intelligence #AI #ArtificialIntelligence #HumanIntelligence
• Build DevOps automations to setup Sagemaker Unified Studio for enterprise • Implement Sagemaker Lifecycle configurations • Create CICD pipelines for end-users to deploy custom Docker images & Kernels in Sagemaker • Build alert & monitoring capabilities for Sagemaker projects to control costs and service quotas • MLOps automations for model and infrastructure deployments to higher environments
• Setting up and enhancing CI (Continuous Integration) and CD (Continuous Deployment). • Configuring the company's products to meet the functional requirements, including the configuration of functional areas and technical areas. • Troubleshooting and remediating issues impacting the integration and operations of the infrastructure and systems. • Ensuring high availability of the company’s products and platforms [24x7x365]. • Writing and deploying scripts in different environments to automate day-to-day operations. • Maintaining servers’ configuration, monitoring jobs, and infrastructure documentation across the cloud environment.



