Empower every employee
Site Reliability Engineer
Location
Germany
Posted
42 days ago
Salary
0
Seniority
Junior
Job Description
Site Reliability Engineer
Flip
• As a Site Reliability Engineer on our Platform Squad, you will play a key role in keeping Flip's infrastructure fast, resilient, and ready to scale. • You will shape the reliability culture, tools, and practices that enable our engineering teams to ship with confidence—at scale and without compromising availability. • This role is ideal for an engineer passionate about building high-throughput, highly available systems who wants to help define how a fast-growing SaaS platform operates in production. • Enable scaling: expand and optimize our cloud infrastructure on Azure and our Kubernetes clusters—designed for high throughput and maximum availability—to support Flip’s rapid global growth. • Ensure resilience & security: design and implement zero-downtime deployments, rollback mechanisms, and disaster-recovery strategies that keep our platform available around the clock. • Build observability: evolve our LGTM stack (Loki, Grafana, Tempo, Mimir) to provide every team with the visibility they need—and use it to define and optimize our SLOs. • Automate everything: design, develop, and optimize Infrastructure as Code with Pulumi in Go to eliminate manual toil and provide our platform to engineering teams as self-service. • Drive reliability practices: promote CI/CD best practices, incident management, post-mortems, and developer experience across the engineering organization. • Shape our roadmap: work with your squad and engineering leadership to define the platform direction—from scalable high-throughput systems and cost optimization to security posture and compliance.
Job Requirements
- 1–3 years of hands-on experience as a Site Reliability Engineer (SRE), Platform Engineer, DevOps Engineer, Infrastructure Engineer, Cloud Engineer, or Backend Engineer with a strong infrastructure focus.
- Experience operating and scaling cloud infrastructures (Azure, GCP, AWS).
- Deep knowledge of Kubernetes and container orchestration in production environments.
- Hands-on experience with modern observability stacks (e.g., Prometheus, Mimir, Loki, ELK) and familiarity with defining and operating SLOs and error budgets.
- Solid software development skills in Go (preferred, since our IaC runs on Pulumi in Go), Python, or Kotlin.
- Hands-on experience with Infrastructure as Code (e.g., Pulumi, OpenTofu, Terraform) and configuration tools (e.g., Ansible, Chef).
- A collaborative mindset, strong communication skills, and business-fluent English.
- Willingness to participate in on-call rotations to ensure the reliability of our platform.
Benefits
- Work mode: We are remote-first, giving you the flexibility to work from home. At the same time, we value the benefits of in-person collaboration. Depending on the role, you will occasionally attend team events, workshops, or meetings at our offices in Berlin or Stuttgart—always with sufficient notice. The exact balance will be discussed transparently during your application process.
- Work–life balance: We don’t want you to be glued to your desk, so we cover the cost of your E-Gym/Wellpass membership and offer company bike leasing (JobRad).
- Celebrate successes: You’ll work with highly motivated, committed people in a relaxed work atmosphere.
- Be in the action: You will actively shape Flip. Along the way, you’ll enable the rapid growth of a young tech company and grow with your goals. Positive atmosphere guaranteed.
- Happy to be a Flipster: Look forward to regular team events and Culture Days that bring us together as Flipsters.
- Work abroad: At Flip you can also work from other European countries—let’s talk about workation during the interview.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior DevOps Engineer, German
RecruityTalentConnecting top IT and Executive talents with great companies in EMEA/LATAM through tailored recruitment solutions.
• Design, build, and maintain CI/CD pipelines and automation processes • Manage and optimize cloud infrastructure and deployment environments • Monitor system performance, availability, and security • Collaborate with development teams to improve deployment workflows • Implement infrastructure as code (IaC) using modern tools and best practices • Troubleshoot complex system and deployment issues • Ensure high availability, scalability, and reliability of applications • Communicate with stakeholders and teams in German and English • Mentor junior engineers and contribute to DevOps best practices
Sagemaker DevOps Engineer
Xenon SevenHuman Experts Implementing Artificial Intelligence #AI #ArtificialIntelligence #HumanIntelligence
• Build DevOps automations to setup Sagemaker Unified Studio for enterprise • Implement Sagemaker Lifecycle configurations • Create CICD pipelines for end-users to deploy custom Docker images & Kernels in Sagemaker • Build alert & monitoring capabilities for Sagemaker projects to control costs and service quotas • MLOps automations for model and infrastructure deployments to higher environments
• Setting up and enhancing CI (Continuous Integration) and CD (Continuous Deployment). • Configuring the company's products to meet the functional requirements, including the configuration of functional areas and technical areas. • Troubleshooting and remediating issues impacting the integration and operations of the infrastructure and systems. • Ensuring high availability of the company’s products and platforms [24x7x365]. • Writing and deploying scripts in different environments to automate day-to-day operations. • Maintaining servers’ configuration, monitoring jobs, and infrastructure documentation across the cloud environment.
Ingeniero de Confiabilidad del Sitio – SRE
PERCEPTIO S.A.S.Consultoría SAP🔷SAP Cloud ERP🔷Expert Hub🔷Centro de atención en la nube🔷SAP Business IA
• Ingeniero de Confiabilidad del Sitio (SRE) manejando tecnologías como Grafana, Prometheus y AWS • Colaborar en la evolución y transformación tecnológica




