Flip logo
Flip

Empower every employee

Senior Site Reliability Engineer

Location

Germany

Posted

6 days ago

Salary

0

Seniority

Senior

Job Description

Senior Site Reliability Engineer

Flip

• Co-own the architecture: Help drive the architecture and evolution of our cloud infrastructure on Azure and our Kubernetes clusters - designed for high throughput and highest availability - to support Flip's rapid growth across the globe. • Drive the resilience strategy: Define how we approach global scaling, zero-downtime deployments, rollback mechanisms and disaster recovery, and make sure the platform stays available around the clock. • Evolve our observability stack: Improve our LGTM stack (Loki, Grafana, Tempo, Mimir) into a foundation our engineers can trust. • Improve our IaC Platform: Eliminate toil at the source, and make our infrastructure truly self-service for engineering teams. • Lead in incidents: Take a leading role in platform-related major incidents, drive blameless post-mortems for the squad, and translate findings into systemic improvements. • Mentor within the squad: Coach teammates, run RFCs and design reviews inside the team, and help engineers grow into stronger SREs. • Shape our roadmap: Partner with your squad to define the platform's direction.

Job Requirements

  • 5+ years of hands-on experience as a Site Reliability Engineer (SRE), Platform Engineer, DevOps Engineer, Infrastructure Engineer, Cloud Engineer, or Backend Engineer with a strong infrastructure focus.
  • Proven track record building and operating high-throughput, highly available systems in production.
  • Deep, production-level experience with Kubernetes on any Hyperscaler.
  • Strong experience with modern observability stacks (e.g. Prometheus, Mimir, VictoriaMetrics, Dash0, Loki, ELK) and a clear point of view on SLIs, SLOs and error budgets.
  • Solid software development skills in Go (strongly preferred, since our IaC runs on Pulumi in Go) or Python.
  • Hands-on experience with Infrastructure as Code (Pulumi, OpenTofu, Terraform) and GitOps (e.g. ArgoCD) + CI/CD pipeline design.
  • Demonstrated ability to lead complex infrastructure initiatives from design to production - including writing RFCs and driving architecture decisions within your team.
  • Experience mentoring engineers and raising the technical bar within a team.
  • Comfortable owning major incidents end-to-end and turning learnings into systemic change.
  • Strong communication skills and business-fluent English.
  • Willingness to participate in on-call rotations to ensure the reliability of our platform.

Benefits

  • Work mode: We’re remote-first, giving you flexibility to work from home. At the same time, we deeply value the power of in-person collaboration. Depending on the role, you’ll join occasional team events, workshops, or meetings in our Berlin or Stuttgart offices - always with plenty of notice. The exact balance will be discussed during your interview.
  • Work-Life-Balance: We don't want you to grow roots to your desk chair. That's why we cover the costs of your E-Gym-Wellpass membership and offer job bike leasing.
  • Celebrating success: Expect highly motivated and committed people in a relaxed working atmosphere.
  • Be part of something bigger: You actively shape Flip in your role. Along the way, you are an enabler of the rapid growth process of a young tech company and grow towards your goals, fun is guaranteed.
  • Happy to be a Flipster: Stay tuned for regular team events and culture days that bring us together as Flipsters.
  • Working abroad: At Flip you can also work abroad in the European Union. Let's talk about remote work in the interview.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

ContractRemoteTeam 11-50Since 2012H1B No Sponsor

• Migrate containerized applications from Docker Compose to Kubernetes (k3s). • Design, build, and maintain secure CI/CD pipelines in Azure DevOps and GitLab CI/CD or Jenkins. • Integrate security controls into the delivery process, including static code analysis, dependency scanning, container image scanning, secrets detection, image signing, vulnerability management, and release gates. • Support deployment automation, environment promotion, rollback, traceability, and release reliability. • Work closely with development, infrastructure, and security teams to improve automation, resilience, and secure delivery practices. • Troubleshoot and resolve build, deployment, and runtime issues across CI/CD and Kubernetes environments. • Document implemented solutions and contribute to good DevSecOps practices across the team.

Poland
Ensono logo

Senior Consultant, DevOps

Ensono

Ensono delivers complete Hybrid IT solutions, from mainframe to cloud, tailored to each client’s journey.

DevOps Engineer6 days ago
Full TimeRemoteTeam 1,001-5,000H1B Sponsor

• As a Senior DevOps Consultant, you’ll be a key technical contributor in delivering complex projects • You’ll take end-to-end ownership of your deliverables, ensuring they meet high standards of quality, security, and performance • Collaborating closely with colleagues and client teams to achieve project goals • Applying modern DevOps practices to design, build, and optimise solutions that help clients get the most from their cloud platform • Actively involved throughout the full project lifecycle — from refining requirements and influencing solutions, to delivering working systems and communicating outcomes to technical and non‑technical audiences • Providing guidance and informal mentoring to less experienced engineers, helping to raise capability within the team • Contributing to pre‑sales activities such as analysing technical requirements, working with our bid team, and helping develop compelling proposals • Engaging with Ensono Digital’s internal competencies — sharing knowledge, refining our delivery approaches, and expanding your own expertise

Poland
Full TimeRemoteTeam 51-200H1B Sponsor

• Co-owner of the architecture: Help drive the architecture and evolution of our cloud infrastructure on Azure and our Kubernetes clusters — designed for high throughput and maximum availability — to support Flip’s rapid global growth. • Drive the resilience strategy: Define our approach to global scaling, zero-downtime deployments, rollback mechanisms, and disaster recovery, ensuring the platform remains available 24/7. • Evolve our observability stack: Optimize our LGTM stack (Loki, Grafana, Tempo, Mimir) into a foundation that engineers can rely on. • Improve our IaC platform: Eliminate operational toil at the source and make our infrastructure a true self-service for engineering teams. • Incident leadership: Take a leading role in major platform incidents, conduct blameless post-mortems, and turn insights into lasting improvements. • Squad mentoring: Coach team members, lead RFCs and design reviews within the team, and help engineers develop into stronger SREs. • Shape our roadmap: Work collaboratively with your squad to define the direction of the platform.

Germany

Site Reliability Engineer

ESO

ESO is a fast-paced, growing data, technology, and research company passionate about improving community health and safety through the power of data. We pioneer innovative, user-friendly software to meet the changing needs of today’s EMS agencies, fire departments, and hospitals. We’re small enough to be nimble and fun, but big enough to be a great place to work. We serve thousands of customers out of our offices across the US, Canada and Northern Ireland.

DevOps Engineer6 days ago

Role Description The Site Reliability Engineering (SRE) team at ESO is responsible for ensuring the reliability, scalability, and performance of our production systems. We operate at the intersection of engineering and operations, with a strong focus on automation, observability, and continuous improvement. As a Site Reliability Engineer, you will work hands-on with cloud-native systems, supporting production and pre-production environments to maintain system health, improve resiliency, and optimize performance. You’ll partner closely with engineering, infrastructure, and database teams to troubleshoot complex issues, enhance automation, and ensure our services meet reliability and availability expectations. This role is ideal for an engineer who enjoys solving challenging problems, digging into application and database behavior, and continuously improving how systems operate in a fast-paced, high-impact environment. What You’ll Do - Support and maintain production and non-production cloud environments (Cloud Azure/AWS). - Troubleshoot complex, distributed, cloud-based applications to identify root causes and implement durable fixes. - Monitor system health, performance, and reliability using observability tools (e.g., New Relic, ELK and Zabbix). - Investigate application and database performance issues, including writing and optimizing SQL queries. - Participate in incident response, debugging, and post-incident reviews focused on continuous improvement. - Contribute to CI/CD pipelines (e.g., Azure DevOps) to improve automation, reliability, and deployment processes. - Write and maintain automation scripts (PowerShell, bash, Python or similar) to streamline operational workflows. - Collaborate with developers to understand code behavior and support troubleshooting efforts in C#/.NET-based systems. - Help improve reliability standards, documentation, and operational best practices. Qualifications - Hands-on experience working in a cloud environment (Microsoft Azure strongly preferred). - Experience supporting and troubleshooting complex, cloud-native applications in production environments. - Strong understanding of relational databases and solid experience writing and troubleshooting SQL queries. - Ability to read and understand application code (preferably C#/.NET) to support debugging and issue resolution. - Experience working with at least one CI/CD platform (e.g., Azure DevOps). - Familiarity with monitoring and observability tools (e.g., New Relic) and core concepts such as logs, metrics, and traces. - Experience with scripting/automation (PowerShell preferred). - Strong analytical and problem-solving skills with attention to detail. - Clear written and verbal communication skills. Requirements - Passionate about reliability engineering and operational excellence. - Curious and eager to learn, actively seeking feedback and continuously growing your technical skill set. - Coachable and adaptable, able to thrive in a fast-paced and evolving environment. - Comfortable navigating ambiguity and taking ownership of problems through to resolution. - A collaborative team player who values accountability and continuous improvement. Nice to Have - Experience working with Linux-based systems. - Experience working with Kubernetes and container systems. - Exposure to infrastructure-as-code tools (e.g., Terraform). - Familiarity with Git-based version control workflows. Benefits - Competitive health plans (medical, dental, & vision insurance). - PTO (starting at 20 days) & 12 company holidays. - 401(k) with company match. - Telemedicine service provided by ESO. - Savings accounts (FSA, HSA, DCA). - Employee Assistance Program (EAP). - Annual health and wellness reimbursement. - Peace of mind benefits such as life insurance, disability insurance, and worksite benefits. - Paid parental leave, new child program, & flexible parental return-to-work options. - Casual office environments and unlimited office snacks and drinks.

United States