1KOMMA5°

Immer der günstigste und sauberste Strom!

Senior Site Reliability Engineer – Platform & Agentic Operations

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 1,001-5,000Since 2021H1B No SponsorCompany Site LinkedIn

Location

Germany

Posted

5 days ago

Salary

Seniority

Senior

Bachelor Degree6 yrs expEnglishCloud Google Cloud Platform Python TypeScript

Job Description

• Implement and improve monitoring, alerting, and incident response systems and processes to ensure high reliability for our customers and meet defined SLOs • Design, build, and maintain resilient, scalable infrastructure utilizing SRE principles and best practices • Attend post-incident reviews, detect patterns and contribute to continuous improvement efforts • Execute performance testing , analyze system bottlenecks, and formulate strategies for capacity planning to ensure our systems meet current and future demands effectively • Build systems where CI/CD test failures serve as immediate, real-time context for agents , enabling them to analyze logs, trace dependencies, and suggest or apply instant code fixes.

Job Requirements

6+ years in SRE, DevOps, or Platform Engineering
Strong understanding and practical application of Site Reliability Engineering (SRE) principles, methodologies, and best practices
Proficiency in programming/scripting languages such as Python, GoLang or TypeScript
Practical understanding of integrating LLMs into automated workflows. You know how to feed live system state (like a fresh CI test failure) into an agent as actionable context.
Prior experience in incident management, post-incident reviews, and implementing improvements to prevent future incidents
Ability to troubleshoot complex technical issues systematically and effectively
Good experience working with a public cloud provider, ideally Google Cloud Platform (GCP), and a solid understanding of its observability services
A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
Excellent communication skills to convey technical concepts and collaborate effectively with diverse teams
Very good knowledge of spoken and written english, german is a plus
Residency in Germany

Benefits

You are part of an international, dynamic, and highly motivated team of people who have proven to make things happen
With your work, you accelerate the "energy transition" and hence have a direct impact on our climate
Work with and learn from other super-smart colleagues
You will enjoy direct contact with core decision-makers
You will enjoy the best chances of entering full-time in one of Europe’s most thriving scaleups
You work remotely (Germany-wide), with offices in Hamburg, Berlin or Munich
Create a healthy balance alongside your work and enjoy all the benefits of the EGYM Wellpass
Benefits and discounts are yours with Futurebens
Whether city bike or e-bike - be flexible with our job bike leasing and do something good for the environment at the same time

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

DevOps Engineer

Optiplay

DevOps Engineer5 days ago

Part Time Remote

Role Description We are launching our entire platform from scratch — fully cloud-native, automated, and built on top of DevOps and Infrastructure-as-Code principles. We are a small and fast-growing engineering team, and we are looking for a Senior DevOps Engineer who will own the whole cloud infrastructure and shape the technical direction of the platform. What you will build: - Build our entire AWS cloud infrastructure from scratch following automation-first, HA, and security-by-design principles; - Design and operate Kubernetes environments; create and maintain custom Helm charts; - Implement Infrastructure-as-Code using modular Terraform structures; - Design and maintain CI/CD pipelines and GitOps delivery workflows; - Build secure networking architecture: VPC, VPN, WAF, firewalls, ingress controllers; - Set up monitoring, logging, alerting, and tracing (Prometheus / Grafana / Loki / Tempo); - Deploy and maintain infrastructure for PostgreSQL, ClickHouse, Redis, RabbitMQ; - Define solutions for container registry and artifact management; - Ensure system scalability, reliability, observability, and security; - Participate in architectural discussions and influence OptiPlay’s platform engineering strategy. Qualifications - 4+ years of experience in DevOps, Cloud, or Platform Engineering; - AWS production experience: VPC, VPN, WAF, firewalls, multi-environment setups, networking fundamentals; - Kubernetes: strong understanding of internals, Helm, container runtime ecosystem; - Terraform: modular IaC design, reusable infrastructure patterns, infra from scratch; - CI/CD & GitOps: GitLab CI, ArgoCD, automated deployment workflows; - Monitoring & Observability: Prometheus, Grafana, Loki, Tempo, or similar; - Datastores & Messaging: PostgreSQL, ClickHouse, Redis, RabbitMQ; - Networking & Proxying: NGINX, HAProxy, Traefik, ingress controllers; - Automation & Scripting: Bash, Python (or similar); - Artifact Management: modern registries like ECR, Harbor, Nexus, etc; - Experience designing infrastructure for high-load, distributed, real-time systems. Requirements - Nice to have: Terragrunt; - SSL4SaaS; - Experience with GCP or Azure; - Multi-cluster or multi-region architecture; - Deep Kubernetes ecosystem experience (operators, CRDs, service mesh). Benefits - 21 vacation days + 5 extra day-offs annually; - 12 paid sick days; - Fully remote format — work from anywhere you feel productive; - Flexible schedule: start your day anytime between 08:00–11:00 CET; - Fixed budget for health insurance and gym/fitness; - Provided all required work equipment; - Zero bureaucracy and direct communication with founders and C-level; - Minimal meetings, async-friendly workflow; - Startup energy: fast motion, creativity, and tight-knit communication; - Business trips and team meetups several times per year; - Multiple salary payout options (flexible formats); - …And many more perks unlocked after we hit break-even.

View details: DevOps Engineer

CET (UTC+1)

Apply

Job Closed

DevOps Engineer

Torq

No-code Security Automation

DevOps Engineer5 days ago

Full Time RemoteTeam 51-200Since 2020H1B No Sponsor

Company Site LinkedIn

• Champion a DevOps mindset: Identify friction points and implement scalable automation to improve reliability, delivery speed, and developer experience. • Operate and optimize production environments: Use infrastructure as code and modern observability tools to ensure performance and reliability. • Empower development teams: Build intuitive internal tools and foster a culture of self-service within R&D. • Lead deployments end-to-end: From architecture to implementation, manage deployments across multiple regions and environments. • Collaborate globally: Work closely with an Israel-based DevOps team, maintaining strong communication and ownership despite time zone differences.

AWS Cloud Docker Google Cloud Platform Grafana Jenkins Kubernetes Microservices Prometheus Python Terraform Go

View details: DevOps Engineer

Texas

Apply

DevOps Engineer

SOSi

Challenge Accepted

DevOps Engineer5 days ago

Full Time RemoteTeam 1,001-5,000Since 1989H1B No Sponsor

Company Site LinkedIn

• Create and maintain environment-specific Kubernetes configuration packages (e.g., Helm values files, YAML manifests) that incorporate STIG-aligned hardening and RBAC policy enforcement per IL environment. • Implement and manage secrets injection strategies using FedRAMP-compliant tools (e.g., AWS Secrets Manager, Azure Key Vault), ensuring compatibility with automated deployment pipelines. • Integrate CI/CD pipelines to enable secure, automated deployment and rollback of ArcGIS components and related services, supporting ongoing agility and compliance across environments. • Coordinate with Kubernetes and Data Layer Engineers to ensure version control, pipeline integrity, and integration testing for all containerized services deployed under WO-003. • Coordinate with the WO-009 infrastructure team to provide estimated cloud resource usage profiles for Kubernetes-based deployments in IL2, IL4, and IL5 environments. • Supply container and workload specifications to the WO-009 infrastructure team to support provisioning, tagging, and prepay or reservation planning Implement resource tagging in alignment with program-wide cloud governance policy and reconcile actual usage against estimates provided by WO-009. • Support cost optimization efforts led by WO-009 by adjusting workloads and scaling strategies in response to performance and utilization feedback.

Ansible AWS Azure Cloud Docker Google Cloud Platform Grafana Jenkins Kubernetes Microservices Prometheus Terraform Vault

View details: DevOps Engineer

United States

Apply

Senior DevOps Engineer/Site Reliability Engineer

Stellar Cyber

Empowering lean security operations teams of any skill to successfully secure their environments. WE ARE HIRING!

DevOps Engineer5 days ago

Full Time RemoteTeam 51-200H1B Sponsor

Company Site LinkedIn

• Administer and maintain Kubernetes clusters and containerized workloads. • Manage cloud infrastructure across OCI, AWS, GCP, or Azure environments. • Develop and maintain CI/CD pipelines for reliable application deployments. • Implement and manage Infrastructure as Code (IaC) using Terraform and Helm. • Build automation tooling and operational workflows using Python, Go, or Bash. • Drive observability initiatives including monitoring, logging, tracing, and alerting improvements. • Monitor, troubleshoot, and resolve production incidents while participating in on-call rotations. • Support and optimize distributed data platforms including Kafka, Elasticsearch, Spark, Redis, and MongoDB. • Improve platform reliability, scalability, and operational efficiency using SRE best practices. • Collaborate with cross-functional teams across multiple time zones. • Perform Linux system administration and networking troubleshooting. • Contribute to incident response processes, postmortems, and reliability improvements. • Support GitOps and deployment workflows using tools such as ArgoCD and GitHub Actions. • Evaluate and implement AI-assisted operational tooling for auto-remediation, alert correlation, and operational intelligence.

AWS Azure Cloud Distributed Systems Docker ElasticSearch Google Cloud Platform Grafana Kafka Kubernetes Linux MongoDB Prometheus Python Redis Spark Terraform Go

View details: Senior DevOps Engineer/Site Reliability Engineer

New York

$165K - $215K / year

Apply

Senior Site Reliability Engineer – Platform & Agentic Operations

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DevOps Engineer

DevOps Engineer

DevOps Engineer

Senior DevOps Engineer/Site Reliability Engineer