Job Closed
This listing is no longer active.
Let's grow together and unlock opportunities
System Reliability Engineer – DevOps
Location
Poland
Posted
89 days ago
Salary
0
Seniority
Senior
Job Description
System Reliability Engineer – DevOps
Growe
• Lead incident response, perform root cause analysis, and implement recovery and long-term fixes; • Manage infrastructure using Terraform, Terragrunt, and automation tools for consistency and repeatability; • Support vulnerability management; • Monitor resources, forecast growth, and implement scaling strategies; • Participate in 24/7 rotations for timely resolution of critical incidents;
Job Requirements
- 3+ years in a DevOps, SRE, or related role;
- Strong hands-on experience with AWS services including EC2, ECS, EKS, RDS, DocumentDB, ElastiCache, Keyspaces, S3, EBS, VPC, Route53, KMS, ACM, and CloudWatch;
- Proficiency with Terraform, Terragrunt, and Atlantis for reproducible and version-controlled infrastructure;
- Experience with GitLab CI, FluxCD, Argo Rollouts, and automation tools (Ansible, Python, Bash);
- Solid experience with Docker, Kubernetes (AWS EKS), and Helm (including custom templates, ChartMuseum);
- Familiarity with cluster add-ons such as KEDA, VPA, Karpenter, External-DNS, ingress-nginx, aws-alb-controller, and ebs-csi-driver;
- Hands-on experience with Grafana, VictoriaMetrics stack, Tempo, metrics exporters, Pingdom, AWS CloudWatch, and alerting systems like PagerDuty, VMAlert, and Alertmanager;
- Proficiency with Grafana Loki, OpenSearch, and Vector Agent for centralized logging;
- Strong understanding of networking concepts, AWS networking (VPC, Network Firewall, Transit Gateway, Site-to-Site VPN), identity and access management, certificate management (ACM, Vault, SOPS), and application security best practices;
- Familiarity with Cloudflare services, including caching, DNS, and Workers;
- Exposure to AWS Cost Explorer, KubeCost, and custom cost export tools;
- Certifications: AWS, Terraform, Kubernetes, or Helm are a plus.
Benefits
- Ensure availability, performance, and scalability of infrastructure and services through monitoring, automation, and operational best practices;
- Lead incident response, perform root cause analysis, and implement recovery and long-term fixes;
- Manage infrastructure using Terraform, Terragrunt, and automation tools for consistency and repeatability;
- Implement and maintain metrics, logs, and tracing solutions (Prometheus, Grafana, Loki, VictoriaMetrics, CloudWatch) to ensure system visibility;
- Identify bottlenecks, tune systems, and improve infrastructure performance;
- Monitor resources, forecast growth, and implement scaling strategies;
- Integrate security best practices into IaC, CI/CD pipelines, and deployments;
- Support vulnerability management;
- Participate in 24/7 rotations (once a week) for timely resolution of critical incidents;
- Work with DevOps, PRE, development, and security teams to improve reliability and design resilient systems;
- Maintain operational runbooks, incident reports, and system documentation.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Design, implement, and automate components of large-scale distributed cloud systems. • Implement and support PAM solutions primarily on OpenStack, ensuring secure and reliable access management. • Build tools, automation, and workflows to improve availability, scalability, latency, and operational efficiency. • Work closely with engineering and delivery teams to deploy high-quality software in a fast-paced environment. • Monitor production and development environments and implement preventive and corrective measures to ensure platform reliability. • Participate in incident response, debugging, and root cause analysis for production issues. • Collaborate across teams to deliver consistent and reliable solutions aligned with. • Document designs, operational procedures, and troubleshooting guides clearly and effectively. • Contribute to improvements in reliability metrics such as availability, MTTD, and MTTR.
Senior DevOps Engineer
TruvTruv empowers businesses to make confident decisions. Truv is a one-stop income and employment verification solution.
• Architect and scale our AWS infrastructure, including container orchestration, autoscaling, networking, and cost optimization • Build our observability and alerting platform from the ground up. You'll own it from design through production deployment • Lead infrastructure builds for compliance (SOC 2, HIPAA). We need someone who scopes, builds, and ships, not just advises • Harden container workloads and secrets management across production, staging, and isolated compliance environments • Own the shared infrastructure stack (Postgres, Redis, Celery). Find bottlenecks, fix them, and add capacity before they become incidents • Build and maintain CI/CD pipelines, optimizing for deploy speed, reliability, and security • Extend our Terraform codebase to keep environments reproducible and audit-ready. We ship IaC changes weekly, not quarterly • Define and own our reliability practices: SLOs, incident response, post-mortems, and the production tooling to back them up • Unblock engineering teams by reducing deploy friction, improving dev environments, and eliminating toil • Share on-call with a small team. When things break, you lead the response, run the post-mortem, and make sure the fix ships
Senior NixOS, DevOps Engineer
virtual7 GmbHWir gestalten die digitale Zukunft Deutschlands. Finde deine Berufung - Wachse mit virtual7.
• As a Senior NixOS / DevOps Engineer (m/f/d), you will support our clients in building modern, declarative, and highly automated software and infrastructure processes. • Consulting and implementation around Nix and NixOS: from architecture reviews and the design of modern system landscapes to the production rollout of declarative environments. • Analyze and optimize existing software architectures, source code, and development processes with a focus on quality, efficiency, and reproducibility. • Design, implement, and evolve CI/CD pipelines (e.g., GitLab, Nix Hydra), including automated build, test, and deployment pipelines. • Introduce and secure reproducible builds and modern build systems (e.g., CMake, Meson) in complex enterprise environments. • Build and maintain automated testing environments (unit, integration, and HIL tests) to ensure stable, test-driven development workflows. • Support developer and DevOps teams in adopting declarative, test- and quality-oriented working practices.
• Working towards improving the system's non-functional qualities, including availability, scalability, security, and durability. • Engaging in Scrum-based work management by participating in team meetings and discussions. • Creating automation and process improvements. • Supporting and developing new tools used by our Engineering department. • Monitor systems and create infrastructure documentation. • Participating in planning and taking full ownership of our initiatives. • Working with our tech stack, including GCP - GKE, Cloud SQL, Memorystore Redis, GCS, VMs, Cloud Run, PSC, Artifacts Registry, Secret Manager, CDN, K8S and Helm, Argo CD and Argo Workflows, Pulumi, Istio, Grafana stack, Gitlab.




