SRE
Location
Europe
Posted
1 day ago
Salary
0
Seniority
Mid Level
No structured requirement data.
Job Description
SRE
Acclaim
Role Description We currently have several large-scale projects and are expanding our infrastructure team. Our product is an advanced platform for creating and managing AI agents. It can be deployed directly inside a customer’s infrastructure and delivered as an enterprise solution, while also being available as a SaaS version. Under the hood, there is real-time voice and telephony, GPU and LLM inference, streaming analytics, and all of this runs both in the cloud and on-prem, including in banking environments. There is a lot of infrastructure; it is complex, interesting, and sometimes at the edge of what is possible. That is why we are looking for a strong SRE who, like us, cares about making systems transparent, reliable, and built the right way. This is a role for a strong, independent engineer. A Senior SRE with real influence and a voice in how things are built and operated. You will also handle DevOps tasks for the team, but your main focus and area of expertise should be SRE: reliability, observability, incident management, and performance under load. Qualifications - 5+ years in SRE/DevOps. - Deep, practical understanding of Docker and Kubernetes. - Mature understanding of metrics and alerts. - Practical experience with Prometheus, Alertmanager, and Grafana. - Experience with SLIs/SLOs, reliability management, incident investigation, and postmortems. - Experience with load testing and basic capacity planning. - Python programming skills. - Cloud experience with GCP and/or AWS. - DevOps fundamentals: CI/CD and infrastructure as code. - Ownership mindset. - Strong communication with developers. - Willingness and ability to mentor, teach, and share knowledge. - Analytical mindset. - Proactivity. - Strong attention to detail and reliability. Requirements - Experience using AI agents for routine and recurring tasks (nice to have). - Real-time telephony: SIP, FreeSWITCH, RTP, WebRTC (nice to have). - GPU/ML serving: Triton, vLLM, RunPod, Nebius, Lambda, run:ai, DCGM (nice to have). - Streaming data and analytics: Kafka, ClickHouse (nice to have). - Deep experience with IaC and GitOps (nice to have). - Experience working in isolated and highly secure environments (nice to have). - Experience preparing systems for significant growth in load (nice to have). Responsibilities - Responsible for the reliability of our services: SLIs/SLOs, availability, and identifying and eliminating bottlenecks. - Set up monitoring for services, metrics, alerts, and dashboards. - Build and maintain Grafana dashboards. - Run load testing, analyze results, and provide recommendations. - Investigate incidents, participate in on-call rotations, and write postmortems. - Work closely with developers to communicate and defend your position. - Develop and support Kubernetes-based infrastructure across clouds. - Take part in delivering and supporting the platform for customers. - Mentor colleagues and help raise the engineering bar across the team. Benefits - The team has built award-winning AI products for tech corporations. - Cutting-edge tech stack: Speech Technologies, NLP, Generative AI. - High engineering bar and real ownership. - Fast career progression. - Startup pace with enterprise stability. - Fully remote across Europe. - 21 vacation days + public holidays + 5 sick days. - Private English lessons via Preply.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior DevOps Engineer
KyribaTransform how you use liquidity as a dynamic vehicle for growth and value creation.
• Contribute to infrastructure automation across the AWS platform using Terraform Enterprise, Ansible, Harness, Kubernetes (including operators, CRDs, and cluster lifecycle management), and GitOps practices • Support and evolve continuous delivery pipelines, ensuring reliable and repeatable deployments across environments (PRE, SBX, PRD) • Build and maintain self-service capabilities so that developers and engineering teams can autonomously claim and consume infrastructure resources (storage, compute, databases, messaging) through Kubernetes-native APIs and GitOps workflows, without manual intervention • Leverage AIOps practices to improve platform reliability: intelligent alerting, anomaly detection, AI-assisted incident triage, and automated remediation • Write, maintain, and improve production runbooks to ensure operational procedures are automated, documented, and accessible to the team • Provide L2 support on automation systems and pipelines already in place, troubleshooting issues across the platform ecosystem • Participate in on-call rotations and contribute to incident response and post-mortem processes • Collaborate actively with Production Ops, DBA, and application engineering teams on platform improvements and migrations • Contribute to FinOps by identifying and implementing AWS cost optimization opportunities • Document architecture decisions, operational procedures, and contribute to team knowledge sharing • Own, automate and improve Kyriba's storage platform (NetApp, S3, EBS), ensuring reliability, performance, and DR readiness • Design and implement storage architectures on AWS with high availability and fault tolerance • Provide L2 support on storage incidents, leading investigations on performance degradation, availability issues, and data integrity events
Site Reliability Engineer – Core Infrastructure
Kraken Digital Asset ExchangeWe put the power in your hands to buy, sell, and trade digital currency 🌏
• Implement data infrastructure solutions (self service) that support the needs of dozens of business units and hundreds of engineers • Utilize Infrastructure as Code (IaC) principles to design, provision, and manage both on-premises and cloud (AWS) infrastructure components using tools such as Terraform • Develop and maintain automation scripts using bash/shell scripting and to automate operational tasks and deployments. • Enhance and manage CI/CD pipelines to facilitate consistent software deployments across the data infrastructure. • Implement robust data monitoring and alerting solutions to proactively detect anomalies and performance issues. • Manage and implement role-based access control (RBAC) and permissions for a multitude of user groups and machine workflows across different environments • Utilize Kubernetes and Nomad to manage containerized applications within the data infrastructure, ensuring efficient deployment, scaling, and orchestration. • Implement effective incident response procedures and participate in on-call rotations. • Collaborate with data analysts, engineers, and cross-functional teams to understand requirements and implement appropriate solutions. • Document architecture, processes, and best practices to enable knowledge sharing and support continuous improvement.
Senior Devops Engineer
VonageCommunications APIs. Unified Communications. Contact Centers. Now we're talking.
• Work with Cloud technologies, including VPC, Route53, IAM, S3, Lambda, API Gateway, CloudFormation, ALB, VPCs, CloudWatch, ECS and EKS, and auto-scaling • Utilize Infrastructure as Code (IaC) technologies, including AWS CloudFormation and Terraform • Manage container technologies, including Kubernetes and Docker • Perform metrics gathering and health monitoring of applications and systems using AWS CloudWatch and Elasticsearch • Develop automation workflows and scripts for integrating multiple IT systems, using Python and PowerShell • Administrate corporate applications, including GitHub, Okta, Jira, Confluence, Slack, Netskope, Active Directory, and SCCM
DevOps Engineer III
VonageCommunications APIs. Unified Communications. Contact Centers. Now we're talking.
• Utilize infrastructure as code (IaC) tools, such as Terraform and Ansible, to automate the provisioning of cloud resources and integrate Crossplane to extend IaC capabilities across multiple cloud providers. • Deploy and manage Kubernetes clusters to ensure scalability, performance, and reliability. • Develop, maintain, and optimize Continuous Integration and Continuous Deployment/Delivery (CI/CD) pipelines for automated testing and build and deploy applications within Kubernetes environments, incorporating package management tools to handle application dependencies. • Deploy and manage Crossplane in Kubernetes clusters to unify and automate multi-cloud infrastructure provisioning and management. • Design and implement the least privileged access policies to ensure that users and services have only the necessary permissions. • Review and update Identity and Access Management (IAM) policies and roles to adapt to changing project requirements and security best practices. • Configure Amazon Web Services (AWS) GuardDuty, AWS Web Application Firewall (WAF), and AWS Shield Advanced at scale for comprehensive cloud security management and threat mitigation. • Set up and build IAC modules to produce AWS accounts based on approved blueprints, ensuring standardized infrastructure compliance and security protocols. • Work with Python; Rest APIs; AWS; GCP; Capella; Enterprise Redis; Mongo; Cloudformation; Terraform and Ansible; Docker and Kubernetes; DevOps; CI/CD; GitOps and Agile methodologies; deployment and infrastructure orchestration frameworks; CI/CD pipelines; Github; AWS Code Suite; GCP Cloud Build; deploying and running code; and, using Docker and Kubernetes.



