Job Closed
This listing is no longer active.
Creating Digital Leaders. Digital Transformation Consultancy Services and Solutions
Senior Site Reliability Engineer – AWS, AI
Location
Bulgaria
Posted
109 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer – AWS, AI
Xebia
• Enhancing automation, reliability, and deployment efficiency through Infrastructure as Code and GitOps practices • Utilizing AI-powered tooling to optimize operational workflows and system performance • Building and supporting tools, processes, and infrastructure that enable faster software delivery • Ensuring availability, reliability, and scalability of application infrastructure • Building and supporting continuous integration, delivery, and release pipelines • Collecting, monitoring, and ensuring actionable metrics
Job Requirements
- 5+ years of experience in DevOps practices and Continuous Delivery
- Practical knowledge of AWS services, infrastructure, and networking
- Solid experience with Kubernetes (ideally EKS on AWS) and container orchestration
- Python knowledge
- Experience working with AI Agents
- Familiarity with Claude Code
- Experience with FastMCP or other MCP libraries
- Hands-on experience with GitOps practices, preferably with ArgoCD
- Strong skills in Terraform and Helm
- Proficiency in Bash scripting (PowerShell is a plus)
- Experience with CI/CD pipelines and tooling (GitLab CI/CD, GitHub Actions, or similar)
- Experience with monitoring, observability, and logging tools (e.g., Prometheus, Grafana, AppDynamics, OpenSearch)
- Strong security awareness (OWASP, encryption, secrets management)
- Highly communicative and collaborative, with a strong sense of ownership
- Upper-intermediate / advanced English (B2/C1)
Benefits
- Health insurance
- Flexible work arrangements
- Professional development
- Personal development budgets
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Own and Evolve CI/CD Infrastructure • Design, implement, and maintain scalable CI/CD pipelines across GitLab CI, GitHub Actions, and Azure DevOps. • Improve deployment velocity while increasing reliability and rollback safety. • Standardize pipeline-as-code practices to eliminate manual configuration drift. • Partner with engineering teams to reduce build failures and deployment bottlenecks. • Measure and optimize DORA metrics (deployment frequency, lead time, MTTR, change failure rate). • Manage Multi-Cloud Infrastructure as Code • Build and maintain Azure and AWS infrastructure using Pulumi. • Ensure infrastructure is version-controlled, reproducible, and environment-consistent. • Optimize compute, storage, networking, and security configurations for performance and cost efficiency. • Contribute to infrastructure modernization initiatives as platform scale increases. • Implement automated cost-monitoring and capacity planning mechanisms. • Strengthen Kubernetes & Container Operations • Support Kubernetes cluster management, scaling strategies, and workload optimization. • Improve container orchestration patterns, resource allocation, and deployment strategies. • Standardize Docker image best practices and runtime security configurations. • Improve cluster reliability and automate failover strategies. • Contribute to automation of environment provisioning and cluster lifecycle management. • Elevate Monitoring, Observability & Incident Response • Maintain and improve Kubernetes dashboards, and system health metrics. • Enhance observability using Prometheus, Grafana, and ELK where applicable. • Reduce mean time to detection and resolution through proactive monitoring design. • Participate in incident response and contribute to root cause analysis. • Implement post-incident improvement actions and preventive automation. • Embed DevSecOps and Operational Discipline • Integrate secrets management, vulnerability scanning, and compliance checks into CI/CD pipelines. • Strengthen infrastructure security posture across environments. • Apply least-privilege access models and secure configuration baselines. • Contribute to DevOps best practices documentation and internal enablement. • Support hiring processes and participate in technical interviews.
• building and supporting tools, processes, and infrastructure that enable faster and higher-quality software delivery and scaling • ensuring the availability, reliability, and scalability of application infrastructure • building and supporting continuous integration, delivery, and release pipelines • ensuring the right metrics are collected, monitored, and actionable
• Cloud & Infrastructure Security - Write and maintain Infrastructure as Code (IaC) with secure defaults, ensuring least privilege access and robust cloud configurations. • Vulnerability Management - Hunt for weaknesses, perform threat modeling, prioritize remediation, and guide engineering teams on how to fix discovered flaws. • Incident Response & Monitoring - Monitor live systems, investigate security anomalies, and respond to breaches. • Develop, deploy, and maintain Infrastructure-as-Code (IaC) in a GCP cloud-based environment. • Lead the development and enforcement of security architecture and operational best practices. • Establish monitoring, alerting, and incident response strategies across environments. • Define and execute on security roadmaps (e.g., threat modeling, vulnerability scanning, IAM policies). • Partner with developers to shift security and reliability left into the SDLC. • Support compliance and audit initiatives (SOC2, ISO27001). • Develop and maintain automated CI/CD pipelines for DBs, Servers, containers, and applications using DevSecOps tools to include Terraform, Ansible, GitHub, ArgoCD. • Develop integration interfaces using Python, Bash and Go. • Deploy and maintain complex modern cloud architectures. • Create automated testing plans for infrastructure and applications. • Create and update technical documentation (e.g. user guides, infrastructure diagrams). • Work across infrastructure that contains both Linux and Windows. • Work and communicate effectively in a group environment with technical and non-technical, management and customer both written and verbally. • Utilize robust troubleshooting skills. • Instill and apply solid engineering rigor, to include configuration management, testing. • Develop/engineer as part of an Agile team.
• Architect and own software and dev ops infrastructure for a Command & Control (C2) system designed to control multi-domain unmanned systems • Design and implement secure network architectures across partner and government owned environments • Collaborate with partners on cyber security accreditation • Accelerate development through CI/CD improvements, cloud development environments, and integrations



