Harvest the power of your data
Site Reliability Engineer II
Location
Italy
Posted
168 days ago
Salary
€38.5K - €48.5K / year
Seniority
Mid Level
Job Description
Site Reliability Engineer II
Agile Lab
• Ensure high reliability of microservices running in OpenShift environments • Lead and coordinate a technical team of 3–4 engineers for operational excellence • Manage incident resolution and ticketing workflows via ServiceNow • Collaborate with development teams to drive performance optimization and tuning • Design, configure and maintain monitoring dashboards (Grafana, Prometheus, etc.) • Coordinate with Service Control Room to maintain effective alerting and response • Oversee release processes of new features, hotfixes, and updates in production
Job Requirements
- Degree in Computer Engineering, Computer Science, or a related field
- Proven experience in Application Maintenance Services (AMS): minimum 2 years
- In-depth knowledge of OpenShift and microservices in cloud-native environments
- Ability to technically and operationally lead a team of 3–4 people
- Experience in release management, monitoring, and incident resolution
- Excellent communication and cross-functional coordination skills
- Strong initiative, operational autonomy, and results-oriented mindset
- Fluency in Italian (mandatory requirement)
- Monitoring & Observability: Grafana, Prometheus, Kibana, Jaeger, Datadog, OpenTelemetry
- Cloud/DevOps: OpenShift, GitLab, Jenkins
- Data & Messaging: Kafka, MongoDB, Ignite
- Ticketing & ITSM: ServiceNow
Benefits
- Full Remote or hybrid working in our offices: Milan, Turin, Padua, Bologna, Catania and Rende
- Real work life balance
- Training monthly budget (time and money)
- Support of a buddy in the first week of work
- Benefits and corporate welfare programs: company prizes and welcome pack with all the equipment you need to work
- Agile Nomads Experience: opportunity to work for 2 weeks abroad
- Referral bonus, if you bring people as talented as you
- The opportunity to attend one conference per year
- A company rated 4.8 out of 5 for employee satisfaction on Glassdoor and certified as a Great Place to Work
- Inclusive environment where you can be who you really are
- Stimulating environment oriented to growth, both professional and personal.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Head of DevOps, Cloud & Infrastructure
EnterpriseAlumniCorporate Alumni Engagement & Management Platform For The Enterprise
• Architect, build, and maintain scalable, secure, multi-regional cloud infrastructure on AWS • Own our Infrastructure as Code practices using Terraform, ensuring reproducibility and auditability • Design and optimize CI/CD pipelines across Jenkins and CircleCI, including iOS and Android build systems • Manage container orchestration via EC2/ECS/ECR and Kubernetes as well as ingress/routing through Traefik • Lead observability strategy using Grafana and Prometheus — ensuring comprehensive monitoring, alerting, and incident response capabilities • Drive high availability and disaster recovery planning across regions • Ensure infrastructure meets SOC 2, ISO 27001, and Cyber Essentials+ requirements • Implement and maintain robust security practices, including encryption at rest, in transit, and in use • Stay current on evolving compliance requirements for banking and professional services clients • Lead security audits and remediation efforts • Continuously monitor and optimize cloud spend, staying ahead of AWS pricing changes and leveraging reserved instances, savings plans, and right-sizing strategies • Establish cost visibility and accountability across teams • Present regular cost analyses and recommendations to leadership • Build, mentor, and lead the DevOps and infrastructure team • Set clear goals, provide regular feedback, and support career development • Foster a culture of ownership, collaboration, and continuous improvement • Manage vendor relationships and negotiate contracts where applicable • Partner closely with development teams to ensure infrastructure supports application needs • Communicate infrastructure strategy, risks, and trade-offs clearly to non-technical stakeholders • Participate in incident response and establish on-call practices that balance reliability with team well-being
• Implement and maintain observability tools and dashboards using [e.g., AWS CloudWatch, Datadog, Sentry, OpenTelemetry]. • Go beyond basic CPU/memory metrics; instrument applications for high-value Application Performance Monitoring (APM) traces, custom business metrics, and real-user monitoring (RUM). • Enhance security monitoring in our observability stack. Implement automated alerts for anomalous behavior, access pattern violations, and potential security threats. • Implement logging and retention configurations to meet defined data retention policies and relevant standards (e.g., GDPR, CCPA, SOC2) and ensure PII is appropriately redacted or handled. • Assist with cloud cost visibility and optimization. • Analyze infrastructure usage patterns to identify waste, implement aggressive tagging strategies, and recommend rightsizing adjustments to reduce spend. • Manage Reserved Instances, Savings Plans, and Spot Instance usage to maximize value. • Manage and enhance our CI/CD pipelines (using [e.g., GitHub Actions, GitLab CI, Jenkins]). Your goal is to optimize for speed, reliability, and ease of use for developers • Integrate security scanning (SAST/DAST/container scanning) and compliance checks directly into the CI pipeline. • Manage the tooling and processes for deploying applications to AWS EKS / Kubernetes / ECS / Serverless • Facilitate modern deployment strategies, such as Blue/Green deployments, Canary releases, and feature-flag rollouts, to minimize blast radius during releases. • Maintain and evolve our Infrastructure as Code (IaC) base using [Terraform / OpenTofu / CloudFormation / Pulumi].
• Define and drive the technical vision for infrastructure reliability across the organization • Architect large-scale, fault-tolerant systems on AWS using Terraform • Lead cross-functional initiatives to improve system reliability, scalability, and efficiency • Establish standards for infrastructure-as-code, CI/CD, and deployment practices • Design and implement solutions for our most complex operational challenges • Lead incident response for critical outages and drive systemic improvements • Mentor senior engineers and help grow the SRE team’s capabilities • Evaluate and introduce new technologies that improve operational excellence • Influence engineering culture around reliability, observability, and operational maturity
• Design, build, and maintain infrastructure using Terraform on AWS • Develop and improve CI/CD pipelines and deployment automation • Monitor system health, respond to incidents, and conduct blameless postmortems • Collaborate with development teams to improve service reliability and performance • Automate toil and repetitive operational tasks • Participate in on-call rotations • Document systems, runbooks, and operational procedures • Mentor junior team members



