Formerly known as iTalent Digital. We are a different kind of global software development and technology consultancy.
Senior Software Engineer/SRE
Location
United Kingdom
Posted
4 days ago
Salary
0
Seniority
Senior
Job Description
Senior Software Engineer/SRE
itD
• Lead the design, development and operation of large-scale, secure observability systems • Collaborate with internal teams on industry thought leadership • Attend regular internal practice community meetings • Complete client case studies and learning material (blogs, media material) • Build out material to contribute to the Digital Transformation practice • Attend internal itD networking events (in person and virtual)
Job Requirements
- 5+ years experience designing, deploying and operating mid to large size distributed systems on VMs or bare metal machines running Linux
- 2+ years experience developing with languages like Ruby, Python, Go, Scala, or Bash
- Strong experience in building out solutions based on Software engineering best practices
- Direct experience with technologies like Elasticsearch Logstash Kibana (ELK) stack, Kafka, Prometheus/Thanos/Cortex, Graphite, Ansible, Terraform, Consul
- Willingness to be part of a production on-call rotation
Benefits
- Comprehensive medical benefits
- 401k plan
- Paid holidays
- Networking & career learning and development programs
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Troubleshoot system issues using logs, diagnostics, and monitoring tools • Develop and maintain automation scripts using PowerShell and Bash • Design and manage CI/CD pipelines for continuous integration and deployment • Configure and maintain web servers (IIS) and support application hosting environments • Deploy, manage, and scale applications in Azure Kubernetes Service (AKS) • Build and maintain Docker images, Docker files, and containerized environments • Implement and maintain monitoring, logging, and observability solutions (Grafana, Prometheus, ELK, etc.) • Collaborate with development teams to ensure smooth release cycles and reliable deployments • Contribute to improving system architecture and adopting best DevOps practices
Senior Manager, Site Reliability Engineering
Clover HealthClover is a healthcare technology company helping members live their healthiest lives with our Medicare Advantage plans.
• Lead and grow our SRE team of ~10 engineers, including hiring, retention, career development, and performance management across multiple time zones (US, HK, NZ). • Build strategic partnerships with product engineering pillars — shifting SRE from reactive, ticket-based support to proactive co-ownership of reliability outcomes. • Scale our multi-tenant infrastructure to support new customer onboarding and growing patient populations. • Own cloud cost management and FinOps practices, building frameworks that balance cost control with reliability and performance. • Champion developer self-service and platform engineering. Build self-service capabilities so product teams can manage routine operations without filing SRE tickets. Establish SLOs/SLIs for critical services and improve alert quality so every page is meaningful. • Ensure the SRE team is fully leveraging AI tooling in their workflows — using tools like Claude Code for IaC generation, log analysis, root cause investigation, and automating repetitive work — at the same level as the rest of engineering.
Senior Manager, Site Reliability Engineering
Counterpart HealthIn 2018, Clover Health set out to build a clinically intuitive, AI-enabled solution that fits within physicians' workflows to help support the earlier diagnosis and management of chronic conditions. Years later, that vision is a reality, with thousands of practitioners using Counterpart Assistant during patient visits. Counterpart Health is a subsidiary of Clover Health, committed to Diversity & Inclusion as key to our success. We are an Equal Opportunity Employer, valuing diverse strengths, experiences, perspectives, and backgrounds.
Role Description We're looking for a Senior Manager of Site Reliability Engineering to join our team. You'll lead a team of ~10 SREs across North America, UK, HK, and New Zealand — owning both the day-to-day operations and the long-term technical direction of the SRE organization. This role sits at the intersection of people leadership, technical depth, and strategic partnership: you're here to make Counterpart’s infrastructure reliable, scalable, and cost-efficient — and to transform the SRE team's engagement model from reactive support to proactive collaboration with our product engineering pillars. - Lead and grow our SRE team of ~10 engineers, including hiring, retention, career development, and performance management across multiple time zones (US, HK, NZ). - Build strategic partnerships with product engineering pillars — shifting SRE from reactive, ticket-based support to proactive co-ownership of reliability outcomes. - Scale our multi-tenant infrastructure to support new customer onboarding and growing patient populations. - Own cloud cost management and FinOps practices, building frameworks that balance cost control with reliability and performance. - Champion developer self-service and platform engineering. Build self-service capabilities so product teams can manage routine operations without filing SRE tickets. Establish SLOs/SLIs for critical services and improve alert quality so every page is meaningful. - Ensure the SRE team is fully leveraging AI tooling in their workflows — using tools like Claude Code for IaC generation, log analysis, root cause investigation, and automating repetitive work — at the same level as the rest of engineering. Qualifications - You have 6+ years managing an SRE team and 10+ years of hands-on SRE or infrastructure engineering experience. - You're deeply comfortable with our core stack: Kubernetes, GCP (GKE, Cloud SQL, Pub/Sub, GCS), Terraform, Helm, ArgoCD, PostgreSQL, and Prometheus/Grafana. - You have strong programming skills in Python and/or Go, and you're comfortable writing and reviewing infrastructure tooling code — including using AI coding tools to do so. - You have experience with CI/CD pipelines (GitHub Actions) and a track record of building or improving developer tooling and automation. - You have sound build vs. buy judgment — you default to the right answer, not the easiest one, and you're comfortable building internal tooling when existing solutions don't fit. - You have experience leading teams across multiple time zones and a track record of developing engineers into strong technical contributors. Benefits - Financial Well-Being: Competitive base salary and equity opportunities, performance-based bonus program, 401k matching, and regular compensation reviews. - Physical Well-Being: Comprehensive medical, dental, and vision coverage. - Mental Well-Being: Initiatives such as No-Meeting Fridays, monthly company holidays, access to mental health resources, and a generous flexible time-off policy. - Professional Development: Learning programs, mentorship, professional development funding, and regular performance feedback and reviews. - Additional Perks: Employee Stock Purchase Plan (ESPP), reimbursement for office setup expenses, monthly cell phone & internet stipend, remote-first culture, paid parental leave for all new parents, and much more!
Senior Platform Engineer – DevOps, Infrastructure and Platform
OZmapDiscover the best solution for documenting fiber optic networks.
• Design, operate and evolve AWS (EC2) and on-premises environments with containers (Docker), ensuring availability, security and scalability; • Operate and administer Linux production environments (systemd, kernel/network tuning, I/O, process troubleshooting); • Build and evolve CI/CD pipelines from scratch, including quality and security gates; • Develop end-to-end observability (instrumentation, exporters, PromQL, SLI/SLO, alerts); • Lead advanced troubleshooting, root cause analysis and blameless post-mortems — driving structural change afterwards, not just producing a report; • Implement automation using Infrastructure as Code; • Analyze and optimize cloud costs: rightsizing, usage analysis and proposing data-driven alternatives; • Act as a technical reference for developers and engineers, influencing architecture without relying on formal authority.



