itD logo
itD

Formerly known as iTalent Digital. We are a different kind of global software development and technology consultancy.

Senior Software Engineer/SRE

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 501-1,000Since 2005H1B No SponsorCompany SiteLinkedIn

Location

United Kingdom

Posted

4 days ago

Salary

0

Seniority

Senior

Job Description

Senior Software Engineer/SRE

itD

• Lead the design, development and operation of large-scale, secure observability systems • Collaborate with internal teams on industry thought leadership • Attend regular internal practice community meetings • Complete client case studies and learning material (blogs, media material) • Build out material to contribute to the Digital Transformation practice • Attend internal itD networking events (in person and virtual)

Job Requirements

  • 5+ years experience designing, deploying and operating mid to large size distributed systems on VMs or bare metal machines running Linux
  • 2+ years experience developing with languages like Ruby, Python, Go, Scala, or Bash
  • Strong experience in building out solutions based on Software engineering best practices
  • Direct experience with technologies like Elasticsearch Logstash Kibana (ELK) stack, Kafka, Prometheus/Thanos/Cortex, Graphite, Ansible, Terraform, Consul
  • Willingness to be part of a production on-call rotation

Benefits

  • Comprehensive medical benefits
  • 401k plan
  • Paid holidays
  • Networking & career learning and development programs

Related Categories

Related Job Pages

More DevOps Engineer Jobs

PSI CRO AG logo

DevOps Engineer

PSI CRO AG

The global CRO where clinical trials run on time.

DevOps Engineer4 days ago
Full TimeRemoteTeam 1,001-5,000Since 1996H1B No Sponsor

• Troubleshoot system issues using logs, diagnostics, and monitoring tools • Develop and maintain automation scripts using PowerShell and Bash • Design and manage CI/CD pipelines for continuous integration and deployment • Configure and maintain web servers (IIS) and support application hosting environments • Deploy, manage, and scale applications in Azure Kubernetes Service (AKS) • Build and maintain Docker images, Docker files, and containerized environments • Implement and maintain monitoring, logging, and observability solutions (Grafana, Prometheus, ELK, etc.) • Collaborate with development teams to ensure smooth release cycles and reliable deployments • Contribute to improving system architecture and adopting best DevOps practices

Estonia
Clover Health logo

Senior Manager, Site Reliability Engineering

Clover Health

Clover is a healthcare technology company helping members live their healthiest lives with our Medicare Advantage plans.

DevOps Engineer4 days ago
Full TimeRemoteTeam 501-1,000H1B Sponsor

• Lead and grow our SRE team of ~10 engineers, including hiring, retention, career development, and performance management across multiple time zones (US, HK, NZ). • Build strategic partnerships with product engineering pillars — shifting SRE from reactive, ticket-based support to proactive co-ownership of reliability outcomes. • Scale our multi-tenant infrastructure to support new customer onboarding and growing patient populations. • Own cloud cost management and FinOps practices, building frameworks that balance cost control with reliability and performance. • Champion developer self-service and platform engineering. Build self-service capabilities so product teams can manage routine operations without filing SRE tickets. Establish SLOs/SLIs for critical services and improve alert quality so every page is meaningful. • Ensure the SRE team is fully leveraging AI tooling in their workflows — using tools like Claude Code for IaC generation, log analysis, root cause investigation, and automating repetitive work — at the same level as the rest of engineering.

United States
$187K - $243K / year
Counterpart Health logo

Senior Manager, Site Reliability Engineering

Counterpart Health

In 2018, Clover Health set out to build a clinically intuitive, AI-enabled solution that fits within physicians' workflows to help support the earlier diagnosis and management of chronic conditions. Years later, that vision is a reality, with thousands of practitioners using Counterpart Assistant during patient visits. Counterpart Health is a subsidiary of Clover Health, committed to Diversity & Inclusion as key to our success. We are an Equal Opportunity Employer, valuing diverse strengths, experiences, perspectives, and backgrounds.

DevOps Engineer4 days ago
Full TimeRemoteTeam 51-200

Role Description We're looking for a Senior Manager of Site Reliability Engineering to join our team. You'll lead a team of ~10 SREs across North America, UK, HK, and New Zealand — owning both the day-to-day operations and the long-term technical direction of the SRE organization. This role sits at the intersection of people leadership, technical depth, and strategic partnership: you're here to make Counterpart’s infrastructure reliable, scalable, and cost-efficient — and to transform the SRE team's engagement model from reactive support to proactive collaboration with our product engineering pillars. - Lead and grow our SRE team of ~10 engineers, including hiring, retention, career development, and performance management across multiple time zones (US, HK, NZ). - Build strategic partnerships with product engineering pillars — shifting SRE from reactive, ticket-based support to proactive co-ownership of reliability outcomes. - Scale our multi-tenant infrastructure to support new customer onboarding and growing patient populations. - Own cloud cost management and FinOps practices, building frameworks that balance cost control with reliability and performance. - Champion developer self-service and platform engineering. Build self-service capabilities so product teams can manage routine operations without filing SRE tickets. Establish SLOs/SLIs for critical services and improve alert quality so every page is meaningful. - Ensure the SRE team is fully leveraging AI tooling in their workflows — using tools like Claude Code for IaC generation, log analysis, root cause investigation, and automating repetitive work — at the same level as the rest of engineering. Qualifications - You have 6+ years managing an SRE team and 10+ years of hands-on SRE or infrastructure engineering experience. - You're deeply comfortable with our core stack: Kubernetes, GCP (GKE, Cloud SQL, Pub/Sub, GCS), Terraform, Helm, ArgoCD, PostgreSQL, and Prometheus/Grafana. - You have strong programming skills in Python and/or Go, and you're comfortable writing and reviewing infrastructure tooling code — including using AI coding tools to do so. - You have experience with CI/CD pipelines (GitHub Actions) and a track record of building or improving developer tooling and automation. - You have sound build vs. buy judgment — you default to the right answer, not the easiest one, and you're comfortable building internal tooling when existing solutions don't fit. - You have experience leading teams across multiple time zones and a track record of developing engineers into strong technical contributors. Benefits - Financial Well-Being: Competitive base salary and equity opportunities, performance-based bonus program, 401k matching, and regular compensation reviews. - Physical Well-Being: Comprehensive medical, dental, and vision coverage. - Mental Well-Being: Initiatives such as No-Meeting Fridays, monthly company holidays, access to mental health resources, and a generous flexible time-off policy. - Professional Development: Learning programs, mentorship, professional development funding, and regular performance feedback and reviews. - Additional Perks: Employee Stock Purchase Plan (ESPP), reimbursement for office setup expenses, monthly cell phone & internet stipend, remote-first culture, paid parental leave for all new parents, and much more!

Northern America
$187K - $243K / year
OZmap logo

Senior Platform Engineer – DevOps, Infrastructure and Platform

OZmap

Discover the best solution for documenting fiber optic networks.

DevOps Engineer4 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor

• Design, operate and evolve AWS (EC2) and on-premises environments with containers (Docker), ensuring availability, security and scalability; • Operate and administer Linux production environments (systemd, kernel/network tuning, I/O, process troubleshooting); • Build and evolve CI/CD pipelines from scratch, including quality and security gates; • Develop end-to-end observability (instrumentation, exporters, PromQL, SLI/SLO, alerts); • Lead advanced troubleshooting, root cause analysis and blameless post-mortems — driving structural change afterwards, not just producing a report; • Implement automation using Infrastructure as Code; • Analyze and optimize cloud costs: rightsizing, usage analysis and proposing data-driven alternatives; • Act as a technical reference for developers and engineers, influencing architecture without relying on formal authority.

Brazil