We're hiring! | Empowering commercial insurers to write better risks, faster.
Site Reliability Engineer
Location
United Kingdom
Posted
2 days ago
Salary
0
Seniority
Senior
Job Description
Site Reliability Engineer
Artificial Labs
• Support system reliability and operability, contributing to monitoring, observability, and infrastructure improvements. • Work with containerised and cloud-based systems, including Docker, Nix, and AWS (e.g. ECS and Fargate). • Develop Infrastructure-as-Code in Terraform, and contribute to code across all the software stack: Nix, scripting (Nu, bash, hell), Haskell, and even a bit of Python and TypeScript. • Participate in incident response and help improve on-call, alerting, and incident management practices. • Communicate effectively in a distributed team, taking ownership and supporting collaboration and continuous improvement.
Job Requirements
- Experience with incident management and monitoring, or a DevOps-oriented way of working.
- Experience in insurtech, insurance or related industries.
- Strong problem-solving skills.
- Experience in a distributed work environment.
- You are an engineer with a solid track record in doing that.
- Hands-on experience with Haskell in production, with a focus on performance, maintainability, and best practices, or a background in infrastructure, operations, or SRE-adjacent roles, with a strong focus on system reliability.
- Comfortable with Git, Linux, the command line, and modern software development workflows.
- Interested in operating and scaling distributed systems.
Benefits
- Private medical insurance
- Income protection insurance
- Life insurance of 4 * base salary
- On-site gym and shower facilities
- Enhanced maternity and paternity pay
- Team social events and company parties
- Salary exchange on pension and nursery fees
- Access to Maji, the financial wellbeing platform
- Company stock options managed through Ledgy
- Milestone Birthday Bonus and a Life Events leave policy
- Generous holiday allowance of 28 days plus national holidays
- Home office and equipment allowance, and a company MacBook
- Learning allowance and leave to attend conferences or take exams
- YuLife employee benefits, including EAP and bereavement helplines
- For each new hire, we plant a tree through our partnership with Ecologi
- Action The best coffee machine in London, handmade in Italy and imported just for us!
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Lead a team of DevOps engineers, including performance management, growth planning, and career development • Own the DevOps team roadmap in partnership with the Director of Platform Engineering, including quarterly priorities and capacity planning • Drive technical decisions and architecture reviews for CI/CD, infrastructure automation, and platform tooling • Collaborate with engineering, data platform, data governance, and ITOps leadership on cross-functional initiatives and shared standards • Coach engineers through code review, design feedback, and incident retrospectives • Represent the team in executive forums, including roadmap reviews, FinOps reporting, and architecture councils • Partner with the Director of Platform Engineering on AI tooling governance, including standardization on approved platforms, usage policy, and measuring engineering productivity impact • Design, build, and maintain CI/CD pipelines using GitHub Actions, including reusable workflows, self-hosted runners, and security controls • Implement and operate infrastructure as code using Terraform across multi-account AWS environments • Manage Kubernetes (EKS) clusters, including ArgoCD-based GitOps delivery, ingress, observability, and security policies • Operate secrets management with HashiCorp Vault, including dynamic credentials, JWT/OIDC auth, and External Secrets Operator integration • Build and maintain observability tooling with Grafana, OpenTelemetry, and Kubernetes-native monitoring stacks • Lead incident response and post-incident reviews, including authoring runbooks and reliability improvements • Implement security controls, governance processes, and compliance validation across the platform • Contribute to AWS network architecture, including PrivateLink, VPCs, and cross-account access patterns
DevOps Engineer
Cognitive Medical Systems, Inc.Our purpose is to empower people and organizations to optimize healthcare through innovative technology solutions.
• Monitor, support, and maintain production applications to ensure system availability, reliability, and performance • Review application, server, and system logs to proactively identify, troubleshoot, and resolve issues • Perform root cause analysis and implement corrective actions to prevent recurring incidents • Establish operational monitoring, alerting, and support procedures • Manage and maintain Microsoft SQL Server environments supporting enterprise applications • Lead and manage application deployments across Development, Test, Staging, and Production environments • Support an Agile, Lean, and SAFe-based environment utilizing DevSecOps, CI/CD, and related methodologies • Collaborate with development teams to improve application performance, maintainability, and deployment efficiency • Support JavaScript-based application development efforts as needed
• Manage enterprise storage on Hitachi VSP Gx00 and 5x00 , covering LUN and volume provisioning, troubleshooting, and replication with GAD, UR, and SI • Storage NetApp AFF and FAS , deliver SAN and NAS services, oversee provisioning and health, resolve issues, and configure SnapMirror replication • Handle Purestorage X90 for day to day management, provisioning, and incident resolution • Operate Brocade GEN5 and later switches and directors, performing zoning, pathing, and diagnostics • Use Hitachi OPS Center to monitor Hitachi arrays and analyze performance, NetApp Active IQ Unified Manager to track NetApp health and capacity, and Brocade BNA to administer Brocade fabrics and report on events and performance
• You'll own reliability across our Azure-based cloud platform — but how you get there is changing. • We expect our SREs to actively use AI tools to work smarter: faster root cause analysis, intelligent alerting, automated runbooks, predictive scaling. • You'll deep-dive production issues, build automation that sticks, and partner closely with engineering teams to ship and run resilient services at scale — for regulated, compliance-heavy clients who can't afford surprises. • You'll also join our on-call rotation.




