Job Closed
This listing is no longer active.
Senior Kubernetes Platform Engineer
Location
United Kingdom
Posted
135 days ago
Salary
0
Seniority
Senior
Job Description
Senior Kubernetes Platform Engineer
Intermedia Cloud Communications
• Design, build, operate, and support the company’s Kubernetes platform across on-premises and multiple cloud environments. • Ensure a reliable, standardized Kubernetes runtime that integrates with the Internal Developer Platform (IDP), enabling application teams to deploy and operate services independently. • Own cluster lifecycle management: provisioning, upgrades, patching, and decommissioning. • Develop and maintain Infrastructure as Code: Terraform modules, cluster bootstrap and configuration automation. • Implement and operate GitOps workflows for platform components. • Integrate Kubernetes capabilities into the Internal Developer Platform (IDP): standard cluster and namespace patterns, approved ingress, secrets, and observability integrations. • Participate in a rotational on-call/support model for platform-level incidents. • Troubleshoot Kubernetes platform issues and improve reliability. • Create and maintain documentation, runbooks, and operational standards. • Collaborate with IDP, application support, infrastructure, and security team.
Job Requirements
- Strong hands-on experience operating production Kubernetes environments
- Experience with on-premises, VM-based infrastructure
- Solid understanding of:
- Kubernetes internals
- Linux systems and networking
- Experience with Infrastructure as Code, preferably Terraform
- Experience working with multiple cloud environments (at least one deeply)
- Familiarity with Git-based workflows and GitOps tools
- Proven ability to troubleshoot distributed systems and production issues
- Desired / Nice-to-Have Skills**
- Experience with multiple managed Kubernetes services (EKS, GKE, AKS, OKE)
- Experience running Kubernetes on-prem (bare metal or VM-based)
- Exposure to:
- Observability stacks (Prometheus, Grafana, logging systems, Open Telemetry)
- Ingress and traffic management
- Secrets and certificate management
- Prior experience integrating infrastructure platforms into an Internal Developer Platform (IDP)**
- Understanding of SRE concepts (SLOs, error budgets, incident response)**
- Soft Skills**
- Ownership mindset: accountable for platform outcomes
- Generalist approach: comfortable across infrastructure, Kubernetes, and operations
- Strong problem solver: able to handle ambiguous and complex issues
- Clear communicator: explains technical topics effectively
- Collaborative: works well across teams and shares responsibility
- Calm under pressure: effective during incidents and outages
- Documentation-driven: values clarity and knowledge sharing
Benefits
- We hire, promote, and compensate employees based on their ability to perform their job responsibilities, without regard to race, color, creed, religion, sex, gender, marital status, national origin, ancestry, age, citizenship, physical or mental disability, sexual orientation, or any other basis protected by applicable law (collectively referred to in our Code of Conduct as “Protected Classes”). We do not tolerate employment discrimination in the workplace, and we are committed to making reasonable accommodations for identified disabilities or other limitations as required by all applicable laws. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.*
Related Guides
Related Categories
Related Job Pages
More Platform Engineer Jobs
• Investigate small to large scale security and engineering issues to determine impact and risks to platform data • Conduct response and recovery efforts across complex systems and large data sets • Examine systems and applications to understand current security posture • Advocate for best-security practices to engineering organization • Identify, design, and partner with teams to implement security and data improvement projects
Senior Platform Engineer
Fanatics, Inc.We amplify pride and create connections for all fans around the world.
• Own and deliver medium-to-large platform projects from implementation through production • Build and maintain internal platform services and tooling that improve developer experience and delivery velocity • Partner with application teams to unblock delivery and align platform capabilities with developer needs • Drive cloud cost efficiency and optimization across our cloud and infrastructure • Improve operational excellence through observability, automation, and incident response • Help evolve CI/CD pipelines and GitOps-based workflows • Mentor junior engineers through code reviews and knowledge sharing • Participate in on-call rotations and incident response
• Architect, implement, and manage cloud-native infrastructure (Kubernetes, Terraform, Terragrunt, Docker, Azure) to support scalable and reliable platforms. • Champion SRE principles: drive service reliability, availability, and performance through automation, monitoring, and incident management. • Lead platform automation, CI/CD, and infrastructure-as-code initiatives to improve deployment velocity and system consistency. • Own observability, monitoring, and alerting using Datadog and related tools, ensuring actionable insights and rapid incident response. • Collaborate with software engineers to ensure robust integration between application and platform layers, and to improve developer experience (DevX). • Provide expert support for critical incidents and production issues, demonstrating high ownership and urgency (including out-of-hours support when required). • Mentor and guide engineers in platform, SRE, and DevOps best practices, fostering a culture of reliability and continuous improvement. • Identify and address gaps in tooling, automation, and platform reliability to proactively improve engineering outcomes.
AI Platform Engineer
Thinkahead Consultant Psychologist Pty LtdWe get to the heart of the matter.....real people......real solutions
• Architect and manage Kubernetes clusters tailored to AI/ML workloads. • Implement Run:ai and operators for GPU resource orchestration and workload scheduling. • Develop and maintain Python-based automation scripts and ML pipelines; automate infrastructure provisioning with Terraform and configuration management with Ansible. • Create and manage Jupyter Notebooks for experimentation and collaboration. • Integrate and optimize NVIDIA Enterprise Suite components (CUDA, NeMo Framework, Triton, TensorRT, GPU drivers) for accelerated computing. • Establish and maintain MLOps best practices for model lifecycle management, CI/CD, and monitoring (e.g., MLflow, Kubeflow). • Work closely with data scientists and platform engineers to ensure efficient resource utilization and scalability across environments.



