The #1 Converged Identity Platform with Intelligent Access Governance for Employees, Third Parties & Machines.
Platform Support Engineer
Location
United Kingdom
Posted
106 days ago
Salary
0
Seniority
Senior
Job Description
Platform Support Engineer
Saviynt
• Strong pod-level troubleshooting skills in AKS/EKS (not just restarting pods). • Analyze application and DB (RDS, MySQL) performance issues. • Oversee the monitoring of our SaaS applications and underlying infrastructure (Kubernetes on AWS and Azure, VPN connections, customer applications, Elastic Search, MySQL) for alerts and performance issues. • Ensure adherence to defined SLAs (Service Level Agreements) and KPIs (Key Performance Indicators) for operational performance. • Plan and coordinate scheduled maintenance activities with minimal impact to service availability.
Job Requirements
- Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field.
- Minimum of 3+ years of experience in IT/Cloud operations and application support (specifically Java apps), with knowledge of cloud infrastructure (AWS and Azure).
- Strong experience with application support (Java, Grails, Hibernate) and performance analysis in a production environment, able to pinpoint a performance degradation through analysis.
- Strong understanding of cloud computing concepts, architectures, and services on both AWS and Azure platforms.
- Working knowledge of containerization and orchestration technologies, specifically Kubernetes.
- Experience managing and troubleshooting network connectivity, including VPNs and connections to external networks.
- Familiarity with monitoring tools and practices, with experience in setting up and responding to alerts.
- Hands-on experience with log management and analysis tools, preferably Elastic Search.
- Working knowledge of database systems, preferably MySQL, including L2 troubleshooting and performance monitoring.
- Experience with ITSM (IT Service Management) systems, preferably FreshService, including incident, problem, and service request management processes.
- Excellent problem-solving, analytical, and troubleshooting skills with a data-driven approach.
- Strong communication (written and verbal), interpersonal, and presentation skills.
- Ability to work effectively under pressure and manage multiple priorities in a fast-paced environment.
- Experience in developing and documenting operational procedures and runbooks.
- Experience with automation tools and scripting languages (e.g., Python, Bash) is a plus.
- Experience working in a SaaS environment is highly desirable.
Benefits
- Complete security & privacy literacy and awareness training during onboarding and annually thereafter
- Review and adhere to Information Security/Privacy Policies and Procedures
Related Guides
Related Categories
Related Job Pages
More Platform Engineer Jobs
Senior Data Platform Engineer – Financial Services
Truelogic SoftwarePreviously named to the Inc. 5000 list of America's fastest-growing companies, Truelogic Software specializes in custom web and mobile software development. By leveraging its globa
• Manage High Availability solutions including Always On Availability Groups, Failover Clustering, and HA/DR strategies. • Perform advanced T-SQL development (CTEs, Window Functions) and expert-level performance tuning (execution plans, indexing strategies). • Implement enterprise-grade database security including TDE, Row-Level Security (RLS), and RBAC governance controls. • Design and build scalable data pipelines using Python 3, Pandas, SQLAlchemy, and orchestration tools (e.g., Airflow). • Manage AWS-based data infrastructure using Infrastructure as Code (Terraform/CDK) and serverless architectures. • Develop and integrate RESTful APIs (FastAPI/Flask) to support microservices and enterprise data exchange. • Leverage AI-assisted development tools (e.g., GitHub Copilot, Cursor) to accelerate engineering workflows. • Architect data pipelines optimized for AI/ML workloads, including feature engineering and model deployment support.
About Clarity AI 🪴 Clarity AI is a global tech company founded in 2017 with a unique mission: bringing societal impact to markets. We leverage AI and machine learning technologies to provide top international investors, governments, companies, and consumers with the right data, methodologies, and tools to make more informed decisions. We are now a team of more than 300 highly passionate and curious individuals from all over the world, with offices in New York, Madrid, London, Paris, and Abu Dhabi. Together, we have established Clarity AI as a leading sustainability tech AI company backed by investors and strategic partners such as BlackRock, SoftBank, and Deutsche Börse , who believe in us and share our goals. We are dedicated to cultivating an exceptional workplace environment, and we take pride in our culture, defined by our commitment to being fact-based, diverse, transparent, meritocratic, and flexible. We have plans to continue growing our teams globally, so if you would like to join us on this rocket ship, keep reading! Your work will shape and guide the sustainable decisions of investors, companies and consumers worldwide. About The Role 💻 We are looking for a Senior GenAI Platform Staff Engineer who is an expert in the deployment and scaling of LLMs and Agentic systems . In this role, you will bridge the gap between machine learning experimentation and production at scale by building the robust, highly efficient platform that powers our AI initiatives. While our Data Science teams focus on developing and tuning state-of-the-art models, you will be the owner of the platform that enables them. You will define best practices, build automated pipelines, and ensure that our infrastructure can handle complex agentic workflows with high reliability and performance. This is a role for a visionary who stays ahead of the daily shifts in the AI landscape and can rapidly adapt our stack to leverage emerging trends. For more insight into the technologies used by the engineering team at Clarity AI, please explore our Tech Stack What You’ll Be Doing 🚀 As a Senior GenAI Platform Staff Engineer, you will be responsible for: GenAI Platform Engineering: Designing and developing the core platform that enables the efficient deployment, scaling, and management of LLMs and multi-agent systems. Infrastructure for Agents: Building specialized infrastructure to support long-running agentic workflows, including state management, tool-calling interfaces, and complex reasoning loops. High-Scale Productionization & Model Serving: Scaling inference for LLMs to handle global demand while optimizing for latency, throughput, and cost. Implement standard batch and online serving with controlled rollback. Build & Delivery : Establishing the "Golden Path" for model deployment through a self-service path to move code, data, and models to production safely and reproducibly , including automated evaluation frameworks, safety guardrails, and CI/CD/CT pipelines. Strategic Vision & Product Management: Continuously monitoring the AI ecosystem and proactively evolving our platform to maintain a competitive edge. This includes adopting best practices in Platform Product Management and driving the adoption of golden-path solutions . End-to-End Observability: Implementing deep observability for LLMs, tracking not just system health but providing unified visibility into health, impact, and root cause across data, ML, and GenAI (including model hallucinations, token usage, and RAG performance). Collaborative Foundation: Providing the tools and abstractions that allow Data Scientists and stakeholders to move from a "tuned model" to a "production service" with zero friction. Location 🌍 The role is based in Madrid/Spain ( Remote / Hybrid ). What You’ll Need 👀 LLM & Agent Expertise: Deep, hands-on experience deploying Large Language Models and complex agentic architectures at scale. GenAI Platform Specifics: Proven experience in implementing Prompt Lifecycle Management (versioning, testing, and deploying prompts as code), an LLM Abstraction Layer (provider-agnostic access), and systems for Cost & Usage Control (visibility and limits on GenAI spend per use case). Evaluation & Benchmarking Mastery: Expert-level experience building automated evaluation pipelines and frameworks (e.g., Ragas, DeepEval, G-Eval) and implementing LLM-as-a-judge patterns to validate model quality, grounding, and safety in CI/CD. Platform & MLOps Mindset: A proven track record of building platforms or shared infrastructure. Deep understanding of MLOps concepts like Model Registry (versioning, state management, and lineage) and Model Monitoring & Drift Detection . 3+ years of experience in MLOps or high-scale Software Engineering with a focus on AI production environments. Technical Stack Mastery: Expert-level Python and deep experience with container orchestration ( Kubernetes , Docker ) and cloud infrastructure (AWS/GCP). AI Tooling & Frameworks: Proficiency with orchestration libraries (e.g., LangChain, LlamaIndex, CrewAI), vector databases (e.g., Pinecone, Weaviate), and inference engines (e.g., vLLM, TGI). Agility & Adaptability: The ability to learn and implement new technologies in a field that changes weekly. You should be a "fast mover" who enjoys constant evolution. Software Excellence & Governance: Strong fundamentals in API design, microservices, and "GitOps" methodologies, including the implementation of automated security and compliance by default. English Proficiency: Excellent communication skills (minimum C1 level), with the ability to articulate technical vision to both engineers and leadership. What We Offer 🥁 Competitive compensation, both in terms of base salary as well as equity plans that enable to you to share in our success Flexibility in ways of working both in terms of your schedule as well as your location, whether you prefer to work from home, the office, or abroad with access to a global network of co-working spaces Generous paid time off schemes , including vacation, sabbatical, religious observance and compensation days Meaningful benefits including private healthcare coverage, fitness and wellness programs covered through Wellhub, working-from-home allowances to help you set up your home office and cover monthly expenses Professional development with annual training budget for conferences, courses, certifications and access to top market e-learning platforms Collaborative environment with multiple offices around the globe, regular team activities and events as well as employee-led resource groups More About Clarity AI ⭐ Clarity AI’s Founder and CEO, Rebeca Minguela , is a successful entrepreneur who has been recognised by prestigious institutions like the World Economic Forum as one of the most distinguished leaders under 40. The leadership team has an international presence and is composed of professionals from leading tech, consulting, and banking firms, entrepreneurs, PhDs from top research institutions, and MBA graduates from top business schools. Clarity AI has received several awards: The Forrester New Wave - ESG Ratings, Data, and Analytics - Leaders for 2022-2024 Investment Week - Best Sustainable Investment Research & Ratings Provider 2023 Fast Company - Most Innovative Companies 2023 European Commission | EU Seal of Excellence 2020 World Economic Forum - Technology Pioneer 2020 World Economic Forum, Young Global Leader - Rebeca Minguela Clarity AI believes diversity, inclusion, and belonging are essential for creating an innovative and successful workplace. By actively promoting and engaging in sustainability efforts, we can help create a more equitable and resilient future for our planet and all its inhabitants.
Founding Platform Engineer
HexaStartup studio specializing in the future of work (eFounders), web3, fintech, AI, and health.
• Own the multi-tenant platform. Architect and maintain the infrastructure that hosts hundreds of MCP codebases. Each isolated, each configurable, all observable from a single pane of glass (or MCP). • Build the data backbone. Design the pipelines that aggregate statistics, analytics, and performance metrics across every deployed instance. Make the data real-time, reliable, and actionable. • Ship the core app. Work hands-on in the main Waniwani application: a Next.js/TypeScript codebase with Drizzle ORM, React Query, and a modern component architecture. You'll build features that customers and internal teams use daily. • Design for scale. Make architectural decisions that hold at 10x. Multi-codebase deployment strategies, database partitioning, caching layers, job queues. Whatever the problem demands. • Automate everything. CI/CD pipelines, codebase provisioning, health monitoring, rollback systems. If a human has to do it twice, you build a system to do it forever. • Instrument relentlessly. Logging, tracing, alerting. When something breaks at 2am across 300 instances, the system should tell you exactly where and why, before anyone notices. • You'll work directly with the CTO. The architecture is yours to shape. The stack is TypeScript end-to-end.
Senior Staff GenAI Platform Engineer
Clarity Innovations, Inc.We are your trusted partner for edtech strategy, content, and engineering.
• GenAI Platform Engineering: Designing and developing the core platform that enables the efficient deployment, scaling, and management of LLMs and multi-agent systems. • Infrastructure for Agents: Building specialized infrastructure to support long-running agentic workflows, including state management, tool-calling interfaces, and complex reasoning loops. • High-Scale Productionization & Model Serving: Scaling inference for LLMs to handle global demand while optimizing for latency, throughput, and cost. Implement standard batch and online serving with controlled rollback. • Build & Delivery: Establishing the "Golden Path" for model deployment through a self-service path to move code, data, and models to production safely and reproducibly, including automated evaluation frameworks, safety guardrails, and CI/CD/CT pipelines. • Strategic Vision & Product Management: Continuously monitoring the AI ecosystem and proactively evolving our platform to maintain a competitive edge. This includes adopting best practices in Platform Product Management and driving the adoption of golden-path solutions. • End-to-End Observability: Implementing deep observability for LLMs, tracking not just system health but providing unified visibility into health, impact, and root cause across data, ML, and GenAI (including model hallucinations, token usage, and RAG performance). • Collaborative Foundation: Providing the tools and abstractions that allow Data Scientists and stakeholders to move from a "tuned model" to a "production service" with zero friction.




