k-ID is a first-of-its-kind global compliance engine that makes it easy for game developers and parents to ensure the safety and privacy of kids and teens online, providing age-appropriate and market-specific feature access in more than 200 markets around the world.
Senior Site Reliability Engineer
Location
Singapore
Posted
60 days ago
Salary
0
Seniority
Senior
No structured requirement data.
Job Description
Senior Site Reliability Engineer
k-ID
About k-ID k-ID is the global leader in privacy-first compliance and age verification infrastructure. Recognized as one of TIME’s Best Inventions of 2025, named a Tech Pioneer by the World Economic Forum and a winner of Fast Company’s Next Big Things in Tech, we are building the Age Layer for the internet—the fundamental infrastructure that allows digital platforms to verify age and manage compliance globally without friction. Our core platform, anchored by the Compliance Development Kit (CDK) and AgeKit, is the trusted engine for the world’s largest game publishers and digital ecosystems. We replace fragmented, manual compliance with a unified API that handles age verification, parental consent, and regulatory logic across 200+ markets. Backed by top-tier venture capital firms like a16z and Lightspeed, k-ID is entering a phase of growth to define the standard for global digital safety. About the role We are hiring a Senior Site Reliability Engineer to help make k-ID reliable at scale. This role sits in the middle of our production backbone. You will own and improve the systems that keep our platform available, observable, secure, and resilient as traffic grows and our client base expands globally. You will work across infrastructure, tooling, deployment workflows, incident response, and systems design to make sure we can scale without breaking. This is not a ticket closing operations role. We want someone who can look at a system, find the weak points, and harden it. Someone who cares about failure modes, blast radius, deployment safety, recovery time, cost discipline, and the realities of running production systems under pressure. You should be comfortable writing code, automating away toil, and partnering closely with engineers to improve reliability through better architecture and better operating practices. Responsibilities - Own the reliability, availability, and performance of the systems behind k-ID’s platform and public APIs - Design and improve scalable infrastructure on AWS and Kubernetes that can support high growth, uneven traffic, and global production workloads - Build and maintain strong observability across logs, metrics, tracing, alerting, and service health so issues are caught early and investigated quickly - Improve deployment safety through better CI and CD workflows, release controls, rollback paths, and environment consistency - Drive incident response and production readiness practices, including runbooks, on call hygiene, postmortems, capacity planning, and resilience testing - Reduce operational toil by automating repetitive work and improving internal tooling for developers and operators - Partner with engineering teams to embed reliability and operability into service design from the start, not after something fails in production - Strengthen platform security and infrastructure hygiene across access controls, secrets handling, system hardening, and production safeguards - Continuously improve system performance, resource efficiency, and cost awareness without compromising reliability Qualifications - 5+ years of experience in infrastructure, platform engineering, site reliability engineering, or software engineering with meaningful production ownership - Strong experience running production systems in AWS - Strong hands on experience with Kubernetes and container based workloads - Experience with infrastructure as code, preferably Terraform - Experience designing and operating observability stacks using tools such as Prometheus, Alertmanager, Grafana, OpenTelemetry, or equivalent systems - Strong understanding of distributed systems, failure modes, service reliability, and production debugging - Experience building or improving CI and CD systems and release workflows in modern engineering environments - Ability to write code and automation in one or more languages such as Go, Python, or TypeScript - Good judgment during incidents and a practical mindset around tradeoffs, risk, and recovery - Clear written and verbal communication skills with the ability to work effectively in a remote team - Startup experience is a plus, especially in environments where systems and processes are still being built Applicants Privacy Policy
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Serás el dueño del dato desde su origen físico en planta industrial. • Capturar datos desde servidores OPC (Windows) y dispositivos Edge (Linux) para alimentar los pipelines de streaming hacia la nube. • Responsable de que la telemetría llegue íntegra, en tiempo real y de forma confiable para analítica crítica de negocio. • Ownership técnico total sobre la infraestructura Edge y los pipelines de datos: diseñas, decides, construyes, operas y te haces responsable de que el dato fluya sin interrupciones desde el piso de planta hasta AWS.
Associate Site Reliability Engineer
HomeVisionHomeVision is a technology company dedicated to streamlining collateral underwriting through one comprehensive platform that brings efficiency, transparency, an
• Build and maintain infrastructure for our SaaS products on AWS using Terraform • Handle IT tasks such as onboarding and account provisioning • Tackle software projects for our products - typically related to authentication, reliability, observability and other platform concerns
Sobre Coderio Coderio diseña y entrega soluciones digitales escalables para empresas globales. Con una base técnica sólida y una mentalidad orientada al producto, nuestros equipos lideran proyectos de software complejos desde la arquitectura hasta la ejecución. Valoramos la autonomía, la comunicación clara y la excelencia técnica. Colaboramos estrechamente con equipos y socios internacionales, construyendo tecnología que genera impacto. 🌍 Más información: http://coderio.com Qué buscamos Buscamos un DevOps/SRE Engineer en Argentina cuya misión principal será garantizar que la plataforma sea estable, escalable y eficiente. Será responsable de la infraestructura crítica sobre la que corren todas nuestras aplicaciones, asegurando que el despliegue y la operación sean impecables. Responsabilidades: - Garantizar la estabilidad y la operación eficiente de la plataforma. - Implementar y mantener flujos de automatización y entrega continua. Requisitos Técnicos: - Kubernetes / OpenShift / EKS: nivel avanzado, incluyendo HPA, Autoscaling, Ingress y RBAC. - Infraestructura: experiencia en entornos Cloud y On-prem. - Automatización: experiencia con herramientas de CI/CD como Jenkins y GitHub Actions. - Infraestructura como Código (IaC): experiencia con Terraform y Helm. - GitOps: implementación de prácticas con herramientas como ArgoCD o Flux. Beneficios: - 100% remoto - Compromiso a largo plazo, con autonomía e impacto - Rol estratégico y de alta visibilidad en una cultura de ingeniería moderna - Equipo internacional colaborativo y liderazgo técnico sólido - Plan de carrera y crecimiento dentro de Coderio ¿Por qué unirte a Coderio? En Coderio valoramos el talento sin importar la ubicación. Somos una empresa remote-first,apasionada por la tecnología, el trabajo colaborativo y la compensación justa. Ofrecemos un entorno inclusivo, desafiante y con oportunidades reales de crecimiento. Si te motiva construir soluciones con impacto, te estamos esperando. Postula ahora.
Senior Site Reliability Engineer
EmpowerWe are an equal opportunity employer with a commitment to diversity. All individuals, regardless of personal characteristics, are encouraged to apply. All qualified applicants will receive consideration for employment without regard to age, race, color, national origin, ancestry, sex, sexual orientation, gender, gender identity, gender expression, marital status, pregnancy, religion, physical or mental disability, military or veteran status, genetic information, or any other status protected by applicable state or local law.
• Design and implement highly available, fault-tolerant systems supporting critical financial transactions. • Architect infrastructure solutions using AWS best practices, optimizing for cost, performance, and reliability. • Lead complex incident response efforts, coordinating across teams to restore service rapidly. • Drive postmortem processes for high-severity incidents, ensuring action items are identified and completed. • Establish and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for key services. • Design and implement disaster recovery strategies and business continuity plans. • Build advanced Infrastructure as Code solutions using Terraform, including modules, workspaces, and state management. • Architect and optimize multi-cluster EKS environments, including pod autoscaling, cluster autoscaling, and resource optimization. • Design observability strategies using Datadog and Splunk, including metrics, dashboards, and alerting that support proactive detection. • Implement progressive delivery mechanisms (canary and blue-green deployments) within GitOps workflows. • Build automation frameworks that reduce operational toil and improve team efficiency. • Partner with development teams to improve application reliability, including design reviews and architectural guidance. • Mentor junior and intermediate SREs through coaching and code reviews. • Contribute to architectural decisions that impact platform reliability and scalability. • Evangelize SRE best practices across the engineering organization. • Participate in on-call rotations and drive improvements to reduce on-call burden. • Implement and maintain zero-trust security controls across infrastructure. • Ensure systems meet financial services regulatory requirements and internal compliance standards. • Conduct security reviews of infrastructure changes and deployment processes. • Participate in audit preparations and respond to compliance-related inquiries.


