Job Closed
This listing is no longer active.
World's leading A.I. app in English speaking and communication
Senior Site Reliability Engineer, API Platform Engineer
Location
Indonesia
Posted
55 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer, API Platform Engineer
ELSA, Corp
• Join the AI Infrastructure & Platform team to build, operate, and scale the production systems that power ELSA’s APIs, platform services, and AI-enabled applications. • This Senior Site Reliability Engineer / API Platform Engineer role bridges software engineering, cloud infrastructure, and operational excellence, requiring a pragmatic, highly productive individual who can use modern AI tools and automation to accelerate delivery and improve reliability. • Collaborate closely with engineering, AI, and product teams to ensure our services are secure, scalable, observable, and resilient in real-world production environments. • Design, build, and operate reliable, scalable infrastructure for APIs, platform services, and AI-enabled applications on AWS and Kubernetes. • Own and enhance CI/CD pipelines, deployment workflows, and operational tooling to enable safe and fast software delivery. • Build and maintain robust observability systems across metrics, logging, tracing, alerting, and service health. • Lead incident response, root cause analysis, postmortems, and remediation efforts to continuously improve production reliability. • Automate repetitive operational work through software, infrastructure-as-code, and AI-assisted workflows. • Use AI-native engineering tools including copilots, intelligent automation, and agentic operational tooling to improve debugging, response time, analysis, and team productivity. • Partner with backend, platform, and AI engineering teams to productionize new services and ensure they meet reliability, security, and scalability standards. • Optimize infrastructure and runtime performance across latency, throughput, availability, and cost. • Define and enforce engineering standards for reliability, security, observability, and operational excellence across services. • Contribute production-grade software and internal tools that reduce toil and improve platform leverage across the organization.
Job Requirements
- Strong experience in Site Reliability Engineering, DevOps, Platform Engineering, or Infrastructure Software Engineering, with a track record of operating production systems at scale.
- Solid experience writing and maintaining production-grade software for live systems and internal platform tooling.
- Deep expertise in cloud infrastructure and distributed systems, particularly on AWS, including EKS, EC2, IAM, VPC, CloudWatch, and related services.
- Hands-on experience running Kubernetes-based services in production environments.
- Strong experience operating APIs and microservices in production, including release workflows, failure recovery, and service hardening.
- Hands-on experience with observability and monitoring tools such as Prometheus, Grafana, SigNoz, Sentry, OpenTelemetry, or similar systems.
- Strong understanding of CI/CD practices, incident management, production monitoring, and service reliability engineering.
- Experience with infrastructure-as-code and automation tooling.
- Experience using AI tools and automation as a core part of your engineering workflow to increase productivity, reduce toil, and improve execution quality.
- Strong judgment, ownership, and follow-through. You take on hard operational problems and drive them through resolution.
Benefits
- Flexible work setup: Remote-first for Indonesia, Malaysia, Thailand, Taiwan; hybrid model for Vietnam.
- Comprehensive employee well-being benefits.
- Free ELSA Premium courses to polish your language skills.
- Collaborative, international team culture.
- Opportunity to contribute to a fast-growing, well-funded Silicon Valley startup with global impact.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior DevOps Engineer – Kubernetes Platform
Unit4The Next-Generation in Smart Enterprise Resource Planning.
• Collaborare con i Cloud Operations Engineers per progettare, operare e migliorare i servizi scalabili in esecuzione su Azure Kubernetes Service (AKS) e anche cluster Kubernetes autonomi su infrastruttura dedicata. • Garantire l'affidabilità, le prestazioni e l'osservabilità delle piattaforme container sia cloud-native che on-premise. • Evolvere e mantenere il nostro ecosistema di Infrastructure-as-Code utilizzando Bicep, modelli ARM e Terraform. • Supportare e migliorare le capacità CI/CD attraverso Azure DevOps (pipelines, boards, Git repos). • Collaborare con team tecnici trasversali per espandere l'automazione e modernizzare i processi operativi. • Agire come punto di escalation tecnico per i team di Supporto Operativo globali. • Introdurre miglioramenti nelle distribuzioni, nell'infrastruttura e nel monitoraggio con una mentalità orientata all'automazione. • Mantenere documentazione, runbooks e articoli della knowledge base chiari e accurati. • Supportare e mantenere le istanze di PostgreSQL in esecuzione in ambienti cloud e containerizzati.
Senior Lead – SAP Site Reliability Engineer, FI
KyndrylWe design, build, manage and modernize the mission-critical technology systems that the world depends on every day.
• Ensure stability, availability, resilience, and reliability of SAP FI processes • Lead management of critical SAP incidents and major problems • Act as SAP functional lead for AMS service, guiding functional consultants and support teams • Oversee monitoring strategies for critical SAP processes • Maintain up-to-date functional and operational documentation
Software Engineer (DevOps & Deployment) Location: Las Vegas, NV / Remote (frequent on-site deployment) Company: EagleSight.ai Type: Full-time About Us: EagleSight.ai is building vision agents for large venues such as hotels and casinos— powering real-time video analytics and intelligent surveillance across hundreds of camera streams. Our systems run on-prem in some of the largest resorts in Las Vegas, and many more are in the pipeline. The Role: We’re looking for a hands-on Software Engineer (DevOps & Deployment) who can help us keep our systems running reliably in the field, while also contributing to core product development. You’ll wear multiple hats — configuring and managing GPU servers, supporting deployment of AI models, supporting backend code, and troubleshooting live systems alongside casino IT teams. You Will: - Deploy, configure, and maintain EagleSight’s on-prem GPU servers in casino environments. Our stack includes – Ubuntu, Python, NVIDIA acceleration (TensorRT, Triton), RabbitMQ, Postgres and React. - Own containerization and remote monitoring for multiple sites. - Work closely with our ML and full-stack engineers to ship on-site updates, improve reliability, and debug issues across the stack. - Collaborate with casino IT/security teams to manage network access, firewalls, and system security. - Build scripts and automation to make deployment, upgrades, and monitoring seamless. - Proactively improve observability — health checks, logs, metrics, and alerting. - Be the first point of technical contact for our customers in Las Vegas. What We’re Looking For: - 3+ years of experience in DevOps engineering roles. - Strong Linux skills, Docker, networking, and performance debugging. - Comfortable working in live production environments with minimal supervision. - A startup mindset — resourceful, adaptable, and excited to work across ML, backend, and DevOps boundaries. Nice to Have: - Experience with GStreamer, FFmpeg, RTSP (or similar protocol) video pipelines. - Integration experience with enterprise VMS software - Experience with Prometheus/Grafana or similar monitoring stacks. - Familiarity with GPU environments (NVIDIA drivers, CUDA, Triton, TensorRT). Why Join Us: You’ll be joining a small, fast-moving team where your work directly impacts live systems used in large venues every day. You’ll have ownership over real infrastructure, autonomy to ship fast, and the chance to grow along with a team that has gained strong traction in a short period of time.
• Act as the embedded DevOps partner for a software development team • Design, implement, and maintain cloud infrastructure on AWS • Develop and manage Infrastructure as Code using Terraform • Provision and manage cloud resources like EC2 and databases • Collaborate with the central DevOps / Infrastructure team


