Job Closed

This listing is no longer active.

ELSA, Corp logo
ELSA, Corp

World's leading A.I. app in English speaking and communication

Senior Site Reliability Engineer, API Platform Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 201-500Since 2015H1B No SponsorCompany SiteLinkedIn

Location

Indonesia

Posted

55 days ago

Salary

0

Seniority

Senior

Job Description

Senior Site Reliability Engineer, API Platform Engineer

ELSA, Corp

• Join the AI Infrastructure & Platform team to build, operate, and scale the production systems that power ELSA’s APIs, platform services, and AI-enabled applications. • This Senior Site Reliability Engineer / API Platform Engineer role bridges software engineering, cloud infrastructure, and operational excellence, requiring a pragmatic, highly productive individual who can use modern AI tools and automation to accelerate delivery and improve reliability. • Collaborate closely with engineering, AI, and product teams to ensure our services are secure, scalable, observable, and resilient in real-world production environments. • Design, build, and operate reliable, scalable infrastructure for APIs, platform services, and AI-enabled applications on AWS and Kubernetes. • Own and enhance CI/CD pipelines, deployment workflows, and operational tooling to enable safe and fast software delivery. • Build and maintain robust observability systems across metrics, logging, tracing, alerting, and service health. • Lead incident response, root cause analysis, postmortems, and remediation efforts to continuously improve production reliability. • Automate repetitive operational work through software, infrastructure-as-code, and AI-assisted workflows. • Use AI-native engineering tools including copilots, intelligent automation, and agentic operational tooling to improve debugging, response time, analysis, and team productivity. • Partner with backend, platform, and AI engineering teams to productionize new services and ensure they meet reliability, security, and scalability standards. • Optimize infrastructure and runtime performance across latency, throughput, availability, and cost. • Define and enforce engineering standards for reliability, security, observability, and operational excellence across services. • Contribute production-grade software and internal tools that reduce toil and improve platform leverage across the organization.

Job Requirements

  • Strong experience in Site Reliability Engineering, DevOps, Platform Engineering, or Infrastructure Software Engineering, with a track record of operating production systems at scale.
  • Solid experience writing and maintaining production-grade software for live systems and internal platform tooling.
  • Deep expertise in cloud infrastructure and distributed systems, particularly on AWS, including EKS, EC2, IAM, VPC, CloudWatch, and related services.
  • Hands-on experience running Kubernetes-based services in production environments.
  • Strong experience operating APIs and microservices in production, including release workflows, failure recovery, and service hardening.
  • Hands-on experience with observability and monitoring tools such as Prometheus, Grafana, SigNoz, Sentry, OpenTelemetry, or similar systems.
  • Strong understanding of CI/CD practices, incident management, production monitoring, and service reliability engineering.
  • Experience with infrastructure-as-code and automation tooling.
  • Experience using AI tools and automation as a core part of your engineering workflow to increase productivity, reduce toil, and improve execution quality.
  • Strong judgment, ownership, and follow-through. You take on hard operational problems and drive them through resolution.

Benefits

  • Flexible work setup: Remote-first for Indonesia, Malaysia, Thailand, Taiwan; hybrid model for Vietnam.
  • Comprehensive employee well-being benefits.
  • Free ELSA Premium courses to polish your language skills.
  • Collaborative, international team culture.
  • Opportunity to contribute to a fast-growing, well-funded Silicon Valley startup with global impact.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Unit4 logo

Senior DevOps Engineer – Kubernetes Platform

Unit4

The Next-Generation in Smart Enterprise Resource Planning.

DevOps Engineer55 days ago
Full TimeRemoteTeam 1,001-5,000Since 1980H1B No Sponsor

• Collaborare con i Cloud Operations Engineers per progettare, operare e migliorare i servizi scalabili in esecuzione su Azure Kubernetes Service (AKS) e anche cluster Kubernetes autonomi su infrastruttura dedicata. • Garantire l'affidabilità, le prestazioni e l'osservabilità delle piattaforme container sia cloud-native che on-premise. • Evolvere e mantenere il nostro ecosistema di Infrastructure-as-Code utilizzando Bicep, modelli ARM e Terraform. • Supportare e migliorare le capacità CI/CD attraverso Azure DevOps (pipelines, boards, Git repos). • Collaborare con team tecnici trasversali per espandere l'automazione e modernizzare i processi operativi. • Agire come punto di escalation tecnico per i team di Supporto Operativo globali. • Introdurre miglioramenti nelle distribuzioni, nell'infrastruttura e nel monitoraggio con una mentalità orientata all'automazione. • Mantenere documentazione, runbooks e articoli della knowledge base chiari e accurati. • Supportare e mantenere le istanze di PostgreSQL in esecuzione in ambienti cloud e containerizzati.

Poland
Job Closed
Kyndryl logo

Senior Lead – SAP Site Reliability Engineer, FI

Kyndryl

We design, build, manage and modernize the mission-critical technology systems that the world depends on every day.

DevOps Engineer55 days ago
Full TimeRemoteTeam 10,001+Since 2021H1B Sponsor

• Ensure stability, availability, resilience, and reliability of SAP FI processes • Lead management of critical SAP incidents and major problems • Act as SAP functional lead for AMS service, guiding functional consultants and support teams • Oversee monitoring strategies for critical SAP processes • Maintain up-to-date functional and operational documentation

Mexico

Software Engineer (DevOps & Deployment) Location: Las Vegas, NV / Remote (frequent on-site deployment) Company: EagleSight.ai Type: Full-time About Us: EagleSight.ai is building vision agents for large venues such as hotels and casinos— powering real-time video analytics and intelligent surveillance across hundreds of camera streams. Our systems run on-prem in some of the largest resorts in Las Vegas, and many more are in the pipeline. The Role: We’re looking for a hands-on Software Engineer (DevOps & Deployment) who can help us keep our systems running reliably in the field, while also contributing to core product development. You’ll wear multiple hats — configuring and managing GPU servers, supporting deployment of AI models, supporting backend code, and troubleshooting live systems alongside casino IT teams. You Will: - Deploy, configure, and maintain EagleSight’s on-prem GPU servers in casino environments. Our stack includes – Ubuntu, Python, NVIDIA acceleration (TensorRT, Triton), RabbitMQ, Postgres and React. - Own containerization and remote monitoring for multiple sites. - Work closely with our ML and full-stack engineers to ship on-site updates, improve reliability, and debug issues across the stack. - Collaborate with casino IT/security teams to manage network access, firewalls, and system security. - Build scripts and automation to make deployment, upgrades, and monitoring seamless. - Proactively improve observability — health checks, logs, metrics, and alerting. - Be the first point of technical contact for our customers in Las Vegas. What We’re Looking For: - 3+ years of experience in DevOps engineering roles. - Strong Linux skills, Docker, networking, and performance debugging. - Comfortable working in live production environments with minimal supervision. - A startup mindset — resourceful, adaptable, and excited to work across ML, backend, and DevOps boundaries. Nice to Have: - Experience with GStreamer, FFmpeg, RTSP (or similar protocol) video pipelines. - Integration experience with enterprise VMS software - Experience with Prometheus/Grafana or similar monitoring stacks. - Familiarity with GPU environments (NVIDIA drivers, CUDA, Triton, TensorRT). Why Join Us: You’ll be joining a small, fast-moving team where your work directly impacts live systems used in large venues every day. You’ll have ownership over real infrastructure, autonomy to ship fast, and the chance to grow along with a team that has gained strong traction in a short period of time.

United States

DevOps Specialist

Experian

We're unlocking the power of data to help create a better tomorrow.

DevOps Engineer55 days ago
Full TimeRemoteTeam 10,001+Since 1996H1B Sponsor

• Act as the embedded DevOps partner for a software development team • Design, implement, and maintain cloud infrastructure on AWS • Develop and manage Infrastructure as Code using Terraform • Provision and manage cloud resources like EC2 and databases • Collaborate with the central DevOps / Infrastructure team

Brazil