At Zensar, we’re “experience-led everything”. We are committed to conceptualizing, designing, engineering, marketing, and managing digital solutions and experiences for over 130 leading enterprises. We are a company driven by a bold purpose: Together, we shape experiences for better futures. Whether for our clients, our people, or the world around us, this belief powers everything we do. At the heart of our culture is ONE with Client - a set of four core values that reflect who we are and how we work: One Zensar, Nurturing, Empowering, and Client Focus. Part of the $4.8 billion RPG Group, we’re a community of 10,000+ innovators across 30+ global locations, including Milpitas, Seattle, Princeton, Cape Town, London, Zurich, Singapore, and Mexico City. We believe the best work happens when individuality is celebrated, growth is encouraged, and well-being is prioritized. We are an equal employment opportunity (EEO) and affirmative action employer, committed to creating an inclusive workplace. All qualified applicants will be considered without regard to race, creed, color, ancestry, religion, sex, national origin, citizenship, age, sexual orientation, gender identity, disability, marital status, family medical leave status, or protected veteran status.

Engineer Sr Lead, Site Reliability

DevOps EngineerDevOps EngineerFull Time Remote LeadTeam 10,001

Location

Serbia

Posted

111 days ago

Salary

Seniority

Lead

No structured requirement data.

Job Description

JD - Engineer Sr Lead, Site Reliability What you will be doing: Software Engineer/Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Solutions, Payments and Capital Markets business. In this role, the candidate will have the opportunity to make a lasting impact on the company's transformation journey, drive customer-centric innovation and automation, and position the organization as a leader in the competitive banking, payments and investment landscape. Specifically, the Site Reliability Engineer will be responsible for the following: • Design and maintain monitoring solutions for infrastructure, application performance, and user experience. • Implement automation tools to streamline tasks, scale infrastructure, and ensure seamless deployments. • Ensure application reliability, availability, and performance, minimizing downtime and optimizing response times. • Lead incident response, including identification, triage, resolution, and post-incident analysis. • Conduct capacity planning, performance tuning, and resource optimization. • Collaborate with security teams to implement best practices and ensure compliance. • Manage deployment pipelines and configuration management for consistent and reliable app deployments. • Develop and test disaster recovery plans and backup strategies. • Collaborate with development, QA, DevOps, and product teams to align on reliability goals and incident response processes. • Participate in on-call rotations and provide 24/7 support for critical incidents. What you bring: • Proficiency in development technologies, architectures, and platforms (web, API). • Experience with cloud platforms (AWS, Azure, Google Cloud) and IaC tools. • Hands-on experience with Docker, Kubernetes. • Knowledge of monitoring tools (Prometheus, Grafana, DataDog) and logging frameworks (Splunk, ELK Stack). • Experience in incident management and post-mortem reviews. • Strong troubleshooting skills for complex technical issues. • Proficiency in scripting languages (Python, Bash) and automation tools (Terraform, Ansible). • Experience with CI/CD pipelines (Jenkins, GitLab CI/CD, Azure DevOps). • Ownership approach to engineering and product outcomes. • Excellent interpersonal communication, negotiation, and influencing skills. At Zensar, we’re “experience-led everything”. We are committed to conceptualizing, designing, engineering, marketing, and managing digital solutions and experiences for over 130 leading enterprises. We are a company driven by a bold purpose: Together, we shape experiences for better futures. Whether for our clients, our people, or the world around us, this belief powers everything we do. At the heart of our culture is ONE with Client - a set of four core values that reflect who we are and how we work: One Zensar, Nurturing, Empowering, and Client Focus. Part of the $4.8 billion RPG Group, we’re a community of 10,000+ innovators across 30+ global locations, including Milpitas, Seattle, Princeton, Cape Town, London, Zurich, Singapore, and Mexico City. Explore Life at Zensar and join us to Grow. Own. Achieve. Learn. to be the best version of yourself. We believe the best work happens when individuality is celebrated, growth is encouraged, and well-being is prioritized. We are an equal employment opportunity (EEO) and affirmative action employer, committed to creating an inclusive workplace. All qualified applicants will be considered without regard to race, creed, color, ancestry, religion, sex, national origin, citizenship, age, sexual orientation, gender identity, disability, marital status, family medical leave status, or protected veteran status.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Senior Site Reliability Engineer, API Platform Engineer

ELSA, Corp

World's leading A.I. app in English speaking and communication

DevOps Engineer111 days ago

Full Time RemoteTeam 201-500Since 2015H1B No Sponsor

Company Site LinkedIn

• Join the AI Infrastructure & Platform team to build, operate, and scale the production systems that power ELSA’s APIs, platform services, and AI-enabled applications. • This Senior Site Reliability Engineer / API Platform Engineer role bridges software engineering, cloud infrastructure, and operational excellence, requiring a pragmatic, highly productive individual who can use modern AI tools and automation to accelerate delivery and improve reliability. • Collaborate closely with engineering, AI, and product teams to ensure our services are secure, scalable, observable, and resilient in real-world production environments. • Design, build, and operate reliable, scalable infrastructure for APIs, platform services, and AI-enabled applications on AWS and Kubernetes. • Own and enhance CI/CD pipelines, deployment workflows, and operational tooling to enable safe and fast software delivery. • Build and maintain robust observability systems across metrics, logging, tracing, alerting, and service health. • Lead incident response, root cause analysis, postmortems, and remediation efforts to continuously improve production reliability. • Automate repetitive operational work through software, infrastructure-as-code, and AI-assisted workflows. • Use AI-native engineering tools including copilots, intelligent automation, and agentic operational tooling to improve debugging, response time, analysis, and team productivity. • Partner with backend, platform, and AI engineering teams to productionize new services and ensure they meet reliability, security, and scalability standards. • Optimize infrastructure and runtime performance across latency, throughput, availability, and cost. • Define and enforce engineering standards for reliability, security, observability, and operational excellence across services. • Contribute production-grade software and internal tools that reduce toil and improve platform leverage across the organization.

AWS Cloud Distributed Systems EC2 Grafana Kubernetes Microservices Prometheus

View details: Senior Site Reliability Engineer, API Platform Engineer

Indonesia

Apply

Job Closed

Senior DevOps Engineer – Kubernetes Platform

Unit4

The Next-Generation in Smart Enterprise Resource Planning.

DevOps Engineer111 days ago

Full Time RemoteTeam 1,001-5,000Since 1980H1B No Sponsor

Company Site LinkedIn

• Collaborare con i Cloud Operations Engineers per progettare, operare e migliorare i servizi scalabili in esecuzione su Azure Kubernetes Service (AKS) e anche cluster Kubernetes autonomi su infrastruttura dedicata. • Garantire l'affidabilità, le prestazioni e l'osservabilità delle piattaforme container sia cloud-native che on-premise. • Evolvere e mantenere il nostro ecosistema di Infrastructure-as-Code utilizzando Bicep, modelli ARM e Terraform. • Supportare e migliorare le capacità CI/CD attraverso Azure DevOps (pipelines, boards, Git repos). • Collaborare con team tecnici trasversali per espandere l'automazione e modernizzare i processi operativi. • Agire come punto di escalation tecnico per i team di Supporto Operativo globali. • Introdurre miglioramenti nelle distribuzioni, nell'infrastruttura e nel monitoraggio con una mentalità orientata all'automazione. • Mantenere documentazione, runbooks e articoli della knowledge base chiari e accurati. • Supportare e mantenere le istanze di PostgreSQL in esecuzione in ambienti cloud e containerizzati.

Azure Cloud Docker Kubernetes PostgreSQL Terraform

View details: Senior DevOps Engineer – Kubernetes Platform

Poland

Apply

Job Closed

Senior Lead – SAP Site Reliability Engineer, FI

Kyndryl

We design, build, manage and modernize the mission-critical technology systems that the world depends on every day.

DevOps Engineer111 days ago

Full Time RemoteTeam 10,001+Since 2021H1B Sponsor

Company Site LinkedIn

• Ensure stability, availability, resilience, and reliability of SAP FI processes • Lead management of critical SAP incidents and major problems • Act as SAP functional lead for AMS service, guiding functional consultants and support teams • Oversee monitoring strategies for critical SAP processes • Maintain up-to-date functional and operational documentation

View details: Senior Lead – SAP Site Reliability Engineer, FI

Mexico

Apply

Software Engineer (DevOps & Deployment)

EagleSight.ai

DevOps Engineer111 days ago

Full Time Remote

Software Engineer (DevOps & Deployment) Location: Las Vegas, NV / Remote (frequent on-site deployment) Company: EagleSight.ai Type: Full-time About Us: EagleSight.ai is building vision agents for large venues such as hotels and casinos— powering real-time video analytics and intelligent surveillance across hundreds of camera streams. Our systems run on-prem in some of the largest resorts in Las Vegas, and many more are in the pipeline. The Role: We’re looking for a hands-on Software Engineer (DevOps & Deployment) who can help us keep our systems running reliably in the field, while also contributing to core product development. You’ll wear multiple hats — configuring and managing GPU servers, supporting deployment of AI models, supporting backend code, and troubleshooting live systems alongside casino IT teams. You Will: - Deploy, configure, and maintain EagleSight’s on-prem GPU servers in casino environments. Our stack includes – Ubuntu, Python, NVIDIA acceleration (TensorRT, Triton), RabbitMQ, Postgres and React. - Own containerization and remote monitoring for multiple sites. - Work closely with our ML and full-stack engineers to ship on-site updates, improve reliability, and debug issues across the stack. - Collaborate with casino IT/security teams to manage network access, firewalls, and system security. - Build scripts and automation to make deployment, upgrades, and monitoring seamless. - Proactively improve observability — health checks, logs, metrics, and alerting. - Be the first point of technical contact for our customers in Las Vegas. What We’re Looking For: - 3+ years of experience in DevOps engineering roles. - Strong Linux skills, Docker, networking, and performance debugging. - Comfortable working in live production environments with minimal supervision. - A startup mindset — resourceful, adaptable, and excited to work across ML, backend, and DevOps boundaries. Nice to Have: - Experience with GStreamer, FFmpeg, RTSP (or similar protocol) video pipelines. - Integration experience with enterprise VMS software - Experience with Prometheus/Grafana or similar monitoring stacks. - Familiarity with GPU environments (NVIDIA drivers, CUDA, Triton, TensorRT). Why Join Us: You’ll be joining a small, fast-moving team where your work directly impacts live systems used in large venues every day. You’ll have ownership over real infrastructure, autonomy to ship fast, and the chance to grow along with a team that has gained strong traction in a short period of time.

View details: Software Engineer (DevOps & Deployment)

United States

Apply

Engineer Sr Lead, Site Reliability

Job Description

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior Site Reliability Engineer, API Platform Engineer

Senior DevOps Engineer – Kubernetes Platform

Senior Lead – SAP Site Reliability Engineer, FI

Software Engineer (DevOps & Deployment)