At Zensar, we’re “experience-led everything”. We are committed to conceptualizing, designing, engineering, marketing, and managing digital solutions and experiences for over 130 leading enterprises. We are a company driven by a bold purpose: Together, we shape experiences for better futures. Whether for our clients, our people, or the world around us, this belief powers everything we do. At the heart of our culture is ONE with Client - a set of four core values that reflect who we are and how we work: One Zensar, Nurturing, Empowering, and Client Focus. Part of the $4.8 billion RPG Group, we’re a community of 10,000+ innovators across 30+ global locations, including Milpitas, Seattle, Princeton, Cape Town, London, Zurich, Singapore, and Mexico City. We believe the best work happens when individuality is celebrated, growth is encouraged, and well-being is prioritized. We are an equal employment opportunity (EEO) and affirmative action employer, committed to creating an inclusive workplace. All qualified applicants will be considered without regard to race, creed, color, ancestry, religion, sex, national origin, citizenship, age, sexual orientation, gender identity, disability, marital status, family medical leave status, or protected veteran status.

Engineer II, Site Reliability

DevOps EngineerDevOps EngineerFull Time Remote Mid LevelTeam 10,001

Location

India

Posted

106 days ago

Salary

Seniority

Mid Level

No structured requirement data.

Job Description

Job Description: Site Reliability Engineer (SRE) Role Overview The Software Engineer / Site Reliability Engineer (SRE) will play a critical role in driving reliability, scalability, and performance for the Banking Solutions, Payments, and Capital Markets platforms. This role blends core SRE principles, performance engineering, and service health management to support large-scale, mission-critical systems. The ideal candidate will help modernize platforms through automation-first practices, data-driven reliability metrics, and proactive performance optimization, ensuring exceptional customer experience and business continuity in a highly regulated environment. What You Will Be Doing Core SRE & Reliability Engineering Design, implement, and operate highly available, resilient, and scalable systems aligned with SRE best practices. Define and manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets to balance reliability and delivery velocity. Build and maintain service health dashboards to provide real-time visibility into platform stability and customer experience. Reduce toil through extensive automation of operational workflows, alerts, and remediation activities. Monitoring, Observability & Service Health Design and maintain end-to-end monitoring and observability solutions covering infrastructure, applications, APIs, and user journeys. Implement advanced alerting strategies to reduce noise and improve mean time to detect (MTTD) and mean time to resolution (MTTR). Leverage metrics, logs, and traces to drive root cause analysis and proactive incident prevention. Enable reliability reporting for stakeholders using SLO compliance and service health metrics. Performance Engineering & Testing Lead performance engineering initiatives, including load testing, stress testing, endurance testing, and capacity validation. Identify performance bottlenecks across application, middleware, database, and infrastructure layers. Conduct capacity planning and performance tuning to support business growth and peak traffic scenarios. Partner with development and QA teams to embed performance testing into CI/CD pipelines. Incident Management & Operations Lead and participate in incident response activities, including triage, mitigation, recovery, and post-incident reviews. Drive blameless post-mortems and ensure corrective actions are tracked to completion. Participate in on-call rotations, providing 24x7 support for critical production systems. Continuously improve operational readiness and resilience. Automation, CI/CD & Cloud Operations Design and manage deployment pipelines, configuration management, and environment consistency across lower and production environments. Implement Infrastructure as Code (IaC) practices for repeatable and secure cloud provisioning. Collaborate with DevOps teams to improve deployment reliability, rollback mechanisms, and release safety. Develop and test disaster recovery plans, backup strategies, and failover mechanisms. Collaboration & Governance Work closely with Development, QA, DevOps, Security, and Product teams to align on reliability and performance goals. Ensure platforms meet security, compliance, and regulatory requirements common in financial services. Act as a reliability and performance advocate throughout the SDLC. What You Bring Required Skills & Experience Strong experience in Core SRE practices, including reliability engineering, incident management, and automation. Proven hands-on experience in Performance Engineering / Performance Testing for large-scale distributed systems. Deep understanding and implementation experience with SLI / SLO / Error Budget frameworks. Proficiency in cloud platforms (AWS, Azure, or Google Cloud). Hands-on experience with containerization and orchestration (Docker, Kubernetes). Strong background in monitoring, observability, and logging Tools such as Prometheus, Grafana, Datadog, Splunk, ELK Stack. Experience with CI/CD pipelines (Jenkins, GitLab CI/CD, Azure DevOps). Proficiency in scripting and automation using Python, Bash, Terraform, Ansible. Strong troubleshooting skills across application, infrastructure, and network layers. Experience designing and running incident response and post-mortem reviews. Ownership mindset with accountability for service reliability and customer outcomes. Excellent communication, collaboration, and stakeholder management skills. Nice to Have (SRE+ Skills) Experience with Keptn or similar tools for automated SLO-based quality gates and continuous delivery. Programming experience in Java, especially for debugging, performance profiling, or building automation tools. Familiarity with chaos engineering practices and tools. Experience working in banking, payments, or capital markets domains. Knowledge of security best practices and regulatory compliance in enterprise environment At Zensar, we’re “experience-led everything”. We are committed to conceptualizing, designing, engineering, marketing, and managing digital solutions and experiences for over 130 leading enterprises. We are a company driven by a bold purpose: Together, we shape experiences for better futures. Whether for our clients, our people, or the world around us, this belief powers everything we do. At the heart of our culture is ONE with Client - a set of four core values that reflect who we are and how we work: One Zensar, Nurturing, Empowering, and Client Focus. Part of the $4.8 billion RPG Group, we’re a community of 10,000+ innovators across 30+ global locations, including Milpitas, Seattle, Princeton, Cape Town, London, Zurich, Singapore, and Mexico City. Explore Life at Zensar and join us to Grow. Own. Achieve. Learn. to be the best version of yourself. We believe the best work happens when individuality is celebrated, growth is encouraged, and well-being is prioritized. We are an equal employment opportunity (EEO) and affirmative action employer, committed to creating an inclusive workplace. All qualified applicants will be considered without regard to race, creed, color, ancestry, religion, sex, national origin, citizenship, age, sexual orientation, gender identity, disability, marital status, family medical leave status, or protected veteran status.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Senior Java Engineer

Ciklum

Experiences of Tomorrow, Engineered Together!

DevOps Engineer106 days ago

Full Time RemoteTeam 1,001-5,000Since 2002H1B No Sponsor

Company Site LinkedIn

Ciklum is looking for a Senior Java Engineer to join our team full-time in Bulgaria. We are a custom product engineering company that supports both multinational organizations and scaling startups to solve their most complex business challenges. With a global team of over 4,000 highly skilled developers, consultants, analysts and product owners, we engineer technology that redefines industries and shapes the way people live. About the role: As a Senior Java Engineer, become a part of a cross-functional development team engineering experiences of tomorrow. Responsibilities: - Analyze existing Java EE components and document business logic - Design clean, testable Spring Boot services and data access interfaces - Plan and execute MySQL to MongoDB data model transformation - Write unit/integration tests prior to implementation - Execute incremental migration with parallel validation - Ensure performance parity with legacy application Requirements: - Java/Spring Boot - Strong experience building enterprise applications, REST APIs, and data access layers with Spring Data - Jakarta EE/Java EE - Ability to analyze and deconstruct legacy Java EE components (EJBs, JPA, CDI) - MySQL - Deep understanding of relational data models and query optimization - MongoDB - Experience with NoSQL patterns, document modeling, and Spring Data MongoDB - TDD/Unit Testing - Proven track record writing comprehensive test suites before implementation (JUnit, Mockito, Testcontainers) Desirable: - Python - For scripting, automation, and migration tooling - LLM/AI Tools - Experience leveraging AI for code analysis, conversion assistance, and documentation generation - Kafka - Understanding of event-driven architectures for data synchronization - Docker/Kubernetes - Familiarity with containerized deployments What`s in it for you? - Regular salary reviews based on performance - Corporate events: webinars, offline parties, and meetups - Internal Mobility Program - Tailored education path (including full access to Udemy, certifications, etc.) - 25 paid days off: 20 business days of vacation per calendar year + 5 undocumented sick leave days - Additional health insurance - 100% company-covered Multisport card, with discounts available for family members About us: At Ciklum, we are always exploring innovations, empowering each other to achieve more, and engineering solutions that matter. With us, you’ll work with cutting-edge technologies, contribute to impactful projects, and be part of a One Team culture that values collaboration and progress. Since expanding to Bulgaria in 2022, we’ve been building a fast-growing team that thrives on learning, collaboration, and innovation. Join us on this exciting journey and help shape the future of our delivery center. Want to learn more about us? Follow us on Instagram, Facebook, LinkedIn. Explore, empower, engineer with Ciklum! Interested already? We would love to get to know you! Submit your application. We can’t wait to see you at Ciklum.

View details: Senior Java Engineer

Bulgaria

Apply

Senior Java Engineer

Ciklum

Experiences of Tomorrow, Engineered Together!

DevOps Engineer106 days ago

Full Time RemoteTeam 1,001-5,000Since 2002H1B No Sponsor

Company Site LinkedIn

View details: Senior Java Engineer

Bulgaria

Apply

Staff DevOps Engineer

Dexcom

Empowering people to take control of health

DevOps Engineer106 days ago

Full Time RemoteTeam 10,001+Since 1999H1B Sponsor

Company Site LinkedIn

The Company Dexcom Corporation (NASDAQ DXCM) is a pioneer and global leader in continuous glucose monitoring (CGM). Dexcom began as a small company with a big dream: To forever change how diabetes is managed. To unlock information and insights that drive better health outcomes. Here we are 25 years later, having pioneered an industry. And we're just getting started. We are broadening our vision beyond diabetes to empower people to take control of health. That means personalized, actionable insights aimed at solving important health challenges. To continue what we've started: Improving human health. We are driven by thousands of ambitious, passionate people worldwide who are willing to fight like warriors to earn the trust of our customers by listening, serving with integrity, thinking big, and being dependable. We've already changed millions of lives and we're ready to change millions more. Our future ambition is to become a leading consumer health technology company while continuing to develop solutions for serious health conditions. We'll get there by constantly reinventing unique biosensing-technology experiences. Though we've come a long way from our small company days, our dreams are bigger than ever. The opportunity to improve health on a global scale stands before us. Meet the team: The IT Cloud Platform Services teams is a dynamic mix of tech enthusiasts, problem solvers, and creative thinkers, united by our passion for leveraging cutting-edge technology to transform healthcare and empower our customers to take real-time control of their health. Where you come in: As a Staff DevOps Engineer, you will play a critical role in our dynamic team, contributing to the management, operation, and optimization of our GCP, AWS, and Azure cloud infrastructure. Your expertise will help us achieve seamless deployment, automation, and monitoring of our services. You will: - Implement and Manage Infrastructure: Utilize Terraform to define and provision our GCP infrastructure, ensuring it is scalable, reliable, and cost-effective. - Automate Configuration Management: Use Ansible to automate the configuration and management of our systems, improving efficiency and reducing the risk of manual errors. - Monitor and Optimize Performance: Leverage DataDog to monitor system performance and application metrics, identifying and resolving issues before they impact our users. - Collaborate Across Teams: Work closely with software engineers, QA, and operations teams to ensure smooth integration and delivery of our applications. - Innovate and Improve: Continuously explore and implement new tools and practices to enhance our DevOps processes and improve overall system reliability and performance. - Your role will be essential in supporting our mission to deliver high-quality, reliable services to our customers. If you are passionate about cloud technologies, automation, and continuous improvement, we want to hear from you! Your duties typically include: - Leading Implementation of Solutions: You will collaborate with architects and engineers to build robust and scalable architectures on GCP to support the organization's applications and services. This involves understanding business requirements; complex, multi-cloud and facility networking and interconnects; evaluating cloud services across GCP, Azure, and AWS; and designing solutions that meet performance, reliability, and cost requirements. - Infrastructure as Code (IaC): Implementing Infrastructure as Code practices using tools like Terraform, Ansible, or Google Cloud Deployment Manager to automate the provisioning and management of resources across our cloud platforms. - Serverless Technologies: You will leverage serverless technologies to build scalable and cost-effective solutions. This includes implementing serverless architectures using tools like Google Cloud Functions and AWS Lambda, optimizing serverless workflows, and ensuring seamless integration with other cloud services. - Expertise in GKE and Kubernetes: You will demonstrate expertise in Google Kubernetes Engine (GKE) and Kubernetes, helping the team design and implement a containerization strategy, and manage containerized platforms to ensure high availability and scalability. This includes configuring and optimizing GKE clusters, implementing best practices for Kubernetes management, and troubleshooting issues within the Kubernetes environment. - Cloud Networking and Security at the Edge: You will implement secure and efficient cloud networking solutions. This includes configuring virtual networks, managing network security groups, implementing cloud firewalls, and ensuring compliance with security standards. You will also be responsible for monitoring and optimizing network performance, troubleshooting network issues, and collaborating with security teams to ensure robust security measures are in place. - Performance, FinOps, and Cost Optimization: Continuously monitoring and optimizing the performance of cloud systems to ensure optimal resource utilization, scalability, and reliability. Identify and report on bottlenecks, wasted resources, and expenditures; tune configurations; and implement best practices to improve system performance in a financially responsible manner. - Security and Compliance: Implementing security best practices and ensuring compliance with relevant regulations and standards (e.g., HIPAA, GDPR) in cloud environments. This involves configuring identity and access management, network security, encryption, and auditing/logging solutions to protect data and resources. - Collaboration and Documentation: Collaborating with cross-functional teams including developers, operations, and security teams, to ensure alignment on architecture decisions and implementation strategies. Documenting design decisions, configurations, and procedures to facilitate knowledge sharing and future troubleshooting. - Overall, as a Staff DevOps Engineer who specializes in GCP, your role is crucial in maintaining reliable, scalable, and secure cloud-based systems that enable the organization to leverage the full potential of our cloud platforms. What makes you successful: - You have a good understanding of cloud principles and services. Experience in Google Cloud Platform (GCP) and Azure is required. - Your understanding of Terraform allows you to define and manage infrastructure as code (IaC) efficiently. Experience with Terraform Cloud or similar is preferred. - You provide technical leadership and work direction to junior staff, including both project and operational activities - You are familiar with using Ansible or other CM tools for configuration management and automation. - Your experience with DataDog or other observability platforms enables you to effectively monitor and optimize system performance. - You possess a solid foundation in scripting languages such as Python, Bash, or similar. - You bring excellent problem-solving skills and the ability to troubleshoot complex issues. - Your collaborative mindset and effective communication skills allows you to work seamlessly with cross-functional teams. - Your documentation hygiene is impeccable, reflecting your understanding that knowledge is better when shared - You are committed to continuous learning and staying updated with the latest Cloud Platform tools and practices. Nice to have: - Familiarity or experience with FOCUS, GenAI and associated technologies What you’ll get: - A front row seat to life changing CGM technology. Learn about our brave #dexcomwarriors community. - A full and comprehensive benefits program. - Growth opportunities on a global scale. - Access to career development through in-house learning programs and/or qualified tuition reimbursement. - An exciting and innovative, industry-leading organization committed to our employees, customers, and the communities we serve. Travel Required: - 0-5% Experience and Education Requirements: - Typically requires a Bachelor’s degree in a technical discipline, and a minimum of 8-12 years related experience or Master’s degree and 5-7 years equivalent industry experience or a PhD and 2-4 years of experience. Remote Workplace: Your location will be a home office; you are not required to live within commuting distance of your assigned Dexcom site (typically 75 miles/120km). If you reside within commuting distance of a Dexcom site (typically 75 miles/120km) a hybrid working environment may be available. Ask about our Flex workplace option. Please note: The information contained herein is not intended to be an all-inclusive list of the duties and responsibilities of the job, nor are they intended to be an all-inclusive list of the skills and abilities required to do the job. Management may, at its discretion, assign or reassign duties and responsibilities to this job at any time. The duties and responsibilities in this job description may be subject to change at any time due to reasonable accommodation or other reasons. Reasonable accommodations may be made to enable individuals with disabilities to perform essential functions. An Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, age, national origin, or protected veteran status and will not be discriminated against on the basis of disability. Dexcom’s AAP may be viewed upon request by contacting Talent Acquisition at talentacquisition@dexcom.com. If you are an individual with a disability and would like to request a reasonable accommodation as part of the employment selection process, please contact Dexcom Talent Acquisition at talentacquisition@dexcom.com. Meritain, an Aetna Company, creates and publishes the Machine-Readable Files on behalf of Dexcom. To link to the Machine-Readable Files, please click on the URL provided: https://health1.meritain.com/app/public/#/one/insurerCode=MERITAIN_I&brandCode=MERITAINOVER/machine-readable-transparency-in-coverage?reportingEntityType=TPA_19874&lock=true To all Staffing and Recruiting Agencies: Our Careers Site is only for individuals seeking a job at Dexcom. Only authorized staffing and recruiting agencies may use this site or to submit profiles, applications or resumes on specific requisitions. Dexcom does not accept unsolicited resumes or applications from agencies. Please do not forward resumes to the Talent Acquisition team, Dexcom employees or any other company location. Dexcom is not responsible for any fees related to unsolicited resumes/applications. Salary: $122,500.00 - $204,100.00

View details: Staff DevOps Engineer

United States

$122K - $204K / year

Apply

Job Closed

Reliability Engineer IV

TalentWerx

Speed, Accuracy, and Cost savings... experience the TalentWerx difference.

DevOps Engineer106 days ago

Full Time RemoteTeam 11-50H1B No Sponsor

Company Site LinkedIn

• Ensure the availability, performance, monitoring, and incident response of cloud platforms and services • Ensure compliance with requirements for production • Manage failures and resource issues • Use metrics like MTTR and MTTF • Develop technical solutions to complex problems • Document findings and conduct root cause analysis • Collaborate with engineering and development teams • Monitor production equipment diagnostics • Recommend design and process modifications to improve reliability

Cloud

View details: Reliability Engineer IV

United States

$120.2K - $127.0K / year

Apply

Job Closed

Engineer II, Site Reliability

Job Description

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior Java Engineer

Senior Java Engineer

Staff DevOps Engineer

Reliability Engineer IV