Job Closed
This listing is no longer active.
At Zensar, we’re “experience-led everything”. We are committed to conceptualizing, designing, engineering, marketing, and managing digital solutions and experiences for over 130 leading enterprises. We are a company driven by a bold purpose: Together, we shape experiences for better futures. Whether for our clients, our people, or the world around us, this belief powers everything we do. At the heart of our culture is ONE with Client - a set of four core values that reflect who we are and how we work: One Zensar, Nurturing, Empowering, and Client Focus. Part of the $4.8 billion RPG Group, we’re a community of 10,000+ innovators across 30+ global locations, including Milpitas, Seattle, Princeton, Cape Town, London, Zurich, Singapore, and Mexico City. We believe the best work happens when individuality is celebrated, growth is encouraged, and well-being is prioritized. We are an equal employment opportunity (EEO) and affirmative action employer, committed to creating an inclusive workplace. All qualified applicants will be considered without regard to race, creed, color, ancestry, religion, sex, national origin, citizenship, age, sexual orientation, gender identity, disability, marital status, family medical leave status, or protected veteran status.
SRE
Location
India
Posted
52 days ago
Salary
0
Seniority
Mid Level
No structured requirement data.
Job Description
SRE
Zensar
Cloud & Infrastructure Expertise: Strong knowledge of AWS services (EC2, RDS, S3, IAM), networking, and Infrastructure as Code (Terraform/CloudFormation). Reliability & Automation Skills: Proficiency in CI/CD pipelines, monitoring tools (CloudWatch, Prometheus, Grafana), and incident response automation. Security & Performance Focus: Ability to enforce IAM policies, compliance standards, and optimize workloads for scalability, resilience, and cost efficiency
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Head of Security – DevOps
iTalent PLUSA Recruitment agency that aims to simplify the hiring needs of organisations.
• Develop, implement, and maintain the organisation’s information security strategy and cybersecurity framework. • Establish security policies, standards, and governance structures to protect systems, infrastructure, and data assets. • Ensure alignment of security practices with operational objectives and the broader technology roadmap. • Own the reliability and scalability of the organisation’s cloud infrastructure, including container orchestration, CI/CD pipelines, observability, and disaster recovery. • Design and maintain infrastructure-as-code (IaC) across AWS or equivalent cloud platforms, ensuring reproducibility, auditability, and least-privilege access. • Build and optimise CI/CD pipelines to enable fast and secure deployments, including Docker build caching, multi-stage builds, and automated testing gates. • Establish SLOs, SLIs, and error budgets, while leading incident management and on-call practices. • Architect and maintain disaster recovery and business continuity plans, including cross-region failover and backup strategies. • Drive cloud cost optimisation while maintaining high performance and security standards. • Identify, assess, and manage cybersecurity risks across the organisation’s technology environment. • Implement risk mitigation strategies and security controls to protect critical infrastructure and digital assets. • Monitor emerging cyber threats and vulnerabilities that could impact operations or infrastructure. • Oversee monitoring, detection, and response processes for cybersecurity incidents and vulnerabilities. • Coordinate incident response activities, ensuring proper investigation, containment, and remediation. • Support the development and maintenance of incident response plans and procedures. • Oversee security frameworks related to digital assets, wallets, and transaction infrastructure where applicable. • Support safeguards that protect wallet systems, transaction flows, and overall platform integrity. • Collaborate with Risk, Fraud, and Product teams to strengthen controls against abuse, account compromise, and system manipulation. • Ensure alignment with relevant regulatory obligations, compliance requirements, and industry standards. • Support internal and external audits, risk assessments, and compliance reviews. • Maintain oversight of data protection, security controls, and governance frameworks. • Promote a strong culture of security awareness through training, guidance, and knowledge-sharing initiatives. • Identify opportunities to enhance the organisation’s cybersecurity posture through improved tools, processes, and practices.
Principal Site Reliability Engineer
Parallel DomainSynthetic data for computer vision and perception.
About the Role Parallel Domain is looking for a Principal Site Reliability Engineer to own the reliability, scalability, and security of our cloud infrastructure - the backbone that runs simulation workloads for some of the most demanding customers in autonomous vehicle development. This is a hands-on, high-ownership role. You'll be the primary infrastructure owner across our multi-region AWS/EKS platform, working closely with a small platform engineering team, partnering with engineering leads across simulation and ML, and our customer-facing teams. What You'll Do Infrastructure Ownership & Cloud Operations - Own and evolve our AWS-based infrastructure, improving platform performance and availability today, and building toward deployable configurations that support enterprise customer environments tomorrow. - Own EKS cluster operations across production regions: node pool strategy, AMI lifecycle, autoscaling, and Kubernetes workload health. - Support the GitOps deployment pipeline - define, deploy, and manage applications across clusters using infrastructure-as-code. - Manage complex networking: VPC design, cross-region connectivity, DNS, and load balancing. - Lead infrastructure deprecation and migration efforts with minimal disruption. Reliability Engineering & Incident Response - Own SLO measurement infrastructure; enable proactive triage of emerging issues before they impact customers. - Lead incident investigation, root cause analysis and postmortems, driving systemic fixes rather than one-off patches. - Design and improve automated remediation systems to reduce MTTR. Security & Access Management - Review and provide security-conscious feedback on platform architecture decisions. - Own cloud IAM governance - roles, policies, and access boundaries across accounts and services. - Lead compliance-adjacent work including audit-readiness, partner certification requirements, and supporting responses to customer security questionnaires. Cross-Functional Collaboration - Partner with application development teams to build an inherently secure platform and drive next-generation deployment architecture. - Partner with customer teams to ensure availability for expected utilization. - Partner with Finance on cloud cost optimization - lifecycle policies, right-sizing, and spend visibility. - Support GPU and batch workloads in collaboration with simulation and ML engineering teams. Platform Tooling & Developer Experience - Improve CI/CD pipelines and automated infrastructure validation. - Support engineering teams with infra-side debugging, log analysis, and environment configuration. What We're Looking For Technical Depth - 5+ years in SRE, DevOps, or infrastructure engineering roles. - Infrastructure-as-code proficiency - Terraform modules, state management, and multi-environment patterns. - Deep AWS experience - EKS, EC2, IAM, S3, Storage Gateway, VPC networking, Transit Gateway, CloudFront, KMS, and IRSA. - Kubernetes expertise - cluster operations, node pools, probes, cordoning, pod scheduling, RBAC, Helm, node autoscaling (Karpenter experience a plus); solid understanding of containerization and AMI lifecycle management. - CI/CD - experience with GitOps workflows and pipeline tooling (ArgoCD, GitHub Actions, Jenkins) - Solid networking fundamentals - CIDR design, security groups, DNS, load balancing, VPN, cross-region connectivity. - Experience with monitoring and observability tooling - Prometheus, Grafana, Elasticsearch. - Comfort with Python and Bash for tooling and automation. - Familiarity working across Linux and Windows environments. Operational familiarity with Windows Server is a meaningful advantage. Communication & Ownership - You communicate clearly across engineering, product, and customer-facing teams, flagging issues with urgency proportional to customer impact. - You advocate for SRE best practices and can effectively operationalize an informed and principled view on security. - You take end-to-end ownership of complex, multi-team efforts - from planning through execution and post-change verification. - You know when to push for a clean solution vs. when to accept a pragmatic one, and you communicate that tradeoff clearly. Nice to Have - Experience with Windows-based workloads on EKS. - Experience supporting simulation, ML, or rendering workloads in cloud infrastructure; running GPU workloads on Kubernetes, including NVIDIA and DirectX device plugin configuration. - Experience with AWS Storage Gateway or Transfer Family integrations. - Familiarity with Envoy Gateway or similar. - Experience with container-optimized OS images (e.g., Bottlerocket, Packer). - Experience with cloud cost optimization at scale. Core ToolsTerraform · AWS · Kubernetes · Helm · ArgoCD · Kustomize · Grafana · Prometheus · Elasticsearch · VictoriaLogs · Fluent Bit · GitHub Actions · Jenkins · Docker · Python · Bash Why This Role PD's simulation platform runs at the intersection of high-performance compute, distributed systems, and customer-critical reliability. The infrastructure problems here are genuinely interesting — multi-region GPU scheduling, Windows workloads on Kubernetes, startup latency optimization, and an enterprise product direction that will require rethinking how we deploy and manage the platform entirely. The Principal SRE at PD is not a ticket-taker - it's a high-trust, high-autonomy position where you'll have genuine influence over infrastructure architecture, cross-team process, and customer experience.
Senior Site Reliability Engineer
Centene CorporationCentene Corporation is a Fortune 500, mission-driven healthcare leader committed to transforming the health of the communities we service, one person at a time. Through our local m
You could be the one who changes everything for our 28 million members by using technology to improve health outcomes around the world. As a diversified, national organization, Centene's technology professionals have access to competitive benefits including a fresh perspective on workplace flexibility. Position Purpose: Helps lead projects that are focused on managing and maintaining optimum platform infrastructure performance, reliability, and security using SRE practices, observability tools, manual and automated procedures, documentation, people and processes and continuous delivery(CI/CD) tools, processes, and designs. Develops complex services to automate monitoring activities and provide critical information to facilitate response and resolution of performance and availability issues and incidents. Understands and advocates for standardized and scalable software tools to ensure that systems operate without interruption at optimum performance and leads project teams through out the deployment process. Troubleshoots and analyzes service disruptions to determine the root cause of issues and develop solutions for improved reliability. - Support multiple applications and schedule batch jobs for a large number of transactions weekly - Troubleshoots and resolves more complex problems with systems and services and initiates regular deployment of new versions of the systems and their subcomponents - Leads more complex projects focused on building and maintaining observability/monitoring for the application, monitoring key performance indicators, maintaining alerting, and continuously improving visibility. - Helps make decisions around periodic system validation and testing, service monitoring, and standing up new services/tools - Uses knowledge and experience to identify strategies that increase system reliability and performance through on-call rotation and process optimization - Identifies and implements necessary manual and automated procedures for improved collaborative response in real-time - Leads lower level Engineers in stress, security, and performance testing - Resolves issues that come up through support escalation - Keeps documentation and runbooks up to date to effectively deal with new incidents that might arise - Leads post incident reviews and documents findings for future informed decision making - Reviews proposals to optimize Software Development Life Cycle (SDLC) to boost service reliability and makes decisions around which proposals should move forward. - Communicates complex topics with development teams to investigate and document issues and leads internal team to develop solutions to mitigate them - Performs other duties as assigned - Complies with all policies and standards Education/Experience: A Bachelor's degree in a quantitative or business field (e.g., statistics, mathematics, engineering, computer science) and Requires 4 – 6 years of related experience. Or equivalent experience acquired through accomplishments of applicable knowledge, duties, scope and skill reflective of the level of this position. Technical Skills: - One or more of the following skills are desired. - - Experience with SRE or DevOps - Batch scheduling - Monitoring experience - SQL Pay Range: $87,000.00 - $161,300.00 per year Centene offers a comprehensive benefits package including: competitive pay, health insurance, 401K and stock purchase plans, tuition reimbursement, paid time off plus holidays, and a flexible approach to work with remote, hybrid, field or office work schedules. Actual pay will be adjusted based on an individual's skills, experience, education, and other job-related factors permitted by law, including full-time or part-time status. Total compensation may also include additional forms of incentives. Benefits may be subject to program eligibility. Centene is an equal opportunity employer that is committed to diversity, and values the ways in which we are different. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or other characteristic protected by applicable law. Qualified applicants with arrest or conviction records will be considered in accordance with the LA County Ordinance and the California Fair Chance Act
• Collaborating with our web development, data engineering, and business teams to design and deploy reliable, scalable, and secure infrastructure • Ensuring high availability and uptime of our applications by implementing monitoring and alerting systems • Automating tasks and workflows using scripts, CI/CD pipelines, and other tools to streamline development and deployment processes • Maintaining optimal use of AWS infrastructure and databases to ensure scalability and cost-effectiveness • Staying up-to-date with the latest DevOps tools and technologies to continuously improve our processes and infrastructure • Participating in code reviews, architecture discussions, and other activities to help maintain best practices across our engineering teams • Providing technical guidance and mentorship to other team members, helping to grow their skills and improve our overall engineering capabilities • Having knowledge of networking concepts such as DNS, VPN, load balancing, and security groups



