Job Closed
This listing is no longer active.
Industry leading AIOps platform for operational intelligence.
Senior Site Reliability Engineer
Location
India
Posted
111 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer
Selector
• Serve as a senior technical expert in deploying and maintaining Selector’s operational analytics platform across on-premises and SaaS environments. • Lead complex customer installations, including deployments in air-gapped and highly regulated environments. • Partner directly with customers via Zoom/Teams to troubleshoot, triage services, and resolve installation or performance nuances. • Author, review, and maintain Infrastructure as Code (IaC) using Terraform/OpenTofu, ensuring scalable and maintainable infrastructure design. • Deploy and manage containerized applications using Kubernetes (including RKE) and Kustomize in production environments. • Triage and resolve issues across distributed systems, Kafka pipelines, CI/CD workflows (Jenkins), and Google Cloud infrastructure. • Provide structured, actionable feedback to Platform Engineering and DevOps teams to improve reliability, scalability, and performance. • Participate in and help mature on-call processes, ensuring high availability and operational excellence. • Perform root cause analysis for production incidents and implement long-term corrective and preventative solutions. • Research, evaluate, and implement new tools or architectural improvements to address infrastructure and operational challenges. • Mentor junior engineers and promote SRE best practices across reliability, observability, and automation. • Improve internal tooling, automation, and operational workflows to enhance developer productivity and system stability.
Job Requirements
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
- 7+ years of hands-on experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles.
- Strong experience with Git/GitHub for version control and collaborative development workflows.
- Deep hands-on experience managing Kubernetes clusters in production environments (RKE experience preferred).
- Strong experience with Infrastructure as Code tools such as Terraform or OpenTofu.
- Experience working with Google Cloud Platform (GCP) in production environments.
- Experience with CI/CD pipelines and tooling such as Jenkins.
- Experience working with Kafka or other distributed streaming platforms.
- Proficiency in Python for scripting, automation, and troubleshooting.
- Strong expertise in diagnosing and resolving issues in distributed systems.
- Experience working directly with enterprise customers in technical, customer-facing roles.
- Strong written and verbal communication skills with the ability to explain complex technical concepts clearly.
- Experience working in air-gapped or secure enterprise environments is highly preferred.
- Demonstrated ability to lead initiatives, mentor engineers, and drive reliability improvements across teams.
Benefits
- Health Insurance (GMC): Comprehensive medical coverage for employees and dependents, including hospitalization and maternity benefits.
- Personal Accident Insurance (GPA): Coverage for accidental injury, both on and off duty.
- Life Insurance (Term Plan): Life insurance coverage for eligible employees.
- Provident Fund (PF): Company contribution as per statutory requirements.
- Gratuity: As per the Payment of Gratuity Act.
- Paid Time Off: Sick Leave, Earned Leave, and Maternity Leave in line with company policy and applicable laws.
- Holidays: National and regional holidays as per the annual holiday calendar.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Write and maintain Bash scripts to automate operational and deployment tasks; • Create, support and improve existing CI/CD pipelines for application delivery; • Help manage and monitor cloud environments (dev, test, production); • Perform basic troubleshooting of infrastructure and deployment issues; • Work with Linux systems (services, processes, permissions, networking basics); • Assist with monitoring, logging, and alerting solutions; • Document procedures, configurations, and runbooks; • Collaborate with developers and senior engineers to improve system reliability; • Assist in building and maintaining cloud infrastructure using Terraform and other IaC tools.
• Design, manage, and operate compute, storage, network, and IAM across AWS, GCP, using Infra-as-code (Terraform). • Build and maintain secure networking and connectivity. • Manage production Kubernetes platforms (EKS, GKE). • Implement observability using Prometheus, Grafana, Datadog(optional).
DevOps Engineer – Aerospace
ALTEN Technology USAWe help transform ideas into innovations with offices across the US, including Denver, CO; Troy, MI; and Greensboro, NC.
• Designing, deploying, and maintaining hybrid infrastructure, supporting both on‑premises and cloud (GCP) environments for secure, high‑performance workloads. • Building and administering Kubernetes clusters, ensuring reliability, scalability, and workload orchestration for compute‑intensive systems; bonus experience with Rancher and FluxCD. • Supporting High‑Performance Computing (HPC) environments, optimizing scheduling, resource utilization, and performance for simulation, engineering, and large‑scale compute workloads. • Implementing and maintaining ITAR‑compliant infrastructure, including secure access controls, data‑handling policies, logging, and audit‑ready documentation. • Developing automation and infrastructure‑as‑code pipelines, including provisioning, configuration management, and cluster lifecycle management. • Troubleshooting networking, performance, and system‑level issues across hybrid environments, ensuring high availability and operational continuity. • Partnering with engineering and security teams to support compute workloads, enforce compliance, and maintain consistent infrastructure standards
Senior Site Reliability Engineer
4DMedicalGlobal medical technology company 4DMedical Limited (ASX:4DX)
• Provide design and implementation guidance for best practice AWS solution architecture. • Assist in the planning and execution of changes, incident management and non-conformance reporting. • Provide operational support to internal and external customers. • Assist with deployment, configuration, support, documentation, and monitoring of infrastructure, workflows, and applications across both AWS and on-premises environments, including containerized applications on virtualized servers. • Operate independently during US business hours in support of internal and customer-facing systems. • Collaborate closely with development teams to promote DevOps practices, automate deployment and maintenance processes, and ensure secure, reliable transitions to production environments. • Recommend, test, and implement continuous improvements to the 4DMedical cloud environments. • Conduct risk assessments and vulnerability analyses for cloud infrastructure. • Provide technical guidance to other team members as required.




