Plumbing, Heating & HVAC Supplies. Real People. Real Service.
Site Reliability Engineer
Location
India
Posted
10 days ago
Salary
$29K - $36K / year
Seniority
Senior
Job Description
Site Reliability Engineer
SupplyHouse.com
• Design, build, and maintain scalable, reliable systems on GCP (Compute Engine, GKE, Cloud Storage, Cloud SQL) • Develop automation for infrastructure provisioning using Terraform, Ansible, or Deployment Manager • Build and maintain observability platforms (monitoring, logging, tracing) using tools such as Stackdriver (Cloud Monitoring), Prometheus, or Grafana • Manage incident response, conduct postmortems, and implement improvements to reduce recurrence • Partner with DevOps and engineering teams to enhance CI/CD pipelines for resilient deployments • Define and monitor SLAs, SLOs, and SLIs to ensure application availability and performance • Implement disaster recovery (DR) and backup strategies across cloud services • Continuously optimize performance, capacity, and cost-efficiency of GCP resources
Job Requirements
- Bachelors degree in Computer Science, Engineering, or a related field
- 3+ years of hands-on experience as a Site Reliability Engineer, DevOps Engineer, Systems Engineer, or Cloud Infrastructure Engineer. Proven track record managing production-grade systems on Google Cloud Platform (GCP) or other cloud providers
- Strong understanding of Linux/Unix system administration, networking, and troubleshooting.
- Experience implementing Infrastructure as Code (IaC) using tools like Terraform, Ansible, or Deployment Manager
- Familiarity with containerization and orchestration technologies such as Docker and Kubernetes (GKE)
- Experience with monitoring and observability tools (Google Cloud Operations Suite, Prometheus, Grafana, Datadog, ELK).
- Experience defining and monitoring SLAs, SLOs, and SLIs to ensure application uptime and performance.
- Proven ability to handle incident response, conduct postmortems, and drive root cause analysis
- Proficiency in at least one scripting language (Python, Bash, or Go) for automation and tooling. Hands-on experience building or managing CI/CD pipelines (Jenkins, GitLab CI, Cloud Build). Strong background in configuration management and release automation
- Knowledge of IAM (Identity and Access Management), network security, and cloud compliance controls. Familiarity with disaster recovery (DR), backups, and high-availability design
- High-level proficiency of written and verbal communication in English.
Benefits
- Comprehensive and affordable medical, dental, vision, and life insurance options
- Competitive Provident Fund contributions
- Paid time off and holidays
- Mental health support and wellbeing program
- Company-provided equipment and one-time $250 USD work from home stipend
- $750 USD annual professional development budget
- Company rewards and recognition program
- And more!
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Manage, automate and optimize cloud environments, with a particular focus on AWS. • Implement Infrastructure as Code, manage CI/CD pipelines, and support continuous delivery of applications. • Collaborate with development and operations teams to ensure system reliability, scalability and performance. • Contribute to platform evolution and process automation.
• Platform & IaC Ownership: Analyze and implement infrastructure designs for services and shared components, managing them as Infrastructure as Code (IaC) using tools like Terraform and Helm within our cloud environment (AWS). • Delivery Lifecycle Management: Design and implement robust CI/CD pipelines and own the full delivery lifecycle of infrastructure tools, services, and components from development testing through to production rollout. • Developer Enablement: Actively participate in regular support cadences to provide hands-on technical assistance and expertise to development teams regarding platform adoption and usage. • Reliability Integration: Integrate and maintain monitoring, logging, and alerting components for platform services, and participate in the team's on-call rotation for immediate incident mitigation within the platform ownership scope. • Security & Compliance: Collaborate closely with the Security team to embed DevSecOps best practices and guardrails, ensuring the security and compliance of the platform and delivery process. • Process Improvement: Drive continuous improvements in platform tooling usability, deployment efficiency, and environment stability.
• Platform & IaC Ownership: Analyze and implement infrastructure designs for services and shared components, managing them as Infrastructure as Code (IaC) using tools like Terraform and Helm within our cloud environment (AWS). • Delivery Lifecycle Management: Design and implement robust CI/CD pipelines and own the full delivery lifecycle of infrastructure tools, services, and components from development testing through to production rollout. • Developer Enablement: Actively participate in regular support cadences to provide hands-on technical assistance and expertise to development teams regarding platform adoption and usage. • Reliability Integration: Integrate and maintain monitoring, logging, and alerting components for platform services, and participate in the team's on-call rotation for immediate incident mitigation within the platform ownership scope. • Security & Compliance: Collaborate closely with the Security team to embed DevSecOps best practices and guardrails, ensuring the security and compliance of the platform and delivery process. • Process Improvement: Drive continuous improvements in platform tooling usability, deployment efficiency, and environment stability.
• Platform & IaC Ownership: Analyze and implement infrastructure designs for services and shared components, managing them as Infrastructure as Code (IaC) using tools like Terraform and Helm within our cloud environment (AWS). • Delivery Lifecycle Management: Design and implement robust CI/CD pipelines and own the full delivery lifecycle of infrastructure tools, services, and components from development testing through to production rollout. • Developer Enablement: Actively participate in regular support cadences to provide hands-on technical assistance and expertise to development teams regarding platform adoption and usage. • Reliability Integration: Integrate and maintain monitoring, logging, and alerting components for platform services, and participate in the team's on-call rotation for immediate incident mitigation within the platform ownership scope. • Security & Compliance: Collaborate closely with the Security team to embed DevSecOps best practices and guardrails, ensuring the security and compliance of the platform and delivery process. • Process Improvement: Drive continuous improvements in platform tooling usability, deployment efficiency, and environment stability.


