HostPapa logo
HostPapa

Let Papa take care of you!

Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 51-200Since 2006H1B No SponsorCompany SiteLinkedIn

Location

Canada

Posted

6 days ago

Salary

0

Seniority

Senior

Job Description

Site Reliability Engineer

HostPapa

• Define and implement SLIs, SLOs, and error budgets for critical CloudBlue services to ensure reliability and performance • Influence system architecture with a strong focus on reliability, scalability, and operability, designing systems for fault tolerance, graceful degradation, and self-healing • Reduce operational toil by identifying opportunities for automation and process improvement • Design and operate CloudBlue’s observability stack across metrics, logs, and traces using tools such as Datadog, Grafana, and Elastic Stack • Develop actionable alerting strategies and dashboards that provide clear insight into platform and business health • Design and maintain high-availability architectures, implementing redundancy, failover, and disaster recovery strategies across regions and availability zones • Conduct capacity planning, load testing, and performance optimization to ensure platform stability and scalability • Act as a senior responder during production incidents, leading incident coordination, communication, and service restoration • Own blameless postmortems and drive improvements that reduce incident frequency, MTTR, and customer impact • Improve reliability of Kubernetes-based platforms through health checks, autoscaling strategies, rollout safety, and resilience testing • Partner with engineering and DevOps teams to improve deployment safety, rollback strategies, and platform reliability • Maintain runbooks and operational documentation, and promote SRE best practices across engineering teams • Support other tasks or projects as assigned to meet team and business needs

Job Requirements

  • 3+ years of experience as an SRE, DevOps Engineer, or Production Engineer, with strong ownership of production systems
  • Proven experience operating highly available, enterprise-grade, multi-tenant SaaS platforms
  • Hands-on experience with observability and monitoring tools such as Datadog, Grafana, and Elasticsearch/Kibana
  • Solid understanding of Linux, networking, and distributed systems fundamentals
  • Experience working with containerized environments such as Docker and Kubernetes
  • Strong scripting and automation skills using Python and/or Bash
  • Experience participating in on-call rotations and incident response in production environments
  • Strong written and spoken English
  • Experience defining SLIs/SLOs and managing error budgets at scale will be considered a plus
  • Cloud experience, preferably with Azure; experience with AWS and/or GCP will also be valued
  • Experience working with hybrid or on-premises integrations is beneficial
  • Familiarity with chaos engineering and resilience testing will be considered an asset

Benefits

  • A competitive salary that values you and your unique skill sets
  • Career advancement & professional development opportunities to help you reach your full potential
  • Flexible work arrangements to support work/life balance

Related Categories

Related Job Pages

More DevOps Engineer Jobs

• Partner with customers to shape their cloud adoption journey, providing both technical and strategic guidance • Design, plan, and implement secure cloud architectures aligned with business and compliance requirements • Serve as a trusted advisor and deep technical resource to customers • Design and implement automated security and compliance solutions in AWS • Develop and maintain Infrastructure-as-Code (IaC) solutions using Terraform • Build and operate CI/CD pipelines (GitHub Actions, Jenkins, CircleCI) for security automation • Develop Python-based automation for provisioning, compliance enforcement, and remediation • Implement AWS Control Tower guardrails and Service Control Policies (SCPs) • Configure AWS Config rules with automated remediation workflows • Develop and enforce policy-as-code frameworks (preventative, detective, responsive controls) • Align implementations with industry standards such as CIS AWS Foundations • Design and deploy centralized security monitoring and analytics frameworks • Implement AWS-native security services, including: Security Hub (centralized findings aggregation), GuardDuty (threat detection), Macie (sensitive data discovery), Inspector (vulnerability management) • Enable observability and auditing via CloudTrail, VPC Flow Logs, and CloudWatch • Build self-service account provisioning frameworks using CI/CD pipelines • Develop scalable landing zone and account baseline architectures • Create reusable Terraform modules and automation frameworks • Design reference architectures and implementation playbooks • Create high-quality technical content (playbooks, runbooks, white papers, reference architecture)

United States
Veta Virtual logo

Senior DevOps Engineer – Cloud Infrastructure

Veta Virtual

Grow your business and free up your time!

DevOps Engineer6 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor

• Build and manage AWS infrastructure using Infrastructure as Code (Terraform), ensuring scalability and maintainability. • Manage and scale Kubernetes (EKS) clusters for high availability and fault tolerance. • Provision, maintain, and upgrade AWS services including RDS, networking, compute, and storage components. • Design, implement, and optimize CI/CD pipelines to improve deployment speed and reliability. • Oversee and maintain GitLab infrastructure and engineering workflows. • Collaborate with security and legal teams to support compliance initiatives (SOC 2, GDPR, etc.). • Monitor infrastructure performance using tools like Grafana, CloudWatch, and other observability platforms. • Implement strong alerting, monitoring, and incident response processes. • Lead incident resolution and root cause analysis, ensuring long-term fixes are implemented. • Participate in architecture design, capacity planning, and disaster recovery strategies. • Create and maintain documentation, runbooks, and infrastructure standards. • Mentor junior engineers and contribute to a high-performing DevOps culture.

Argentina
In All Media logo

Senior DevOps Engineer – Production Support

In All Media

Imagine the future of business. Ideas for a Digital Renaissance.

DevOps Engineer6 days ago
ContractRemoteTeam 1,001-5,000H1B No Sponsor

• Monitor critical production systems—including Azure Kubernetes Service (AKS), microservices, and CI/CD pipelines—using advanced dashboards and proactive alerting • Act as the primary technical responder for live production incidents and Slack escalations, ensuring rapid triage, root-cause identification, and swift resolution • Maintain, refine, and improve internal runbooks and standard operating procedures (SOPs) to ensure operational predictability • Oversee and support deployment activities across both production and non-production environments while strictly adhering to SLAs and corporate response times • Collaborate deeply with core DevOps and software engineering teams to root out recurring systemic issues and elevate overall platform reliability • Help design and implement smart automation scripts for recurring operational tasks to reduce manual toil

Brazil
Full TimeRemoteTeam 10,001+Since 1954H1B Sponsor

• Deliver simple solutions to complex problems as a DevSecOps Software Engineer SME at GDIT. • Tailor cutting-edge solutions to clients' unique requirements. • Help ensure today is safe and tomorrow is smarter. • Provide business and technical architectural guidance to development teams. • Lead capture, proposal, and service delivery efforts to secure new or re-compete contracts. • Develop technical solutions for capture strategy and proposal responses. • Educate teams on adoption of DevSecOps practices and tooling. • Define, design, and implement the full lifecycle of products and services. • Conduct analysis of alternatives on a variety of solutions.

United States
$170.1K - $207K / year