Job Closed

This listing is no longer active.

HostPapa

Let Papa take care of you!

Site Reliability Engineer

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 51-200Since 2006H1B No SponsorCompany Site LinkedIn

Location

Canada

Posted

65 days ago

Salary

Seniority

Senior

Bachelor Degree3 yrs expEnglishAWS Azure Cloud Distributed Systems Docker ElasticSearch Google Cloud Platform Grafana Kubernetes Linux Python

Job Description

• Define and implement SLIs, SLOs, and error budgets for critical CloudBlue services to ensure reliability and performance • Influence system architecture with a strong focus on reliability, scalability, and operability, designing systems for fault tolerance, graceful degradation, and self-healing • Reduce operational toil by identifying opportunities for automation and process improvement • Design and operate CloudBlue’s observability stack across metrics, logs, and traces using tools such as Datadog, Grafana, and Elastic Stack • Develop actionable alerting strategies and dashboards that provide clear insight into platform and business health • Design and maintain high-availability architectures, implementing redundancy, failover, and disaster recovery strategies across regions and availability zones • Conduct capacity planning, load testing, and performance optimization to ensure platform stability and scalability • Act as a senior responder during production incidents, leading incident coordination, communication, and service restoration • Own blameless postmortems and drive improvements that reduce incident frequency, MTTR, and customer impact • Improve reliability of Kubernetes-based platforms through health checks, autoscaling strategies, rollout safety, and resilience testing • Partner with engineering and DevOps teams to improve deployment safety, rollback strategies, and platform reliability • Maintain runbooks and operational documentation, and promote SRE best practices across engineering teams • Support other tasks or projects as assigned to meet team and business needs

Job Requirements

3+ years of experience as an SRE, DevOps Engineer, or Production Engineer, with strong ownership of production systems
Proven experience operating highly available, enterprise-grade, multi-tenant SaaS platforms
Hands-on experience with observability and monitoring tools such as Datadog, Grafana, and Elasticsearch/Kibana
Solid understanding of Linux, networking, and distributed systems fundamentals
Experience working with containerized environments such as Docker and Kubernetes
Strong scripting and automation skills using Python and/or Bash
Experience participating in on-call rotations and incident response in production environments
Strong written and spoken English
Experience defining SLIs/SLOs and managing error budgets at scale will be considered a plus
Cloud experience, preferably with Azure; experience with AWS and/or GCP will also be valued
Experience working with hybrid or on-premises integrations is beneficial
Familiarity with chaos engineering and resilience testing will be considered an asset

Benefits

A competitive salary that values you and your unique skill sets
Career advancement & professional development opportunities to help you reach your full potential
Flexible work arrangements to support work/life balance

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Cloud Security Consultant, DevSecOps – AWS

Vertical Relevance

Your trusted Financial Services industry partner in Business Transformation, Customer Experience, & AWS Cloud Services.

DevOps Engineer65 days ago

Full Time RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Partner with customers to shape their cloud adoption journey, providing both technical and strategic guidance • Design, plan, and implement secure cloud architectures aligned with business and compliance requirements • Serve as a trusted advisor and deep technical resource to customers • Design and implement automated security and compliance solutions in AWS • Develop and maintain Infrastructure-as-Code (IaC) solutions using Terraform • Build and operate CI/CD pipelines (GitHub Actions, Jenkins, CircleCI) for security automation • Develop Python-based automation for provisioning, compliance enforcement, and remediation • Implement AWS Control Tower guardrails and Service Control Policies (SCPs) • Configure AWS Config rules with automated remediation workflows • Develop and enforce policy-as-code frameworks (preventative, detective, responsive controls) • Align implementations with industry standards such as CIS AWS Foundations • Design and deploy centralized security monitoring and analytics frameworks • Implement AWS-native security services, including: Security Hub (centralized findings aggregation), GuardDuty (threat detection), Macie (sensitive data discovery), Inspector (vulnerability management) • Enable observability and auditing via CloudTrail, VPC Flow Logs, and CloudWatch • Build self-service account provisioning frameworks using CI/CD pipelines • Develop scalable landing zone and account baseline architectures • Create reusable Terraform modules and automation frameworks • Design reference architectures and implementation playbooks • Create high-quality technical content (playbooks, runbooks, white papers, reference architecture)

AWS Cloud Jenkins Python Terraform

View details: Cloud Security Consultant, DevSecOps – AWS

United States

Apply

Job Closed

Senior DevOps Engineer – Cloud Infrastructure

Veta Virtual

Grow your business and free up your time!

DevOps Engineer65 days ago

Full Time RemoteTeam 11-50H1B No Sponsor

Company Site LinkedIn

• Build and manage AWS infrastructure using Infrastructure as Code (Terraform), ensuring scalability and maintainability. • Manage and scale Kubernetes (EKS) clusters for high availability and fault tolerance. • Provision, maintain, and upgrade AWS services including RDS, networking, compute, and storage components. • Design, implement, and optimize CI/CD pipelines to improve deployment speed and reliability. • Oversee and maintain GitLab infrastructure and engineering workflows. • Collaborate with security and legal teams to support compliance initiatives (SOC 2, GDPR, etc.). • Monitor infrastructure performance using tools like Grafana, CloudWatch, and other observability platforms. • Implement strong alerting, monitoring, and incident response processes. • Lead incident resolution and root cause analysis, ensuring long-term fixes are implemented. • Participate in architecture design, capacity planning, and disaster recovery strategies. • Create and maintain documentation, runbooks, and infrastructure standards. • Mentor junior engineers and contribute to a high-performing DevOps culture.

AWS Cloud Grafana Kubernetes Python Terraform

View details: Senior DevOps Engineer – Cloud Infrastructure

Argentina

Apply

Job Closed

Senior DevOps Engineer – Production Support

In All Media

Imagine the future of business. Ideas for a Digital Renaissance.

DevOps Engineer65 days ago

Contract RemoteTeam 1,001-5,000H1B No Sponsor

Company Site LinkedIn

• Monitor critical production systems—including Azure Kubernetes Service (AKS), microservices, and CI/CD pipelines—using advanced dashboards and proactive alerting • Act as the primary technical responder for live production incidents and Slack escalations, ensuring rapid triage, root-cause identification, and swift resolution • Maintain, refine, and improve internal runbooks and standard operating procedures (SOPs) to ensure operational predictability • Oversee and support deployment activities across both production and non-production environments while strictly adhering to SLAs and corporate response times • Collaborate deeply with core DevOps and software engineering teams to root out recurring systemic issues and elevate overall platform reliability • Help design and implement smart automation scripts for recurring operational tasks to reduce manual toil

Azure Cloud Grafana Kubernetes Microservices Prometheus Python Swift

View details: Senior DevOps Engineer – Production Support

Brazil

Apply

DevSecOps Software Engineer

General Dynamics Information Technology

Art of the possible.

DevOps Engineer65 days ago

Full Time RemoteTeam 10,001+Since 1954H1B Sponsor

Company Site LinkedIn

• Deliver simple solutions to complex problems as a DevSecOps Software Engineer SME at GDIT. • Tailor cutting-edge solutions to the unique requirements of clients. • Ensure today is safe and tomorrow is smarter by joining a dedicated DevSecOps team. • Provide business and technical architectural guidance to development teams, business groups, and customers. • Develop marketing strategies, business concepts, and technical capabilities that maximize customer value. • Define, design, and implement the full lifecycle of products and services. • Conduct analysis of alternatives on solutions to determine the best solutions supporting overall business goals.

Ansible AWS Azure Cloud Docker Gradle Java Jenkins Kubernetes Linux Node.js Packer Python Shell Scripting Terraform Unix Vagrant Vault

View details: DevSecOps Software Engineer

United States

$170.1K - $207K / year

Apply

Job Closed

Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Cloud Security Consultant, DevSecOps – AWS

Senior DevOps Engineer – Cloud Infrastructure

Senior DevOps Engineer – Production Support

DevSecOps Software Engineer