Job Closed

This listing is no longer active.

Fable logo
Fable

Fable is a leading accessibility platform powered by people with disabilities.

Senior Site Reliability Engineer, SRE

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 11-50H1B No SponsorCompany SiteLinkedIn

Location

Canada

Posted

49 days ago

Salary

$130K - $150K / year

Seniority

Senior

Job Description

Senior Site Reliability Engineer, SRE

Fable

• Design, build, and maintain reliable, scalable, and secure infrastructure for Fable’s product services • Improve system observability, monitoring, and alerting to ensure high availability and fast incident response • Contribute to and evolve SRE practices, including SLIs/SLOs, incident management, and postmortems • Support and improve CI/CD pipelines and deployment processes • Identify and reduce operational complexity across systems and tooling • Work across infrastructure and application layers to diagnose and resolve reliability and performance issues, including making targeted improvements to application code when needed • Support infrastructure and platform capabilities required for AI/ML-powered features, including scaling, performance, and reliability considerations • Monitor and optimize infrastructure costs across cloud environments • Contribute to capacity planning and cost forecasting for infrastructure and services • Identify opportunities to improve performance and efficiency at the system level • Evaluate and optimize the cost and performance of compute-intensive workloads (e.g., AI/ML services), ensuring efficient resource usage and scalability • Work with third-party vendors and tools that support Fable’s infrastructure and operations • Help evaluate, select, and manage tools and services to support platform reliability and scalability • Support vendor-related troubleshooting and ongoing service improvements • Partner with Engineering teams to improve reliability, performance, and operational readiness of new features • Partner with application engineering teams to improve service architecture, performance, and observability, and help define best practices for building reliable, scalable systems • Act as a point of support and escalation for production issues • Collaborate across teams to manage dependencies and ensure smooth system operations • Contribute to building strong SRE and operational practices across the organization • Share knowledge through documentation, pairing, and technical discussions • Help onboard and support more junior team members as the team grows • Contribute to improving ways of working within the team and across Engineering

Job Requirements

  • 5–8+ years of experience in Site Reliability Engineering, DevOps, Infrastructure Engineering, or Platform Engineering
  • Strong experience with cloud infrastructure (AWS, GCP, or Azure)
  • Experience building internal platforms, tooling, or shared services that improve developer productivity and system reliability
  • Experience designing systems that bridge infrastructure and application layers
  • Ability to work across the stack: comfortable reading, debugging, and making changes to application code (e.g., backend services, APIs) when needed to improve reliability, performance, or observability
  • Experience with at least one backend programming language (e.g., Node.js, Python, Go, Java)
  • Strong experience with monitoring, observability, and alerting tools (e.g., Datadog, Prometheus, Grafana)
  • Solid understanding of CI/CD systems and modern deployment practices
  • Experience managing infrastructure as code (e.g., Terraform, CloudFormation)
  • Experience optimizing system performance and infrastructure costs
  • Familiarity with security and compliance considerations in cloud environments
  • Experience working with third-party vendors and infrastructure tools
  • Familiarity with infrastructure considerations for AI/ML workloads (e.g., high-compute services, data pipelines, or third-party AI platforms) is a strong asset
  • Curiosity about emerging technologies and their impact on infrastructure, reliability, and cost at scale
  • Strong problem-solving skills and ability to navigate complex systems
  • Excellent collaboration and communication skills.

Benefits

  • stock options
  • career growth opportunities
  • professional development support
  • health and dental coverage

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Full TimeRemoteTeam 51-200

🚀 En IRIUM nos preocupamos porque no dejes de perseguir tus sueños. Prepárate para conquistar tus metas, y ten siempre presente disfrutar del camino. Buscamos un/a ARQUITECTO DEVSECOPS - CLOUD con inglés muy alto para Proyecto desl sector energético full remote, para diseño de soluciones. 🔍 ¿Qué buscamos?: REQUISITOS IMPRESCINDIBLES: - Administración GitHub - Experiencia trabajo, con soltura, sobre Amazon AWS y en particular servicios sobre EKS. - Conocimientos de kubernetes: - Despliegue de aplicaciones sobre kubernetes - Desarrollo de procesos en argocd - Implantación y diseño de charts Helm para el despliegue de esas aplicaciones compleja. - Implantación de modelos de Landing zone en Azure. - Experiencia en elaboración de modelos cloud con terraform tanto en amazon como en azure - Inglés - nivel de conversación. REQUISITOS VALORABLES - CICD MLOps, Ansible, Databricks... Administración de Jira, SonarQube, Administración Linux, Windows Administración API Rest, Docker, Ansible Tower, OpenShift. ⭐ ¿Qué Ofrecemos? • Lugar de trabajo: REMOTO – IMPRESCINDIBLE RESIDENCIA EN ESPAÑA • Horario: 7.30 a 15.30 / 8 a 15 en verano • Guardias • Viajes o desplazamientos puntuales • Contrato indefinido con IRIUM • Retribución flexible ✌ • Banda salarial: Según valía y experiencia (30– 38K) • 23 días de vacaciones 🏕️ • Buen clima laboral 🙍‍♀️🙍‍♂️ • Acceso ilimitado a formación tecnológica puntera en modalidad barra libre. 📚 • Club de beneficios para empleados con descuentos directos y miles de ofertas en marcas, hoteles, agencias de viaje, cines, ropa… 💰 ✨Pasarás a formar parte de un gran equipo de personas que estarán siempre dispuestas a ayudarte. IRIUM es una empresa formada por profesionales con inquietudes, dinámicos y resolutivos. Nuestros valores son la responsabilidad y el compromiso con el trabajo bien hecho, este es el espíritu que buscamos en IRIUM, sea cual sea tu edad, si te reconoces ¡esta es tu empresa! Podemos construir juntos el futuro. ¿Hablamos? 🟢🔵🟣

Spain
€30K - €38K / year
Prometeo Talent logo

Senior DevOps Engineer

Prometeo Talent

Empowering startups to scale by connecting you with top 1% global talent. Since 2010. www.prometeotalent.com/

DevOps Engineer49 days ago
Full TimeRemoteTeam 11-50Since 2010H1B No Sponsor

• Take ownership of scalable, secure, and highly available infrastructure across multi-cloud environments • Design systems that are reliable by default, automated end-to-end, and trusted by engineering teams • Manage full lifecycle of Kubernetes clusters • Design and maintain robust CI/CD pipelines • Build full-stack observability systems • Implement security best practices and support compliance frameworks • Apply FinOps practices for cost optimization • Use AI tools to accelerate IaC, automation, and documentation

Colombia
Job Closed
VExpenses logo

Senior DevOps/SRE

VExpenses

Reembolso de despesas sem complicação!

DevOps Engineer49 days ago
Full TimeRemoteTeam 51-200Since 2016H1B No Sponsor

• Design solutions following automation best practices and cloud computing principles, taking into account the context of a fast-growing fintech; • Diagnose, monitor, and document incidents to help build higher-performing solutions; • Fully automate the deployment of our applications, from code to production (Continuous Deployment); • Provide rapid feedback on code changes at scale while maintaining high security and quality standards (Continuous Integration); • Architect and implement new environments together with our Technology team; • Ensure quality and scalability for our platform.

Brazil
Job Closed
Verity Group logo

SRE / DevOps Engineer

Verity Group

Somos Humanos. Somos Digitais. Somos Verity!

DevOps Engineer49 days ago
Full TimeRemoteTeam 51-200Since 2010H1B No Sponsor

• Design, implement, and evolve CI/CD pipelines • Provision and maintain infrastructure on GCP using Terraform and Ansible • Operate and scale Kubernetes environments (GKE) • Define, implement, and monitor SLIs, SLOs, and Error Budgets • Build observability, alerts, and APM (Dynatrace experience is a plus) • Work closely with squads, promoting platform engineering and reliability best practices

Brazil
Job Closed