Adaptive SPM for AI-Accelerated Innovation | Modular Solutions, Compounding Value | 30,000+ Customers
Manager, Site Reliability Engineer
Location
Canada
Posted
69 days ago
Salary
0
Seniority
Senior
Job Description
Manager, Site Reliability Engineer
Tempo Software
• Lead, mentor, and grow a team of Site Reliability Engineers, focusing on career development, performance management, and hiring. • Define the team's roadmap and strategy for platform reliability, scaling, and operational efficiency. • Provide technical oversight and direction for the design and implementation of key infrastructure projects, including CI/CD pipelines and automation for build, release, and deployment processes. • Partner closely with engineering teams and product managers to ensure the reliability and performance requirements of new products and features are met. • Oversee the maintenance and continuous improvement of the AWS-based platform to ensure it scales effectively. • Drive the adoption of AI tooling to enhance SRE productivity and introduce intelligent automation of SRE processes. • Champion SRE best practices, including error budget management, effective on-call rotations, incident response, and post-mortem processes.
Job Requirements
- 6+ years of progressive experience in a SaaS environment, with 2+ years of experience managing or leading high-performing SRE or Infrastructure teams.
- Proven experience in defining strategy and overseeing the deployment of complex software solutions in a fast-paced, cloud environment.
- Working knowledge of AWS or other cloud service providers.
- Solid understanding of SRE and DevOps principles, software design patterns, and infrastructure operations.
- Passionate about containerization and orchestration technologies like Kubernetes.
- Familiarity with monitoring, alerting, and observability tools, including RUM (Real User Metrics), tracing, and other vital metrics.
- Demonstrated ability to lead cross-functional projects, manage ambiguity, and drive technical decision-making.
- Exceptional communication, collaboration, and analytical skills, with a passion for solving tough technical and organizational problems.
Benefits
- Remote First work environment
- Unlimited vacation in most of our locations!!
- Great benefits including health, dental, vision and savings plan.
- Perks such as training reimbursement, WFH reimbursement, and more.
- Diverse and dynamic teams with challenging and exciting work.
- An opportunity to have a real impact on our business.
- A great range of social activities (both in person and virtual).
- Optional in person meet-ups and the ability to travel to our international offices
- Employee referral program
- And so much more!
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Site Reliability Engineer
CanonicalUbuntu is a community-developed, Linux-based operating system that is published and commercially supported by software development firm Canonical. Like Canonica
• We deploy and run OpenStack, Kubernetes, storage solutions, and open source applications, applying DevOps practices. • To become a member of our team, you need to be a software engineer fluent in Python, you need a genuine interest in the full open source infrastructure stack from bare metal to containers, and you need the ability to work in operations with mission-critical services for global brand-name customers. • As a member of the team, you will gain experience in a broad range of cloud technologies. We evolve our offerings as the state of the art improves, so you get to stay current with the latest capabilities in open source infrastructure.
Senior Site Reliability Engineer
CanonicalUbuntu is a community-developed, Linux-based operating system that is published and commercially supported by software development firm Canonical. Like Canonica
• Bring Python software-engineering skills and rigour to the operations domain • Practise devsecops from bare metal to application • Architect and run OpenStack, Kubernetes and software-defined storage • Enable devsecops for applications running on that infrastructure • Gain experience in a broad range of cloud technologies
Site Reliability Engineer, GitOps
CanonicalUbuntu is a community-developed, Linux-based operating system that is published and commercially supported by software development firm Canonical. Like Canonica
• Apply your experience of IaC to develop infrastructure as code practice within IS by constantly increasing automation and improving IaC processes • Automate software operations for re-usability and consistency across private and public clouds, taking into consideration the complexities of distributed systems • Develop new features and improve the resilience and scalability of the existing cloud and container portfolio at Canonical • Maintain operational responsibility for all of Canonical’s core services, networks, and infrastructure • Develop skills in troubleshooting, capacity planning, and performance investigation, Setting up, maintaining and using observability tools such as Prometheus, Grafana, and Elasticsearch; design, implement and maintain monitoring and alerting for various systems and services • Collaborate with development teams to design service architecture, documentation, playbooks, policies and operational procedures • Provide assistance and work with globally distributed engineering, operations, and support peers • Be given uninterrupted development time to focus on larger projects and automation of manual tasks • Share your experience, know-how and best practices with other team members in design sessions, mentorship and ‘doing work together’ • Carry final responsibility for time-critical escalations.
Senior Site Reliability Engineer, GitOps
CanonicalUbuntu is a community-developed, Linux-based operating system that is published and commercially supported by software development firm Canonical. Like Canonica
• Drive the development of automation, Gitops in your team as an embedded tech lead • Closely collaborate with the IS architect to align your solutions with the IS architecture vision • Design and architect services that IS can offer to the organization as products • Apply your experience of IaC to develop infrastructure as code practice within IS by constantly increasing automation and improving IaC processes • Automate software operations for re-usability and consistency across private and public clouds, taking into consideration the complexities of distributed systems • Maintain operational responsibility for all of Canonical’s core services, networks, and infrastructure • Develop skills in troubleshooting, capacity planning, and performance investigation, Setting up, maintaining and using observability tools such as Prometheus, Grafana, and Elasticsearch; design, implement and maintain monitoring and alerting for various systems and services • Provide assistance and work with globally distributed engineering, operations, and support peers • Be given uninterrupted development time to focus on larger projects and automation of manual tasks • Share your experience, know-how and best practices with other team members in design sessions, mentorship and ‘doing work together’ • Carry final responsibility for time-critical escalations

