Tecsys Inc. logo
Tecsys Inc.

Equipping supply chain greatness.

Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 501-1,000Since 1983H1B No SponsorCompany SiteLinkedIn

Location

Canada

Posted

69 days ago

Salary

0

Seniority

Senior

Job Description

Site Reliability Engineer

Tecsys Inc.

• Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews. • Innovate relentlessly: Identify pain points, propose creative solutions, and drive initiatives that simplify, scale, and strengthen the platform. • Maintain services once they are live by measuring and monitoring availability, latency and overall system health. • Own observability: Enhance and expand monitoring and alerting using Datadog; define SLOs/SLIs and create actionable dashboards that drive reliability outcomes. • Drive automation: Develop and improve internal tooling, IaC frameworks, and pipelines (Terraform, GitLab CI/CD) to reduce manual intervention and enable self-healing systems. • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity. • Act as an agent orchestrator using Amazon Kiro: run multiple activities in parallel by leveraging AI agents to accelerate execution, while personally validating results and completing selected tasks manually when needed. • Be on-call. • Practice sustainable incident response and blameless postmortems. Lead post-incident reviews (RCAs) and identify long-term fixes that improve stability, reliability, and developer experience. • Implement monitoring, Logging, alerting, and SLA Reporting. • Create and maintain technical documentation. • Implement, maintain and mature SRE best practices. • Lead incidents: Act as Incident Commander for Incidents; coordinate cross-team response, manage communications, and ensure rapid service restoration. • Provide support for our planning and deployment teams to enable stability, predictability, and scale in our continued growth. • Collaborate with members of the Platform Engineering team to implement and support far-reaching strategic efforts, provide constructive feedback, and foster a collaborative environment. • Work cross-functionally with internal teams and vendors to manage our growth around the globe, with a strong focus on maintaining the high level of performance, availability, and reliability for our users.

Job Requirements

  • 5+ years in Site Reliability, Cloud, or DevOps Engineering, ideally in SaaS or large-scale production environments.
  • Experience designing and deploying large scale systems, multi-vendor platforms and globally distributed infrastructure.
  • Proven experience managing cloud infrastructure in AWS (multi-account, VPC, EC2, EKS) and Kubernetes at scale.
  • Strong hands-on experience with IaC and automation (Terraform, Ansible, or similar).
  • Familiarity with CI/CD pipelines and release automation (GitLab preferred, Jenkins acceptable).
  • Deep understanding of monitoring and observability using Datadog (or equivalent), including metric design, log pipelines, alerting, and dashboards.
  • Experience with incident management, on-call participation, escalation, and structured postmortems.
  • Scripting skills in Python, Bash, Java or equivalent for automation and diagnostics.
  • Curiosity, ownership, and a bias for action; you see a problem, you solve it, and you share the lessons learned.
  • Experience with Fedramp (The Federal Risk and Authorization Management Program) compliance is a strong asset.
  • Basic knowledge of Java- or .Net-based development required.
  • Strong English communication skills, both written and spoken, are essential for effective correspondence with customers, business partners and colleagues beyond the province of Quebec.
  • Escalation on-call rotation
  • Occasional travel (quarterly offsites, conferences – less than 10%)

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Elastic logo

Principal Site Reliability Engineer - Observability

Elastic

Self-described as the leading platform for search-powered solutions, Elastic helps organizations, their customers, and their employees find what they need faster while protecting a

DevOps Engineer69 days ago

Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale — unleashing the potential of businesses and people. The Elastic Search AI Platform, used by more than 50% of the Fortune 500, brings together the precision of search and the intelligence of AI to enable everyone to accelerate the results that matter. By taking advantage of all structured and unstructured data — securing and protecting private information more effectively — Elastic’s complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI. What Is The Role : We're looking for a Principal Site Reliability Engineer to join the Observability Solution as part of the team building the next generation of Infrastructure Observability experiences leveraging the new Search AI and agentic capabilities. What You Will Be Doing : - Collaborate with product management, product design, customers and multiple teams across Elastic (especially our own SRE teams) in defining and evolving the end-to-end InfraObs experiences that enable both human and agentic users. - Deliver and continually evolve the experiences leveraging the Elastic Platform capabilities and coding agents. - Be a contact point for other teams within Elastic. Examples include helping Support with difficult cases or aligning with the teams providing the foundations for developing integrations or consulting the Elastic Stack engineers with designing new features. - Foster a culture of mutual respect, collaboration and consensus based decision-making. - Be an awesome person to work with, somebody who sincerely empathizes with others. What You Bring : This is a role for practitioners so we are looking for engineers with a SRE background and experience operating large-scale production services with the help of Observability tools. - Proficiency operating production infrastructure in K8s and at least one of the three major CSPs. - Proficiency using Observability tools. - Working with a high level of autonomy, able to tackle projects and guide them from beginning to end. This covers both technical design and working with other engineers to develop needed components. - Ability to use AI coding agents in the delivery workflow. - Excellent verbal and written communication skills. Collaborating on the internet is hard. We try to be supportive, empathetic, and trusting in all of our interactions. And we expect that from everyone too. Bonus Points : - Experience as a user of the Elastic Stack. Additional Information - We Take Care of Our People As a distributed company, diversity drives our identity. Whether you’re looking to launch a new career or grow an existing one, Elastic is the type of company where you can balance great work with great life. Your age is only a number. It doesn’t matter if you’re just out of college or your children are; we need you for what you can do. We strive to have parity of benefits across regions and while regulations differ from place to place, we believe taking care of our people is the right thing to do. - Competitive pay based on the work you do here and not your previous salary - Health coverage for you and your family in many locations - Ability to craft your calendar with flexible locations and schedules for many roles - Generous number of vacation days each year - Increase your impact - We match up to $2000 (or local currency equivalent) for financial donations and service - Up to 40 hours each year to use toward volunteer projects you love - Embracing parenthood with minimum of 16 weeks of parental leave Different people approach problems differently. We need that. Elastic is an equal opportunity employer and is committed to creating an inclusive culture that celebrates different perspectives, experiences, and backgrounds. Qualified applicants will receive consideration for employment without regard to race, ethnicity, color, religion, sex, pregnancy, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, disability status, or any other basis protected by federal, state or local law, ordinance or regulation. We welcome individuals with disabilities and strive to create an accessible and inclusive experience for all individuals. To request an accommodation during the application or the recruiting process, please email candidate_accessibility@elastic.co. We will reply to your request within 24 business hours of submission. Applicants have rights under Federal Employment Laws, view posters linked below: Family and Medical Leave Act (FMLA) Poster; Pay Transparency Nondiscrimination Provision Poster; Employee Polygraph Protection Act (EPPA) Poster and Know Your Rights (Poster) Elasticsearch develops and distributes technology and information that is subject to U.S. and other countries’ export controls and licensing requirements for individuals who are located in or are nationals of the following sanctioned countries and regions: Belarus, Cuba, Iran, North Korea, Syria, or Russia, including the Ukrainian territories annexed by Russia (The Crimea region of Ukraine, The Donetsk People's Republic (DNR), The Luhansk People's Republic (LNR), Kherson or Zaporizhzhia). If you are located in or are a national of one of the listed countries or regions, an export license may be required as a condition of your employment in this role. Please note that national origin and/or nationality do not affect eligibility for employment with Elastic. Please see here for our Privacy Statement.

Spain
Referral Board logo

Principal Site Reliability Engineer - Observability

Referral Board

Remote's Total Rewards philosophy is to ensure fair, unbiased compensation and fair equity pay along with competitive benefits in all locations in which we operate. We do not agree to or encourage cheap-labor practices and therefore we ensure to pay above in-location rates. At Remote, we foster internal mobility as a key element of our culture of employee growth and development, supported by a compensation philosophy that guarantees pay equity and fairness.

DevOps Engineer69 days ago
Full TimeRemoteTeam 1,001-5,000

Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale — unleashing the potential of businesses and people. The Elastic Search AI Platform, used by more than 50% of the Fortune 500, brings together the precision of search and the intelligence of AI to enable everyone to accelerate the results that matter. By taking advantage of all structured and unstructured data — securing and protecting private information more effectively — Elastic’s complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI. What Is The Role : We're looking for a Principal Site Reliability Engineer to join the Observability Solution as part of the team building the next generation of Infrastructure Observability experiences leveraging the new Search AI and agentic capabilities. What You Will Be Doing : - Collaborate with product management, product design, customers and multiple teams across Elastic (especially our own SRE teams) in defining and evolving the end-to-end InfraObs experiences that enable both human and agentic users. - Deliver and continually evolve the experiences leveraging the Elastic Platform capabilities and coding agents. - Be a contact point for other teams within Elastic. Examples include helping Support with difficult cases or aligning with the teams providing the foundations for developing integrations or consulting the Elastic Stack engineers with designing new features. - Foster a culture of mutual respect, collaboration and consensus based decision-making. - Be an awesome person to work with, somebody who sincerely empathizes with others. What You Bring : This is a role for practitioners so we are looking for engineers with a SRE background and experience operating large-scale production services with the help of Observability tools. - Proficiency operating production infrastructure in K8s and at least one of the three major CSPs. - Proficiency using Observability tools. - Working with a high level of autonomy, able to tackle projects and guide them from beginning to end. This covers both technical design and working with other engineers to develop needed components. - Ability to use AI coding agents in the delivery workflow. - Excellent verbal and written communication skills. Collaborating on the internet is hard. We try to be supportive, empathetic, and trusting in all of our interactions. And we expect that from everyone too. Bonus Points : - Experience as a user of the Elastic Stack. Additional Information - We Take Care of Our People As a distributed company, diversity drives our identity. Whether you’re looking to launch a new career or grow an existing one, Elastic is the type of company where you can balance great work with great life. Your age is only a number. It doesn’t matter if you’re just out of college or your children are; we need you for what you can do. We strive to have parity of benefits across regions and while regulations differ from place to place, we believe taking care of our people is the right thing to do. - Competitive pay based on the work you do here and not your previous salary - Health coverage for you and your family in many locations - Ability to craft your calendar with flexible locations and schedules for many roles - Generous number of vacation days each year - Increase your impact - We match up to $2000 (or local currency equivalent) for financial donations and service - Up to 40 hours each year to use toward volunteer projects you love - Embracing parenthood with minimum of 16 weeks of parental leave Different people approach problems differently. We need that. Elastic is an equal opportunity/affirmative action employer committed to diversity, equity, and inclusion. Qualified applicants will receive consideration for employment without regard to race, ethnicity, color, religion, sex, pregnancy, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, disability status, or any other basis protected by federal, state or local law, ordinance or regulation. We welcome individuals with disabilities and strive to create an accessible and inclusive experience for all individuals. To request an accommodation during the application or the recruiting process, please email candidate_accessibility@elastic.co We will reply to your request within 24 business hours of submission. Applicants have rights under Federal Employment Laws, view posters linked below: Family and Medical Leave Act (FMLA) Poster; Pay Transparency Nondiscrimination Provision Poster; Employee Polygraph Protection Act (EPPA) Poster and Know Your Rights (Poster) Elasticsearch develops and distributes encryption software and technology that is subject to U.S. export controls and licensing requirements for individuals who are located in or are nationals of the following sanctioned countries and regions: Belarus, Cuba, Iran, North Korea, Russia, Syria, the Crimea Region of Ukraine, the Donetsk People’s Republic (“DNR”), and the Luhansk People’s Republic (“LNR”). If you are located in or are a national of one of the listed countries or regions, an export license may be required as a condition of your employment in this role. Please note that national origin and/or nationality do not affect eligibility for employment with Elastic. Please see here for our Privacy Statement.

Greece + 9 moreAll locations: Greece | Ireland | Norway | Oman | Poland | Portugal | Romania | Spain | Sweden | United Kingdom
CampMinder logo

Senior DevOps Engineer

CampMinder

Summer Camp Management Software

DevOps Engineer69 days ago
Full TimeRemoteTeam 51-200Since 2001H1B No Sponsor

Ideal start timeline: ASAP Role status: Exempt Compensation: Our target hiring range is $175,000-$215,000 plus participation in our Annual Bonus Program with eligibility for $12,000 bonus. Actual compensation will be commensurate with experience and skills. Campminder’s Flexible Working Location: Our employees have the option to work 100% remotely within the United States or their choice of days at home and at our office in Boulder, Colorado. We host a variety of all-company hybrid meetings and social events. We require anybody working remotely to have a very reliable, high-speed internet connection. We know the best people can choose to work anywhere. Here’s a few reasons why 85+ of them choose Campminder: - With 20+ years experience of serving the industry through its digital transformation, we’re stable, profitable, and have developed a loyal customer base (that continues to grow). - We build software for summer camps, an industry that enables meaningful experiences for kids. - We work on interesting, ambitious projects that create real value for our clients. - We know our team members feel their work has an impact on the organization’s purpose. - At the same time, we are genuinely committed to work/life balance. Our team members feel they have the flexibility to take time off when needed and feel supported in making use of flexible working arrangements. - We invest in emerging technology and cutting-edge leadership and are proud to take an "AI-Enabled" approach in our solutions. - We’ve been listed on Outside Magazine’s 50 Best Places to Work for 8 consecutive years for our values-led culture and employee experience. This role’s mission & overview: We are looking for a Senior DevOps Engineer to join our Platform team to work across the technology organization to transform our development, testing, and deployment processes. This person will partner with AI engineering experts to incorporate AI into our engineering processes at Campminder, in a thoughtful way that increases velocity, quality, and people empowerment across the organization. They will also enable a culture of fast experimentation and iteration through fast feedback loops with customers, while keeping long-term vision and architecture in mind. They will provide mentorship and strong collaboration on DevOps best practices across the entire technology organization. This is a high-impact, high-autonomy role for someone who thrives on solving hard technical problems that unlock a whole new way of working at Campminder, while being in close collaboration with people across the technology organization. As a Senior DevOps Engineer on our Platform team, you will: - Transform our DevOps platform and bring it into alignment with modern tooling and practices - Transform the way we monitor and operate software in production to fully incorporate modern automation and AI tools - Incorporate AI tooling throughout our entire architecture in order to empower engineers to develop, test, and operate their software better - Enable engineering teams to rearchitect monolithic .Net Framework and legacy JavaScript Framework systems into a modular, platform-focused architecture - Mentor engineers across Campminder in DevOps skills and practices - Collaborate across the full engineering org to define and implement DevOps and AI best practices We think a successful candidate will bring: - Experience supporting AI tooling in development, testing, and deployment workflows - Experience completely automating testing and deployment pipelines and implementing CI/CD for a range of systems and architectures, from monolithic to greenfield - Experience significantly improving the local development experience for engineers working in a monolithic system - Experience transforming tooling and approaches to monitoring, alerting, and operating software in production to bring them in alignment with best practices - Extensive experience with infrastructure as code (such as Terraform), Kubernetes, and container orchestration - Experience with networking and security best practices - A passion for mentorship, teaching, and supporting engineers in their DevOps growth journey - A passion for empowering delivery teams to move quickly and unblock themselves Our Interview Process: - 45 min - interview with People & Culture - 60 min - interview with Hiring Manager - Phase 3 - 60 min - technical interview with Engineering team members - 60 min - cultural interview with Engineering team members - 30 min - interview with CTO A few of the benefits we are proud to offer: - Robust medical, dental, and vision coverage options with generous employer contributions, plus a $500 employer HSA contribution for HSA-compatible plans - Ability to choose where you work - remotely, in the office, or a mix! - A variety of resources to support mental health and emotional well-being - 12 weeks of 100% paid parental leave for all new parents, including via adoption, surrogacy, and foster care - 401(k) with 4% company matching - Trust-Based (flexible) PTO (and yes, we use it!) - $900/year wellness allowance - Company-paid subscriptions, training, and support for using AI professionally and personally. We have a team dedicated to enabling our AI capabilities for our team members and our customers! We encourage people of all backgrounds to apply: We're actively taking steps to make sure our culture is inclusive and that our processes and practices promote equity for all, including people of color, people from working-class backgrounds, women, and members of the LGBTQ+ community. We welcome and encourage applications from people with these identities or members of other historically marginalized groups. Research shows that women and people of color tend not to apply to jobs unless they believe they are 100% qualified and apply to fewer senior-level positions. With that in mind, we encourage you to apply if you're not sure whether you meet our qualifications. We'd love to have the opportunity to consider you! We encourage applications from parents, parents-to-be, and those responsible for the caretaking of others. We offer paid parental leave for birthing and non-birthing parents (including for adoption, surrogacy, and foster care placement) and paid loss leave to recover from miscarriage or stillbirth. The company's HSA and wellness allowance contributions may be used toward childcare, eldercare, adoption fees, and fertility treatments like IVF, among other expenses.

Colorado
$175K - $215K / year
Pryon logo

Senior Engineer, Infrastructure – DevOps

Pryon

AI-powered enterprise knowledge management

DevOps Engineer69 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

• Design and implement cloud-native architectures for AI/ML applications using Kubernetes (GKE, EKS, AKS) • Architect and maintain CI/CD pipelines using modern GitOps practices with tools like FluxCD and BitBucket • Design and implement observability solutions using Prometheus, Grafana, and other monitoring tools • Create and maintain Infrastructure as Code (IaC) using Terraform • Implement container orchestration strategies using Docker, Kubernetes, and Helm • Design and implement multi-cloud deployment strategies • Establish SLOs/SLIs and implement SRE best practices • Automate operational tasks and create self-healing systems • Mentor team members on DevOps best practices • Collaborate with ML engineers and researchers to optimize model deployment and serving infrastructure • Stay current with emerging technologies and best practices in the DevOps/MLOps space

New York
$180K - $200K / year
Job Closed