Senior Reliability Engineer

Location

Canada

Posted

1 day ago

Salary

$100.3K - $150K / year

Seniority

Senior

Bachelor Degree5 yrs expExperience acceptedEnglishKubernetesLinuxPythonReactGo

Job Description

Senior Reliability Engineer

SPS Commerce

• Engineer and maintain highly available, secure, and cost-effective container orchestration platforms such as Kubernetes and ECS • Engineer Continuous Integration & Continuous Delivery (CI/CD) solutions that simplify and improve software deployments to enable high velocity for our Product Engineering partners • Develop robust monitoring and observability services and patterns to consistently improve the team’s ability to identify, react, respond, and recover from complex failures • Collaborate with Technology Engineering, Development, and Product Management to help develop, scale, and improve production systems and services • Partner with service teams to provide appropriate documentation, cross-training, architecture planning, capacity management, and recommendations for future state • Engineer technical solutions to prevent or reduce the frequency of failures • Help drive the code quality practices within the team and work hard to deliver a maintainable software • Participate in screening, interview panels, and other hiring related activities when required

Job Requirements

  • 5+ years IT experience with a Bachelor’s degree; or 3 years and a Master’s degree; or a PhD with 1 year; or equivalent work experience
  • Experience in Python and/or Golang with software Engineering mindset
  • Experience administering Linux
  • Experience participating in Agile development methodology and task execution
  • Experience with immutable and scalable infrastructure (infrastructure as code concepts)
  • Demonstrated understanding of networking systems, various identity and authorization systems

Benefits

  • Comprehensive benefits package designed to support employees’ health, well-being, and financial security

Related Categories

Related Job Pages

More DevOps Engineer Jobs

General Dynamics logo

Cloud Dev/Ops Engineer

General Dynamics

A business unit of General Dynamics, General Dynamics Information Technology (GDIT) supports some of the United States' most complex government, defense, and in

• Deliver streamlined, effective solutions to complex technical challenges • Engineer, implement, and support cloud-based systems • Maintain cloud performance and reliability • Execute upgrades, monitor system performance, and resolve technical issues • Implement solutions across Azure & OCI environments • Lead operational efficiencies through automation and Infrastructure-as-Code methodologies • Manage Firewalls, VPNs, load balancers, serverless apps, and microservices • Deploy and optimize Kubernetes workloads and Docker containers

United States
$102K - $138K / year
Actalent logo

Devops Automation Engineer

Actalent

Actalent provides scaled solutions and service capabilities that drive results and value to help customers achieve more. The company promotes consultant engagem

Title: Devops Automation Engineer Location: Boston United States Job Description: Job Title: DevOps Automation Engineer (Platform Operations) Job Description This DevOps Automation Engineer role sits within a platform operations team and focuses on supporting and scaling AWS-based production environments in a global, regulated MedTech context. The position emphasizes site reliability engineering and platform operations over CI/CD-heavy development, with a strong focus on automation, incident response, and system reliability. The engineer will act as a plug-and-play resource, quickly ramping up on the existing stack to ensure high availability, robust security, and consistent performance across cloud infrastructure. Responsibilities - Support and operate production AWS environments to ensure high availability, reliability, and performance across global operations. - Troubleshoot production issues across infrastructure, services, and cloud-native components, driving timely resolution and root cause analysis. - Build, test, and maintain automation solutions using Ansible to streamline application and infrastructure workflows. - Leverage, run, and troubleshoot existing Infrastructure-as-Code implementations using Terraform, including reviewing and debugging code as needed. - Work with existing CI/CD pipelines to support deployment workflows and ensure reliable delivery of changes to production environments. - Monitor system performance and health using tools such as Datadog and Amazon CloudWatch, responding proactively to alerts and performance anomalies. - Participate in incident response, take ownership of production issues end-to-end, and coordinate resolution efforts with relevant stakeholders. - Implement and support AWS security best practices, including IAM configuration, encryption, and key management, to maintain a secure production environment. - Collaborate with global teams to support ongoing cloud initiatives and migrations, ensuring alignment with platform standards and operational best practices. - Provide operational input and guidance on platform design, reliability, and scalability based on deep AWS and Linux expertise. - Act as an SRE-style operator, owning production environments and contributing to continuous improvement of reliability and automation. Essential Skills - 5-10+ years of experience in DevOps, Site Reliability Engineering (SRE), or Platform Operations roles. - Strong, hands-on experience with AWS in production support environments, with deep expertise rather than surface-level exposure. - Proven experience supporting and troubleshooting core AWS services such as EC2, ECS, RDS, IAM, and networking components. - Demonstrated experience using Ansible for automation, including building, testing, and maintaining Ansible playbooks. - Experience working with Terraform, including running, debugging, and reviewing Infrastructure-as-Code configurations. - Background in production support and SRE, including incident response, system ownership, and maintaining uptime for critical systems. - Strong Linux systems experience, including administration, troubleshooting, and performance tuning in production environments. - Solid understanding of AWS security best practices, including IAM, encryption, and key management. - Strong troubleshooting skills across automation, AWS services, service reliability, and networking technologies. - Ability to operate as a plug-and-play resource, quickly ramping up on the existing technical stack and taking ownership of production responsibilities. - Comfort working in time zones aligned with Eastern or Central hours to support production coverage. Additional Skills & Qualifications - Experience with Datadog or other monitoring tools such as Amazon CloudWatch, Splunk, or AppDynamics. - Exposure to containerization technologies, including Docker and AWS ECS or EKS/Kubernetes. - Hands-on AWS networking experience, including VPCs, routing, VPNs, firewalls, DNS, and transit gateways. - Experience working in regulated environments such as MedTech, healthcare, or similar industries. - AWS certifications at the Associate or Professional level. - Software development Background, for example with C#, .NET, or scripting languages. - Experience supporting monitoring and observability migrations, such as moving from CloudWatch to Datadog. - Strong communication and collaboration skills to work effectively with global teams and cross-functional stakeholders. Work Environment This role operates in a remote work environment, aligned primarily to Eastern or Central time zones to support production coverage. You will work within a global, regulated MedTech setting, focusing on AWS-based production environments and modern cloud-native infrastructure. The technology stack centers on AWS services, Linux systems, Ansible for automation, Terraform for Infrastructure-as-Code, and monitoring tools such as Datadog and CloudWatch. The culture emphasizes stability, reliability, and long-term growth, with a focus on strong technology practices and a stable operational environment. You will collaborate with distributed teams, participate in incident response, and contribute to ongoing cloud initiatives and migrations while working from a remote setup. Job Type & Location This is a Contract position based out of Boston, MA. Pay and Benefits The pay range for this position is $60.00 - $70.00/hr. Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, the benefits available for this temporary role may include the following: - Medical, dental & vision - Critical Illness, Accident, and Hospital - 401(k) Retirement Plan - Pre-tax and Roth post-tax contributions available - Life Insurance (Voluntary Life & AD&D for the employee and dependents) - Short and long-term disability - Health Spending Account (HSA) - Transportation benefits - Employee Assistance Program - Time Off/Leave (PTO, Vacation or Sick Leave) Workplace Type This is a fully remote position. Application Deadline This position is anticipated to close on Jun 26, 2026. About Actalent Actalent is a global leader in engineering and sciences services and talent solutions. We help visionary companies advance their engineering and science initiatives through access to specialized experts who drive scale, innovation and speed to market. With a network of almost 20,000 consultants and 5,000 clients across the U.S., Canada, Asia and Europe, Actalent serves many of the Fortune 500. We are proud to be an Engineering News-Record (ENR) Top 500 Design Firm for our engineering design services and a ClearlyRated Best of Staffing winner for both client and talent service. The company is an equal opportunity employer and will consider all applications without regard to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law. If you would like to request a reasonable accommodation, such as the modification or adjustment of the job application process or interviewing process due to a disability, please email actalentaccommodation@actalentservices.com for other accommodation options. San Francisco Fair Chance Ordinance: Pursuant to the San Francisco Fair Chance Ordinance, for all positions located in the city and county of San Francisco, we will consider for employment qualified applicants with arrest and conviction records. Massachusetts Lie Detector: It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability. Use of Artificial Intelligence (AI): We may use Artificial Intelligence (AI) to support parts of our hiring process, including sourcing, screening, and evaluating candidates. AI helps assess applications and qualifications, but final decisions are made by our hiring team. By applying, you acknowledge and agree that your application may be reviewed using AI tools.

Massachusetts
$60 - $70 / hour
RHI Magnesita logo

Senior DevOps Analyst

RHI Magnesita

Welcome to RHI Magnesita: The driving force of the refractory industry 🔥🌎

Full TimeRemoteTeam 10,001+H1B No Sponsor

• Receive and synthesize data about customer business and technical requirements and address them with technical architecture(s) • Understand Customer Technical Environment and map architecture and digital transformation solutions to customer business outcomes • Provide guidance and support with technical expertise to teams in building cloud infrastructure and developing features for Cloud Solutions • Keep up with best practices, software development trends, and innovations

Brazil
Full TimeRemoteTeam 11-50Since 2012H1B No Sponsor

• Design the overall system architecture aligned with the prescribed technology stack • Develop backend services for message ingestion, processing, validation, and storage • Develop frontend components for user interaction, including dashboards and detailed views • Implement message parsing and validation logic for MTF messages • Ensure extensibility of configurable elements such as message categories • Integrate the application with Microsoft Exchange using EWS to monitor the registry mailbox • Implement authentication and authorization via Keycloak and Active Directory • Design and implement the database schema • Implement tracking of message delivery, read, and acknowledgement status • Implement validation of MTF messages against both generic and type-specific structures • Configure and maintain Amazon Azure DevOps pipelines for build and release • Produce deployable RPM packages • Ensure the application can be deployed on Oracle Linux environments • Collaborate with stakeholders to refine requirements and priorities

Romania