Job Closed

This listing is no longer active.

EverOps

The Embedded Service Provider

Lead Site Reliability Engineer

DevOps EngineerDevOps EngineerOther RemoteTeam 51-200H1B No SponsorCompany Site LinkedIn

Location

United States

Posted

102 days ago

Salary

No structured requirement data.

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description This role involves owning and executing a comprehensive IT support automation strategy designed to significantly reduce ticket volume and human intervention. - Eliminating tickets before they are created - Automating resolution paths when tickets do occur - Building durable automation frameworks across SaaS and internal platforms - Removing systemic friction across the IT lifecycle You will operate heavily within the IT support domain, addressing areas such as: - Account lockouts and access management - Provisioning and deprovisioning workflows - Device and asset lifecycle management - Standard internal IT requests - SaaS integrations and workflow orchestration The expectation is leadership-level ownership. You will define the automation roadmap, architect solutions, and drive initiatives from intake through deployment with measurable outcomes. Qualifications - 8+ years in SRE, Platform Engineering, DevOps, or Automation Engineering - Proven experience designing enterprise-scale automation systems - Strong exposure to IT support domains (access, provisioning, identity, device lifecycle, SaaS operations) Requirements - Deep experience designing and consuming REST APIs - Strong understanding of authentication and authorization patterns - Experience orchestrating workflows across multiple SaaS platforms - Strong proficiency in Python or Go - Experience building production-ready services - Advanced scripting for orchestration and automation logic - Strong familiarity with at least one major cloud provider (AWS, GCP, or Azure) - Containerization and Kubernetes exposure - Infrastructure as Code experience - Networking fundamentals - Identity and access concepts - Understanding of asset lifecycle management - Experience leading technical initiatives from idea through deployment - Ability to mentor junior engineers - Strong written and verbal communication skills - Comfortable influencing cross-functional stakeholders - Data-driven decision-making approach Benefits - 100% Remote Workplace - Unlimited Paid Time Off - Equity – Become a true owner of the company - 401K with company contribution and sponsored healthcare - Professional Growth – Access to training and certification programs

Job Requirements

8+ years in SRE, Platform Engineering, DevOps, or Automation Engineering
Proven experience designing enterprise-scale automation systems
Strong exposure to IT support domains (access, provisioning, identity, device lifecycle, SaaS operations)
Deep experience designing and consuming REST APIs
Strong understanding of authentication and authorization patterns
Experience orchestrating workflows across multiple SaaS platforms
Strong proficiency in Python or Go
Experience building production-ready services
Advanced scripting for orchestration and automation logic
Strong familiarity with at least one major cloud provider (AWS, GCP, or Azure)
Containerization and Kubernetes exposure
Infrastructure as Code experience
Networking fundamentals
Identity and access concepts
Understanding of asset lifecycle management
Experience leading technical initiatives from idea through deployment
Ability to mentor junior engineers
Strong written and verbal communication skills
Comfortable influencing cross-functional stakeholders
Data-driven decision-making approach

Benefits

100% Remote Workplace
Unlimited Paid Time Off
Equity – Become a true owner of the company
401K with company contribution and sponsored healthcare
Professional Growth – Access to training and certification programs

Related Categories

DevOps Engineer

Related Job Pages

More Remote Jobs

More DevOps Engineer Jobs

DevOps Cloud Engineer, Open LMS

Learning Technologies Group plc

LTG is a leader in corporate digital learning and talent management.

DevOps Engineer102 days ago

Full Time RemoteTeam 5,001-10,000Since 2013H1B No Sponsor

Company Site LinkedIn

• Using automation and Infrastructure as a Code (IaC) to continuously improve reliability, scalability, and performance of services deployed on AWS. • Performance tuning and configuration of both Linux system and application parameters supporting highly concurrent web stacks. • Manage infrastructure through code using configuration management and IaC templating software such as Terraform and Puppet. • Documenting procedures and knowledge base articles throughout problem resolution and architecture development processes. • Monitoring the availability, performance, and health of production systems to meet service level objectives using monitoring systems such as Icinga, Prometheus, Grafana, CloudWatch, and Loki. • Participating in emergency incident response on-call rosters. • Practicing blameless postmortems that lead to improvements in resiliency and reductions in alert fatigue.

Ansible Apache HTTP Server AWS Chef DNS Amazon EC2 Grafana LAMP Linux MariaDB MySQL PHP PostgreSQL Prometheus Puppet Python SMTP SQL Terraform

View details: DevOps Cloud Engineer, Open LMS

Colombia

Apply

Senior IAM Operations – Reliability Engineer

Genesys

Orchestrating billions of remarkable experiences in more than 100 countries – through cloud, digital and AI technology.

DevOps Engineer102 days ago

Full Time RemoteTeam 5,001-10,000Since 1990H1B Sponsor

Company Site LinkedIn

• Resolve IAM-related incidents through hands-on troubleshooting and remediation, serving as an escalation point for other junior engineers on the team. • Monitor observability, AIOps, and event management platforms to identify anomalies, authentication failures, provisioning delays, and emerging IAM-related incidents. • Perform incident triage and correlation to determine probable cause and appropriate routing for deeper investigation. • Validate automated remediation workflows and assist in identifying repeated manual IAM tasks that could be automated. • Participate in early-stage automation and AI-readiness activities by documenting remediation steps, key patterns, and operational edge cases related to identity services. • Reduce alert noise by suggesting adjustments to IAM-related thresholds, suppression logic, or detection rules. • Support post-incident reviews by providing relevant data, timelines, and insights related to identity service behavior. • Collaborate with Cloud, Network, Security, Endpoint, and ServiceNow teams to support incident resolution and improve IAM operational processes. • Assist with access lifecycle, certification, or remediation workflows by troubleshooting failures, validating outcomes, and performing manual intervention when automation isn’t available. • Ensure accuracy of identity event data, alerts, and service mappings to support effective correlation within monitoring and CMDB systems. • Troubleshoot and resolve IAM-related incidents across authentication, authorization, provisioning, deprovisioning, and access lifecycle workflows. • Analyze logs, events, and telemetry from IAM platforms (e.g., Okta system logs, Microsoft Entra ID (formerly Azure Active Directory) sign-in logs, directory events) to determine service impact and root causes. • Support correlation of IAM events with dependencies across cloud applications, SaaS platforms, endpoints, and network access paths. • Participate in validating IAM automation workflows such as Joiner/Mover/Leaver processes, access provisioning, and deprovisioning flows and apply fixes and minor enhancements to automation or AIOps capabilities. • Assist in identifying IAM-related automation opportunities by documenting repeated failure modes and manual remediation steps. • Support certificate, trust, and integration troubleshooting for IAM-connected applications and services. • Maintain IAM-focused dashboards and alerts, ensuring clear signals and early detection of user-impacting identity issues. • Provide knowledge-sharing to team members and peers regarding common IAM troubleshooting patterns and operational best practices. • Participate in IAM readiness activities for new application onboarding, platform changes, or lifecycle process updates by reviewing operational and monitoring requirements.

Azure ServiceNow

View details: Senior IAM Operations – Reliability Engineer

Brazil

Apply

Job Closed

Principal Site Reliability Engineer

RTX

DevOps Engineer102 days ago

Other RemoteTeam 10,001+Since 2020H1B No Sponsor

Company Site LinkedIn

• Spend your days working to automate and improve reliability and continue to push FlightAware's infrastructure forward, ensuring it is resilient and reproducible. • Be responsible for service availability, performance, monitoring, incident response, and capacity planning. • Create, improve, and manage environments to ensure decisions on resource allocation, problem identification, and capacity planning are based on accurate data-driven insights. • Maintain a physical infrastructure using Kubernetes, Linux, & Ceph, and a cloud infrastructure in AWS as part of the Site Reliability Engineering team. • Impact technology decision and direction to grow and support the FlightAware platform. • Collaborate closely with fellow SREs on your team and extend your collaboration across other FlightAware teams and disciplines to design dependable and scalable solutions and services. • Identify, implement, and champion process improvements to enhance productivity, collaboration, and delivery efficiency, while ensuring alignment with company goals and industry best practices.

Ansible AWS Docker Kubernetes Linux SaltStack Terraform Unix

View details: Principal Site Reliability Engineer

Washington

$107.5K - $204.5K / year

Apply

Job Closed

DevOps Engineering Manager

Roadpass Digital

Our brands help inspire, educate, and empower millions of RVers and roadtrippers to enjoy camping and the open road.

DevOps Engineer102 days ago

Other RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Drives the delivery of company level features and initiatives while prioritizing work alignment with product and business goals • Continuously monitors deadlines and removes any blockers to ensure delivery at a team level • Be a hands on member of the team by contributing to code as an independent contributor at a senior level • Design, develop and deliver scalable and automated services and architecture • Architect, design, and implement solutions with native AWS Services and other cloud/managed services as necessary • Ensure solutions are architected and delivered using best practices and technologies • Communicate the benefits and drawbacks of infrastructure choices to technical and non-technical stakeholders • Create and apply reusable automation libraries across the company • Manage centralized monitoring and alerting for infrastructure, and enable developers to extend with application-level monitoring • Setup infrastructure for easy reporting and accountability across products • Troubleshoot production issues and perform on-call duties • Plan and coordinate infrastructure and operations for new projects and acquisitions • Monitors, advises and implements solutions to address security and risks for company Infrastructure/Ops • Evaluate DevOps priorities for the company • Manages and prioritizes day to day tasks for DevOps team • Holds regular 1:1 meetings with direct reports allowing for two way feedback • Lead, mentor, and manage a team of DevOps engineers fostering a culture of collaboration, accountability, and continuous improvement • Owns performance reviews, and career growth for direct reports, and actively participates in hiring processes.

Ansible AWS Docker Kubernetes Linux Python Ruby Terraform

View details: DevOps Engineering Manager

Colorado

Apply

Job Closed

Lead Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DevOps Cloud Engineer, Open LMS

Senior IAM Operations – Reliability Engineer

Principal Site Reliability Engineer

DevOps Engineering Manager