Job Closed

This listing is no longer active.

Lirio

Senior System Reliability Engineer

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 51-200

Location

United States

Posted

191 days ago

Salary

$130K - $150K / year

Seniority

Senior

AWS Azure Kubernetes Apache Kafka Java TypeScript Groovy SQL Terraform Prometheus Grafana Datadog Git Python Linux DNS TCP/IP TLS

Job Description

Role Description The Senior System Reliability Engineer (SRE) at Lirio is responsible for the reliability, scalability, and performance of our cloud-native applications and infrastructure. This role leads the design and implementation of automation, monitoring, and incident response processes, and mentors other engineers in SRE best practices. The Senior SRE partners with development teams to ensure robust, secure, and highly available systems, and drives continuous improvement in operational excellence. This role operates as a senior, hands-on reliability engineer embedded with product and platform teams. The Senior SRE is accountable for: - Defining and enforcing service-level objectives (SLOs) - Reducing operational toil through automation - Improving system reliability through proactive engineering rather than reactive support This role is not ticket-driven operations and is expected to influence architecture, development practices, and incident readiness across the platform. Essential Duties & Responsibilities - Reliability Engineering & Automation (40%) - Architect, implement, and maintain automated solutions for deployment, monitoring, alerting, and incident response using Lirio’s technology stack (AWS, Azure, Kubernetes, Kafka, Java, TypeScript, Groovy, Databases/SQL). - Develop and manage infrastructure as code (e.g., Terraform, AWS CloudFormation). - Build and optimize CI/CD pipelines for seamless, reliable delivery. - Define, implement, and continuously refine service-level indicators (SLIs), service-level objectives (SLOs), and error budgets for critical services. - Identify and reduce operational toil through automation, platform improvements, and architectural changes. - Performance analysis and optimization of Lirio systems and services. - Ensure high availability and scalability of services through proactive engineering, load testing, and capacity planning across multi-tenant and client-specific environments. - Peer Reviews & Collaboration (10%) - Review infrastructure changes, automation scripts, and reliability-impacting code changes to ensure production readiness. - Collaborate with software engineers to embed reliability, security, and operational best practices into development workflows. - Partner with software engineering teams during design and architecture discussions to identify reliability risks early. - Operational Support & Incident Management (20%) - Monitor system health using modern observability tools (e.g., Prometheus, Grafana, Datadog). - Participate in a defined on-call rotation supporting production systems, with clear escalation paths and expectations. - Contribute to and maintain incident severity definitions, response procedures, and no-blame postmortem practices. - Lead incident response, root cause analysis, and postmortems for production issues. - Triage and resolve issues, ensuring minimal downtime and rapid recovery. - Support client onboarding and production rollouts by ensuring reliability, observability, and operational readiness standards are met. - Mentorship & Knowledge Sharing (10%) - Mentor and coach engineers on reliability engineering principles, operational ownership, and incident response best practices. - Design processes to share operational knowledge and avoid single points of failure. - Advise colleagues on architecture and reliability strategies. - Help establish shared operational ownership across teams to reduce single points of failure and knowledge silos. - Continuous Learning & Innovation (10%) - Stay current with industry trends in reliability engineering, cloud operations, and automation. - Bring innovation to operational practices and system design, evaluating and introducing new tools and technologies as appropriate for Lirio. - Evaluate new tooling with an emphasis on operational simplicity, security, and long-term maintainability. - Documentation & Process Improvement (5%) - Define and document operational processes, incident response playbooks, and reliability standards. - Contribute to operational planning, incident reviews, and reliability documentation. Qualifications - 5-7 years related experience - Bachelor's Degree in related field - Linux systems and networking fundamentals (DNS, TCP/IP, TLS) - Distributed systems debugging and failure analysis - Load, stress, and fault-injection testing - CI/CD tools and processes - Version control (e.g., Git) - Cloud platforms (e.g., AWS, Azure) - Containers and orchestration (Kubernetes) - Kafka (messaging/streaming) - Scripting and programming languages (e.g., Java, TypeScript, Groovy, Python) - Agile methodologies (e.g., Scrum, XP, SAFe) - Databases/SQL - Observability/monitoring tools (DataDog) Benefits - Medical (HSA available) - Dental - Vision - Short-term & long-term disability (company-paid) - Life & AD&D (company-paid) - 401K with company match - 10 paid holidays, quarterly company closure dates, + holiday week company closure - Flexible time off policy - Work from home - 6 weeks paid parental leave Salary Range $130k-$150k

Job Requirements

5-7 years related experience
Bachelor's Degree in related field
Linux systems and networking fundamentals (DNS, TCP/IP, TLS)
Distributed systems debugging and failure analysis
Load, stress, and fault-injection testing
CI/CD tools and processes
Version control (e.g., Git)
Cloud platforms (e.g., AWS, Azure)
Containers and orchestration (Kubernetes)
Kafka (messaging/streaming)
Scripting and programming languages (e.g., Java, TypeScript, Groovy, Python)
Agile methodologies (e.g., Scrum, XP, SAFe)
Databases/SQL
Observability/monitoring tools (DataDog)

Benefits

Medical (HSA available)
Dental
Vision
Short-term & long-term disability (company-paid)
Life & AD&D (company-paid)
401K with company match
10 paid holidays, quarterly company closure dates, + holiday week company closure
Flexible time off policy
Work from home
6 weeks paid parental leave
Salary Range
$130k-$150k

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

DevOps Engineer – Platform

Univention

be open.

DevOps Engineer192 days ago

Full Time RemoteTeam 51-200Since 2003H1B No Sponsor

Company Site LinkedIn

• You create the technical foundations so that our product teams can release quickly, securely, and independently. • You build our CI/CD pipelines with GitLab CI and continuously improve them. • You are responsible for build, test, and release automation for multiple product teams. • You containerize existing and new components (Docker) and operate applications on Kubernetes — including creating and maintaining Helm charts. • You maintain and further develop our Debian-based distributions. • You define reusable standards and workflows for recurring deployments and releases. • You manage vulnerabilities and ensure a robust, secure software supply chain. • You work closely with tech leads, development teams, and product management. • Your work has impact across all product teams and sustainably accelerates releases.

Ansible Docker Kubernetes Python Terraform

View details: DevOps Engineer – Platform

Germany

Apply

Senior Deployment Engineer

Karat

Karat is the world leader in technical interviewing and pioneer of the Interviewing Cloud.

DevOps Engineer192 days ago

Other RemoteTeam 201-500H1B Sponsor

Company Site LinkedIn

• Serve as the principal technical advisor to enterprise clients, establishing yourself as the authoritative voice on Karat's solutions and building high-level trust relationships. • Partner with Software Engineers globally to thoroughly analyze their hiring processes and performance requirements; ensure precise solution alignment and Karat product delivery as the lead technical expert in Customer Operations and GTM. • Work strategically with the Company's GTM team throughout the entire customer lifecycle. • Presenting Karat's technical solution to prospects as the subject matter expert. • Architecting and implementing the initial Karat interview framework for each new enterprise client. • Conducting regular strategic reviews with customers to ensure alignment with business objectives and optimal performance. • Designing and delivering executive-level training sessions for client stakeholders. • Analyze complex performance data and calibrate assessment metrics in collaboration with Karat’s Content and Data teams; translate findings into actionable strategic recommendations that strengthen client partnerships. • Drive continuous improvement of the Customer Operations and GTM teams' internal processes by identifying innovative opportunities to deliver additional value to our enterprise clients.

SQL

View details: Senior Deployment Engineer

Colorado + 13 more

$131.5K - $148.2K / year

Apply

Job Closed

DevOps Engineer

Crewfare

DevOps Engineer192 days ago

Other RemoteTeam 11-50

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description We’re hiring a hands-on senior DevOps engineer to take full ownership of our infrastructure, security posture, and deployment pipelines. This is not a “keep the lights on” role. We need someone who: - fixes what’s broken - simplifies what’s overcomplicated - hardens security - builds systems we can trust You’ll work closely with engineering leadership and application developers, and you’ll be expected to operate independently with a high bar for reliability. Qualifications - 5+ years of hands-on DevOps / infrastructure experience - Deep experience with AWS - Strong experience with Terraform or similar IaC tools - CI/CD experience (GitHub Actions, GitLab CI, or similar) - Experience running production systems with real traffic - Strong understanding of cloud security best practices - Comfortable being the sole DevOps owner Requirements - Experience with ECS or EKS - Multi-account AWS setups - SOC 2 / ISO / PCI exposure - Production migrations (shared hosting → cloud) - Experience cleaning up inherited, messy infrastructure Benefits - You must overlap meaningfully with US working hours - You communicate clearly and proactively - You take ownership - problems don’t linger - You prefer simple, robust solutions over clever ones - You are comfortable saying “this is risky” and backing it up This Role Is Not a Fit If: - You are part of an agency or want to work part-time - You need constant direction or ticket-by-ticket management - You avoid security responsibility - You’re unavailable during US hours - You prefer theoretical architecture over practical reliability

View details: DevOps Engineer

United States

Apply

Job Closed

Senior Site Reliability Engineer

Juul Labs

DevOps Engineer193 days ago

Other RemoteTeam 1,001-5,000H1B Sponsor

Company Site LinkedIn

• A Senior Site Reliability Engineer (SRE) is expected to own the operational stability and performance of Juul’s hybrid cloud infrastructure (Nutanix, AWS/GCP). • This involves leading automation efforts, architecting for reliability, and acting as the final escalation point for critical incidents to ensure the platform is scalable and efficient. • Design, deploy, and maintain enterprise-scale Nutanix AHV clusters and Prism Central for multi-cluster management. • Expert-level proficiency with Nutanix CLI (nCLI and acli) for advanced operations, troubleshooting, and automation. • Develop automation scripts using Nutanix REST APIs, Python SDK, PowerShell, and Terraform for infrastructure-as-code. • Design disaster recovery solutions using Leap, Protection Domains, cross-cluster replication, and metro clustering. • Lead L3 troubleshooting using advanced diagnostics, log analysis (CVM, Genesis), NCC health checks, and cluster service resolution.

AWS GCP Kubernetes Python TCP/IP Terraform

View details: Senior Site Reliability Engineer

United States

$150K - $184K / year

Apply

Job Closed

Senior System Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DevOps Engineer – Platform

Senior Deployment Engineer

DevOps Engineer

Senior Site Reliability Engineer