Market-leading solutions that empower governments to build thriving communities, grow businesses and protect citizens.
Lead Site Reliability Engineer
Location
United States
Posted
1 day ago
Salary
$160K - $185K / year
Seniority
Senior
Job Description
Lead Site Reliability Engineer
Accela
• Serve as a technical leader for reliability engineering, operational excellence, and platform modernization across the Civic Platform. • Drive platform modernization initiatives, including the continued evolution from VM-based architectures toward containerized and cloud-native services, in partnership with DevOps Engineering, Database Engineering, Security, and Development teams. • Lead efforts that improve and sustain the availability, performance, scalability, security, and cost efficiency of Accela's SaaS offerings. • Define, implement, and operate service level objectives (SLOs), service level agreements (SLAs), and error budgets for critical platform services, using data to drive prioritization and risk-based decision making. • Lead observability initiatives across metrics, distributed tracing, logging, and monitoring platforms to improve system visibility and accelerate issue detection and resolution. • Drive Root Cause Analysis (RCA) efforts for complex production incidents, facilitate blameless postmortems, and ensure corrective actions are implemented and tracked to completion. • Design, develop, and maintain automation, tooling, and software solutions that improve reliability, operational efficiency, scalability, and developer productivity. • Serve as a senior technical escalation point during production incidents and for platform changes that impact availability, performance, security, or compliance. • Partner with Security and Compliance teams to ensure platform operations meet regulatory and compliance requirements, including SOC 2, HIPAA, FedRAMP, StateRAMP, and PCI-DSS. • Translate operational metrics, reliability trends, and platform health data into actionable insights for engineering leadership and executive stakeholders. • Mentor engineers across the Cloud Engineering organization and influence engineering best practices through technical leadership and collaboration
Job Requirements
- 8+ years of experience in Site Reliability Engineering, Software Engineering, Cloud Infrastructure, or related disciplines within a SaaS environment, including experience leading complex technical initiatives.
- Demonstrated technical leadership driving platform modernization in containerized and orchestrated environments, including Kubernetes or equivalent technologies.
- Hands-on experience operating and supporting large-scale SaaS platforms on Microsoft Azure.
- Experience developing automation and operational tooling using Python, PowerShell, Bash, or similar scripting languages.
- Deep expertise designing, operating, analyzing, and troubleshooting complex distributed systems across the application, infrastructure, networking, and operating system layers.
- Strong experience with modern observability platforms, including monitoring, logging, metrics, and distributed tracing.
- Demonstrated success leading incident response, Root Cause Analysis, and continuous improvement initiatives.
- Experience establishing and maturing Incident, Problem, and Change Management practices.
- Strong written and verbal communication skills with the ability to effectively communicate technical concepts to engineering leadership and executive stakeholders.
- Experience using Git and GitHub-based development workflows.
Benefits
- flexible time off
- comprehensive medical, dental, and vision plans
- family planning benefits
- 401(k) retirement savings plan with company match
- health savings account with company contributions
- flexible spending account
- life, accident, and disability coverage
- business travel insurance
- employee assistance programs
- other well-being benefits
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior DevOps – Platform Engineer
IntervAIAutomating early-stage recruitment with intelligent, bias-free AI interviews.
• CI/CD & Deployments: Design, implement, and improve CI/CD pipelines and deployment processes. • Infrastructure Management: Build and maintain scalable, secure, and reliable infrastructure across cloud, on-premise, or hybrid environments. • Orchestration & IaC: Manage Kubernetes-based platforms, containerized workloads, and improve infrastructure automation using Infrastructure as Code (IaC) tools. • Observability: Implement and enhance logging, monitoring, alerting, and general observability practices across all platforms. • Reliability & Performance: Monitor system health, troubleshoot production issues, improve platform reliability, and contribute to availability and operational improvement initiatives. • Collaboration: Partner with security, infrastructure, and engineering teams to support scalable application deployments, ensure compliant platform operations, and assist in incident response and root cause analysis.
DevOps Engineer
Louco Event Media GmbHLouco is the game changer in live entertainment. The first platform that truly understands what users want and delivers hyper-personalized event experiences.
Role Description We are looking for a hands-on DevOps Engineer to support our infrastructure, deployments, security, and scalability as we prepare for our next growth phase. This is a remote freelance position (approximately 20–40 hours per month initially), with the potential to grow over time. What You Will Do - Manage and optimize our cloud infrastructure - Maintain development, staging, and production environments - Build and improve CI/CD pipelines - Support deployments and release processes - Monitor uptime, performance, and security - Implement backup and disaster recovery strategies - Configure domains, SSL certificates, and networking - Support developers with infrastructure-related topics - Improve scalability, reliability, and system performance Qualifications - Strong Linux administration skills - Experience with Hetzner Cloud or similar cloud environments - Docker and container management - CI/CD pipelines - GitHub Actions - PostgreSQL - Monitoring and logging tools - Security best practices - DNS, SSL and networking knowledge - Backup and disaster recovery concepts Requirements - Supabase experience (Nice to Have) - Nginx or Traefik (Nice to Have) - Terraform (Nice to Have) - Kubernetes (Nice to Have) - Mobile app infrastructure experience (Nice to Have) - Startup experience (Nice to Have) Benefits - Remote - Freelance - Part-Time (20–40 hours/month initially) - Flexible working hours - Long-term collaboration preferred How to Apply Please send: - CV or LinkedIn profile - Hourly rate expectation - Availability (hours per month) - Short description of relevant DevOps projects We look forward to hearing from you. Company Description Louco is the game changer in live entertainment. The first platform that truly understands what users want and delivers hyper-personalized event experiences.
• Manage and optimize our cloud infrastructure • Maintain development, staging, and production environments • Build and improve CI/CD pipelines • Support deployments and release processes • Monitor uptime, performance, and security • Implement backup and disaster recovery strategies • Configure domains, SSL certificates, and networking • Support developers with infrastructure-related topics • Improve scalability, reliability, and system performance
Senior DevOps Engineer
RecruityTalentConnecting top IT and Executive talents with great companies in EMEA/LATAM through tailored recruitment solutions.
• Support startups and enterprises in Bulgaria, EMEA, and LATAM by connecting them with top talent. • Collaborate with clients to meet technical requirements and project needs. • Build and maintain infrastructure solutions and CI/CD pipelines. • Monitor and optimize system performance and reliability.



