Market-leading solutions that empower governments to build thriving communities, grow businesses and protect citizens.
Site Reliability Engineer
Location
United States
Posted
3 days ago
Salary
$125K - $145K / year
Seniority
Senior
Job Description
Site Reliability Engineer
Accela
• Contribute to the operation, maintenance, and continuous improvement of Accela's production cloud environments. • Support platform modernization initiatives, including containerization, cloud-native technologies, and automation efforts. • Monitor platform health, availability, performance, and capacity using modern observability and monitoring tools. • Participate in incident response activities, troubleshooting production issues and contributing to Root Cause Analysis efforts. • Develop and maintain automation, tooling, and scripts that improve reliability, scalability, deployment efficiency, and operational effectiveness. • Support the implementation and monitoring of service level objectives (SLOs), service level agreements (SLAs), and operational metrics. • Partner with Development, DevOps, Database Engineering, and Security teams to identify and resolve reliability, performance, and scalability challenges. • Assist with platform deployments, operational readiness reviews, and change management activities. • Contribute to observability initiatives through monitoring, logging, metrics collection, and distributed tracing. • Support compliance-related operational activities associated with SOC 2, HIPAA, FedRAMP, StateRAMP, and PCI-DSS environments. • Participate in post-incident reviews and contribute to corrective and preventive actions that improve platform stability.
Job Requirements
- 4+ years of experience in Site Reliability Engineering, Cloud Operations, Systems Engineering, DevOps, Software Engineering, or a related technical discipline.
- Experience supporting cloud-based SaaS environments, preferably within Microsoft Azure.
- Experience with Kubernetes and containerized application environments.
- Working knowledge of scripting and automation using Python, PowerShell, Bash, or similar languages.
- Experience troubleshooting distributed systems across application, infrastructure, networking, and operating system layers.
- Familiarity with monitoring, logging, metrics, and observability platforms.
- Strong analytical and problem-solving skills with a structured approach to troubleshooting and Root Cause Analysis.
- Experience working within Incident, Problem, and Change Management processes.
- Strong written and verbal communication skills and the ability to work effectively with cross-functional teams.
- Experience using Git and GitHub-based workflows.
Benefits
- flexible time off
- comprehensive medical, dental, and vision plans
- family planning benefits
- 401(k) retirement savings plan with company match
- health savings account with company contributions
- flexible spending account
- life, accident, and disability coverage
- business travel insurance
- employee assistance programs
- other well-being benefits
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior Site Reliability Engineer, SRE
OowlishWe make innovation simple, convenient and right...we just make it HAPPEN
• Design, implement, and improve Site Reliability Engineering practices across production environments. • Define, manage, and continuously improve Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets. • Lead and participate in incident response and incident command processes. • Build and evolve observability strategies, including monitoring, logging, alerting, and distributed tracing. • Improve system reliability, availability, scalability, and operational efficiency. • Partner with engineering teams to improve application performance and production readiness. • Develop automation solutions that reduce operational overhead and improve reliability. • Participate in root cause analysis and post-incident reviews. • Drive continuous improvement initiatives based on operational insights and incident learnings. • Help establish reliability best practices across teams and services.
• Help strengthen the security of our software delivery lifecycle, cloud infrastructure, and production environment • Work closely with engineering, IT, product, quality, and compliance teams to improve CI/CD security, software supply chain security, Azure and AKS security posture, infrastructure security baselines, vulnerability management, privileged access controls, and technical audit readiness.
• Design, implement, and maintain scalable Kubernetes infrastructure on GKE/EKS • Develop and manage Infrastructure as Code using Terraform, Helm, and Ansible • Build and improve CI/CD pipelines for fast and reliable deployments • Implement and maintain monitoring, logging, and alerting solutions • Support PostgreSQL and Kafka environments • Automate operational tasks using Python and Bash scripting • Troubleshoot production issues across cloud and Kubernetes environments • Collaborate with developers to improve deployment and operational processes • Participate in on-call rotation and production support
Role Description Intetics Inc., a global technology company providing custom software application development, distributed professional teams, software product quality assessment, and “all-things-digital” solutions, is seeking a highly skilled and experienced Lead DevOps Engineer to join our dynamic team on a full-time basis. You will be the first hire in a brand-new Platform team, reporting directly to the CTO and shaping the function from the ground up. The team is expected to grow to 3–4 people over time. You will work closely with: - Architect - Dev Team Lead - System Administrator - Development Team Responsibilities: - Design and implement Infrastructure as Code practices - Build and improve observability (monitoring, logging, tracing) - Stabilize and evolve production environments - Support multi-environment deployments (Azure, private cloud, on-premise) - Improve platform reliability and system health - Participate in incident response and post-mortem analysis Qualifications - Hands-on experience running real production systems - Participation in on-call rotations - Experience handling incidents and writing post-mortems - Strong understanding of system health vs. superficial metrics - Must Have: Azure & .NET Ecosystem Requirements - Solid experience working with: - Microsoft Azure - Windows Server - IIS - Windows Services - SQL Server - Azure networking




