Market-leading solutions that empower governments to build thriving communities, grow businesses and protect citizens.
Principal Customer Reliability Engineer
Location
United States
Posted
3 days ago
Salary
$160K - $190K / year
Seniority
Lead
Job Description
Principal Customer Reliability Engineer
Accela
• Serve as the customer-facing technical representative for Accela's SaaS Operations organization, partnering with Engineering, SRE, Database Engineering, Product, Professional Services, and Support teams to ensure customer success. • Lead technical engagements related to SaaS implementations, migrations, and ongoing production operations, ensuring reliable and predictable outcomes for customers. • Partner with Professional Services and Support teams to identify technical requirements, define migration strategies, and facilitate successful transitions into steady-state operations. • Develop and improve operational processes, tooling, monitoring, metrics, and alerting capabilities that support customer onboarding, migrations, and ongoing platform reliability. • Act as a senior escalation point for complex customer issues, leveraging observability tools, application performance monitoring, log analysis, distributed tracing, and metrics to diagnose and resolve production concerns. • Lead cross-functional response efforts for critical customer-impacting incidents and implementation challenges, coordinating stakeholders to drive timely resolution. • Partner with customers to establish reliability expectations, communicate service-level commitments, and provide guidance regarding operational risk and change management practices. • Collaborate with Sales, Professional Services, and Support teams to communicate Accela's service management processes, cloud operations practices, and compliance posture, including SOC 2, HIPAA, FedRAMP, StateRAMP, and PCI-DSS requirements. • Provide customer-driven insights and feedback to Product, Engineering, and SRE teams to improve platform reliability, usability, and operational effectiveness. • Support pre-sales and customer expansion activities by providing technical expertise related to reliability, architecture, cloud operations, and compliance. • Provide technical leadership, mentorship, and best-practice guidance to Customer Reliability Engineers, Site Reliability Engineers, and other technical teams.
Job Requirements
- 8+ years of experience in Production Engineering, Site Reliability Engineering, Cloud Operations, Technical Support Engineering, or related SaaS environments, including customer-facing or escalation leadership responsibilities.
- Strong customer focus and demonstrated ability to communicate effectively with both technical and business stakeholders.
- Hands-on experience operating and supporting SaaS platforms on Microsoft Azure.
- Experience with Kubernetes and modern containerized environments.
- Strong experience using observability and monitoring tools, including APM platforms, distributed tracing, logging, and metrics solutions.
- Deep troubleshooting and Root Cause Analysis expertise across application, infrastructure, networking, operating system, and database layers.
- Working knowledge of Infrastructure-as-Code concepts and tools, particularly Terraform.
- Experience developing automation and operational tooling using Python, PowerShell, Bash, or similar scripting languages.
- Demonstrated ability to lead Incident, Problem, and Change Management processes during high-severity customer escalations.
- Excellent written and verbal communication skills, including experience presenting technical information to customer leadership and executive stakeholders.
- Experience using Git and GitHub-based workflows.
Benefits
- flexible time off
- comprehensive medical, dental, and vision plans
- family planning benefits
- 401(k) retirement savings plan with company match
- health savings account with company contributions
- flexible spending account
- life, accident, and disability coverage
- business travel insurance
- employee assistance programs
- other well-being benefits
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior Site Reliability Engineer, SRE
OowlishWe make innovation simple, convenient and right...we just make it HAPPEN
• Design, implement, and improve Site Reliability Engineering practices across production environments. • Define, manage, and continuously improve Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets. • Lead and participate in incident response and incident command processes. • Build and evolve observability strategies, including monitoring, logging, alerting, and distributed tracing. • Improve system reliability, availability, scalability, and operational efficiency. • Partner with engineering teams to improve application performance and production readiness. • Develop automation solutions that reduce operational overhead and improve reliability. • Participate in root cause analysis and post-incident reviews. • Drive continuous improvement initiatives based on operational insights and incident learnings. • Help establish reliability best practices across teams and services.
• Help strengthen the security of our software delivery lifecycle, cloud infrastructure, and production environment • Work closely with engineering, IT, product, quality, and compliance teams to improve CI/CD security, software supply chain security, Azure and AKS security posture, infrastructure security baselines, vulnerability management, privileged access controls, and technical audit readiness.
• Design, implement, and maintain scalable Kubernetes infrastructure on GKE/EKS • Develop and manage Infrastructure as Code using Terraform, Helm, and Ansible • Build and improve CI/CD pipelines for fast and reliable deployments • Implement and maintain monitoring, logging, and alerting solutions • Support PostgreSQL and Kafka environments • Automate operational tasks using Python and Bash scripting • Troubleshoot production issues across cloud and Kubernetes environments • Collaborate with developers to improve deployment and operational processes • Participate in on-call rotation and production support
Role Description Intetics Inc., a global technology company providing custom software application development, distributed professional teams, software product quality assessment, and “all-things-digital” solutions, is seeking a highly skilled and experienced Lead DevOps Engineer to join our dynamic team on a full-time basis. You will be the first hire in a brand-new Platform team, reporting directly to the CTO and shaping the function from the ground up. The team is expected to grow to 3–4 people over time. You will work closely with: - Architect - Dev Team Lead - System Administrator - Development Team Responsibilities: - Design and implement Infrastructure as Code practices - Build and improve observability (monitoring, logging, tracing) - Stabilize and evolve production environments - Support multi-environment deployments (Azure, private cloud, on-premise) - Improve platform reliability and system health - Participate in incident response and post-mortem analysis Qualifications - Hands-on experience running real production systems - Participation in on-call rotations - Experience handling incidents and writing post-mortems - Strong understanding of system health vs. superficial metrics - Must Have: Azure & .NET Ecosystem Requirements - Solid experience working with: - Microsoft Azure - Windows Server - IIS - Windows Services - SQL Server - Azure networking




