Job Closed
This listing is no longer active.
Elevating Autism & IDD Care through Technology
Senior Site Reliability Engineer, Security
Location
United States
Posted
103 days ago
Salary
$160K - $180K / year
Seniority
Senior
Job Description
Senior Site Reliability Engineer, Security
CentralReach
• Responsible for availability, latency, performance, efficiency, monitoring/observability, emergency response, capacity planning, setting and maintaining SLOs, SLIs and Error Budgets, creating dashboards. • Analyze, troubleshoot and resolve operational challenges contributing to defined SLO's. • Manage site stability, performance, reliability, and maintain uptime for production environments. • Develop a fully automated multi-environment observability stack based on the existing system and extend it to predict capacity needs based on the usage patterns. • Strive for automation to reduce toil and increase development velocity. • Perform application-specific production support, incident management, change management, problem management, RCAs, and service restoration as needed. • Identify changes for the product architecture from the reliability, performance and availability perspective with a data driven approach. • Document resolution run books and standard operating procedures. • Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation. • Collaborate with software development teams in the release management process and to shape the future roadmap and establish strong operational readiness across teams. • Implementation of reliability and observability tools (like New Relic, Prometheus, Grafana etc.,). • Collaborates with Security team and other platform engineering teams to build reliable, maintainable, and scalable solutions that improve our security posture.
Job Requirements
- Strong background as a SRE supporting a 24x7 highly available production environment for a SaaS or cloud service provider.
- Solid experience with Monitoring/APM/Observability tools (Splunk, New Relic etc.)
- Experience implementing observability plans around logs, metrics, and traces.
- Experience in an agile development team developing software.
- Experience with cloud infrastructure environments, preferably AWS, and Infrastructure as code (Terraform, CloudFormation).
- Extensive experience with Docker, Kubernetes, Helm, CI/CD and config management tools like Ansible, Chef.
- Strong experience with containerization technology and/or Kubernetes.
- Experience with Release automation, system administration, configuration management.
- Experience with programming languages (Java, Python, Go, etc.).
- Strong understanding of Linux, Windows, software development, systems, networking, and cloud concepts.
- Strong interpersonal and teaming skills - ability to set and enforce process and influence engineers who are not direct reports.
- Strong analytical and programming skills (Python, Go, Java etc.).
- Deep understanding around best practices for modern cloud security.
- Proven experience building observability for security concerns, such as privilege escalations and bot detection.
Benefits
- Comprehensive health benefits
- Generous PTO
- 401(k) matching
- Paid parental leave
- Hybrid work schedules
- Career development support
- Wellness programs
- Opportunities to give back through CR Cares™
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Design, implement, and maintain DevSecOps programs to support Valer’s growing infrastructure and security needs. • Develop and extend a Jenkins CI/CD pipeline for automated code testing and deployments. • Champion and implement modern DevSecOps tools and technologies (e.g. Jenkins, Docker, Kubernetes, Ansible, etc.) • Utilize monitoring and logging tools to create a platform monitoring dashboard. • Create detailed technical documentation for system architecture, pipelines, security processes, and deployment instructions to ensure knowledge transfer and audit readiness. • Understand stringent healthcare compliance rules and implement advanced security tools and practices. • Coordinate with software engineering teams to plan major releases. • Assist with configuring and maintaining secure networks.
• Design, operate, and maintain production infrastructure on GCP • Manage and optimize GKE (Kubernetes) clusters, workloads, and networking • Administer Cloud SQL (PostgreSQL) and Redis • Own IAM roles, service accounts, and least-privilege access • Manage DNS using OctoDNS across Google DNS and Cloudflare • Ensure infrastructure security, scalability, reliability, and cost efficiency • Lead migration from bash-based provisioning to Terraform / OpenTofu • Introduce and maintain Helm charts for Kubernetes deployments • Eliminate click-ops through declarative, version-controlled infrastructure • Review and improve Kubernetes manifests and deployment practices • Build and maintain CI/CD pipelines using Google Cloud Build and GitHub Actions • Support automated builds, testing, and deployments for Java/Kotlin/Python services • Improve deployment safety through rollout strategies, health checks, and rollback mechanisms • Configure monitoring and alerting using Google Operations (Cloud Monitoring & Logging) • Act as an escalation point for production incidents • Diagnose complex system issues using logs, metrics, and traces • Improve observability and reduce mean time to resolution (MTTR) • Manage secrets using Google Secret Manager, GKE Secrets, and Keeper • Enforce secure configuration and credential rotation • Partner with engineering to ensure secure runtime and deployment patterns
• You’ll help design, build, and run reliable, scalable infrastructure with a strong platform engineering and SRE mindset. • Manage critical platform components (including cloud networking). • Automate manual tasks to reduce operational toil. • Collaborate on technical decisions. • Partner closely with application and security teams. • Continuously work to improve reliability, performance, and security across the platform. • Contribute to platform engineering initiatives using Kubernetes (EKS), Helm, and Infrastructure as Code (IaC). • Maintain and improve CI/CD platforms and deployment pipelines. • Build and support observability foundations, including metrics, logging, alerting, and dashboards tied to service health. • Provision, configure, and operate scalable AWS infrastructure. • Support cloud networking and connectivity architecture. • Configure and manage Cloudflare for edge security and CDN performance. • Actively participate in incident response and post-mortems. • Implement security and compliance best practices across infrastructure. • Monitor and track service reliability using established metrics.
• Lead the initial setup of our DevOps and platform engineering practices, defining standards, tooling, and infrastructure strategy • Design and deliver an internal platform for personal or feature environments to empower developers and improve velocity • Build and maintain AWS-based infrastructure to ensure optimal performance, scalability, and security • Develop CI/CD pipelines to streamline deployments and automate release processes • Implement observability tooling (logging, monitoring, alerting) to detect and resolve issues proactively • Collaborate closely with developers to identify bottlenecks, improve delivery, and enhance system reliability • Ensure high availability and uptime through effective monitoring, incident response, and recovery strategies • Promote infrastructure automation and Infrastructure as Code best practices • Document systems, configurations, and incident processes for team-wide clarity and consistency • Participate in on-call rotations or coordinate time zone coverage to ensure 24-hour reliability across regions



