Job Closed
This listing is no longer active.
Site Reliability Engineer
Location
Florida + 4 moreAll locations: Florida | New Jersey | Massachusetts | Missouri | Texas
Posted
73 days ago
Salary
$86K - $109.3K / year
Seniority
Senior
Job Description
Site Reliability Engineer
Zelis
• Define a unified vision for observability across all platforms, with golden signals as the foundation for monitoring and alerting • Develop and maintain a comprehensive roadmap to improve observability, reduce tool redundancy, and standardize practices across platforms • Establish and track key performance indicators (KPIs) to measure progress and ensure accountability for roadmap milestones • Partner with the ZEIT SRE team and engineering leads to break down silos and promote consistent observability practices • Standardize the implementation of golden signals across applications to improve system reliability and incident detection • Identify and address gaps in existing observability practices, prioritizing long-term scalability and reliability • Measure and report on observability success metrics, including actionable alert volume and reduced issue escalations
Job Requirements
- Bachelor’s degree in Computer Science, Information Technology, or a related field (or equivalent experience)
- Minimum of 5 years of experience in Site Reliability Engineering, DevOps, or a related role with a strong focus on observability
- 5+ years of hands-on experience with .NET (C#), including advanced knowledge of ASP.NET Core, Web APIs, and performance optimization
- Deep understanding of SRE principles and golden signals for system monitoring
- Proficiency with observability tools such as Prometheus, Grafana, Splunk, New Relic, or Datadog
- Familiarity with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes)
- Advanced proficiency in scripting languages such as PowerShell
- Experience in front-end development using React.js
- Advanced knowledge of .NET
Benefits
- 401k plan with employer match
- flexible paid time off
- holidays
- parental leaves
- life and disability insurance
- health benefits including medical, dental, vision, and prescription drug coverage
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Writing Ansible and Terraform to expand and automate a large Elastic Stack implementation • Scripting in Python or Ruby to automate tool integration and processes • Automating the development of security controls, including firewall rules and policy and IPS policy. • Automating new integration with the Elastic Stack SOAR automation • Developing custom enhancements to COTS tools to improve their functionality and enrich data. • Automating server configuration for security, including logging, key changes, and system hardening. • Automate and enhance CI/CD pipelines and environments. • Automating the implementation of security controls in Amazon Web Services (AWS) via the AWS API. • Build out auto-provisioning and auto-scaling of security infrastructure. • Developing security enhancements to improve the security posture of our Government clients. • Building blue team defenses to detect and block the adversary.
• Lead the Reliability Engineering and Metro Engineering functions, overseeing both the physical expansion of metro networks and the observability systems that support them. • Own the end-to-end Tier 3 escalation lifecycle, working with NOC and Incident Management teams to drive a blameless engineering culture focused on systemic improvement and data-driven root cause analysis. • Define the roadmap for Infrastructure-as-Code and GitOps workflows, collaborating with software and network teams to ensure configurations are version-controlled, auditable, and deployed via CI/CD. • Drive the strategy for closed-loop automation by partnering with software engineering teams to implement systems that leverage real-time streaming telemetry for autonomous fault detection and remediation. • Champion the elimination of operational toil; work across the organization to automate change verification and routine maintenance, allowing the NRE team to focus on high-value reliability engineering.
Site Reliability Engineer (SRE)
Radiance TechnologiesRadiance Technologies, Inc. is an employee-owned small business prime contractor. Radiance leads the way in developing government and commercial customer-focused solutions. Leveraging its record of technical innovation and operational expertise, Radiance Technologies offers: • Cyber Solutions • Systems Engineering • Technology Development, Production, Testing, and Evaluation • Technology Application • Intelligence Community Support • Government Program Support The company’s 900+ employees in 15+ U.S. and international offices serve customers in the Department of Defense (DOD), National Aeronautics and Space Administration (NASA), the national intelligence community, the Department of Homeland Security (DHS), other government organizations, and selected commercial customers. Radiance Technologies continues to attract and retain talented motivated employees by being an employee-owned company – founded with the idea of providing an environment, a benefits package, and a stock ownership plan that are second to none. For more information, visit www.radiancetech.com. Radiance Technologies, Inc. – Concepts to Capabilities®
Salary Range: $75,000 - $100,000 At Radiance our SREs own the reliability of systems they don't write - defining what "reliable enough" means from the user’s perspective, instrumenting and measuring against those targets, and building the tooling and runbooks that make failure recoverable. They partner with dev teams pushing operational quality upstream before code ships, and they lead the resolution in production when things go wrong. SREs are comfortable debugging distributed systems, resolving incidents, and translating findings into lasting reliability improvements. Day to day responsibilities fall into four categories: Incident Response, Toil Reduction, Reliability Evaluations, Platform Enablement Required Qualifications - 1+ years of experience in Operations, Sys Admin, DevOps, or Software engineering - Bachelor’s Degree in CS, Computer Engineering, or related technical field - US Citizenship & must have or be able to obtain a Top Secret Clearence - Systems thinking – understanding how systems fail together, blast radius, and more - Observability Fundamentals – not just the 3 signals, but knowing why and how to use telemetry to optimize services and engineering quality of life - Basic software engineering – building automation & non-trivial APIs, git workflows, effectively engaging in code reviews - Linux/networking fundamentals - Strong Communication, Collaboration, and Organizational Skills Specialty Skills: (1 or more) - Platform & Infrastructure - Kubernetes, ArgoCD/GitOps, disaster recovery, capacity planning - Observability - OTel standards, Grafana/Perses, Tempo, Clickhouse, VictoriaMetrics - Automation & Toil Reduction - scripting, CI/CD, runbook automation, “DevOps” - Developer Enablement - instrumentation SDKs, SRE practice onboarding - Data & Alerting - dashboard quality, alert design, anomaly detection Desired Qualifications - SRE Certifications from The DevOps Institute, AWS Solution Architect, or similar - Hands-on experience with: Python, Go, Kubernetes, Argo CD, GitLab/GitHub, Jenkins, Docker, Locust/Gatling, Prometheus, Grafana/Perses Radiance Technologies is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status.
Site Reliability Engineer
Precisely US JobsPrecisely is the leader in data integrity. We empower businesses to make more confident decisions based on trusted data through a unique combination of software, data enrichment products and strategic services. Focused on delivering outstanding innovation and support that helps customers increase revenue, lower costs and reduce risk Powers better decisions for more than 12,000 global organizations, including 95 of the Fortune 100 2500 employees unified by four core values: Openness, Determination, Individuality, and Collaboration Committed to career development for employees with opportunities for growth, learning, and building community "Work from anywhere" culture celebrating diversity in a distributed environment with a presence in 30 countries and 20 offices across 5 continents
Role Description This position is 100% remote located anywhere in the United States. You help keep our platforms reliable, available, and easy to operate for teams across the company. We rely on you to improve system stability through automation, monitoring, and thoughtful operational practices. You primarily support cloud-based services and reliability initiatives, while also helping maintain a stable network environment when needed. You work closely with others to reduce incidents, improve resilience, and support systems that people depend on every day. They are trusted to balance reliability engineering with network support as part of a broader infrastructure role. What you will do: - You improve system reliability through monitoring, automation, and operational improvements. - You support cloud and platform environments to ensure services remain available and resilient. - You respond to incidents, help restore service, and reduce the chance of repeat issues. - You build and maintain monitoring, alerting, and operational tooling. - You support production changes and infrastructure improvements using established processes. - You provide secondary support for network systems, ensuring connectivity remains stable. - You assist with routine network tasks such as maintenance, upgrades, and troubleshooting. - You support secure connectivity between cloud services, offices, and remote users. - We rely on you to document systems, changes, and operational practices. - They are trusted to protect critical services and improve reliability over time. Qualifications - 5 years of experience supporting production systems, platforms, or infrastructure. - Experience supporting reliable systems in a production environment. - Experience responding to incidents and restoring service. - Experience working with cloud or virtual environments. - Ability to automate, monitor, and improve system operations. - Comfort supporting infrastructure changes and upgrades. - No travel required. - Familiarity with network concepts such as Fortinet firewalls, Cisco routing, F5 Load balancing or virtual private connectivity. - Familiarity with cloud networking or hybrid environments. - Bonus points for experience with certificates or infrastructure automation. Company Description Precisely is the leader in data integrity. We empower businesses to make more confident decisions based on trusted data through a unique combination of software, data enrichment products and strategic services. What does this mean to you? For starters, it means joining a company focused on delivering outstanding innovation and support that helps customers increase revenue, lower costs and reduce risk. In fact, Precisely powers better decisions for more than 12,000 global organizations, including 95 of the Fortune 100. Precisely's 2500 employees are unified by four company core values that are central to who we are and how we operate: Openness, Determination, Individuality, and Collaboration. We are committed to career development for our employees and offer opportunities for growth, learning and building community. With a "work from anywhere" culture, we celebrate diversity in a distributed environment with a presence in 30 countries as well as 20 offices in over 5 continents. Learn more about why it's an exciting time to join Precisely!




