An early-stage cybersecurity software company, SimSpace was founded in 2015 to provide state-of-the-art network emulation and modeling tools that deliver realis
Staff Site Reliability Engineer
Location
United States
Posted
28 days ago
Salary
$165K - $230K / year
Seniority
Lead
Job Description
Staff Site Reliability Engineer
SimSpace
• Design and architect the overarching infrastructure strategy that enables consistent, repeatable, and secure deployments across SimSpace-hosted data centers, customer-provided hardware, and highly restricted air-gapped environments. • Lead the evolution of our CI/CD and Kubernetes platforms. Drive advanced application packaging, templating, and configuration management strategies using Jsonnet and Grafana Tanka (alongside Kustomize). Move beyond maintaining pipelines to architecting multi-cluster, multi-environment deployment frameworks that drastically improve developer velocity. • Define, measure, and govern Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets across the engineering organization. Partner with product and engineering leadership to balance feature delivery with platform stability. • Architect our enterprise observability strategy using the Grafana stack. Design frameworks for proactive monitoring, complex anomaly detection, and distributed tracing that give teams unparalleled visibility into system health, pod scaling, and latency bottlenecks. • Drive the infrastructure security posture at an architectural level. Embed advanced container security, zero-trust network segmentation, and automated compliance policies directly into our deployment pipelines and runtime environments. • Serve as a strategic partner and consultant to development teams. Advocate for an 'SRE culture' by designing self-service tooling, establishing 'paved roads' for developers, and reducing operational toil across the entire engineering org. • Act as an Incident Commander during complex, high-severity outages. Drive blameless post-mortems and engineer long-term, systemic, and architectural fixes to ensure classes of failures never repeat. • Act as a technical mentor to senior and mid-level engineers. Raise the baseline of engineering excellence across the company by coaching, documenting best practices, and leading by example.
Job Requirements
- 8+ years of experience in Site Reliability, Platform, or DevOps engineering, with a proven track record of operating at a Staff, Principal, or Lead level to drive organization-wide infrastructure initiatives.
- You possess deep software engineering skills (beyond scripting) and can architect complex, production-quality systems. You design clean interfaces, build maintainable tooling, and can dictate the technical direction of our internal toolchain. Language agnostic, but highly proficient in at least one modern language (e.g., Go, Python).
- Deep, architectural understanding of Kubernetes in multi-tenant and multi-cluster production environments. You possess expert-level knowledge of Jsonnet and Grafana Tanka for managing complex, scalable Kubernetes configurations and application packaging.
- Extensive experience architecting sophisticated CI/CD pipelines and GitOps workflows using GitHub Actions, ArgoCD, and infrastructure-as-code principles at an enterprise scale.
- Systems-level thinking with the ability to design architectures that span self-hosted, on-premises, VMware-based, and air-gapped deployment models.
- Deep expertise with observability platforms (Grafana stack preferred) and a proven ability to design alerting and monitoring strategies for complex distributed systems.
- Strong background in infrastructure security architecture, including container hardening, network security, vulnerability management, and delivering software to heavily regulated or customer-managed environments.
- Exceptional communication and stakeholder management skills. You have a service-oriented mindset, but you also have the ability to influence cross-functional leadership, negotiate reliability tradeoffs, and align engineering teams behind a unified technical vision.
Benefits
- Comprehensive medical, dental, and vision benefits, plus savings plans—coverage starts on day one!
- Access to company-paid counseling, coaching, and resources for you and your family through Spring Health.
- Plan for your future with a 401(k)-retirement savings plan featuring a company match.
- Take the time you need with unlimited vacation and dedicated health & wellness days. SimSpace provides flexible solutions to meet the diverse work-life needs of team members.
- Paid leave plans to support you and your loved ones during life’s most important moments.
- Equity stock options at hire, with annual performance-based grants—become an invested stakeholder in our shared success.
- Earn $1,500–$3,500 for every qualified hire through our employee referral program.
- Full- and partial-subsidized membership plans and equipment discounts to help you reach your personalized fitness goals.
- Access a LinkedIn Learning membership to prioritize your personal and professional development.
- Monthly reimbursements for meaningful connections with teammates through our SocialSpace Community.
- Legal plan coverage, pet insurance, wellness reimbursements, and more to simplify life’s details.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior Site Reliability Engineer
JMS Technical SolutionsWe are an equal-opportunity employer. We do not discriminate in hiring or employment against any individual based on race, color, gender, national origin, ancestry, religion, physical or mental disability, age, veteran status, sexual orientation, gender identity or expression, marital status, pregnancy, citizenship, or any other factor protected by anti-discrimination laws.
Role Description Our client, a leading network automation solutions company, is seeking a highly skilled Senior Site Reliability Engineer (Cloud Engineering) to join their growing team. This is a remote/full-time/contract position with a salary based on experience: Up to $70-$80/HR. This is an exciting opportunity to help build, support, and scale a cutting-edge SaaS platform focused on cloud infrastructure, Kubernetes, and automation technologies. You will serve as a senior technical contributor responsible for supporting production environments, improving infrastructure automation, enhancing CI/CD processes, and driving operational excellence across customer deployments. This role requires a proactive engineer who thrives in fast-paced cloud-native environments and enjoys solving complex infrastructure and reliability challenges. - Operate, maintain, and optimize cloud environments in AWS, including EKS, EC2, RDS, IAM, networking, and related services. - Manage and support production Kubernetes environments using Helm, Kubernetes manifests, and infrastructure automation tools. - Design, improve, and maintain CI/CD pipelines using tools such as GitHub Actions, Terraform, and Ansible. - Monitor platform health and system reliability using observability tools, including Prometheus, Grafana, Loki, Datadog, and ELK. - Troubleshoot complex application, infrastructure, networking, and Kubernetes-related issues across distributed systems. - Support escalations involving AKS and legacy on-premises customer environments when needed. - Collaborate cross-functionally with Cloud Operations, Engineering, and Product teams to deliver scalable and reliable platform solutions. Qualifications - 5+ years of experience in Site Reliability Engineering, DevOps, Cloud Engineering, or related infrastructure roles. - Strong hands-on experience with AWS cloud services, particularly EKS, EC2, IAM, VPC networking, and RDS. - Production experience managing Kubernetes environments and deploying workloads using Helm. - Experience with Infrastructure-as-Code tools such as Terraform and automation/configuration tools like Ansible. - Strong experience building and maintaining CI/CD pipelines using GitHub Actions, Jenkins, CircleCI, or similar tools. - Applicants must be authorized to work in the U.S. Company Description We are an equal-opportunity employer. We do not discriminate in hiring or employment against any individual based on race, color, gender, national origin, ancestry, religion, physical or mental disability, age, veteran status, sexual orientation, gender identity or expression, marital status, pregnancy, citizenship, or any other factor protected by anti-discrimination laws.
DevSecOps Engineer
Blueprint TechnologiesBlueprint Technologies, LLC is an equal employment opportunity employer. Qualified applicants are considered without regard to race, color, age, disability, sex, gender identity or expression, orientation, veteran/military status, religion, national origin, ancestry, marital, or familial status, genetic information, citizenship, or any other status protected by law. If you need assistance or a reasonable accommodation to complete the application process, please reach out to: recruiting@bpcs.com This role is fully remote and part-time (25 hours per week).
Role Description We are looking for a DevSecOps Engineer to join us as we build cutting-edge technology solutions! This is your opportunity to be part of a team that is committed to delivering best in class service to our customers. In this role, you will support secure cloud infrastructure, deployment automation, and operational reliability initiatives for enterprise analytics platforms and applications. You’ll help improve scalability, automation, monitoring, and security posture across development and production environments. Responsibilities - Build and maintain CI/CD pipelines and automation workflows - Support cloud infrastructure and infrastructure-as-code initiatives - Implement security monitoring and vulnerability remediation - Manage containerized workloads and orchestration environments - Support deployment, monitoring, and incident response activities - Collaborate with development teams to streamline release processes - Maintain operational and security documentation Qualifications - Bachelor’s degree in Computer Science, Engineering, or related field - 5+ years of DevOps or DevSecOps experience - Experience with AWS or comparable cloud platforms - Experience with Docker, Kubernetes, or OpenShift - Strong scripting and automation experience Preferred Qualifications - Experience with Terraform, Jenkins, ArgoCD, or GitHub Actions - Familiarity with cloud security and compliance frameworks - Experience supporting analytics or data platforms Salary Range At Blueprint, we strive to offer competitive pay that reflects the value of our team members. Compensation for this role is influenced by a variety of factors, including skills, education, responsibilities, experience, and geographic market. For candidates based in Washington State, the anticipated salary range is $86,000 to $90,000 annually. Please note that we typically do not hire new employees at the top of the posted range. Actual starting pay will be determined based on experience, skills, and internal equity. The final salary and job title may vary depending on the selected candidate’s qualifications and could fall outside the stated range. Equal Opportunity Employer Blueprint Technologies, LLC is an equal employment opportunity employer. Qualified applicants are considered without regard to race, color, age, disability, sex, gender identity or expression, orientation, veteran/military status, religion, national origin, ancestry, marital, or familial status, genetic information, citizenship, or any other status protected by law. If you need assistance or a reasonable accommodation to complete the application process, please reach out to: recruiting@bpcs.com Benefits - Medical, dental, and vision coverage - Flexible Spending Account - 401k program - Competitive PTO offerings - Parental Leave - Opportunities for professional growth and development Location Remote
Senior DevOps Engineer – Infrastructure
Button, Inc.A software development startup, Button provides “reliable, meaningful” digital services for clients in government and private industry. The venture-backed c
• Build, maintain, and evolve Button’s platform to ensure scalability, stability, and operability • Partner with engineers on Core and Infrastructure teams for coherent design • Provide and maintain a self-service platform for Product Engineering • Expand system instrumentation and tooling with monitoring, alerting, logging, and tracing • Build, improve, maintain, and support business-critical systems • Manage and monitor production serving environment
Manager II, Engineering – Site Reliability Engineering
DatadogDatadog provides cloud-scale monitoring and security for metrics, traces and logs in one unified platform.
• Lead and mentor engineering managers • Contribute to and advance the vision for reliability • Guide teams in defining and executing roadmaps • Build cross-functional partnerships across engineering, security, and product teams • Champion a solutions-oriented approach and drive risk mitigation efforts


