Stellar Cyber logo
Stellar Cyber

Empowering lean security operations teams of any skill to successfully secure their environments. WE ARE HIRING!

Senior DevOps Engineer/Site Reliability Engineer

Location

United States

Posted

1 day ago

Salary

$165K - $215K / year

Seniority

Senior

No structured requirement data.

Job Description

Senior DevOps Engineer/Site Reliability Engineer

Stellar Cyber

Role Description We are seeking a highly skilled Senior DevOps / Site Reliability Engineer (SRE) to join our globally distributed engineering organization. This is a hands-on senior-level role focused on building, operating, and scaling reliable cloud-native infrastructure and distributed data platforms. The ideal candidate will have strong expertise in Kubernetes, cloud infrastructure, observability, automation, CI/CD, incident management, and infrastructure reliability. This role combines DevOps engineering practices with SRE principles to improve scalability, resiliency, operational efficiency, and platform performance across production environments. The engineer will work closely with platform, development, and operations teams to drive automation, operational excellence, and reliability best practices for mission-critical systems. Key Responsibilities - Administer and maintain Kubernetes clusters and containerized workloads. - Manage cloud infrastructure across OCI, AWS, GCP, or Azure environments. - Develop and maintain CI/CD pipelines for reliable application deployments. - Implement and manage Infrastructure as Code (IaC) using Terraform and Helm. - Build automation tooling and operational workflows using Python, Go, or Bash. - Drive observability initiatives including monitoring, logging, tracing, and alerting improvements. - Monitor, troubleshoot, and resolve production incidents while participating in on-call rotations. - Support and optimize distributed data platforms including Kafka, Elasticsearch, Spark, Redis, and MongoDB. - Improve platform reliability, scalability, and operational efficiency using SRE best practices. - Collaborate with cross-functional teams across multiple time zones. - Perform Linux system administration and networking troubleshooting. - Contribute to incident response processes, postmortems, and reliability improvements. - Support GitOps and deployment workflows using tools such as ArgoCD and GitHub Actions. - Evaluate and implement AI-assisted operational tooling for auto-remediation, alert correlation, and operational intelligence. Qualifications - 5+ years of experience in DevOps, SRE, or Platform Engineering roles. - Strong expertise with Kubernetes, Docker, and container orchestration. - Hands-on experience managing production cloud environments. - Strong Infrastructure as Code experience with Terraform and Helm. - Experience with CI/CD tools and deployment automation. - Advanced troubleshooting skills in Linux systems, networking, and distributed systems. - Experience with observability platforms including Prometheus, Grafana, Loki, Alertmanager, and Elastic Stack. - Strong programming and scripting skills in Python, Bash, or Go. - Experience supporting high-availability production systems and on-call operations. - Knowledge of incident management and reliability engineering practices. - Familiarity with data platform technologies such as Kafka, Spark, Elasticsearch, Redis, or MongoDB. - Understanding of AI-driven operational tooling and automated remediation concepts. - Excellent communication, collaboration, and problem-solving skills. - Resides on the East Coast. Benefits - Pre-IPO Stock Options - Medical, Dental & Vision care - 401(k) - Employee Assistance Program - Employee Discount Program - Life Insurance - Paid time off - Referral Program - Rewards and Recognition Program Compensation The base compensation range for this role is USD 165,000-215,000 per year. Total compensation includes bonus opportunity and equity, and will vary based on candidate location.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

BrightOrder Inc. logo

Full Stack Developer – DevOps, Cloud Systems

BrightOrder Inc.

Fleet Maintenance, Transportation and Logistics Management Software on the Cloud!

Full TimeRemoteTeam 51-200Since 1996H1B No Sponsor

• Design, develop, and maintain full stack applications, backend services, APIs, and internal tools. • Write clean, maintainable code using Python, JavaScript/TypeScript, and Go (Golang). • Build and enhance frontend or internal tooling using React.js or similar frameworks. • Develop RESTful APIs, integrations, and service-to-service communication. • Build automation scripts and internal tools to improve deployments, monitoring, and troubleshooting. • Develop and maintain CI/CD pipelines for reliable releases. • Configure and manage AWS services, including ECS, EKS, Lambda, EC2, RDS, S3, and CloudWatch. • Optimize and maintain databases, including PostgreSQL and Amazon Aurora. • Use RabbitMQ or similar tools for inter-service communication. • Containerize and deploy applications using Docker and Kubernetes/EKS. • Use infrastructure as code tools such as Terraform or CloudFormation. • Build and maintain monitoring, logging, alerts, dashboards, and observability practices using tools such as New Relic, OpenTelemetry, CloudWatch, Prometheus, or Grafana to improve system reliability, performance visibility, and troubleshooting. • Collaborate with Product, QA, DevOps, and development teams to deliver secure, scalable releases. • Use tools such as Jira Service Management/Jira Service Desk to support issue tracking, escalation workflows, production support, and cross-functional communication with QA, Customer Support, and Development teams. • Support production troubleshooting, root cause analysis, and system reliability improvements. • Use AI tools such as Claude, ChatGPT, GitHub, Copilot, or similar platforms to support development, testing, documentation, and troubleshooting.

Canada
Zscaler logo

Sr. Staff Site Reliability Engineer-Federal, Security Clearance

Zscaler

We make it easy to secure your cloud transformation. Get fast, secure, and direct access to apps without appliances.

Full TimeRemoteTeam 5,001-10,000Since 2008H1B Sponsor

About Zscaler Zscaler accelerates digital transformation to ensure our customers can be more agile, efficient, resilient, and secure. As an AI-forward enterprise, we are constantly pushing the envelope, leveraging the world’s largest security data lake to power our cloud-native Zero Trust Exchange platform. This innovation protects our customers from cyberattacks and data loss by securely connecting users, devices, and applications in any location. Here, impact in your role matters more than title and trust is built on results. We say, impact over activity. We seek innovators who actively use AI to amplify their impact and who thrive in an environment where we leverage intelligent systems to stay ahead of evolving threats. We believe in transparency and value constructive, honest debate—we’re focused on getting to the best ideas, faster. We build high-performing teams that can make an impact quickly and with high quality. To do this, we are building a culture of execution centered on customer obsession, collaboration, ownership, and accountability. We value high-impact, high-accountability with a sense of urgency where you’re enabled to do your best work and embrace your potential. If you’re driven by purpose, thrive on solving complex challenges, and want to be part of the team that’s helping to secure the AI age, we invite you to bring your talents to Zscaler and help shape the future of cybersecurity. Role We are looking for a Sr. Staff Site Reliability Engineer (Federal) to join our Government Cloud team. This is a fully onsite role based in Crystal City, Virginia, reporting to the Manager, Site Reliability Engineering. You will join the team responsible for building the world’s largest cloud security platform, enabling organizations worldwide to harness speed and agility. In this critical role, you will maintain our commitment to security by managing operations within classified environments, ensuring our multitenant architecture remains the leader in cloud security. What you’ll do (Role Expectations) - Manage operational tasks for products in US Government classified environments, including deployments, on-call duties, incident management, and participation in regular deployment syncs - Oversee all cloud infrastructure components such as AWS, private cloud environments, containers, and VMs to ensure stability and scalability - Develop scripts, containerized services, and monitoring mechanisms to automate operations tasks and minimize service disruption - Build new and enhance existing services within classified environments while driving DevOps best practices through documentation and escalation management - Provide 24x7 coverage including night and holiday shifts within a SCIF environment to support critical government missions Who You Are (Success Profile) - You thrive in ambiguity. You’re comfortable building the path as you walk it. You thrive in a dynamic environment, seeing ambiguity not as a hindrance, but as the raw material to build something meaningful. - You act like an owner. Your passion for the mission fuels your bias for action. You operate with integrity because you genuinely care about the outcome. True ownership involves leveraging dynamic range: the ability to navigate seamlessly between high-level strategy and hands-on execution. - You are a problem-solver. You love running towards the challenges because you are laser-focused on finding the solution, knowing that solving the hard problems delivers the biggest impact. - You are a high-trust collaborator. You are ambitious for the team, not just yourself. You embrace our challenge culture by giving and receiving ongoing feedback—knowing that candor delivered with clarity and respect is the truest form of teamwork and the fastest way to earn trust. - You are a learner. You have a true growth mindset and are obsessed with your own development, actively seeking feedback to become a better partner and a stronger teammate. You love what you do and you do it with purpose. What We’re Looking for (Minimum Qualifications) - Active Secret Security Clearance with the ability to maintain it throughout employment - Bachelor’s degree in Computer Science or a related field with 7+ years of Site Reliability Engineering experience in both Operations and Engineering environments - Proficiency in Linux administration, network troubleshooting, and automation tools like Ansible and Terraform - Strong technical skills in Python coding and container-based architectures including AWS ECS and Kubernetes - Experience in monitoring activities such as vulnerability scanning, patch management, and reporting, with expertise in virtualization, web security, and networking protocols What Will Make You Stand Out (Preferred Qualifications) - Experience working within air-gapped and classified environments managing monthly monitoring programs - Familiarity with High/Moderate FedRAMP authorization levels and compliance requirements - Possession of Information Assurance Technician Level 2 certification or Top Secret security clearance #LI-Onsite #LI-YC2 Zscaler’s salary ranges are benchmarked and are determined by role and level. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations and could be higher or lower based on a multitude of factors, including job-related skills, experience, and relevant education or training. The base salary range listed for this full-time position excludes commission/ bonus/ equity (if applicable) + benefits. Base Pay Range $140,000—$200,000 USD At Zscaler, we are committed to building a team that reflects the communities we serve and the customers we work with. We foster an inclusive environment that values all backgrounds and perspectives, emphasizing collaboration and belonging. Join us in our mission to make doing business seamless and secure. Our Benefits program is one of the most important ways we support our employees. Zscaler proudly offers comprehensive and inclusive benefits to meet the diverse needs of our employees and their families throughout their life stages, including: - Various health plans - Time off plans for vacation and sick time - Parental leave options - Retirement options - Education reimbursement - In-office perks, and more! Learn more about Zscaler’s Future of Work strategy, hybrid working model, and benefits here. By applying for this role, you adhere to applicable laws, regulations, and Zscaler policies, including those related to security and privacy standards and guidelines. Zscaler is committed to providing equal employment opportunities to all individuals. We strive to create a workplace where employees are treated with respect and have the chance to succeed. All qualified applicants will be considered for employment without regard to race, color, religion, sex (including pregnancy or related medical conditions), age, national origin, sexual orientation, gender identity or expression, genetic information, disability status, protected veteran status, or any other characteristic protected by federal, state, or local laws. See more information by clicking on the Know Your Rights: Workplace Discrimination is Illegal link. Pay Transparency Zscaler complies with all applicable federal, state, and local pay transparency rules. Zscaler is committed to providing reasonable support (called accommodations or adjustments) in our recruiting processes for candidates who are differently abled, have long term conditions, mental health conditions or sincerely held religious beliefs, or who are neurodivergent or require pregnancy-related support.

Virginia
$140K - $200K / year
Inspired Testing logo

DevOps Engineer, GCP

Inspired Testing

Highly skilled professionals with attention to detail, using latest technologies, to ensure usability and reliability.

Full TimeRemoteTeam 201-500H1B No Sponsor

• Ensure reliability, uptime, and performance across GCP environments. • Implement SRE and DevOps best practices with strong focus on automation and scalability. • Build and optimize CI/CD pipelines using GCP-native tools. • Lead observability initiatives using Grafana, Prometheus, Stackdriver. • Troubleshoot production incidents and deliver root-cause fixes. • Apply Infrastructure as Code (Terraform, Deployment Manager). • Partner with cross-functional teams to maintain platform stability. • Champion a proactive, blameless incident management culture. • Drive continuous improvement through emerging cloud and automation technologies.

United Kingdom

Role Description In your role as DevOps/ Infrastructure Admin, you will: - Analyse, plan, and develop infrastructures for complex projects - Be responsible for IT administration tasks - Handle maintenance and continuous improvement of monitoring environments for software systems - Design, implement, and maintain CI/CD pipelines to inform testers and developers about deployment status - Analyse and resolve issues across the system landscape - Design IT architectures, including evaluating software solutions and conceptualising new systems - Solve critical problems affecting highly available development, test, and production systems Qualifications - A degree in a technical field (e.g. Computer Science or equivalent experience) - Experience building and automating complex environments using Infrastructure as Code - Strong knowledge of backend technologies (Docker, Kubernetes) - Experience with build tools and continuous delivery systems (e.g. Bitbucket Pipelines, Azure DevOps Server) - Excellent teamwork and communication skills in English and German - An independent, structured working style and interest in dynamic team environments - Valid work permit in the EU Benefits - A motivated and innovative team with flat hierarchies and open communication - Full flexibility through 100% remote work - Flexible working hours - A wide variety of tasks in an innovative, future-oriented industry - Strong development opportunities and the prospect of a permanent position - An inclusive and supportive company culture

Germany