Job Closed

This listing is no longer active.

Optum, part of the UnitedHealth Group family of businesses, is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together. At Optum, we support your well-being with an understanding team, extensive benefits and rewarding opportunities. By joining us, you’ll have the resources to drive system transformation while we help you take care of your future. We recognize the power of connection to drive change, improve efficiency and make a difference in health care. Join a team where your skills and ideas can make an impact and where collaboration is key to creating technology that produces healthier outcomes.

Systems Management Analyst, Azure/DevOps - Remote

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 160,000Since 2011Company Site

Location

Tennessee

Posted

99 days ago

Salary

$72.8K - $130K / year

Seniority

Senior

Bachelor Degree9 yrs expEnglishArm Azure Devops Bash Bicep CI/CD Git Itil Azure Powershell Python Servicenow SQL Terraform

Job Description

Requisition Number: 2343918 Optum Tech is a global leader in health care innovation. Our teams develop cutting-edge solutions that help people live healthier lives and help make the health system work better for everyone. From advanced data analytics and AI to cybersecurity, we use innovative approaches to solve some of health care's most complex challenges. Your contributions here have the potential to change lives. Ready to build the next breakthrough? Join us to start Caring. Connecting. Growing together. You'll enjoy the flexibility to work remotely * from anywhere within the U.S. as you take on some tough challenges. For all hires in the Minneapolis or Washington, D.C. area, you will be required to work in the office a minimum of four days per week. Primary Responsibilities: - System Monitoring & Maintenance: - Monitor servers, applications, and network systems for performance and availability - Perform routine maintenance, updates, and patches to ensure system stability and security - Performance Analysis & Optimization: - Analyze system logs and metrics to identify bottlenecks and recommend improvements - Implement tuning and optimization strategies for hardware and software systems - Incident & Problem Management: - Respond to system alerts and troubleshoot issues promptly - Document root cause analysis and implement preventive measures - Automation & Process Improvement: - Develop scripts and automation tools to streamline system management tasks - Recommend and implement best practices for system administration and monitoring - Compliance & Security: - Ensure systems comply with organizational security policies and regulatory standards - Apply patches and updates to mitigate vulnerabilities - Collaboration & Support: - Work closely with application teams, network engineers, and security teams - Provide technical support and guidance to internal stakeholders You'll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in. Required Qualifications: - Experience with Microsoft Azure cloud platform, including: - Hands-on experience with Infrastructure as Code (IaC) using ARM, Bicep, or Terraform - Solid knowledge of Azure architecture, governance, and core services - Experience with database administration, including: - Writing and optimizing T SQL queries - Understanding of relational database concepts and troubleshooting - Experience with ITSM/Incident Management (e.g. ServiceNow), including: - Creating, managing, and resolving Incidents, Problems, and Changes - Understanding of ITIL processes and SLAs - Accurate documentation of technical issues and root cause analysis - Working within enterprise ticketing queues and prioritizing according to severity/impact - Understanding of DevOps methodologies, including: - Source control using GitHub or Azure DevOps - CI/CD pipeline development and automation best practices - Proficiency in scripting languages, such as: - Python - Bash - AI Skills: - All resources are expected to demonstrate baseline proficiency in enterprise-approved AI tools as part of their day-to-day responsibilities. This includes, but is not limited to: - Consistent Use: Maintain a minimum of 90% weekly usage of AI tools such as GitHub Copilot, Microsoft 365 Copilot, and other GenAI platforms approved by the enterprise - Applied Productivity: Leverage AI tools to enhance coding, documentation, data analysis, and decision-making workflows - Continuous Learning: Stay current with evolving AI capabilities and features, and apply them to improve delivery quality and velocity Preferred Qualifications: - Bachelor's degree in Computer Science, Information Technology, or related field (or equivalent experience) - Relevant certifications such as Microsoft Certified, Red Hat Certified Engineer (RHCE), VMware VCP, or ITIL - 6+ years of relevant experience in system management or infrastructure engineering - Proficiency in scripting languages, such as PowerShell *All employees working remotely will be required to adhere to UnitedHealth Group's Telecommuter Policy Pay is based on several factors including but not limited to local labor markets, education, work experience, certifications, etc. In addition to your salary, we offer benefits such as, a comprehensive benefits package, incentive and recognition programs, equity stock purchase and 401k contribution (all benefits are subject to eligibility requirements). No matter where or when you begin a career with us, you'll find a far-reaching choice of benefits and incentives. The salary for this role will range from $72,800 to $130,000 annually based on full-time employment. We comply with all minimum wage laws as applicable. Application Deadline: This will be posted for a minimum of 2 business days or until a sufficient candidate pool has been collected. Job posting may come down early due to volume of applicants. At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone-of every race, gender, sexuality, age, location and income-deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes - an enterprise priority reflected in our mission. UnitedHealth Group is an Equal Employment Opportunity employer under applicable law and qualified applicants will receive consideration for employment without regard to race, national origin, religion, age, color, sex, sexual orientation, gender identity, disability, or protected veteran status, or any other characteristic protected by local, state, or federal laws, rules, or regulations. UnitedHealth Group is a drug-free workplace. Candidates are required to pass a drug test before beginning employment.

Benefits

401(K), Dental insurance, Disability insurance, Employee stock purchase plan, Family medical leave, Flexible Spending Account (FSA), Generous parental leave, Generous PTO, Health insurance, Job training & conferences, Life insurance, Charitable contribution matching, Paid holidays, Paid sick days, Performance bonus, Tuition reimbursement, Vision insurance, Mental health benefits, Personal development training, Bereavement leave benefits

Related Categories

DevOps Engineer

Related Job Pages

DevOps Engineer Jobs in Tennessee Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

DevOps Engineer

BlackLine

BlackLine is a leading global provider of cloud software that controls and automates accounting and finance processes for businesses and organizations of all si

DevOps Engineer99 days ago

Full Time Remote

Company Site

Get to Know Us: It's fun to work in a company where people truly believe in what they're doing! At BlackLine, we're committed to bringing passion and customer focus to the business of enterprise applications. Since being founded in 2001, BlackLine has become a leading provider of cloud software that automates and controls the entire financial close process. Our vision is to modernize the finance and accounting function to enable greater operational effectiveness and agility, and we are committed to delivering innovative solutions and services to empower accounting and finance leaders around the world to achieve Modern Finance. Being a best-in-class SaaS Company, we understand that bringing in new ideas and innovative technology is mission critical. At BlackLine we are always working with new, cutting edge technology that encourages our teams to learn something new and expand their creativity and technical skillset that will accelerate their careers. WiseLayer by BlackLine is transforming finance and accounting with AI-powered agents like Angela (accruals.ai), Dennis (discrepancies.ai), and other task-specific agents used by top companies. Our AI simplifies complex tasks, delivering real business value. WiseLayer was recently acquired by BlackLine, Inc. (Nasdaq: BL), the future-ready platform for the Office of the CFO. Work, Play and Grow at BlackLine! Make Your Mark: As a DevOps Engineer, you will be responsible for building, securing, and operating the systems that power WiseLayer's production environment. You will play a central role in executing our cloud migration roadmap, improving system reliability, and ensuring our platform meets enterprise-grade security and compliance expectations. You'll Get To: - Lead execution of GCP migration and modernization initiatives, including IAM, networking, and infrastructure-as-code - Own CI/CD pipelines, deployment workflows, and environment standardization - Implement and maintain observability, logging, and alerting across all services - Drive security hardening initiatives including key rotation, access controls, and vulnerability remediation - Support data privacy and compliance requirements by improving data handling, retention, and access controls - Partner with engineering teams to improve reliability, scalability, and release velocity - Support migration and optimization of cloud services (compute, storage, messaging, databases) - Design and maintain disaster recovery and business continuity processes What You'll Bring: - Infrastructure as Code (IaC): Proven experience with at least 3+ years in IaC tools such as Terraform, Ansible, or AWS CloudFormation to automate the provisioning, configuration, and management of infrastructure, ensuring consistency and reliability in our deployments. - CI/CD Pipeline Management: Demonstrated ability with 3+ years of experience in designing, implementing, and maintaining CI/CD pipelines using tools like Jenkins, GitLab CI, or CircleCI. You will automate the build, test, and deployment processes to enable faster and more reliable software delivery. - Cloud Computing Platforms: Strong knowledge, with a minimum of 3 years of experience, on at least one major cloud platform such as AWS, Azure, or Google Cloud. You will be responsible for managing and optimizing these environments for high availability and scalability. - Containerization and Orchestration: Over 3+ years of hands-on experience with containerization technologies like Docker and container orchestration platforms like Kubernetes. This is essential for packaging, deploying, and managing applications in a consistent and scalable manner. - Monitoring and Logging: 3+ years of expertise in setting up and managing monitoring and logging solutions using tools like Prometheus, Grafana, or the ELK Stack. You will track system performance, availability, and security to proactively identify and resolve issues. - Scripting and Automation: At least 3+ years of proficiency in scripting languages such as Python, Bash, or PowerShell to automate repetitive tasks and create custom tools. This skill is crucial for improving efficiency and reducing manual errors in our operations. We're Even More Excited If You Have: - Experience with DevSecOps Principles: Familiarity with integrating security practices within the CI/CD pipeline. Experience with security scanning tools (e.g., SonarQube, trivy, or similar) and a mindset of "shifting security left" to identify and address vulnerabilities early in the development lifecycle is highly desirable. - Knowledge of Database Management: Understanding of database administration, performance tuning, and scaling for both SQL (e.g., PostgreSQL, MySQL) and NoSQL (e.g., MongoDB, Redis) databases. The ability to automate database provisioning and backups is a significant plus. - Advanced Networking Concepts: A solid grasp of networking fundamentals, including VPCs, subnets, routing, and firewalls in a cloud environment. Experience with software-defined networking (SDN) and service mesh technologies like Istio or Linkerd would be a strong advantage. Thrive at BlackLine Because You Are Joining: - A technology-based company with a sense of adventure and a vision for the future. Every door at BlackLine is open. Just bring your brains, your problem-solving skills, and be part of a winning team at the world's most trusted name in Finance Automation! - A culture that is kind, open, and accepting. It's a place where people can embrace what makes them unique, and the mix of cultural backgrounds and varying interests cultivates diverse thought and perspectives. - A culture where BlackLiner's continued growth and learning is empowered. BlackLine offers a wide variety of professional development seminars and inclusive affinity groups to celebrate and support our diversity. BlackLine is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity or expression, race, ethnicity, age, religious creed, national origin, physical or mental disability, ancestry, color, marital status, sexual orientation, military or veteran status, status as a victim of domestic violence, sexual assault or stalking, medical condition, genetic information, or any other protected class or category recognized by applicable equal employment opportunity or other similar laws. BlackLine recognizes that the ways we work and the workplace itself have shifted. We innovate in a workplace that optimizes a combination of virtual and in-person interactions to maximize collaboration and nurture our culture. Candidates who live within a reasonable commute to one of our offices will work in the office at least 2 days a week. Salary Range: USD $136,000.00/Yr. - USD $170,000.00/Yr. Pay Transparency Statement: Placement within this range depends upon several factors, including the applicant's prior relevant job experience, skill set, and geographic location. In addition to base pay, BlackLine also offers short-term and long-term incentive programs, based on eligibility, along with a robust offering of benefit and wellness plans. BlackLine is committed to creating an inclusive and accessible experience for all candidates. If you require a reasonable accommodation that would better enable your success during the application or interview process, please complete this form. Accommodations: BlackLine is committed to creating an inclusive and accessible experience for all candidates. If you require a reasonable accommodation that would better enable your success during the application or interview process, please complete this form.

Ansible Aws Cloudformation Bash CircleCI Docker Elk Stack GCP Gitlab Ci Grafana Istio Jenkins Kubernetes Linkerd MongoDB MySQL PostgreSQL Powershell Prometheus Python Redis SQL Terraform

View details: DevOps Engineer

New York

$136K - $170K / year

Apply

Job Closed

Site Reliability Engineer

Everforth

Everforth Apex, a division of Everforth and formerly Apex Systems, an IT staffing and workforce solutions firm, provides recruiting and staffing services to lar

DevOps Engineer99 days ago

Other Hybrid

Company Site

Site Reliability Engineer Job Description: Job#: 3023447 Job Description: Site Reliability Engineer - SRE Location: Austin, Texas (Hybrid) Employment Type: Contract to Perm Role Overview Our organization is seeking a motivated Site Reliability Engineer (SRE) to join our dynamic Advisor Platform Engineering team. This role focuses on safeguarding the availability, performance, and scalability of our mission-critical, Azure-hosted platform. As an individual contributor, you will apply your expertise in cloud infrastructure, automation, and observability to maintain and enhance our systems, collaborating closely with Agile development teams to embed reliability principles throughout the application lifecycle. Key Responsibilities - Monitor, maintain, and optimize Azure infrastructure, ensuring the health, performance, and availability of IaaS, PaaS, and SaaS components. - Enhance observability by defining, measuring, and refining Service Level Indicators (SLIs) and Objectives (SLOs) using Azure Monitor, Application Insights, and Log Analytics (KQL). - Develop automation and tooling using scripting languages (PowerShell, Bash, Python) and potentially C#/.NET to eliminate manual tasks and improve efficiency. - Participate in a 24/7 on-call rotation, contributing to incident triage, mitigation, root cause analysis (RCA), and the implementation of preventive actions. - Collaborate with software development, QA, and other technology teams to ensure reliability, scalability, and performance requirements are met. - Contribute to capacity planning, load testing, and performance tuning initiatives across a .NET and React micro-service architecture. - Create and maintain clear documentation for systems, processes, and runbooks. - Troubleshoot and support integrations with third-party systems via APIs and SSO implementations. Required Qualifications Education: A Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field, or equivalent practical experience is required. Experience: This position requires 2-5 years of experience in Site Reliability Engineering, DevOps, Cloud Operations/Engineering, Systems Administration, or Software Engineering with a strong operational focus. Hands-on experience with platform development, automation strategies, and monitoring strategies within Azure is necessary. Familiarity with SQL and the .NET stack is also required for effective application troubleshooting. Technical Skills: - Hands-on experience managing and troubleshooting production workloads on Microsoft Azure (IaaS & PaaS). - Proficiency with Azure monitoring tools (Azure Monitor, Application Insights, Log Analytics) and KQL query language. - Solid scripting skills for automation (e.g., PowerShell, Bash, Python). - Experience with CI/CD concepts and tools, particularly Azure DevOps pipelines. - Proficiency with Git workflows and platforms like Azure Repos or GitHub. - Solid understanding of networking concepts (TCP/IP, DNS, HTTP/HTTPS, TLS, Load Balancing, Firewalls). Preferred Qualifications - A basic understanding or development experience with C#/.NET applications. - Experience with Infrastructure as Code (IaC) tools like ARM templates, Bicep, or Terraform. - Familiarity with containerization technologies (Docker) and orchestration (Kubernetes, Azure Container Apps). - Experience supporting relational (e.g., Azure SQL) and NoSQL (e.g., Cosmos DB) databases. - Basic familiarity with modern JavaScript front-end technologies like React/Typescript. - Experience working in Agile development environments or within the financial services industry. - Microsoft Certified: Azure Administrator Associate (AZ-104) or DevOps Engineer Expert (AZ-400). Work Environment This is a hybrid position requiring on-site work in South Austin, TX, from Monday to Thursday, with an option for remote work on Fridays. The role includes participation in a 24/7 on-call rotation to support critical production releases and incidents outside of standard business hours. This employer is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or protected veteran status and will not be discriminated against on the basis of disability. Apex uses a virtual recruiter as part of the application process. Click here for more details. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Benefits Department at [email protected] or 804-523-8228. Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing in Talent Satisfaction in the United States and Great Place to Work in the United Kingdom and Mexico. Apex uses a virtual recruiter as part of the application process. Click here for more details. Apex Benefits Overview: Apex offers a range of supplemental benefits, including medical, dental, vision, life, disability, and other insurance plans that offer an optional layer of financial protection. We offer an ESPP (employee stock purchase program) and a 401K program which allows you to contribute typically within 30 days of starting, with a company match after 12 months of tenure. Apex also offers a HSA (Health Savings Account on the HDHP plan), a SupportLinc Employee Assistance Program (EAP) with up to 8 free counseling sessions, a corporate discount savings program and other discounts. In terms of professional development, Apex hosts an on-demand training program, provides access to certification prep and a library of technical and leadership courses/books/seminars once you have 6+ months of tenure, and certification discounts and other perks to associations that include CompTIA and IIBA. Apex has a dedicated customer service team for our Consultants that can address questions around benefits and other resources, as well as a certified Career Coach. You can access a full list of our benefits, programs, support teams and resources within our 'Welcome Packet' as well, which an Apex team member can provide. Employee Type: Contract Location: Austin, TX, US Job Type: Engineering and Technicians Date Posted: March 30, 2026 Pay Range: $60 - $70 per hour Similar Jobs - Site Reliability Engineer - Site Reliability Engineer - SRE Engineer - NC - Software Developer/SRE - Manufacturing - Reliability Engr 4

View details: Site Reliability Engineer

Texas

$60 - $70 / hour

Apply

Senior Site Reliability Engineer

Zeta Global

We deliver better experiences for consumers and better results for your brand.

DevOps Engineer99 days ago

Full Time RemoteTeam 1,001-5,000Since 2007H1B Sponsor

Company Site LinkedIn

WHO WE ARE Zeta Global (NYSE: ZETA) is the AI-Powered Marketing Cloud that leverages advanced artificial intelligence (AI) and trillions of consumer signals to make it easier for marketers to acquire, grow, and retain customers more efficiently. Through the Zeta Marketing Platform (ZMP), our vision is to make sophisticated marketing simple by unifying identity, intelligence, and omnichannel activation into a single platform – powered by one of the industry’s largest proprietary databases and AI. Our enterprise customers across multiple verticals are empowered to personalize experiences with consumers at an individual level across every channel, delivering better results for marketing programs. Zeta was founded in 2007 by David A. Steinberg and John Sculley and is headquartered in New York City with offices around the world. To learn more, go to www.zetaglobal.com. The Role We’re looking for an experienced Senior Site Reliability Engineer (SRE) who can write production-grade code, have mastery of SLIs, SLOs, and error budgets, and are passionate about building scalable observability systems. If you: - Can code confidently in Python or Golang and solve real-world problems through automation. (not only scripting) - Have hands-on experience implementing SLIs, SLOs, and distributed tracing in production. - Understand Kubernetes, Terraform, and Infrastructure as Code tools. - Have hands-on experience with Chaos Engineeringand anomaly detection. - Are excited about working with high-throughput, distributed systems processing millions of transactions daily… Then this role might be for you! Key Responsibilities: - Design, implement, and manage SLOs, SLIs, and error budgets, ensuring reliability aligns with user expectations and business objectives. - Develop production-grade software to enhance system reliability and reduce manual toil through automation. - Implement and optimize observabilitysolutionsusing tools like OpenTelemetry, with a focus on high-cardinality metrics, distributed tracing, and actionable insights. - Drive postmortem processes and lead in-depth root cause analyses for incidents, ensuring lessons learned are effectively applied to prevent recurrence. - Define and monitor MTTx metrics (MTTA, MTTR, MTTF), using them to guide system improvements and measure reliability progress. - Design and participate in Chaos Engineering exercises. - Collaborate with engineering teams to design systems with reliability and scalability in mind, incorporating capacity planning, resiliency patterns, and modern deployment strategies (e.g., Canary, Blue-Green). - Lead design reviews for alerting strategies, ensuring effective signal-to-noise ratios in monitoring and incident management. - Advocate for and implement best practices in incident response and system design to achieveoptimaluptime and performance. Your experience: Strong Coding Background: - 4+ years of experience as an SRE or in a similar role with hands-on coding. - 3+ years of software development experience in Python or Golang, with a focus on building maintainable, production-quality code. SRE Expertise: - Deep understanding of SRE principles, particularly SLIs, SLOs, error budgets, and their real-world application. - Hands-on experience conducting postmortems and implementing observability at scale. - Hands-on experience conducting chaos engineering exercises. Observability Skills: - Expertise in designing and implementing end-to-end observabilitysolutions using tools like OpenTelemetry, Prometheus, Grafana, or Honeycomb. - Experience with distributed tracing and handling high-cardinality metrics in production environments. Infrastructure Knowledge: - 3+ years of experience with AWS and proficiency in Kubernetes, Terraform, andInfrastructure as Code (IaC) tools. - Strong understanding of distributed systems, microservices architectures, and containerization (Docker, Kubernetes). Monitoring and Automation: - Hands-on experience with CI/CD platforms (GitOps, Jenkins, ArgoCD) and building automated pipelines. - Familiarity with tools and frameworks for incident management and operational automation. Additional Skills: - Knowledge of modern deployment strategies (e.g., Canary,Blue-Green) and resiliency patterns (e.g., circuit breakers, retries). - Strong analytical skills for statistical analysis of metrics to identify and resolve performance bottlenecks. BENEFITS & PERKS - Unlimited PTO - Excellent medical, dental, and vision coverage - Employee Equity and Stock Purchase Plan - Employee Discounts, Virtual Wellness Classes, and Pet Insurance And more!! COMPENSATION RANGE  The compensation range for this role is $140,000.00 - $170,000.00, depending on location and experience. PEOPLE & CULTURE AT ZETA Zeta considers applicants for employment without regard to, and does not discriminate on the basis of an individual’s sex, race, color, religion, age, disability, status as a veteran, or national or ethnic origin; nor does Zeta discriminate on the basis of sexual orientation, gender identity or expression. We’re committed to building a workplace culture of trust and belonging, so everyone feels invited to bring their whole selves to work. We provide a forum for employees to celebrate, support and advocate for one another. Learn more about our commitment to diversity, equity and inclusion here: https://zetaglobal.com/blog/a-look-into-zetas-ergs/   ZETA IN THE NEWS! https://zetaglobal.com/press/?cat=press-release #LI-YW1

Argocd CI/CD Docker Gitops Go Grafana Honeycomb Jenkins Kubernetes Opentelemetry Prometheus Python Terraform

View details: Senior Site Reliability Engineer

United States

$140K - $170K / year

Apply

Job Closed

Staff Site Reliability Engineer

Zscaler

We make it easy to secure your cloud transformation. Get fast, secure, and direct access to apps without appliances.

DevOps Engineer99 days ago

Full Time RemoteTeam 5,001-10,000Since 2008H1B Sponsor

Company Site LinkedIn

About Zscaler Zscaler accelerates digital transformation to ensure our customers can be more agile, efficient, resilient, and secure. As an AI-forward enterprise, we are constantly pushing the envelope, leveraging the world’s largest security data lake to power our cloud-native Zero Trust Exchange platform. This innovation protects our customers from cyberattacks and data loss by securely connecting users, devices, and applications in any location. Here, impact in your role matters more than title and trust is built on results. We say, impact over activity. We seek innovators who actively use AI to amplify their impact and who thrive in an environment where we leverage intelligent systems to stay ahead of evolving threats. We believe in transparency and value constructive, honest debate—we’re focused on getting to the best ideas, faster. We build high-performing teams that can make an impact quickly and with high quality. To do this, we are building a culture of execution centered on customer obsession, collaboration, ownership, and accountability. We value high-impact, high-accountability with a sense of urgency where you’re enabled to do your best work and embrace your potential. If you’re driven by purpose, thrive on solving complex challenges, and want to be part of the team that’s helping to secure the AI age, we invite you to bring your talents to Zscaler and help shape the future of cybersecurity. Role We are looking for a Staff Site Reliability Engineer to join our team. This role will report to the Senior Manager, Site Reliability Engineering and offers the flexibility of hybrid (3 days a week) out of San Jose, CA, or can be performed fully remote. As a key member of the Zero Trust Exchange team, you will be responsible for all aspects of the Zscaler production data center services, including servers, operating systems, storage, and supporting systems. You will be an instrumental part of the Site Reliability Engineering team, ensuring the availability, latency, performance, efficiency, and scalability of a cloud that processes tens of billions of transactions daily. What you’ll do (Role Expectations) - Own the reliability of a large-scale cloud service (Linux/BSD, bare metal, Kubernetes, custom load balancing, SD-WAN) by partnering with Engineering and Network teams to define requirements early, conduct operability reviews, and contribute code/design docs for platform resilience - Develop and operate end-to-end observability (metrics/logs/traces, dashboards, alerting) and incident tooling to manage SLOs/error budgets, reduce noise, and improve system detection and diagnosis - Participate in an on-call rotation to lead full-cycle incident response; perform deep cross-stack troubleshooting (OS, networking, distributed systems, packet captures, core dumps) to drive permanent software fixes and codify learnings into runbooks and tests - Build and maintain everything-as-code for fleet and service lifecycle, driving provisioning, configuration, release automation, canary deployments, and complex rollout/rollback workflows - Continuously improve platform hygiene through consistent OS/app upgrades, dependency/vulnerability patching, capacity and performance tuning, and strict CI/CD validation prior to production rollouts Who You Are (Success Profile) - You act like an owner. Your passion for the mission fuels your bias for action. You operate with integrity because you genuinely care about the outcome. You adapt to what’s needed, navigating seamlessly between high-level strategy and hands-on execution. - You are a problem-solver. You seek out challenges because you are energized by finding solutions, knowing that solving the hard problems delivers the biggest impact. - You are a high-trust collaborator. You are ambitious for the team, not just yourself. You embrace our challenge culture by giving and receiving ongoing feedback—knowing that candor delivered with clarity and respect is the truest form of teamwork and the fastest way to earn trust. - You operate with urgency. You understand that in a high-growth environment, speed and quality are not mutually exclusive. You have a relentless focus on execution and a bias for action, delivering high-impact results quickly to win for the customer and the team. - You think at scale. You connect your day-to-day work to the larger company mission and think globally. You build solutions, processes, and teams that are not just effective today but are built to last and support a high-growth, global organization. What We’re Looking for (Minimum Qualifications) - US Citizenship is required (due to the nature of assigned customers) and 5+ years industry experience in software engineering, infrastructure software, and/or platform engineering - Proficiency in at least one programming language (such as Python, Bash, or Go) with demonstrated ability to write production-quality code (testing, code reviews, CI, maintainable design,scripting for diagnostics - Strong Linux/Unix systems fundamentals (process/memory, filesystems, networking stack basics, debugging/perf troubleshooting) and solid understanding of networking protocols and components (e.g., HTTP, DNS, TCP/IP, ICMP, OSI model, subnetting, and load balancing/traffic concepts) - Proven experience operating production services (including incident response, troubleshooting, reducing toil) and ability to participate in on-call rotations and support occasional after-hours or weekend deployments - Managing BSD in production, with a focus on driving systemic fixes through platform engineering What Will Make You Stand Out (Preferred Qualifications) - Proven expertise in operating Kubernetes at scale - Deep experience with the Prometheus/OpenTelemetry ecosystems, including instrumenting golden signals, defining SLOs, and performing alert tuning to ensure high-availability environments #LI-KM9 #LI-Remote Zscaler’s salary ranges are benchmarked and are determined by role and level. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations and could be higher or lower based on a multitude of factors, including job-related skills, experience, and relevant education or training. The base salary range listed for this full-time position excludes commission/ bonus/ equity (if applicable) + benefits. Base Pay Range $119,000—$170,000 USD At Zscaler, we are committed to building a team that reflects the communities we serve and the customers we work with. We foster an inclusive environment that values all backgrounds and perspectives, emphasizing collaboration and belonging. Join us in our mission to make doing business seamless and secure. Our Benefits program is one of the most important ways we support our employees. Zscaler proudly offers comprehensive and inclusive benefits to meet the diverse needs of our employees and their families throughout their life stages, including: - Various health plans - Time off plans for vacation and sick time - Parental leave options - Retirement options - Education reimbursement - In-office perks, and more! Learn more about Zscaler’s Future of Work strategy, hybrid working model, and benefits here. By applying for this role, you adhere to applicable laws, regulations, and Zscaler policies, including those related to security and privacy standards and guidelines. Zscaler is committed to providing equal employment opportunities to all individuals. We strive to create a workplace where employees are treated with respect and have the chance to succeed. All qualified applicants will be considered for employment without regard to race, color, religion, sex (including pregnancy or related medical conditions), age, national origin, sexual orientation, gender identity or expression, genetic information, disability status, protected veteran status, or any other characteristic protected by federal, state, or local laws. See more information by clicking on the Know Your Rights: Workplace Discrimination is Illegal link. Pay Transparency Zscaler complies with all applicable federal, state, and local pay transparency rules. Zscaler is committed to providing reasonable support (called accommodations or adjustments) in our recruiting processes for candidates who are differently abled, have long term conditions, mental health conditions or sincerely held religious beliefs, or who are neurodivergent or require pregnancy-related support.

Bash Dns Firewalls Grafana HTTP Icmp Load Balancing Nagios Osi Model Prometheus Python Tcp/Ip

View details: Staff Site Reliability Engineer

California

$119K - $170K / year

Apply

Job Closed

Systems Management Analyst, Azure/DevOps - Remote

Job Description

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DevOps Engineer

Site Reliability Engineer

Senior Site Reliability Engineer

Staff Site Reliability Engineer