To enable broadband service providers of all sizes to simplify, innovate and grow.
Staff Site Reliability Operations Engineer
Location
California
Posted
3 days ago
Salary
$136K - $265.7K / year
Seniority
Lead
Job Description
Staff Site Reliability Operations Engineer
Calix
• Architect, optimize, and troubleshoot complex networking infrastructure. • Design, scale, and optimize our unified observability platform. • Deploy machine learning models and automated anomaly detection. • Drive the architecture, scaling, security of production Google Kubernetes Engine (GKE) clusters. • Tune and maintain high-throughput Apache Kafka clusters. • Ensure performance, scalability, and disaster recovery readiness across PostgreSQL, AlloyDB, and BigQuery. • Integrate AIOps insights with Grafana workflows to automate triage and analysis. • Coach engineers on advanced debugging techniques and distributed systems.
Job Requirements
- 8+ years in SRE, Production Engineering, or Distributed Systems infrastructure roles.
- Deep technical knowledge and debugging mastery across all OSI layers.
- Expert-level mastery of Google Kubernetes Engine (GKE) internals.
- Proven track record managing high-throughput Apache Kafka pipelines.
- Deep, hands-on experience deploying and managing Grafana Enterprise/Cloud.
- Advanced, production-scale expertise utilizing HashiCorp Terraform.
- High proficiency in Go and Python.
Benefits
- As a part of the total compensation package, this role may be eligible for a bonus.
- Click here for information on our benefits.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Site Reliability Lead Engineer
MastercardFounded in 1966, Mastercard is a worldwide transaction, payment-processing, and consulting company best known for its line of personal and business credit cards. As an employer, Ma
Our Purpose Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential. Title and Summary Site Reliability Lead Engineer Lead Site Reliability Engineer Who is Mastercard? At Mastercard technology, we work to connect and power an inclusive, digital economy that benefits everyone, everywhere, by making transactions safe, simple, smart, and accessible. Using secure data and networks, partnerships, and passion, our innovations and solutions help individuals, financial institutions, governments, and businesses realize their greatest potential. Our decency quotient, or DQ, drives our culture and everything we do inside and outside of our company. We cultivate a culture of inclusion for all employees that respects their individual strengths, views, and experiences. We believe that our differences enable us to be a better team - one that makes better decisions, drives innovation, and delivers better business results. Technology at Mastercard What we create today will define tomorrow. Revolutionary technologies that reshape the digital economy to be more connected and inclusive than ever before. Safer, faster, more sustainable. And we need the best people to do it. Technologists who are energized by the challenges of a truly global network. With the talent and vision to create the critical systems and products that power global commerce and connect people everywhere to the vital goods and services they need every day. Working at Mastercard means being part of a unique culture. Inclusive and diverse, a rich collaboration of ideas and perspectives. A place that celebrates your strengths, values your experiences, and offers you the flexibility to shape a career across disciplines and continents. And the opportunity to work alongside experts and leaders at every level of the business, improving what exists, and inventing what's next. About the Role The Business Operations (Biz Ops) team is seeking a Business Operations Site Reliability Engineer (SRE) The role of Business Operations Organization is to be the production readiness steward for Mastercard products. As a Business Operations SRE, we are responsible for ensuring that our platform is stable and healthy. We break down barriers to run our products by fostering developer run ownership and empowering developers to build resilient products. We support our developers during the application build phase in software run principals that includes operational design, automation, capacity planning, monitoring that leads to fault-tolerant, scalable products. We see the big picture and help create and enforce operations standards while facilitating an agile and learning culture. All about you We are seeking a highly motivated and experienced Sr./Lead/Principal Site Reliability Engineer (SRE) to join our growing team. You will play a critical role in ensuring the reliability, scalability, and performance of our applications, supporting essential services that power Mastercard's global operations. As a thought leader in your field, you will bring technical expertise, a passion for automation, and the ability to mentor. We support daily operations with a hyper focus on triage, root cause by understanding the business impact of our products and subsequently performing blameless post-mortems. The goal of every Business Operations team is to engage early in the development lifecycle to be more proactive and upfront in the development process, and to proactively manage production and change activities to maximize customer experience and increase the overall value of supported applications. Business Operations teams also focus on risk management by tying all our activities together with an overarching responsibility for compliance and risk mitigation across all our environments. Ultimately, the role of Business Operations is to align Product and Customer Focused priorities with Operational needs by providing continuous feedback throughout the lifecycle. Corporate Security Responsibility All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must: - Abide by Mastercard's security policies and practices; - Ensure the confidentiality and integrity of the information being accessed; - Report any suspected information security violation or breach, and - Complete all periodic mandatory security trainings in accordance with Mastercard's guidelines.
• Support the full release lifecycle (intake → validation → release → post-release tracking) • Validate release requests, including dependencies, readiness, and required inputs prior to submission • Coordinate timelines, milestones, and deliverables across stakeholders • Track release status, risks, and blockers and drive resolution to ensure on-time delivery
DevOps Engineer
Remote RecruitmentRemote Recruitment operates as a full-service employment agency providing recruitment/staffing for UK based companies
• Design, build, and maintain CI/CD pipelines to support fast and reliable software delivery • Manage cloud infrastructure on AWS or Azure using infrastructure-as-code • Implement monitoring, alerting, and observability tools across all environments • Collaborate with developers to improve build, test, and deployment processes • Maintain security best practices across infrastructure and pipeline configurations
DevOps Engineer
Leap ToolsLeap Tools is an equal opportunity employer committed to fostering an inclusive, equitable, and accessible environment. Accommodations are available on request for candidates taking part in all aspects of the interview process. If you require any accommodation, please contact us at ta@leaptools.com.
Role Description At Leap Tools, we are building the world's most advanced solutions for the interior décor industry. Our technology lets you preview products in your own room before you buy them. You’ll be responsible for a variety of development-related automation tasks that involve: - Smooth operation of state of the art production systems used by millions of users - Engineering tools (e.g. process and work tracking) - CI/CD infrastructure and release automation - Testing infrastructure (various environments and stages) - Investigation and resolution of scalability bottlenecks and production incidents - Communicating and sharing knowledge with peers, QA Engineers, and Developers/Engineers Qualifications - Strong computer science fundamentals based on a degree in computer science or distinctive work experience in software development - Experience with Kubernetes, AWS, or GCP in a cloud environment - Shell scripting prowess in a Linux environment - Development track record in at least one of the following languages: Python, JavaScript, TypeScript, Java, C and/or C++ - Ability to develop foundational engineering infrastructure to be used across the entire company - A demonstrated ability to provide guidance, mentorship, and support - Exceptional attention to detail and focus on quality - Strong communication skills for capturing requirements, as well as sharing designs and progress Requirements - Comfortable maintaining a personal Linux box and customizing it - Ability to set up complex systems that work flawlessly - Open-mindedness to listen and discover challenges Benefits - Remote-first work environment - Work anywhere in the world for up to 3 months - Parental leave program - Work-from-home stipend - Your birthday (and our company's birthday) is a day off! Company Description Leap Tools is an equal opportunity employer committed to fostering an inclusive, equitable, and accessible environment. Accommodations are available on request for candidates taking part in all aspects of the interview process.



