Founded in 1966, Mastercard is a worldwide transaction, payment-processing, and consulting company best known for its line of personal and business credit cards. As an employer, Ma
Senior Site Reliability Engineer
Location
Ireland
Posted
3 days ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer
Mastercard
Our Purpose Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential. Title and Summary Senior Site Reliability Engineer Who is Mastercard? At Mastercard technology, we work to connect and power an inclusive, digital economy that benefits everyone, everywhere, by making transactions safe, simple, smart, and accessible. Using secure data and networks, partnerships, and passion, our innovations and solutions help individuals, financial institutions, governments, and businesses realize their greatest potential. Our decency quotient, or DQ, drives our culture and everything we do inside and outside of our company. We cultivate a culture of inclusion for all employees that respects their individual strengths, views, and experiences. We believe that our differences enable us to be a better team - one that makes better decisions, drives innovation, and delivers better business results. Technology at Mastercard What we create today will define tomorrow. Revolutionary technologies that reshape the digital economy to be more connected and inclusive than ever before. Safer, faster, more sustainable. And we need the best people to do it. Technologists who are energized by the challenges of a truly global network. With the talent and vision to create the critical systems and products that power global commerce and connect people everywhere to the vital goods and services they need every day. Working at Mastercard means being part of a unique culture. Inclusive and diverse, a rich collaboration of ideas and perspectives. A place that celebrates your strengths, values your experiences, and offers you the flexibility to shape a career across disciplines and continents. And the opportunity to work alongside experts and leaders at every level of the business, improving what exists, and inventing what's next. About the Role The Business Operations team is seeking a highly motivated and experienced Senior Site Reliability Engineer (SRE) to join our team. You will play a critical role in ensuring the reliability, scalability, and performance of our applications, supporting essential services that power Mastercard's global operations. As a thought leader in your field, you will bring technical expertise, a passion for automation, and the ability to mentor. The role of the Business Operations Site Reliability Engineer is to be the production readiness steward for Mastercard products. As Business Operations SRE, we are responsible for ensuring that our platform is stable and healthy. We break down barriers to running our products by fostering developer run ownership and empowering developers to build resilient products. We support our developers during the application build phase in software run principles that include operational design, automation, capacity planning, and monitoring that leads to fault-tolerant, scalable products. We see the big picture and help create and enforce operations standards while facilitating an agile and learning culture. We support daily operations with a hyper focus on triage, root cause by understanding the business impact of our products and subsequently performing blameless post-mortems. The goal of every Business Operations team is to engage early in the development lifecycle to be more proactive and upfront in the development process, and to proactively manage production and change activities to maximize customer experience and increase the overall value of supported applications. Business Operations teams also focus on risk management by tying all our activities together with an overarching responsibility for compliance and risk mitigation across all our environments. Ultimately, the role of Business Operations is to align Product and Customer Focused priorities with Operational needs by providing continuous feedback throughout the lifecycle. As part of the Business Operations team, you will: • Independently execute key elements of projects/processes within the Site Reliability Engineering area by applying in-depth knowledge of their discipline and area best practices to effectively resolve problems and roadblocks as they occur. • Assist in evaluating operational requirements and developing technical solutions within existing frameworks. • Support automation and scripting efforts to improve operational workflows and incident response processes. • Troubleshoot and resolve routine and some complex system issues, escalating when necessary to maintain system health. • Contribute to documentation, knowledge sharing, and best practices to enhance team operational procedures. • Collaborate with development teams and stakeholders to ensure reliability solutions align with technical and business needs. • Participate in reviews and quality assurance activities to uphold system stability standards. • May contribute to solution development for new products/services and/or manage smaller project/initiatives as an experienced individual contributor with specialized knowledge within the Site Reliability Engineering area. Role qualifications: The ideal candidate will apply the following skills independently in routine and moderately complex situations, requiring occasional guidance typically only in unfamiliar or highly complex scenarios. They will demonstrate growing consistency and reliability in applying the skills. • Observability - Ability to use scripting and tooling to implement observability solutions, enabling the collection, analysis, and visualization of metrics, logs, and traces to support incident detection, diagnosis, and continuous service improvement.• Programming and Scripting - Ability to write and maintain code and scripts to automate tasks, build operational tools, and support monitoring, deployment, and incident response using languages such as Python, Go, Bash, or similar.• Systems and Network Administration - Ability to configure, operate, and troubleshoot Linux/Unix systems and network components, applying knowledge of networking concepts, protocols, security, and system reliability.• Cloud Computing and Infrastructure - Ability to design, deploy, and manage applications and infrastructure on cloud platforms (e.g., AWS, Azure, GCP), ensuring scalability, security, availability, and operational efficiency.• Reliability and Scalability - Ability to design and operate systems for high availability, fault tolerance, and disaster recovery, while ensuring systems can scale to meet current and future demand• DevOps Practices - Ability to apply DevOps principles and practices, including CI/CD pipelines, containerization, and orchestration, to enable faster, more reliable software delivery and operations.• Troubleshooting - Capability to systematically identify, diagnose, and resolve technical issues across systems, applications, and networks, using analytical methods and tools to restore functionality, minimize disruption, and ensure stable operations.• Capacity Planning and Performance Optimization - Ability to monitor resource utilization, forecast future capacity needs, and optimize system performance to support growth, scalability, and efficient infrastructure usage.• IT Service Management - Ability to apply IT service management principles to incident, problem, and change management, ensuring reliable service delivery, effective incident response, and continuous service improvement aligned to business needs.• Proactive Monitoring and Improvement (SRE Applications) - The ability to use application reliability signals to anticipate issues, identify risks, and drive preventative improvements that enhance application performance and availability. Corporate Security Responsibility All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must: - Abide by Mastercard's security policies and practices; - Ensure the confidentiality and integrity of the information being accessed; - Report any suspected information security violation or breach, and - Complete all periodic mandatory security trainings in accordance with Mastercard's guidelines.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Construcción, integración de despliegue de los pipelines de los productos de la empresa • Dominio de contenedores (Docker) y orquestadores (Kubernetes) • Participar en el desarrollo de nuevas infraestructuras de servicios. • Contribuir a la base de conocimiento del área. • Participar en la formulación de normas, estándares y políticas referentes al área. • Monitorear el funcionamiento de los recursos. • Identificar y mantenerse actualizada la información de novedades tecnológicas y de gestión. • Velar por la seguridad, dentro de su ámbito de acción e independientemente de su nivel de criticidad, de la información y los activos que deba gestionar según su rol. • Velar por el cumplimiento de procedimientos • Velar por la mejora continua del área
Senior Site Reliability Engineer
AiraloAiralo is an eSIM store where travelers can access more than 200 eSIMS at affordable, local rates from around the world while using an eSIM-compatible tablet, s
• Lead the design of scalable, fault-tolerant and self-healing systems in a multi-region AWS environment. • Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to drive architectural decisions and error budget policies. • Conduct blameless post-incident reviews to uncover systemic root causes and implement long-term preventive measures. • Identify patterns of manual work and lead the development of internal tools/automation to permanently eliminate them. • Develop and maintain automated runbooks and playbooks for common operational tasks and complex incident response. • Shift from simple monitoring to deep observability, ensuring high cardinality data leads to proactive actionable insights. • Proactively identify and mitigate operational risks through chaos engineering and architecture reviews. • Work with software engineers to design systems for reliability, scalability, and maintainability from the early stages of the SDLC. • Continuously evaluate and optimize system performance, capacity, and cost efficiency. • Beyond just participating, you will refine the on-call experience to reduce alert fatigue, improve MTTR, and ensure sustainable rotation health.
• Be responsible for maintaining and improving the infrastructure; • Communicate effectively and remain in close contact with developers; • Improve application performance and scalability in the cloud; • Collaborate with the internal technology team and with external partner/integrator teams.
• Build and operate the infrastructure that keeps Themis secure, reliable, and fast • Own the systems for cloud infrastructure, CI/CD pipelines, observability, and security controls • Automate provisioning, configuration, scaling, and routine operational tasks • Manage containerized workloads and orchestration • Build monitoring, logging, alerting, and dashboards to ensure system health and performance • Define and improve incident response processes • Drive reliability improvements, capacity planning, and performance tuning • Implement and maintain security controls and access management




