Site Reliability Engineer (SRE)
Location
United Kingdom
Posted
49 days ago
Salary
€55K - €68K / year
Seniority
Mid Level
No structured requirement data.
Job Description
Site Reliability Engineer (SRE)
Air Apps
About Air Apps At Air Apps, we believe in thinking bigger—and moving faster. We’re a family-founded company on a mission to create the world’s first AI-powered Personal & Entrepreneurial Resource Planner (PRP), and we need your passion and ambition to help us change how people plan, work, and live. Born in Lisbon, Portugal, in 2018—and now with offices in both Lisbon and San Francisco—we’ve remained self-funded while reaching over 100 million downloads worldwide. Our long-term focus drives us to challenge the status quo every day, pushing the boundaries of AI-driven solutions that truly make a difference. Here, you’ll be a creative force, shaping products that empower people across the globe. Join us on this journey to redefine resource management—and change lives along the way. The Role As a Site Reliability Engineer (SRE) at Air Apps, you will be responsible for ensuring the reliability, availability, and scalability of our systems. You will work at the intersection of software development and operations, implementing automation, monitoring, and performance optimization strategies to minimize downtime and improve system resilience. - This is a fully onsite position, based at our office in Lisbon, where you will collaborate closely with cross-functional teams in person and contribute to a dynamic and fast-paced environment. We are open to support with relocation efforts. Responsibilities - Design and implement scalable, reliable, and fault-tolerant systems across cloud environments. - Develop and maintain observability tools, including monitoring, logging, and alerting (e.g., Prometheus, Grafana, Datadog, ELK). - Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code (IaC) tools like Terraform or CloudFormation. - Optimize system performance, scalability, and incident response workflows to improve uptime. - Work closely with development and DevOps teams to improve system design for reliability. - Conduct root cause analysis (RCA) and implement preventative measures to minimize failures. - Ensure high availability by designing and maintaining load balancing, failover, and disaster recovery strategies. - Improve CI/CD pipelines to enhance deployment speed while maintaining stability. - Optimize cloud cost and resource utilization for AWS, Azure, or Google Cloud Platform (GCP). - Participate in on-call rotations to quickly address system failures and minimize downtime. Requirements - Around 4+ years of experience in Site Reliability Engineering (SRE), DevOps, or System Engineering. - Strong knowledge of cloud platforms (AWS, Azure, or GCP) and cloud-native architectures. - Experience with observability and monitoring tools (Prometheus, Grafana, ELK, Datadog, New Relic). - Proficiency in Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Pulumi. - Hands-on experience with containerization and orchestration (Docker, Kubernetes, Helm). - Strong Linux system administration and networking fundamentals. - Experience with incident management, debugging, and root cause analysis. - Proficiency in scripting (Bash, Python, or Go) for automation and system monitoring. - Knowledge of load balancing, failover strategies, and distributed systems. - Understanding of security best practices, access control, and compliance requirements. - Strong communication skills and the ability to collaborate with cross-functional teams. What benefits are we offering? - Apple hardware ecosystem for work. - Annual Bonus - Top-tier Health and Life Insurance for peace of mind. - Transportation Budget to support your commute needs. - Coverflex benefits package for meal allowances, well-being, and more. - Childcare support. - Air Conference - an opportunity to meet the team, collaborate, and grow together. - Pension Fund to support your long-term financial planning. - Urban Sports Club membership to keep you active. - Meals 100% free at the hub. Diversity & Inclusion At Air Apps, we are committed to fostering a diverse, inclusive, and equitable workplace. We enthusiastically welcome applicants from all backgrounds, experiences, and perspectives. We celebrate diversity in all its forms and believe that varied voices and experiences make us stronger. Application Disclaimer At Air Apps, we value transparency and integrity in our hiring process. Applicants must submit their own work without any AI-generated assistance. Any use of AI in application materials, assessments, or interviews will result in disqualification.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior Staff Engineer - DevSecOps
The TJX Companies, Inc.At TJX Canada, every day brings new opportunities for growth, exploration, and achievement. You’ll be part of our vibrant team that embraces diversity, fosters collaboration, and prioritizes your development. Whether you’re working in our Distribution Centers, Corporate Offices, or Retail Stores—WINNERS, HomeSense, and Marshalls, you’ll find abundant opportunities to learn, thrive, and make an impact. Come join our TJX family—a Fortune 100 company and the world’s leading off-price retailer.
TJX Companies At TJX Canada, every day brings new opportunities for growth, exploration, and achievement. You’ll be part of our vibrant team that embraces diversity, fosters collaboration, and prioritizes your development. Whether you’re working in our Distribution Centers, Corporate Offices, or Retail Stores—WINNERS, HomeSense, and Marshalls, you’ll find abundant opportunities to learn, thrive, and make an impact. Come join our TJX family—a Fortune 100 company and the world’s leading off-price retailer. Here at TJX Canada, we are an equal opportunity employer committed to the inclusion and accommodation of all individuals. Job Description: Senior Staff Engineer – DevSecOps (Security & Audit Compliance) Merchandise Operations Management (MOM) – U.S., Canada, and potential EU support** What You’ll Discover - An inclusive, growth‑focused culture - A global IT organization spanning India, the U.S., Canada, Europe, Australia and India - A collaborative, challenge-rich, team-driven environment - The opportunity to shape and scale DevSecOps capabilities for TJX’s Merchandise Operations Management (MOM) ecosystem - Direct involvement in new deployments of Oracle’s Retail MOM Suite across rapidly growing global divisions What You’ll Do As a Senior Staff Engineer, you’ll play a key role in evolving the DevSecOps foundation that powers and protects our SAFe-based Agile Release Train (ART). In this role, you will: - Lead and collaborate across engineering teams, development squads, and enterprise shared services - Design, develop, automate, and support CI/CD and DevSecOps capabilities for a SOX‑compliant MOM Continuous Delivery Pipeline - Advance DevSecOps patterns, tooling, integrations, and infrastructure to elevate speed, quality, compliance, and security - Drive continuous improvement through metrics, measurement, and hands-on engineering - Mentor engineers and promote DevSecOps best practices across the organization What You’ll Bring - 5–8+ years of hands-on DevSecOps engineering experience - Deep expertise in CI/CD, DevSecOps automation, and industry standards/best practices - Ability to quickly learn new technologies and communicate complex concepts clearly - Experience collaborating across large, global organizations - Strong understanding of Sarbanes–Oxley (SOX) controls, security/audit best practices, and DevSecOps governance (e.g., AD groups, service accounts, role-based access) - Bachelor’s degree in technology, information systems, or equivalent experience - Experience designing security architectures, addressing segregation‑of‑duties requirements, and contributing to new technology evaluations - Comfortable working in the grey and independently - Comfortable with the advancement of AI and exploring new avenues of productivity with the use of GenAI Preferred Technical Skills - CloudBees Core, Ansible Tower Ansible Automation Platform, Artifactory, Liquibase, ServiceNow Change Management, Groovy, Maven, Splunk, Bitbucket, JIRA Cloud, Azure, Veracode, SonarQube, Selenium, and related DevOps tooling. Additional Information: Candidates aged 18 and over will be required to undergo a criminal record check as part of the hiring process. This job posting is for an existing position vacancy within our organization. TJX Canada uses artificial intelligence (AI) to assist in screening and assessing applicants for this position. Internal TJX Associates must submit their applications via the Jobs Hub in Workday. Direct applications to this job posting will not be accepted. Address: 60 Standish Court Location: CAN Home Office Mississauga ON Salary Range: $102,459.00-$155,152.20 /year *This represents the expected hiring range and may not represent the full pay range for the position. The salary offered may be higher than the posted range depending on several factors such as relevant skills, qualifications, and experience.
Senior DevOps Software Engineer
OpenSesameWe help companies develop the world's most productive and admired workforces.
• Design and manage scalable, secure systems primarily on AWS • Enhance infrastructure automation and cloud configuration • Mentor engineers on DevOps principles • Lead infrastructure initiatives and improve developer experience • Foster a culture of ownership across software development lifecycle
Senior Site Reliability Engineer
BoomiBoomi is the platform for intelligent connectivity and automation. Connect everyone to everything, anywhere.
About Boomi and What Makes Us Special Are you ready to work at a fast-growing company where you can make a difference? Boomi aims to make the world a better place by connecting everyone to everything, anywhere. Our award-winning, intelligent integration and automation platform helps organizations power the future of business. At Boomi, you’ll work with world-class people and industry-leading technology. We hire trailblazers with an entrepreneurial spirit who can solve challenging problems, make a real impact, and want to be part of building something big. If this sounds like a good fit for you, check out boomi.com or visit our Boomi Careers page to learn more. The Boomi Managed Cloud Services Team is looking for a Cloud Operations Engineer with a passion for delivering customer excellence. The Cloud Operations Engineer is responsible for providing a world class support experience, managing customer expectations, and resolving challenging issues for customers of the Boomi Managed Cloud Service, based in the United Kingdom. This role is key to delivering customer excellence and a world-class support experience for our Managed Cloud Service customers. The engineer will be responsible for managing customer expectations and resolving complex technical issues. The Role (What you need): - We're looking for a Cloud Operations Engineer for the Boomi Managed Cloud Services Team. - The job is all about giving awesome support and fixing tough issues for customers using the standard Site Reliability tools on the Boomi Managed Cloud Service. - You need to be based in the United Kingdom and have all the relevant documentation to legally live and work in the UK. What makes a successful candidate: - You're big on Site Reliability stuff. - You genuinely love working with customers and internal teams. - You're a detective when it comes to figuring out the root cause of issues (installation, config, performance, both infrastructure and app layers). - You're super curious and can learn fast. - You're a team player—ready to teach and learn from others. - You're into using AI methods to solve problems and build tools. - You're comfortable and confident operating in a technical IT environment while also managing direct customer-facing responsibilities, simultaneously. What you'll be doing: - Building Boomi Clouds using Ansible (predefined configurations). - Giving remote tech support for the Boomi Managed Cloud Service. - Dealing directly with customer issues related to Networking, Infrastructure, and Boomi application errors. - Trying to recreate customer problems to figure them out. - Using diagnostic skills to find issues and recommend fixes. - Building cool tools using Claude Code and AWS infrastructure. - Documenting problems and solutions in the support database. Must-Haves (Technical Requirements): - Experience with monitoring production systems, performance tuning, and advanced troubleshooting. - Experience supporting production Java runtimes (JVMs) on Cloud platforms (AWS, Azure). - Familiar with Ansible, Python, Harness, and Jenkins. - Intermediate to advanced with Linux (RHEL preferred)—you're comfortable at the command line! - Solid understanding of computer architecture, cloud tech, virtual computing, and networking basics (TCP/IP, SSH, NFS or NetApp). - Several years of experience in DevOps or Technical Cloud Support. Nice-to-Haves (Desirable): - Experience with Observability tools like New Relic, Datadog, or Splunk. - Know-how in troubleshooting Kubernetes or similar containerized services (like AWS EKS, Azure AKS). - A Bachelor’s degree in a relevant technical field. - Certification and proficiency in Boomi Runtime Architecture and Systems Administration. #LI-TS1 Be Bold. Be You. Be Boomi. We take pride in our culture and core values and are committed to being a place where everyone can be their true, authentic self. Our team members are our most valuable resources, and we look for and encourage diversity in backgrounds, thoughts, life experiences, knowledge, and capabilities. All employment decisions are based on business needs, job requirements, and individual qualifications. Boomi strives to create an inclusive and accessible environment for candidates and employees. If you need accommodation during the application or interview process, please submit a request to talent@boomi.com. This inbox is strictly for accommodations, please do not send resumes or general inquiries.
Senior DevOps Engineer, Python, Azure
NearsureRemove the barriers to growth by scaling your team fast with top-notch Latin American IT talent
• Develop and maintain automation and tooling using Python. • Implement and manage integrations with Azure Service Bus. • Develop and consume REST APIs for service integration. • Support event-driven workflows and system interactions. • Collaborate with engineering teams to improve reliability and operational processes. • Contribute to automation and remediation solutions. • Support cloud-based environments in Azure.



