The leading Customer Experience Management platform geared towards Arab.
Site Reliability Engineer
Location
Saudi Arabia
Posted
68 days ago
Salary
0
Seniority
Senior
Job Description
Site Reliability Engineer
Lucidya | لوسيديا
• You’ll design and maintain infrastructure that is highly available, fault-tolerant, and scalable • You’ll proactively identify and eliminate single points of failure before they become incidents • You’ll ensure our production systems remain stable, even under increasing scale and load • You’ll manage and continuously improve workloads across AWS, GCP, or Azure • You’ll use Infrastructure as Code (Terraform) to standardize and scale infrastructure • You’ll optimize resource usage to balance performance and cost • You’ll operate and scale Kubernetes clusters (EKS, GKE, etc.) with confidence • You’ll troubleshoot issues quickly and ensure smooth deployments and upgrades • You’ll ensure our containerized workloads perform reliably at scale • You’ll implement and refine monitoring systems using tools like Prometheus, Grafana, Datadog, or ELK • You’ll define alerting that is meaningful, not noisy • You’ll respond to incidents, lead root cause analysis, and ensure we learn from every failure • You’ll write scripts and build tooling to eliminate repetitive operational work • You’ll continuously improve infrastructure efficiency through automation • You’ll promote a culture where manual work is a temporary state, not the norm • You’ll work closely with DevOps and engineering teams to solve performance bottlenecks • You’ll contribute to CI/CD improvements and deployment reliability • You’ll help shape reliability best practices across the organization
Job Requirements
- You’ve spent ~3 years working in SRE, DevOps, or infrastructure engineering, and you’ve seen what breaks at scale
- You’re comfortable working in cloud environments like AWS, GCP, or Azure—and you understand how distributed systems behave
- You’ve worked hands-on with Kubernetes in production and know how to troubleshoot it when things go wrong
- You don’t just fix issues - you ask why they happened and make sure they don’t happen again
- Use Terraform (or similar IaC tools) to manage infrastructure
- Work confidently with Docker and Kubernetes
- Write scripts in Python, Bash, or similar to automate workflows
- Understand CI/CD pipelines (Jenkins, GitHub Actions, Bitbucket, etc.)
- Have a solid grasp of networking, load balancing, and high-availability design
- You’ve implemented tools like Prometheus, Grafana, Datadog, or ELK
- You know the difference between useful alerts and noise
- You focus on signals that actually drive action
- You take ownership - you don’t wait to be told something is broken
- You’re calm under pressure and methodical during incidents
- You simplify complexity instead of adding to it
- You communicate clearly, even when explaining deeply technical issues
- You care about building systems that make other engineers more effective
- Nice to Have (but not required)
- Experience with RabbitMQ or Redis in production
- Familiarity with Ansible or AWX
- Exposure to multi-cloud or hybrid environments
- Cloud certifications (AWS, GCP) or Linux certifications
- Background from ITI (Information Technology Institute)
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Forward Deployment Engineer
WelocalizeReach, Grow, and Engage Global Audiences with Multilingual Content
• Build and maintain automation scripts and internal lightweight tools to support operations teams. • Develop, troubleshoot and optimize enterprise integrations to ensure seamless data exchange. • Implement programmatic quality checks to improve data consistency and accuracy. • Support the rollout of technical solutions by documenting, and iterating based on feedback. • Build and support integrations between internal systems and third-party tools • Maintain and improve existing workflows as the teams needs evolve.
Senior Site Reliability Engineer, SRE, Platform Reliability
AffirmWe create honest financial products that improve lives.
• Owning and delivering quarterly goals for your team • Supporting peers and stakeholders in the product development lifecycle • Proactively identifying technical solutions and operational processes • Supporting the operations and availability of your team’s artifacts • Fostering a culture of quality and ownership on your team • Developing talent on your team through feedback and guidance
Senior Site Reliability Engineer – SRE, Platform Reliability
AffirmWe create honest financial products that improve lives.
• Owning and delivering quarterly goals for your team • Leading engineers through ambiguity to solve open-ended problems • Supporting peers and stakeholders in the product development lifecycle • Proactively identifying technical solutions and operational processes • Supporting the operations and availability of your team’s artifacts • Fostering a culture of quality and ownership on your team • Developing talent on your team
DevOps Engineer
Quokka.ioProactive mobile security that makes you, your customers, organization, and employees feel safe and secure.
• Automate provisioning, deployment, and delivery within our Azure cloud environment • Expand and maintain our application analysis pipeline • Collaborate with engineering teams to understand the needs of the product in operations • Lead the effort in adapting best practices in continuous integration/delivery throughout the company • Assist the development team in monitoring services in production and triaging issues as they come up • Assist in establishing and maintaining build pipelines to push product into production • Work within an agile development environment and participate in the scrum process



