Meet the future of sustainable, self-driving delivery.
Reliability Operations Engineer
Location
Malaysia
Posted
3 days ago
Salary
RM80K - RM100K / year
Seniority
Mid Level
Job Description
Reliability Operations Engineer
Serve Robotics
• Lead incident investigations during your region’s daytime hours, providing timely updates, escalating appropriately, and supporting senior engineers leading the response. • Respond to escalations from Tier 1 support using established runbooks, metrics, logs, and diagnostics to remediate issues or escalate to Tier 3 when needed. • Update runbooks and operational documentation based on new issues, discoveries, and feedback, ensuring clarity and consistency across all procedures. • Run existing automations and collaborate with senior team members to enhance tooling and scripts that streamline troubleshooting and remediation tasks • Use observability tools such as Grafana/Prometheus, GCP Monitoring, and OpenTelemetry to interpret metrics, logs, and traces, helping identify anomalies and validate system performance. • Provide concise, accurate updates during incidents, ensuring information reaches the correct engineering and SRE contacts and supporting structured incident coordination. • Participate in discussions around root causes, share operational insights, and contribute to process improvements that enhance system stability and supportability. • Participate in a shared weekend on-call rotation to help maintain operational coverage for production systems, responding to incidents and escalations as needed and coordinating with engineering teams when issues arise. • Proactively strengthen workflows, adopt best practices, and build the foundation of the Reliability Operations function as it evolves.
Job Requirements
- Bachelor’s degree in Computer Science, Information Technology, Engineering, or equivalent hands-on experience.
- 2–4 years of experience in Reliability Operations, Site Reliability Engineering, DevOps, IT Operations, or a related technical support function.
- Experience participating in Tier 1 or Tier 2 investigations, including log review, basic triage, and structured escalation.
- Exposure to operational environments supporting distributed or cloud-based systems.
- Participation in incident response workflows and/or on-call rotations.
- Proficiency with Linux, including navigating systems, reviewing logs, and performing basic diagnostics.
- Experience using and contributing to runbooks and operational workflows.
- Ability to interpret metrics, logs, and traces using tools such as Grafana/Prometheus, Google Cloud Monitoring, and OpenTelemetry.
- Familiarity with cloud platforms, preferably Google Cloud Platform (GCP).
- Ability to follow documented remediation steps, with good judgment around when to escalate.
- Understanding of CI/CD pipelines and how application deployments affect runtime behavior.
- Experience using Jira or similar ticketing systems.
- Clear and effective communicator, especially when providing updates during time-sensitive operational issues.
- Calm, organized approach to troubleshooting and prioritization.
- Collaborative mindset, working effectively with senior operations engineers, product teams, and SREs.
- Strong sense of ownership and accountability for operational responsibilities.
Benefits
- Continuous operational coverage
- Weekend on-call rotation shared across the Reliability Operations team
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Own the DevOps roadmap across CI/CD, infrastructure automation, release workflows, and environment management with a clear focus on engineering velocity, reliability, and operational efficiency. • Lead the DevOps team while partnering closely with Engineering, SRE, Platform, and Product teams to remove production bottlenecks and raise automation standards across the organization. • Cultivate resilient multi-cloud practices across AWS and GCP, driving infrastructure as code (IaC), Kubernetes-based delivery, and modern operational tooling. • Strengthen observability, uptime discipline, and incident response while leading cloud cost and capacity optimization efforts as our platform scales. • Cultivate team capabilities, manage project plans, and build a high-accountability engineering culture that scales fluidly with company needs. • Evaluate and implement AI powered DevOps tools to improve deployment, monitoring, and incident response processes. • Leverage AI and machine learning solutions for predictive analytics, anomaly detection, capacity planning, and root-cause analysis. • Establish governance, security, and compliance standards for AI enabled infrastructure and operations. • Monitor emerging AI technologies and identify opportunities to improve operational efficiency and reduce manual effort.
Senior DevOps – Cloud & Bare Metal Infrastructure
Vivo (Telefônica Brasil)Com a conexão, queremos que você descubra novos pontos de vista e aproveite tudo o que realmente importa.
• Manage Bare Metal physical environments and provide recommendations for migration or adjustments to Cloud environments • Deploy and support servers, storage and virtualization • Plan and execute corporate connectivity projects • Troubleshoot and optimize networks and links • Implement redundancy and high-availability solutions • Manage infrastructure • Implement and scale Cloud and Multi-cloud infrastructures • Perform advanced connectivity troubleshooting • Ensure infrastructure security and network segmentation • Support integration between physical and cloud environments • Conduct capacity planning and optimize resource utilization • Support operational efficiency initiatives and infrastructure cost reduction • Monitor physical resource consumption and connectivity
Cloud DevOps Engineer
Charger Logistics IncIn Charger Logistics we care about giving equal opportunities to each candidate and employee, we consider qualified applicants without regard to race, color, religion, sex, national origin, ancestry, age, genetic information, sexual orientation, gender identity, marital or family status, medical condition, or disability. We invest time and support in you to provide the room to learn, grow and work your way up. An entrepreneurially minded organization where you’ll be given support and room to develop your own strategies. If this sounds like what you’re looking for, then we might be the place for you. We inform you that the information provided in this application process for our vacancies is confidential and is intended exclusively for the specialized team of Talent and Selection of Charger Logistics. We also confirm that our contact is exclusively through official Charger Logistics channels and is free of charge.
Role Description The Cloud DevOps Engineer will design and implement enterprise-scale cloud infrastructure, drive automation strategies, and mentor engineering teams while architecting mission-critical systems across Azure and GCP environments. Responsibilities - Cloud Architecture & Infrastructure - Design enterprise cloud architecture standards across Azure and GCP. - Lead cloud migration strategies and multi-region infrastructure design. - Establish cloud governance, cost optimization, and security frameworks. - Container & Platform Engineering - Architect Kubernetes platforms (AKS, GKE) with enterprise security and networking. - Implement service mesh solutions and advanced deployment strategies. - Build developer self-service platforms and observability solutions. - CI/CD & Automation - Design scalable pipelines using Jenkins, Azure DevOps, GitHub Actions. - Implement GitOps frameworks and infrastructure-as-code practices. - Integrate security scanning, testing automation, and compliance workflows. - Security & Compliance - Implement Zero Trust architectures and identity management solutions. - Automate compliance for SOC 2, ISO 27001 frameworks. - Drive security-by-design practices across all platforms. - Leadership & Mentorship - Lead technical teams and drive DevOps culture transformation. - Mentor senior engineers and establish centers of excellence. - Oversee architecture reviews and technology roadmaps. Qualifications - 5+ years enterprise experience with Azure and GCP. - 4+ years production Kubernetes experience (AKS/GKE). - Expert-level CI/CD tools (Jenkins, Azure DevOps) and IaC (Terraform). - Strong Python programming with PowerShell/Bash experience. - Deep security and compliance knowledge. Requirements - 6+ years leading technical teams and enterprise initiatives. - Required: Azure Solutions Architect Expert or GCP Professional Cloud Architect. - Preferred: Kubernetes certifications (CKA/CKAD). Preferred Qualifications - AI/ML platforms experience, advanced monitoring tools (Prometheus). - Transportation/logistics industry background. - Open-source contributions and thought leadership experience. Benefits - Competitive Salary - Career Growth
Senior DevSecOps Engineer
Arbor EducationArbor Education, founded in 2011 and based in London, England, United Kingdom, is the country's fastest-growing management information system provider, serving
Role Description We are looking for an experienced and diligent Senior DevSecOps Engineer to join our DevSecOps team and help us secure the resilience, integrity, and performance of the Arbor platform as it scales — including the AI-enabled systems and developer tooling now central to how we build and operate. The remit and focus of the role is to combine deep security engineering with a secure-by-design mindset, using metrics, automation, and threat modelling to drive measurable improvements. Working closely with architecture, platform, and engineering teams, you will continuously harden our infrastructure, our software supply chain, and the AI systems and agents increasingly embedded across our products and workflows. - Collaborate with stakeholders to pinpoint security enhancements across platform architecture and infrastructure, devising and executing strategic plans for implementation. - Work closely with the Platform team to embed robust security processes, controls, and tooling across all system components. - Threat model new and existing systems — including AI/LLM-enabled features and agentic workflows — and translate findings into prioritised, actionable work. - Strengthen our software supply chain: dependency and base-image hygiene, SBOM generation, artefact signing and provenance, and the pinning of third-party actions and packages. - Secure the use of AI across the SDLC, ensuring agentic coding tools, assistants, and MCP integrations operate within safe, well-scoped, and auditable boundaries. - Contribute to the evolution of deployment frameworks, emphasising security, deployment speed, and system stability. - Elevate platform security through strong secrets management and the safe handling of sensitive information. - Play an active role in incident response, resolution, and blameless post-mortems, facilitating continuous improvement. - Participate in knowledge-sharing initiatives, including tech-talks and team-based learning sessions. - Maintain meticulous, current documentation — playbooks, runbooks, and comprehensive systems documentation — to facilitate knowledge dissemination. Qualifications - Extensive experience in cyber security and associated engineering practices. - Vulnerability management and remediation at scale. - Proven track record in DevOps / DevSecOps engineering within large-scale platforms. - Proficiency in distributed cloud systems, particularly Amazon Web Services. - Expertise in Infrastructure as Code (IaC) tooling such as Terraform and CloudFormation. - Experience with languages such as PHP, Bash, or Python. - Experience with Docker and containerisation, with a working understanding of container and runtime security. - Software supply-chain security: SBOMs, dependency scanning, and artefact signing / provenance (e.g. SLSA, Sigstore). - Secrets management and detection (e.g. Vault, cloud-native secret stores, secret-scanning in CI). - Security tooling across the SDLC: SAST, DAST, SCA, IaC scanning, and container scanning (e.g. Snyk, Trivy). - Policy-as-code and guardrails (e.g. OPA / Conftest), with an identity-centric / zero-trust approach to access. - Familiarity with monitoring and detection tooling like DataDog, Prometheus, or similar platforms. - A proactive problem-solving attitude coupled with strong teamwork and communication skills. - Exceptional proficiency in written and spoken English to effectively articulate ideas and concepts. - AI security and safe AI usage. - Practical understanding of AI/LLM security risks and their mitigations — e.g. prompt injection, jailbreaks, insecure output handling, sensitive-data leakage, and excessive agency (aligned to the OWASP Top 10 for LLM Applications). - Experience securing AI-assisted and agentic development tooling: scoping permissions, sandboxing, logging and audit, and preventing secret or data exfiltration through AI agents and MCP servers. - Familiarity with AI threat modelling and adversarial techniques (e.g. MITRE ATLAS) and with conducting or supporting AI-aware red teaming. - Awareness of AI governance and assurance frameworks (e.g. NIST AI RMF, ISO/IEC 42001) and how they intersect with data-protection obligations for a multi-tenant platform handling children's data. - Confident, responsible use of AI tooling to accelerate security work — triage, detection engineering, code review, and documentation — while understanding and accounting for its limitations. Bonus Skills - Past experience with enterprise solutions running at scale. - Familiarity with kanban and agile development processes. - Familiarity with software best practices such as Refactoring, Clean Code, Domain-Driven Design, Test-Driven Development, etc. - Experience with compliance frameworks relevant to EdTech (e.g. NIST CSF, ISO 27001, SOC 2, UK GDPR). - Relevant certifications (e.g. AWS Security Specialty, OSCP, or AI security / governance credentials). Benefits - The chance to work alongside a team of hard-working, passionate people in a role where you’ll see the impact of your work every day. - A dedicated wellbeing team who champion initiatives such as mindfulness, lunch n learns, manager training, mental health first aid training and much more! - 32 days holiday (plus Bank Holidays). This is made up of 25 days annual leave plus 7 extra company-wide days given over Easter, Summer & Christmas. - Life Assurance paid out at 3x annual salary. - Comprehensive wellness benefit provided by AIG Smart Health, which provides a 24/7 virtual GP service, Mental health support, Counselling, and personalised Health Checks. - Private Dental Insurance with Bupa. - Salary sacrifice Pension provided by Scottish Widows. - Enhanced maternity and adoption leave (20 weeks full pay) and paternity (6 weeks full pay) pay. - 5 free return to work maternity coaching sessions, helping you adapt to this new exciting time of life! - Access to services such as Calm and Bippit (financial wellbeing coaching). - All of our roles champion flexible working and we are happy to discuss what this means to you. - Social committees that plan team, office and company-wide events to bring people together and celebrate success. - Dedicated professional development training budget (CPD courses, upskilling resources, professional memberships etc). - Volunteer with a charity of your choice for a day each year. - Dog friendly offices! Interview Process - Phone screen. - 1st stage. - 2nd stage.


