Job Closed

This listing is no longer active.

HostPapa logo
HostPapa

Let Papa take care of you!

Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 51-200Since 2006H1B No SponsorCompany SiteLinkedIn

Location

Malaysia

Posted

48 days ago

Salary

0

Seniority

Senior

Job Description

Site Reliability Engineer

HostPapa

• Define and implement SLIs, SLOs, and error budgets for critical CloudBlue services to ensure reliability and performance • Influence system architecture with a strong focus on reliability, scalability, and operability, designing systems for fault tolerance, graceful degradation, and self-healing • Reduce operational toil by identifying opportunities for automation and process improvement • Design and operate CloudBlue’s observability stack across metrics, logs, and traces using tools such as Datadog, Grafana, and Elastic Stack • Develop actionable alerting strategies and dashboards that provide clear insight into platform and business health • Design and maintain high-availability architectures, implementing redundancy, failover, and disaster recovery strategies across regions and availability zones • Conduct capacity planning, load testing, and performance optimization to ensure platform stability and scalability • Act as a senior responder during production incidents, leading incident coordination, communication, and service restoration • Own blameless postmortems and drive improvements that reduce incident frequency, MTTR, and customer impact • Improve reliability of Kubernetes-based platforms through health checks, autoscaling strategies, rollout safety, and resilience testing • Partner with engineering and DevOps teams to improve deployment safety, rollback strategies, and platform reliability • Maintain runbooks and operational documentation, and promote SRE best practices across engineering teams • Support other tasks or projects as assigned to meet team and business needs

Job Requirements

  • 3+ years of experience as an SRE, DevOps Engineer, or Production Engineer, with strong ownership of production systems
  • Proven experience operating highly available, enterprise-grade, multi-tenant SaaS platforms
  • Hands-on experience with observability and monitoring tools such as Datadog, Grafana, and Elasticsearch/Kibana
  • Solid understanding of Linux, networking, and distributed systems fundamentals
  • Experience working with containerized environments such as Docker and Kubernetes
  • Strong scripting and automation skills using Python and/or Bash
  • Experience participating in on-call rotations and incident response in production environments
  • Strong written and spoken English
  • Experience defining SLIs/SLOs and managing error budgets at scale will be considered a plus
  • Exposure to hyperscale or service-provider-grade platforms is an advantage
  • Cloud experience, preferably with Azure; experience with AWS and/or GCP will also be valued
  • Experience working with hybrid or on-premises integrations is beneficial
  • Familiarity with chaos engineering and resilience testing will be considered an asset

Benefits

  • A competitive salary that values you and your unique skill sets
  • Career advancement & professional development opportunities to help you reach your full potential
  • Flexible work arrangements to support work/life balance

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior Site Reliability Engineering - Remote

UnitedHealth Group

UnitedHealth Group is a healthcare and well-being company that’s dedicated to improving the health outcomes of millions around the world. We are comprised of

DevOps Engineer48 days ago

Optum Tech is a global leader in health care innovation. Our teams develop cutting-edge solutions that help people live healthier lives and help make the health system work better for everyone. From advanced data analytics and AI to cybersecurity, we use innovative approaches to solve some of health care’s most complex challenges. Your contributions here have the potential to change lives. Ready to build the next breakthrough? Join us to start Caring. Connecting. Growing together. The Sr Cloud Engineer/Sr Site Reliability Engineer is a member of Cloud Operations Automation team and responsible for the reliability, security and efficiency of Change Healthcare’s cloud environments and products that comprise Enterprise Imaging solutions. The engineer will also participate in the Cloud Operations team's activities including continuous delivery, configuration changes, performance monitoring, and ensure high availability of systems. The engineer will be an active leader on the incident management process including triaging and resolution of incidents. You’ll enjoy the flexibility to work remotely * from anywhere within Canada (except for the Saskatchewan province) as you take on some tough challenges. Primary Responsibilities: - Works as a software tools developer in the cloud operations team - Define and drive automation of cloud operations and deployment using different toolset - Define and drive implementation of Cloud Operations procedures - Define and implement effective and reliable cloud infrastructure - 24×7 shift-based support with rotating on-call You’ll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in. Required Qualifications: - 5+ years of experience administering cloud infrastructure and production applications including Kubernetes based workloads for enterprise-scale SaaS or PaaS environments in public clouds such as GCP (preferred), AWS, or Azure - 3+ years of solid understanding of DevOps and SRE principles, including version control, IaC/Terraform, automated testing, continuous delivery, observability, and incident response - 2+ years of solid proficiency with programming or scripting language used for automation, tooling, or service development (e.g., Python, JavaScript) - 1+ years of exposure with AI-Ops/AI-powered coding and analysis tools Preferred Qualifications: - Bachelor’s degree in information systems, Computer Science, Engineering, or related field or equivalent certification - 1+ years of experience with cloud security principles, including DDoS prevention, vulnerability management, and patching - Knowledge of cloud networking, cloud security, centralized logging and monitoring tools. - Solid foundation in Linux/Windows operating systems and tools *All employees working remotely will be required to adhere to UnitedHealth Group’s Telecommuter Policy Canada Residents Only: The salary range for Canada residents is $82,700 to $171,900 annually. Pay is based on several factors including but not limited to education, work experience, certifications, etc. At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone–of every race, gender, sexuality, age, location and income–deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes — an enterprise priority reflected in our mission. UnitedHealth Group is an Equal Employment Opportunity employer under applicable law and qualified applicants will receive consideration for employment without regard to race, national origin, religion, age, color, sex, sexual orientation, gender identity, disability, or protected veteran status, or any other characteristic protected by local, state, or federal laws, rules, or regulations.

Canada
82.7K - 171K / year
Zimperium logo

DevOps Engineer – Europe

Zimperium

The leader in enterprise mobile endpoint protection and mobile app protection for Android, iOS and Chromebooks threats

DevOps Engineer48 days ago
Full TimeRemoteTeam 201-500Since 2010H1B Sponsor

• Design, develop, and support cloud and locally hosted solutions that facilitate ease of service deployment, availability, and operations • Continuously improve processes and infrastructure to be easy to deploy, scalable, secure, and fault-tolerant • Automate operational, testing, installation, and other processes to increase efficiency and stability • Work collaboratively service development team to resolve complex operational issues and deliver value • Provide documentation on how to resolve issues in complex systems • Build CI/CD solutions to improve developer productivity and rapid deployments • Support and extend existing service software written in Kotlin • Collaborate with the team to establish DevOps best practices

Latvia
Shyft6 logo

DevSecOps Engineer

Shyft6

Human Powered, Technology Driven, Results First.

DevOps Engineer48 days ago
Full TimeRemoteTeam 201-500Since 2019H1B No Sponsor

• Collaborate with development and operations teams to streamline software delivery and deployment processes • Build, maintain, and optimize CI/CD pipelines to support application releases and system updates • Monitor applications and infrastructure to identify and resolve production issues • Perform root cause analysis and implement solutions to improve system reliability and performance • Integrate security practices into the development lifecycle, including code scanning, vulnerability management, and compliance checks • Support system integrations and deployment activities related to the Facets migration project • Automate processes to improve efficiency, reduce manual effort, and enhance system performance • Participate in code reviews to ensure adherence to security and quality standards • Maintain documentation for deployment processes, security protocols, and system configurations • Collaborate with cross-functional teams to support ongoing enhancements and system improvements

United States
Parexel logo

Site Contract Leader

Parexel

Parexel is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to legally protected status, which in the US includes race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status.

DevOps Engineer48 days ago
Full TimeRemoteTeam 10,001+Since 1983H1B Sponsor

Role Description At Parexel, we are seeking a Site Contract Leader to join our global Clinical Operations organization. This home‑based role can be located in Argentina, Brazil, or Mexico and plays a critical client‑facing leadership role within complex, global clinical trials. As a Site Contract Leader, you will act as the primary point of accountability for sponsor engagement and site contracting strategy on assigned projects. You will partner closely with sponsors, project leadership, and global site contract teams to: - Drive efficient negotiations - Manage budgets - Mitigate risk - Ensure alignment to protocol and study objectives Always keeping quality, timelines, and relationships at the forefront. Qualifications - 3–5 years of experience in site contracting or a closely related CRO role - Demonstrated experience managing timelines, budgets, and external client relationships - Strong negotiation skills and ability to interpret and explain complex contracts - Advanced proficiency in MS Excel Requirements - Strong client relationship and stakeholder management skills - Comfortable leading strategy discussions with sponsors - Sound judgment and ability to influence without authority - Highly organized, proactive, and detail‑oriented - Collaborative, resilient, and diplomatic under pressure - Strong Excel expertise, including experience with budget formulas and financial analysis - Requires global clinical trial experience and confidence navigating multinational contracting landscapes Benefits - Be part of global studies that are shaping the future of healthcare - Work with industry experts - Enjoy flexibility through home‑based work - Access continuous learning and career growth opportunities - Make a meaningful impact on patients’ lives

Brazil + 2 moreAll locations: Brazil | Argentina | Mexico