LineTen is a cloud-based, SaaS technology platform that enables businesses to aggregate technical transactions.
Site Reliability Engineer
Location
Malaysia
Posted
42 days ago
Salary
0
Seniority
Senior
Job Description
Site Reliability Engineer
LineTen
• Ensure global coverage of our products. • Responsible for ensuring all engineering teams have a first class development experience. • Drive roll out of Docker/Kubernetes across all engineering workstations. • Ensure top-notch observability setup is in place. • Provide engineering support across all products. • Work with the Architecture team for product development direction. • Participate in post-incident reviews.
Job Requirements
- Responsible for ensuring all engineering teams have a first class development experience via Tooling, Scripts and support
- Drives roll out of Docker/Kubernetes across all engineering workstations regardless of o/s
- Works with the SRE team and other Architects to ensure that delta between workstation and cloud is minimised and, where this is not possible, workarounds and solutions exist
- Ensure that a top-notch observability setup is in place.
- Is a “go-to” person within the business on all things Docker/Containers
- Responsible for ensuring knowledge base and setup guides are in place, active and maintained
- Provide engineering support across all products
- Work with the Architecture team to gain an understanding of likely direction of product development
- Provide training, support, and resources for engineering teams
- Provide the engineering team with details of any code changes required to support other cloud-based PaaS products
- Provide support to IT manager for device procurement for engineering teams
- Provide QA teams with additional tooling/support as may be required
- Support the engineering team with workstation set up issues
- Participate in product scrums as required
- Work with the engineering team on code reviews
- Ensure Vendor dependencies are recorded/scoped
- Fixing support escalation issues
- Building software / scripts to automate and in general help engineering, operations and support teams perform their duties.
- Participate in post-incident reviews
- Improve the on-call process; reduce team burden while improving issue response times
- Participate in knowledge transfer sessions - for the wider team to self serve
- Capture, analyse and update metrics (SLI, SLO, SLA)
- Create monitoring to improve availability and detect anomalies.
Benefits
- We Are a Home-First Team: LineTen is committed to our home-first policy, which means that we honour remote working first, and offer office space in London, England and Porto, Portugal.
- We Believe in Having Fun: Our WellUs team organises monthly events like Pet Zoom Calls, 45-minute Yoga classes, and after-hours cocktail lessons.
- We Want You to Take a Break: We believe it is the quality of work that matters, not the hours spent “on the clock”. We offer flexible working hours and unlimited vacation.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
NEORIS ahora parte de EPAM es un acelerador Digital que ayuda a las compañías a entrar en el futuro, con más de 20 años de experiencia como Socios Digitales de algunas de las compañías más importantes del mundo. Somos más de 4,000 profesionales en 11 países, con una cultura multicultural y de startup donde fomentamos la innovación, el aprendizaje continuo y la generación de soluciones de alto impacto para nuestros clientes. Estamos en búsqueda de: DevOps Semi Senior - Mid! Principales responsabilidades: • Implementar, optimizar y mantener entornos Cloud Native, garantizando escalabilidad, seguridad y rendimiento. • Gestionar y automatizar pipelines de CI/CD utilizando GitHub, GitLab o Jenkins. • Diseñar, desplegar y administrar soluciones basadas en contenedores. • Configurar y mantener herramientas de observabilidad, asegurando una monitorización integral mediante Dynatrace, Prometheus y Grafana. • Colaborar con equipos de desarrollo y arquitectura para definir buenas prácticas DevOps. • Participar en la mejora continua de procesos, automatización y estándares técnicos. Requerimientos: Excluyentes: • Experiencia mínima de 2 a 4 años en roles DevOps (nivel semi senior). • Conocimientos sólidos en prácticas Cloud Native y gestión de contenedores. • Experiencia aplicando herramientas de observabilidad como Dynatrace, Prometheus y Grafana. • Manejo de GitHub o GitLab y experiencia en pipelines con Jenkins. • Conocimiento de herramientas de calidad y seguridad como Kiuwan. Deseables: • Experiencia con Kubernetes u otros orquestadores de contenedores. • Certificaciones en DevOps o Cloud. • Experiencia en entornos de alta disponibilidad o proyectos de transformación digital. • Conocimientos en automatización avanzada e Infrastructure as Code. Ofrecemos • Contrato indefinido con salario competitivo • Modalidad flexible y posibilidad de trabajo remoto. • Plan de carrera personalizado y formación continua (certificaciones, inglés, etc.). • Participación en proyectos estables con alto componente técnico. • Flexibilidad horaria y enfoque en la conciliación. • Beneficios sociales adaptados a tus necesidades Te invitamos a conocernos en http://www.neoris.com, Facebook, LinkedIn, Twitter o Instagram: @NEORIS. #LI-MO1
Senior DevOps Engineer
TekhqsTekHQS is a global technology and AI-driven solutions company delivering scalable SaaS, Cloud, AI/ML, Blockchain/Web3, DevOps, and enterprise software solutions to startups and enterprise clients worldwide. With a team of 300+ professionals across the USA, UK, UAE, Qatar, Pakistan, and India, we specialize in building high-performance digital products across Logistics, FinTech, Healthcare, and emerging technology sectors. At TekHQS, we foster a culture of innovation, ownership, and continuous growth, empowering our teams to build impactful technology that drives real business transformation.
About the Role We are seeking a skilled DevOps Engineer to strengthen our infrastructure, automation, and CI/CD capabilities across multiple projects. The ideal candidate will drive automation, streamline CI/CD pipelines, and ensure reliable deployments across development and production environments. This role requires strong hands-on experience with AWS and/or Azure, containerization, orchestration tools, infrastructure as code, and continuous integration systems. Key Responsibilities Infrastructure & Cloud Management - Design and manage cloud infrastructure on AWS and Azure. - Deploy and maintain services such as EC2/VMs, S3/Blob Storage, RDS/Azure SQL, VPC/VNet, IAM/Azure AD, Load Balancers, and related services. - Ensure high availability, scalability, and security of production systems. Containerization & Orchestration - Build and manage Docker containers. - Deploy and maintain Kubernetes clusters (EKS/AKS preferred). - Optimize container orchestration for performance and cost efficiency. CI/CD & Automation - Design and maintain CI/CD pipelines using Jenkins (experience with Azure DevOps is a plus). - Automate build, test, and deployment processes. - Implement Infrastructure as Code using Terraform. - Automate configuration management using Ansible. Monitoring & Reliability - Implement monitoring, logging, and alerting mechanisms (CloudWatch, Azure Monitor, Prometheus, Grafana, etc.). - Troubleshoot production issues and ensure minimal downtime. - Improve system reliability and deployment velocity. Security & Compliance - Implement security best practices in infrastructure and pipelines. - Manage IAM roles, access controls, and secrets securely across AWS/Azure environments. - Conduct regular system audits and vulnerability checks. Collaboration - Work closely with development, QA, and product teams. - Support release planning and environment readiness. - Document processes, workflows, and infrastructure architecture. Preferred Qualifications - AWS and/or Azure Certifications (Associate or Professional level). - Experience with microservices architecture. - Experience with monitoring tools (Prometheus, Grafana, CloudWatch, Azure Monitor, etc.). - Knowledge of security best practices and DevSecOps concepts. Soft Skills - Strong analytical and troubleshooting skills. - Ability to work independently and within cross-functional teams. - Strong documentation and communication skills. - Proactive and ownership-driven mindset. - Ability to manage multiple environments and deadlines efficiently. Job Details Experience: 5 years Job Type: Fully Remote Location: 9 pm to 5 am About TekHQS TekHQS is a global technology and AI-driven solutions company delivering scalable SaaS, Cloud, AI/ML, Blockchain/Web3, DevOps, and enterprise software solutions to startups and enterprise clients worldwide. With a team of 300+ professionals across the USA, UK, UAE, Qatar, Pakistan, and India, we specialize in building high-performance digital products across Logistics, FinTech, Healthcare, and emerging technology sectors. At TekHQS, we foster a culture of innovation, ownership, and continuous growth — empowering our teams to build impactful technology that drives real business transformation.
Senior Engineer, Site Reliability
ZensarAt Zensar, we’re “experience-led everything”. We are committed to conceptualizing, designing, engineering, marketing, and managing digital solutions and experiences for over 130 leading enterprises. We are a company driven by a bold purpose: Together, we shape experiences for better futures. Whether for our clients, our people, or the world around us, this belief powers everything we do. At the heart of our culture is ONE with Client - a set of four core values that reflect who we are and how we work: One Zensar, Nurturing, Empowering, and Client Focus. Part of the $4.8 billion RPG Group, we’re a community of 10,000+ innovators across 30+ global locations, including Milpitas, Seattle, Princeton, Cape Town, London, Zurich, Singapore, and Mexico City. We believe the best work happens when individuality is celebrated, growth is encouraged, and well-being is prioritized. We are an equal employment opportunity (EEO) and affirmative action employer, committed to creating an inclusive workplace. All qualified applicants will be considered without regard to race, creed, color, ancestry, religion, sex, national origin, citizenship, age, sexual orientation, gender identity, disability, marital status, family medical leave status, or protected veteran status.
Role Description The Software Engineer / Site Reliability Engineer (SRE) will play a critical role in driving reliability, scalability, and performance for the Banking Solutions, Payments, and Capital Markets platforms. This role blends core SRE principles, performance engineering, and service health management to support large-scale, mission-critical systems. The ideal candidate will help modernize platforms through automation-first practices, data-driven reliability metrics, and proactive performance optimization, ensuring exceptional customer experience and business continuity in a highly regulated environment. What You Will Be Doing - Core SRE & Reliability Engineering - Design, implement, and operate highly available, resilient, and scalable systems aligned with SRE best practices. - Define and manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets to balance reliability and delivery velocity. - Build and maintain service health dashboards to provide real-time visibility into platform stability and customer experience. - Reduce toil through extensive automation of operational workflows, alerts, and remediation activities. - Monitoring, Observability & Service Health - Design and maintain end-to-end monitoring and observability solutions covering infrastructure, applications, APIs, and user journeys. - Implement advanced alerting strategies to reduce noise and improve mean time to detect (MTTD) and mean time to resolution (MTTR). - Leverage metrics, logs, and traces to drive root cause analysis and proactive incident prevention. - Enable reliability reporting for stakeholders using SLO compliance and service health metrics. - Performance Engineering & Testing - Lead performance engineering initiatives, including load testing, stress testing, endurance testing, and capacity validation. - Identify performance bottlenecks across application, middleware, database, and infrastructure layers. - Conduct capacity planning and performance tuning to support business growth and peak traffic scenarios. - Partner with development and QA teams to embed performance testing into CI/CD pipelines. - Incident Management & Operations - Lead and participate in incident response activities, including triage, mitigation, recovery, and post-incident reviews. - Drive blameless post-mortems and ensure corrective actions are tracked to completion. - Participate in on-call rotations, providing 24x7 support for critical production systems. - Continuously improve operational readiness and resilience. - Automation, CI/CD & Cloud Operations - Design and manage deployment pipelines, configuration management, and environment consistency across lower and production environments. - Implement Infrastructure as Code (IaC) practices for repeatable and secure cloud provisioning. - Collaborate with DevOps teams to improve deployment reliability, rollback mechanisms, and release safety. - Develop and test disaster recovery plans, backup strategies, and failover mechanisms. - Collaboration & Governance - Work closely with Development, QA, DevOps, Security, and Product teams to align on reliability and performance goals. - Ensure platforms meet security, compliance, and regulatory requirements common in financial services. - Act as a reliability and performance advocate throughout the SDLC. Qualifications - Strong experience in Core SRE practices, including reliability engineering, incident management, and automation. - Proven hands-on experience in Performance Engineering / Performance Testing for large-scale distributed systems. - Deep understanding and implementation experience with SLI / SLO / Error Budget frameworks. - Proficiency in cloud platforms (AWS, Azure, or Google Cloud). - Hands-on experience with containerization and orchestration (Docker, Kubernetes). - Strong background in monitoring, observability, and logging tools such as Prometheus, Grafana, Datadog, Splunk, ELK Stack. - Experience with CI/CD pipelines (Jenkins, GitLab CI/CD, Azure DevOps). - Proficiency in scripting and automation using Python, Bash, Terraform, Ansible. - Strong troubleshooting skills across application, infrastructure, and network layers. - Experience designing and running incident response and post-mortem reviews. - Ownership mindset with accountability for service reliability and customer outcomes. - Excellent communication, collaboration, and stakeholder management skills. Nice to Have (SRE+ Skills) - Experience with Keptn or similar tools for automated SLO-based quality gates and continuous delivery. - Programming experience in Java, especially for debugging, performance profiling, or building automation tools. - Familiarity with chaos engineering practices and tools. - Experience working in banking, payments, or capital markets domains. - Knowledge of security best practices and regulatory compliance in enterprise environment. Company Description At Zensar, we’re “experience-led everything”. We are committed to conceptualizing, designing, engineering, marketing, and managing digital solutions and experiences for over 130 leading enterprises. We are a company driven by a bold purpose: Together, we shape experiences for better futures. Whether for our clients, our people, or the world around us, this belief powers everything we do. At the heart of our culture is ONE with Client - a set of four core values that reflect who we are and how we work: One Zensar, Nurturing, Empowering, and Client Focus. Part of the $4.8 billion RPG Group, we’re a community of 10,000+ innovators across 30+ global locations, including Milpitas, Seattle, Princeton, Cape Town, London, Zurich, Singapore, and Mexico City. We believe the best work happens when individuality is celebrated, growth is encouraged, and well-being is prioritized. We are an equal employment opportunity (EEO) and affirmative action employer, committed to creating an inclusive workplace.
Site Reliability Engineer, SRE Team
SemrushYour competitors' favorite marketing platform used by 10,000,000 marketers
• Collaborate with development teams to design and implement scalable, reliable, and efficient system architectures • Establish and refine SLOs in partnership with stakeholders to guarantee service reliability and performance • Read and write code in Python/Go • Induce application failure and work to recover it from that state • Debug applications using metrics and add traces/metrics as needed • Participate in on-call duties to provide constant support • Lead the changes in common engineering practices in the Company • Possible night shifts (on-call)


