Job Closed

This listing is no longer active.

Thoughtworks is a dynamic and inclusive community of bright and supportive colleagues who are revolutionizing tech. As a leading technology consultancy, we’re pushing boundaries through our purposeful and impactful work. Over 30 years of delivering extraordinary impact with clients. Helping clients solve complex business problems with technology as the differentiator.

Senior Service Reliability Engineer

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 10,001

Location

Singapore

Posted

54 days ago

Salary

Seniority

Senior

No structured requirement data.

Job Description

Role Description As a Senior Service Reliability Engineer (SRE) you will take a multifaceted approach to ensure technical excellence and operational efficiency within the infrastructure domain. Specializing in reliability, resilience and system performance, you take a lead role in championing the principles of Site Reliability Engineering. By strategically integrating automation, monitoring and incident response, you facilitate the evolution from traditional operations to a more customer-focused and agile approach. Emphasizing shared responsibility and a commitment to continuous improvement, you cultivate a collaborative culture, enabling organizations to meet and exceed their reliability and business objectives. - You will improve site reliability by building mechanisms/architectures that enable fault tolerance and faster median time to respond and median time to detect. - You will drive the integration of observability automation into the CI/CD pipeline. - You will handle production incidents, manage incident communication with clients and draft root cause analysis documents. - You will monitor performance of production systems and improve their scaling to ensure business goals are met within expected SLA and SLO metrics. - You will work closely with application development teams as advisors on improving system reliability and assisting in implementation for reliability improvements. - You will improve system observability across multiple facets such as logging and metrics, reducing false alarms to eliminate unnecessary toil and improving process efficiency. - You will implement chaos engineering practices as necessary to test system reliability, setting up processes for such testing to be done regularly. - You have a clear understanding of client goals and business needs and setting direction for site reliability in line with the same, e.g.: Achieving application availability with minimum/no disruption (99.999%) if necessary for business. Qualifications - You have hands-on experience in programming and scripting languages such as Python, Go or Bash. - You have a good understanding of at least one Public Cloud, e.g.: AWS, Azure or GCP. - You have had exposure to observability tools such as Grafana, Datadog, NewRelic, ELK Stack, Dynatrace or equivalent and you are proficient in using data from these tools to dissect and identify root causes of system and infrastructure issues. - You are familiar with DevOps and GitOps practices. - You have a good knowledge of container-based architecture and orchestration tools such as Kubernetes, AWS EKS, Docker Swarm, Nomad, etc. - You understand technical architecture and modern design patterns, including microservices, serverless functions, NoSQL and RESTful APIs, with experience in fixing bugs, analyzing logs, building metrics and operational dashboards. - You are familiar with creating infrastructure resources for improving reliability of system that follows Cloud’s Well Architected Framework principles: Reliability, security, cost optimization, performance efficiency and operational. Requirements - You have strong communication and articulation skills, and are proficient in English. - You have good people skills with an emphasis on negotiation and close collaboration with multiple cross-functional teams from the client side and/or Thoughtworks. - You solve challenging problems and difficult to debug issues with a never give up attitude. - You have the ability to work under pressure and with composure during production incidents. - You can confidently recommend improvements backed by strong technical arguments to client stakeholders or application development teams. - You are able to understand requirements provided by the client on both technical and business aspects and break them down for successful implementation. - You have a strong drive and ownership mentality, with a willingness to sign up for and deliver work when called upon, without being too concerned about role boundaries. - You’re willing to be part of a rotation- and need-based 24x7 available team. Benefits - There is no one-size-fits-all career path at Thoughtworks: however you want to develop your career is entirely up to you. - Your career is supported by interactive tools, numerous development programs and teammates who want to help you grow. - We see value in helping each other be our best and that extends to empowering our employees in their career journeys.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

Senior DevOps Engineer

iubenda

a team.blue brand

DevOps Engineer54 days ago

Full Time RemoteTeam 51-200Since 2011H1B No Sponsor

Company Site LinkedIn

• In this role, reporting to the Head of DevOps, you will be mainly responsible for CI/CD, infrastructure settings and security, and maintenance of systems, virtual machines, Kubernetes clusters, and cloud applications.

Ansible Cloud DNS Docker Kubernetes Linux MongoDB MySQL NGINX NoSQL PostgreSQL Redis TCP/IP Terraform

View details: Senior DevOps Engineer

Spain

Apply

Job Closed

Senior Site Reliability Engineer, Compute Platform Services

Akamai Technologies

DevOps Engineer55 days ago

Full Time RemoteTeam 5,001-10,000H1B Sponsor

Company Site LinkedIn

• Collaborating with our support, operations and engineering teams, investigate and troubleshoot complex problems. • Developing processes, plans, and infrastructure to deploy new software components and updates safely and efficiently at scale. • Participating in on-call rotations, guiding restoration and repair of service-impacting issues. • Improving our system monitoring and analysis platform to speed error detection and remediation, enhancing performance and reliability.

Ansible AWS Azure Cloud Distributed Systems Google Cloud Platform Grafana Prometheus Python SaltStack Splunk Terraform Go

View details: Senior Site Reliability Engineer, Compute Platform Services

India

Apply

Azure DevOps Engineer

Uvation

DevOps Engineer55 days ago

Part Time RemoteTeam 11-50H1B No Sponsor

Company Site LinkedIn

Role Description We are seeking a skilled and experienced Azure DevOps Engineer with a strong background in Linux administration to join our dynamic team. The ideal candidate will be responsible for managing and optimizing our Azure cloud infrastructure, ensuring seamless CI/CD pipelines, and maintaining the security and efficiency of our Linux-based systems. Senior Level - L3 Required Azure Cloud Experience Key Responsibilities - Azure Cloud Management: - Design, implement, and manage Azure infrastructure using Infrastructure as Code (IaC) tools like Terraform, ARM templates, or Bicep. - Automate cloud deployments and manage resources using Azure DevOps pipelines, scripts, and other automation tools. - Monitor and optimize the performance, scalability, and cost of Azure services. - CI/CD Pipeline Development: - Create, maintain, and enhance CI/CD pipelines for applications and services hosted on Azure. - Collaborate with development teams to integrate code repositories, build processes, and deployment automation. - Ensure proper version control and release management practices are followed. - Linux Administration: - Manage and maintain Linux servers, ensuring high availability, security, and performance. - Perform regular system updates, patches, and configuration management using tools like Ansible, Puppet, or Chef. - Troubleshoot and resolve system-related issues, including networking, security, and application performance. - Security and Compliance: - Implement and enforce security best practices for both Azure cloud services and Linux servers. - Conduct regular security assessments and audits, applying patches and updates as necessary. - Ensure compliance with relevant regulations and industry standards. - Collaboration and Support: - Work closely with development, QA, and operations teams to streamline processes and improve efficiency. - Provide support for production systems, including on-call rotations as needed. - Document processes, configurations, and best practices for both Azure and Linux environments. Qualifications - Proven experience with Azure cloud services, including VM management, networking, storage, and security. - Strong proficiency in Linux system administration, including experience with Red Hat, Ubuntu, or CentOS. - Hands-on experience with CI/CD tools like Jenkins, Azure DevOps, GitLab, or similar. - Proficient in scripting languages such as Bash, PowerShell, or Python. - Familiarity with containerization technologies like Docker and orchestration tools like Kubernetes. Requirements - Strong analytical and troubleshooting skills, with the ability to diagnose and resolve complex technical issues. - Experience in performance tuning, monitoring, and capacity planning for cloud and Linux environments. Soft Skills - Excellent communication and collaboration skills, with the ability to work effectively in a team-oriented environment. - Strong documentation skills and attention to detail. - Ability to work independently and manage multiple priorities in a fast-paced environment. Preferred Qualifications - Certification in Azure (e.g., Azure Administrator, Azure DevOps Engineer). - Experience with cloud-native tools and services like Azure Kubernetes Service (AKS), Azure Functions, or Logic Apps. - Experience with network security and VPN configuration in cloud environments. - Familiarity with database management in cloud environments (SQL, NoSQL).

View details: Azure DevOps Engineer

India

Apply

Senior Site Reliability Engineer, Spend

Airwallex

Empowering businesses to grow beyond borders

DevOps Engineer55 days ago

Full Time RemoteTeam 1,001-5,000Since 2015H1B Sponsor

Company Site LinkedIn

About Airwallex Airwallex is the only unified payments and financial platform for global businesses. Powered by our unique combination of proprietary infrastructure and software, we empower over 200,000 businesses worldwide - including Brex, Rippling, Navan, Qantas, SHEIN and many more - with fully integrated solutions to manage everything from business accounts, payments, spend management and treasury, to embedded finance at a global scale. Proudly founded in Melbourne, we have a team of over 2,000 of the brightest and most innovative people in tech across 26 offices around the globe. Valued at US$8 billion and backed by world-leading investors including T. Rowe Price, Visa, Mastercard, Robinhood Ventures, Sequoia, Salesforce Ventures, DST Global, and Lone Pine Capital, Airwallex is leading the charge in building the global payments and financial platform of the future. If you're ready to do the most ambitious work of your career, join us. Attributes We Value We hire successful builders with founder-like energy who want real impact, accelerated learning, and true ownership. You bring strong role-related expertise and sharp thinking, and you're motivated by our mission and operating principles. You move fast with good judgment, dig deep with curiosity, and make decisions from first principles, balancing speed and rigor. You're humble and collaborative; turn zero-to-one ideas into real products, and you "get stuff done" end-to-end. You use AI to work smarter and solve problems faster. Here, you'll tackle complex, high-visibility problems with exceptional teammates and grow your career as we build the future of global banking. If that sounds like you, let's build what's next. About the team The Engineering team at Airwallex is a diverse group of innovators, builders, and problem solvers, driven by a mission to empower businesses to operate anywhere, anytime. We thrive in a collaborative and fast-paced environment, where we're constantly pushing the boundaries of what's possible in the financial technology space. As a team, we value technical craftsmanship, continuous learning, and a strong sense of ownership, working together to build scalable, reliable, and secure products that empower businesses of all sizes to grow without borders. Our SRE team is breaking new engineering ground and we have the opportunity to define innovative solutions for a number of challenges, paving the way for other teams to follow in our footsteps. This team is responsible for the availability, performance, monitoring and capacity planning of our Global services. What you'll do As a Senior Site Reliability Engineer, you'll work closely with product teams in Spend to deliver and maintain scalable, reliable cloud infrastructure in support of key product initiatives. Aligned to the roadmap, you'll lead on infrastructure design and delivery for complex, high-risk projects such as launching new services, executing global data centre migrations, and modernising data pipelines. Responsibilities: - Architect and implement cloud infrastructure for new services and roadmap initiatives. - Embed with development teams to drive reliability, performance, and operational readiness. - Lead incident response, observability, and automation across critical systems. - Own team-level SLOs, runbooks, and DevOps performance metrics. - Collaborate with central DevOps and security teams to ensure compliance and resilience. Who you are We're looking for people who meet the minimum qualifications for this role. The preferred qualifications are great to have, but are not mandatory. Minimum qualifications: - 6+ years in an SRE, DevOps, or infrastructure-focused engineering role. - Bachelor degree in Computer Science, Software Engineering, or a related field. - Expertise in cloud platforms (AWS/GCP), Kubernetes, observability, and incident response. - Able to lead SRE strategy for large-scale, cross-functional projects. - Strong experience supporting production systems with high availability and compliance requirements. - Proven ability to work closely with developers and guide reliability best practices. Preferred qualifications: - Experience in a fintech or similarly regulated industry. - Familiarity with data streaming, analytics pipelines, or financial data systems. Applicant Safety Policy: Fraud and Third-Party Recruiters To protect you from recruitment scams, please be aware that Airwallex will not ask for bank details, sensitive ID numbers (i.e. passport), or any form of payment during the application or interview process. All official communication will come from an @airwallex.com email address. Please apply only through careers.airwallex.com or our official LinkedIn page. Airwallex does not accept unsolicited resumes from search firms/recruiters. Airwallex will not pay any fees to search firms/recruiters if a candidate is submitted by a search firm/recruiter unless an agreement has been entered into with respect to specific open position(s). Search firms/recruiters submitting resumes to Airwallex on an unsolicited basis shall be deemed to accept this condition, regardless of any other provision to the contrary. Equal opportunity Airwallex is proud to be an equal opportunity employer. We value diversity and anyone seeking employment at Airwallex is considered based on merit, qualifications, competence and talent. We don't regard color, religion, race, national origin, sexual orientation, ancestry, citizenship, sex, marital or family status, disability, gender, or any other legally protected status when making our hiring decisions. If you have a disability or special need that requires accommodation, please let us know. #BI-Hybrid

View details: Senior Site Reliability Engineer, Spend

California

Apply

Job Closed

Senior Service Reliability Engineer

Job Description

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior DevOps Engineer

Senior Site Reliability Engineer, Compute Platform Services

Azure DevOps Engineer

Senior Site Reliability Engineer, Spend