Good Problems. Unlocking value from business challenges
Senior Consultant – Site Reliability Engineering
Location
China
Posted
38 days ago
Salary
0
Seniority
Senior
Job Description
Senior Consultant – Site Reliability Engineering
Fabric Group
• Consultative Ownership: Work with autonomy to own problems and deliver solutions, acting as a bridge between development and operations. • Observability Architecture: Design and implement robust monitoring solutions using the LGTM stack to ensure system health and performance. • Reliability Strategy: Advise clients on defining meaningful SLOs/SLIs and managing error budgets to balance innovation with stability. • AI Assistance: Drive use of AI Agents or AI tools for intelligent automation and improving operational efficiency. • Incident Leadership: Lead post-incident reviews (Blameless Post-Mortems) to identify systemic improvements and reduce future toil. • Mentorship: Coach less experienced engineers within Fabric and our client teams on SRE principles and modern infrastructure patterns. • Advising our clients on the right technical decisions and advocating for the right practices to use. • Being an ambassador for Fabric, promoting our values and the practices we use to make sure we build the software right. • Participate in interviewing and recruitment based on business needs. • Thought Leadership: Contribute to the SRE community through blog posts, meetups, or internal knowledge sharing. • Operational Support & Availability: Rotational Support Coverage: Participate in a sustainable team rotation to provide extended service coverage (including weekends) for business-critical systems. • Incident Response: Act as a primary responder for high-priority (P1/P2) incidents during your rostered shift, focusing on rapid restoration and clear stakeholder communication.
Job Requirements
- Strong expertise in Observability: Deep comfort with Grafana, including the LGTM stack (Loki, Grafana, Tempo, Mimir) or Grafana Cloud, OpenTelemetry.
- Container Orchestration: Solid experience with Kubernetes management, configuration, and troubleshooting in production.
- Good understanding of AI Agent frameworks and tools like Grafana AI Assistant.
- Cloud Proficiency: Hands-on experience with GCP or AWS, including networking, security, and cloud-native services.
- Modern Deployment: Proven experience implementing GitOps (ArgoCD) and CI/CD pipelines (GitLab CI, GitHub Actions, etc.).
- Infrastructure as Code (IaC): Experience with tools like Terraform.
- Automation & Scripting: Proficiency in at least one language (e.g., Python, Go, or Bash) for building tooling and automating operational tasks.
- Incident Management: Experience with on-call rotation tools (Grafana on-call, Opsgenie) and a strong commitment to a blameless culture.
Benefits
- A variety of business domains to dive into including retail, finance, construction and logistics
- Creating innovative custom products to solve complex problems that existing solutions just can’t
- Collaborating with a team of top notch professionals who are obsessed with value, latest tech and the right way to build a digital product
- Ability to switch projects every 6-12 months to keep you challenged, excited and growing
- Strong support network from the delivery community of practice, leadership and our tech teams to help you address any client challenges you may face
- Very diverse and inclusive environment where people value feedback, connections and collaboration in a workspace
- Enjoy the freedom of a fully remote lifestyle, where you can ditch the commute and deliver high-impact work from the comfort of your own home.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevOps (Cloud Engineering)
APACXebia is a trusted advisor in the modern era of digital transformation, serving hundreds of leading brands worldwide with end-to-end IT solutions. The company has experts specializing in: Technology consulting Software engineering AI Digital products and platforms Data Cloud Intelligent automation Agile transformation Industry digitization In addition to providing high-quality digital consulting and state-of-the-art software development, Xebia has a host of standardized solutions that substantially reduce the time-to-market for businesses. The company has a strong presence across 16 countries with development centres across the US, Latin America, Western Europe, Poland, the Nordics, the Middle East, and Asia Pacific.
Role Description We are looking to bring on a hands-on Cloud DB Platform Automation Engineer (CWR) to support our Database Platforms team at BCG. The preferred location is India, with willingness to work during UK business hours. This role is focused on Terraform-based automation and standardization of database platform offerings across AWS, Azure, and GCP, with emphasis on: - Self-service (Terraform Cloud, GitHub Actions) - Consistency - Reducing manual effort We are specifically looking for someone who is: - Strong in Terraform (module development, not just usage) - Comfortable with CI/CD and automation workflows - Able to work independently and deliver from defined requirements - Easy to collaborate with and provides clear status updates Company Description Xebia is a trusted advisor in the modern era of digital transformation, serving hundreds of leading brands worldwide with end-to-end IT solutions. The company has experts specializing in: - Technology consulting - Software engineering - AI - Digital products and platforms - Data - Cloud - Intelligent automation - Agile transformation - Industry digitization In addition to providing high-quality digital consulting and state-of-the-art software development, Xebia has a host of standardized solutions that substantially reduce the time-to-market for businesses. The company has a strong presence across 16 countries with development centres across the US, Latin America, Western Europe, Poland, the Nordics, the Middle East, and Asia Pacific.
Backend Ops Engineer Role
Weekday (YC W21)We are a Y-Combinator-backed startup building your AI-powered Recruiter Agent
Role Description We are looking for a DevOps / Site Reliability Engineer to take full ownership of infrastructure and platform operations in a fast-scaling, AI-first environment. This role is central to building a secure, scalable, and cost-efficient cloud-native ecosystem while enabling fast and reliable deployments. You will focus on automating infrastructure, improving system reliability, and embedding intelligent, AI-driven operations into DevOps workflows. As a key contributor, you will work closely with backend and product teams to ensure seamless performance, reduce operational risks, and support rapid growth. This is a high-impact role with the opportunity to shape platform architecture, drive efficiency, and contribute to next-generation engineering practices. Key Responsibilities - Design, implement, and manage scalable cloud infrastructure using AWS services such as ECS/Fargate, RDS, S3, and IAM - Build and maintain infrastructure as code using Terraform for consistent and automated deployments - Develop and manage CI/CD pipelines using GitHub Actions to ensure fast and reliable releases - Implement observability and monitoring systems using tools like Prometheus, Grafana, OpenTelemetry, and Sentry - Manage containerized environments using Docker and optimize performance under high-load conditions - Drive cost optimization initiatives across cloud infrastructure - Integrate AI-driven solutions into DevOps workflows, such as automated log analysis and predictive scaling - Collaborate with engineering teams to improve system performance, scalability, and reliability - Ensure infrastructure security, compliance readiness, and best practices - Continuously improve deployment pipelines, reduce downtime, and enhance system resilience Qualifications - 2–3+ years of experience in DevOps, SRE, or backend infrastructure roles - Strong hands-on experience with AWS infrastructure and cloud-native architectures - Expertise in Terraform and infrastructure as code practices - Proven experience with CI/CD pipelines and containerization tools like Docker - Strong understanding of observability, monitoring, and incident management - Experience troubleshooting and optimizing systems under production load - Exposure or strong interest in integrating AI/LLMs into DevOps workflows - Knowledge of security, compliance standards (SOC 2, GDPR), and infrastructure best practices - Familiarity with multi-cloud environments (GCP, Azure) is a plus - Strong ownership mindset, problem-solving ability, and clear communication skills Requirements - Min Experience: 3 years - Location: Remote (India) - Job Type: full-time - Salary range: Rs 2000000 - Rs 3500000 (ie INR 20-35 LPA)
Senior DBA Engineer
TabbyOn a mission to create financial freedom. No interest. No fees. Shariah-Compliant.
Role Description We are seeking a skilled IT professional to join our team in Saudi Arabia. The role involves a variety of responsibilities, including: - Design, construct, install, and maintain large relational databases. - Maintain the integrity and security of the database, including backups and recovery procedures. - Implement and manage disaster recovery and failover systems. - Monitor database performance, implement changes, and apply new patches and versions when required. - Optimize queries for performance. - Collaborate with development teams to optimize database usage. - Set up and maintain database replication, clustering, mirroring, and other high availability strategies. - Use and understand tools like pgbouncer and modern monitoring systems. - Stay updated with the latest database technologies and best practices. Qualifications - Experience with PostgreSQL is mandatory. - Proficiency in PostgreSQL setup, replication, upgrade, monitoring, and performance tuning. - Experience with Clickhouse is a plus. - Can read and write complex and very complex queries. - Experience with backup and recovery procedures, as well as PITR. - Strong knowledge of database design, documentation, and coding. - Familiarity with database management tools and performance tuning techniques. - Strong problem-solving and communication skills. - Familiarity with programming/scripting languages like bash, Python, Go, etc. - Experience with DbaaS on cloud platforms such as GCP or AWS (would be a plus). Requirements - Certification in database management or equivalent training (would be a plus). - Experience in migrating large databases between cloud platforms. - Knowledge of the latest trends in database administration. - Familiarity with modern DevOps practices such as Kubernetes, Terraform, Helm. - Experience in real-time data streaming technologies such as Debezium/Flink. Benefits - Work alongside a high-performing, international engineering team in a global fintech unicorn. - Stock options (ESOP) in a fast-scaling, pre-IPO company. - Health Insurance. - Competitive salary and other bonuses.
DevOps/Platform Engineer
Deutsche Telekom IT SolutionsAs Hungary’s most attractive employer in 2025 (according to Randstad’s representative survey), Deutsche Telekom IT Solutions is a subsidiary of the Deutsche Telekom Group. The company provides a wide portfolio of IT and telecommunications services with more than 5300 employees. We have hundreds of large customers, corporations in Germany and in other European countries. DT-ITS received the Best in Educational Cooperation award from HIPA in 2019, acknowledged as the Most Ethical Multinational Company in 2019. The company continuously develops its four sites in Budapest, Debrecen, Pécs and Szeged and is looking for skilled IT professionals to join its team.
Role Description A Deutsche Telekom MMS GmbH 2019 óta ügyfele és üzleti partnere vállalatunknak és alapvetően a DTAG csoport meghatározó innovációs és digitalizációs szolgáltatója. Ezen dinamikusan fejlődő üzletág magyarországi csapatába keresünk ambiciózus, angolul és németül magabiztosan beszélő, tapasztalattal rendelkező szakembert. Projektünk célja a kiemelkedő színvonalú digitális vásárlói élmény biztosítása, illetve a modern infrastruktúrák kialakítása, az ehhez szükséges szakmai tudás biztosításával. Amennyiben szereted a kihívásokkal teli, változatos munkakört, hétköznapjaid az agilis munkavégzés jegyeiben töltöd és szívesen használod a nyelvtudásod a mindennapi kommunikáció során, Te vagy a mi emberünk! Tapasztalt DevOps / Platform Engineer szakembert keresünk, aki kulcsszerepet vállal egy modern AI platform felépítésében és fenntartható működtetésében. Ebben a pozícióban hands-on módon dolgozol Kubernetes és Azure környezetben, és GitOps alapú megközelítéssel biztosítod a platform stabilitását és skálázhatóságát. Qualifications - Több éves tapasztalat DevOps vagy Platform Engineering területen - Magas szintű Kubernetes ismeretek - Gyakorlati tapasztalat Azure környezetben - GitOps alapú működés és deployment modellek ismerete - Hands-on szemlélet és önálló munkavégzés - Angol legalább B2 vagy Német legalább B2 Requirements - AI / gépi tanulás alapok ismerete (előnyt jelent) Benefits - Részvétel egy modern AI platform felépítésében - Legújabb technológiák használata (Kubernetes, Azure, GitOps) - Platform engineering fókuszú, innovatív környezet - Valódi szakmai kihívások és felelősség - Lehetőség hosszú távú fejlődésre cloud és AI területen


