Backblaze logo
Backblaze

Backblaze is the cloud storage innovator delivering a modern alternative to traditional cloud providers.

Site Reliability Engineer I

DevOps EngineerDevOps EngineerFull TimeRemoteMid LevelTeam 201-500Since 2007H1B SponsorCompany SiteLinkedIn

Location

India

Posted

33 days ago

Salary

$66K - $88K / year

Seniority

Mid Level

Bachelor Degree2 yrs expEnglishAnsibleLinuxVault

Job Description

Site Reliability Engineer I

Backblaze

• Act as first point of contact for all customer affecting issues • Be a Key Driver for managing the resolution of technical problems • Ensure that incident management processes are following and that incident post-mortems are completed to capture process deviations and areas for improvement • Deliver consistent communication to Management • Respond to zabbix alerts/regular monitoring of zabbix, either by taking direct action on alerts or escalating. Acknowledge every alert if direct action taken, or with escalation point of contact. • Make sure escalations are handed off successfully. • Ensure health of pods across all sites (define pod alerts on zabbix). • Work through daily filesystem checks for pods. • Troubleshoot technical issues for DC Techs -> advanced pod questions, deployment questions, migration troubleshooting, and ansible playbook issues. • Identification and escalating any potential issues regarding the network. • Vault pre-deployment configuration and testing. • Start Vault Migrations, monitor migration pods, handle applicable migration pod health checks. • Document/Work on automating Daily Items. • Document/Provide Network IP's for upcoming deployments. • Monitor Releases/Updates to the Server Farm, escalate issues as they arise. • Engaging in on-call rotation shifts. • Assist fellow TechOps team members in handling tasks. • Making recommendations for improvements in organizational productivity. • Be able to work outside of normal business hours(weekend shift, holidays & evenings) as needed

Job Requirements

  • Must be located in Bangalore.
  • 2 - 4 years of relevant experience.
  • Knowledge of Sysadmin and Linux skills.
  • Desire to learn and develop all necessary technical skills.
  • Strong analytical thinking.
  • Strong skills in working with different teams and communication.
  • Knowledge of network cabling, network classification, and network topology.

Benefits

  • RSU grants for full-time employees
  • Annual Company bonus plan
  • Healthcare for family, including dental and vision
  • 401K
  • ESPP program
  • Flexible vacation policy
  • Maternity & paternity leave
  • MacBook Pro for work plus a generous stipend to personalize your workstation
  • Childcare bonus (human children only)
  • Fertility treatment and support
  • Learning & development program
  • Commuter benefits
  • A culture that supports a healthy work-life balance

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Flywire logo

Site Reliability Engineering Manager II

Flywire

Delivering the most important & complex payments.

DevOps Engineer33 days ago
Full TimeRemoteTeam 1,001-5,000Since 2011H1B Sponsor

Company Description Are you ready to trade your job for a journey? Become a FlyMate! Passion, excitement & global collaboration are all core to what it means to be a FlyMate. At Flywire, we’re on a mission to deliver the world’s most important and complex payments. We use our Flywire Advantage - the combination of our next-gen payments platform, proprietary payment network and vertical specific software, to help our clients get paid, and help their customers pay with ease - no matter where they are in the world. What more do we need to truly be unstoppable? Perhaps, that is you! Who we are: Flywire is a global payments enablement and software company, founded more than a decade ago to solve high-stakes, high-value payments in higher education. We’ve since scaled into new regions and industry verticals and expanded our product offerings to deliver meaningful value to our clients around the world. Today we support more than 4,800 clients across the global education, healthcare, travel & B2B industries, with diverse payment methods across 240 countries & territories and more than 140 currencies. With over 1,200 global FlyMates, representing more than 40 nationalities, and in 12 offices world-wide, we’re looking for FlyMates to join the next stage of our journey as we continue to grow. Job Description The Opportunity We, at Flywire, are looking for an experienced Manager II, Site Reliability Engineering to join our team. In this role, you’ll help drive reliability, automation and performance within our cloud-based infrastructure. At Flywire, the SRE team is responsible for the lifecycle of production systems. Our team is embedded within Software Engineering teams enabling and empowering them to achieve full speed on shipping reliable and operable systems. They also work at a global scale driving initiatives to achieve production excellence. - Coordinate and support daily activities for SREs on the team and partner with their managers to determine approach for managing daily tasks - Track success on the team based on established goals and objectives - Work on issues of limited scope with the ability to find and execute solutions to routine problems - Become embedded within an Engineering team helping them navigate production excellence and advocate for best practices - Mentor team members and drive initiatives - Drive a design for a feature while understanding system-wide and architectural concerns - Understand the basic day-to-day tasks traits of a production environment and participate in on-call support - Engage and collaborate with other disciplines within the design, deployment, operation and optimization of services - Debug production issues across services and levels of the stack as well as practice incident response and blameless postmortems - Identifies opportunities both in processes and tools to improve the overall productivity of the team - Identify great talent and excite them to join our team - Provide estimations, track progress and manage risk as well as team members' time - Participate in an on-call shift along with other disciplines to respond to incidents - Become involved in tech communities and add contributions to enhance them - Lean into our business domain and needs as well as our company vision, mission and strategy to deliver on our short and long term goals Qualifications Here's what we're looking for - 5 years of experience within the SRE space - 2-5 years of leading or managing and developing SRE teams - Comfortable with the idea of being or becoming a generalizing specialist as we are aiming to build a multidisciplinary and balanced team based on "t-shaped" individuals. - Experience with at least one programming language is required as software engineering is an important part of our work and we actively use and support many different platforms and languages - Proficient with testing techniques such TDD or BDD will be highly valued - Familiarity with the container ecosystem, cloud infrastructure, build systems and CI/CD tools is key for being successful at this role - Comfortable taking ownership of complex systems challenges and help uncover opportunities for improvement - Strong communication and collaboration skills, and most importantly, empathy as we enable, empower and encourage our fellow colleagues Some Technologies We Use: - Ruby, Java, Kotlin, Go, Node, Python - AWS: EC2, ECS, Lambda, Cloudwatch, SQS, RDS, Kinesis, S3, ElasticSearch, DocumentDB - Linux, Docker, Terraform, Make, Chef - Gitlab, Jenkins - Sentry, Sumologic, Honeycomb Our Culture: - We are a global company. Our engineering team is distributed across 3 continents and 4 different countries so remote work is allowed! - Our engineering practice is shaped around concepts including Agile, Lean, and Extreme Programming. Each team has a high level of autonomy to organize themselves in the way they consider more appropriate to execute their mission. - We actively engage in knowledge sharing by hosting internal cross-discipline events. - We are active in contributing to open source whenever possible. - We contribute to our local communities by hosting different events, Meetups, etc Additional Information What We Offer: - Competitive compensation - Employee Stock Purchase Plan (ESPP) - Flying Start - Our immersive Global Induction Program (Meet our Execs & Global Teams) - Work with brilliant people that will keep you on your toes, learn more about their journeys by checking out #InsideFlywire on social media - Dynamic & Global Team (we have been collaborating virtually for years!) - Wellbeing Programs (Mental Health, Wellness, Yoga/Pilates/HIIT Classes) with Global FlyMates - Competitive time off including FlyBetter Days to volunteer in your community and Digital Disconnect Days! - Great Talent & Development Programs (Managers Taking Flight – for new or aspiring managers!) Submit today and get started! We are excited to get to know you! Throughout our process you can expect to meet different FlyMates including the Hiring Manager and other Flymates. Your Talent Acquisition Partner will walk you through the steps and be your “go-to” person for questions. Flywire is an equal opportunity employer and follows a policy of administering all employment decisions and personnel actions without regard to race, color, religion, sex, pregnancy, gender identity, national origin, age, ancestry, physical or mental disability, sexual orientation, genetic disposition or carrier status, veteran status, or any other category protected under applicable national, federal, state or local law. The US base salary range for this full-time position is $160,000 - $200,000 and benefits. Our salary ranges are determined by role, position level, and location. The range displayed on this job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and several other factors, including job-related skills, experience, relevant education and training. #LI-Remote

Illinois
$160K - $200K / year
Job Closed
Glückliche Gäste GmbH logo

Senior DevOps Engineer, Kubernetes, Linux

Glückliche Gäste GmbH

Glückliche Gäste, mehr braucht es nicht um uns vorzustellen. Ein lächelnder Gast ist ein Glücklicher Gast.

DevOps Engineer33 days ago
Full TimeRemoteTeam 11-50Since 2017H1B No Sponsor

• Operation, further development and securing of multiple Kubernetes clusters running on virtual machines • Responsibility for Linux-based systems: physical servers, VMs and NAS systems • Building and maintaining a stable, maintainable and auditable platform landscape • Automation of operational, deployment and maintenance processes (Bash is required; additional tools welcome) • Establishing security-by-design: system hardening • Access control concepts • Secrets management • Logging, monitoring and traceability • Technical implementation of requirements from security & compliance: translating standards into concrete technical controls • Building audit-ready structures, documentation and evidence • Incident handling in the infrastructure context (2nd level), including analysis and sustainable remediation • Continuous improvement of the stability, security and transparency of our systems

Germany
High Tech Genesis logo

DevOps Engineer

High Tech Genesis

Product Engineering Services for the High-Tech Sector

DevOps Engineer33 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

• Build and optimize CI/CD pipelines • Automate deployments and infrastructure (IaC: Terraform/CloudFormation) • Manage cloud environments (AWS/Azure/GCP) • Work with Docker & Kubernetes for containerized workloads • Implement monitoring, logging, and alerting • Improve system reliability and support production environments • Apply DevSecOps practices and manage access/security controls

Mexico
Zensar logo

Cloud DevOps - Build & Engineering - AWS

Zensar

At Zensar, we’re “experience-led everything”. We are committed to conceptualizing, designing, engineering, marketing, and managing digital solutions and experiences for over 130 leading enterprises. We are a company driven by a bold purpose: Together, we shape experiences for better futures. Whether for our clients, our people, or the world around us, this belief powers everything we do. At the heart of our culture is ONE with Client - a set of four core values that reflect who we are and how we work: One Zensar, Nurturing, Empowering, and Client Focus. Part of the $4.8 billion RPG Group, we’re a community of 10,000+ innovators across 30+ global locations, including Milpitas, Seattle, Princeton, Cape Town, London, Zurich, Singapore, and Mexico City. We believe the best work happens when individuality is celebrated, growth is encouraged, and well-being is prioritized. We are an equal employment opportunity (EEO) and affirmative action employer, committed to creating an inclusive workplace. All qualified applicants will be considered without regard to race, creed, color, ancestry, religion, sex, national origin, citizenship, age, sexual orientation, gender identity, disability, marital status, family medical leave status, or protected veteran status.

DevOps Engineer34 days ago
Full TimeRemoteTeam 10,001

Role Description Zensar Technologies is looking for a Cloud Support Engineer to join our Citrix managed services team. The L1 engineer is the first line of defence for Citrix DaaS (Cloud) and CVAD environments, performing eyes-on-glass monitoring, runbook-driven incident response, and proactive health checks around the clock. Qualifications - 5+ years in Cloud Operations (SRE); good to have 2+ years hands-on with Citrix environments - Strong Azure fundamentals - VDA hosting on Azure, Azure Monitor alerts, basic VM/networking operations - Good scripting in PowerShell or Bash for log parsing, alert triage, and simple automation - Strong runbook discipline - ability to execute SOPs accurately without deviation - Good understanding of latency and quality metrics - Good understanding of ITSM tools Knowledge and PagerDuty alerting - App configuration and common auth-failure troubleshooting - Good verbal and written English communication for shift handover and stakeholder updates Requirements - Good to have: Citrix DaaS (Cloud) or CVAD (on-prem/hybrid) administration - VDA lifecycle, session management - Good to have: Citrix Director and Monitor dashboards - proactive fault identification and triage - Good to have: Understanding of HDX/ICA protocol - Exposure to any ITSM-adjacent tooling: PagerDuty, Ansible playbooks, or CI/CD pipelines - Experience in a NOC or managed services delivery environment Key Responsibilities - Perform 24x7 eyes-on-glass monitoring of Citrix Director, Monitor, and Azure-based alerts - Monitor Cloud Connector availability; restart Connector services and validate tunnel health - Triage HDX/ICA latency, packet loss, and frame-rate alerts; escalate with diagnostic data - Execute runbook-based resolutions for session launch failures, logon storms, and black screens - Classify and manage incidents per P1-P4 SLA targets using Wolken ITSM and PagerDuty - Perform StoreFront authentication failure triage and store enumeration checks - Escalate unresolved or P0/MIM-level incidents to L2/L3 (CSG/Citrix) with a complete handover note - Monitor MCS provisioning status and machine power states in Citrix Cloud - Verify Delivery Group availability and report capacity anomalies to the on-call L2 team - Execute SOPs for VDA reboots, certificate renewals, and patch verification tasks - Produce daily health reports and contribute to weekly SLA dashboards - Ensure complete shift handover documentation with open incidents, actions taken, and pending items

Worldwide