Job Closed

This listing is no longer active.

Oowlish

We make innovation simple, convenient and right...we just make it HAPPEN

DevOps – Site Reliability Engineer

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 51-200Since 2017H1B No SponsorCompany Site LinkedIn

Location

Brazil

Posted

83 days ago

Salary

Seniority

Senior

3 yrs expEnglishAWS Azure Docker GCP Grafana Jenkins Kubernetes Prometheus

Job Description

• Join a growing AI-focused SaaS startup as a DevOps & Site Reliability Engineer • Responsible for maintaining, optimizing, and scaling infrastructure supporting the platform • Work closely with development and product teams to improve deployment processes • Monitor systems and respond proactively to incidents

Job Requirements

3+ years of experience in a DevOps, Site Reliability Engineering (SRE), or related role
Strong hands-on experience with the deployment of web, mobile, and API applications
Expertise in monitoring and observability tools (e.g., NewRelic, Datadog, Prometheus/Grafana)
Strong experience with CI/CD pipelines and associated tools (Azure Pipelines, Jenkins, CircleCI)
Proficiency with Docker, Kubernetes, and Helm
Experience working with cloud platforms like Azure, AWS, or GCP
Scripting proficiency in Bash
Familiarity with incident response and disaster recovery planning

Benefits

Home office;
Competitive compensation based on experience;
Career plans to allow for extensive growth in the company;
International Projects;
Oowlish English Program (Technical and Conversational);
Oowlish Fitness with Total Pass;
Games and Competitions;

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

SRE Analyst – Mid-level

Vivo (Telefônica Brasil)

Com a conexão, queremos que você descubra novos pontos de vista e aproveite tudo o que realmente importa.

DevOps Engineer83 days ago

Full Time RemoteTeam 10,001+Since 1998H1B No Sponsor

Company Site LinkedIn

• Perform troubleshooting and functional analysis of incidents in non-production environments; • Provide support for applications in testing environments; • Implement and manage monitoring tools to ensure visibility into system performance and proactively detect issues; • Lead incident response, conducting post-incident (postmortem) analyses to identify root causes and prevent recurrence; • Develop scripts and tools to automate repetitive tasks, improving operational efficiency and reducing human error; • Analyze system capacity and plan scalability to meet demand, ensuring services remain available and responsive; • Collaborate with development teams to implement changes safely and efficiently, minimizing impact on the staging environment; • Work closely with security teams to ensure security practices are integrated into the testing lifecycle; • Create and maintain technical documentation and operational runbooks, and train teams on best practices and tools; • Work together with QA analysts to continuously improve system reliability and efficiency.

Apache HTTP Server Cassandra Linux MongoDB OpenShift Oracle Database PostgreSQL Python

View details: SRE Analyst – Mid-level

Brazil

Apply

Job Closed

Sr. Site Reliability Engineer

Backblaze External Website

At Backblaze, we value being fair and good to our customers, partners, and employees. That’s why diversity, equity, and inclusion are at the core of our values. We are committed to fostering a workforce where all employees feel a sense of belonging regardless of race, ethnicity, nationality, gender, sexual orientation, age, religion, socio-economic status, ability, veteran status, and education. We believe that our dedication to cultivating a diverse workspace not only allows us to better serve our customers in over 175 countries but further reinforces our commitment to doing the right thing. We are proud to be an Equal Opportunity Employer.

DevOps Engineer83 days ago

Full Time RemoteTeam 201-500

About Backblaze Backblaze is the object storage leader in the open cloud movement, fueling customer success with cloud storage built purposefully to unlock budgets, unburden administrators, and unleash innovators. Together with our partners, we’re helping customers break free from the restrictive, overpriced legacy solutions that hold them back, and blaze forward with the full power of the open cloud in their hands. Founded in 2007, we scaled the business with less than $3 million in outside funding until 2021, when we did a traditional IPO on the Nasdaq stock exchange. Today, Backblaze generates over $100m in revenue and is the leading specialized storage cloud - managing over three billion gigabytes of data storage for 500K+ customers in 175+ countries, including businesses, developers, IT professionals, and individuals. But while there is a lot to celebrate in our past, there is almost as much opportunity ahead of us. We’re seeking a Sr. Site Reliability Engineer to join our team! About the Role: We are seeking a Senior Site Reliability Engineer (SRE) to help ensure the stability, scalability, and reliability of our services and infrastructure. This role focuses on building automation, maintaining observability, and supporting incident response to keep customer-facing systems performing at their best. The SRE will collaborate with engineering, product, and operations teams to embed reliability practices into day-to-day development and operations while contributing to tools and processes that improve efficiency and reduce manual effort. What You'll Do: - Service Reliability & Operations - Own and drive the availability, durability, and performance of critical services across all production environments. - Lead and champion complex projects from problem discovery through complete, cross-functional resolution, demonstrating high-level technical ownership. - Define, establish, and enforce service health standards, including working with engineering leadership to implement SLIs, SLOs, and error budget policies for multiple services. - Lead critical incident response and post-incident reviews, translating findings into strategic, long-term service improvements and architectural changes. - Mentor others and act as a subject matter expert in following and evolving established ITIL/OSS processes (incident, change, problem, and capacity management). Automation & Tooling - Design and architect scalable automation solutions to eliminate toil and improve the efficiency of operational tasks across the entire platform. - Drive the strategic direction of monitoring, logging, and alerting frameworks (e.g., Prometheus, Grafana, Catchpoint, ELK), and integrate them for comprehensive observability. - Build, maintain, and secure advanced CI/CD pipelines, configuration management, and complex infrastructure as code solutions (Terraform, Ansible, Jenkins). - Write production-grade code (Bash, Python, Go, etc.) to develop new reliability tools and enhance existing systems. Collaboration - Act as a principal partner to engineering, product, and operations teams, consulting on resilient system design, architecture, and operation. - Lead and formalize the Production Readiness Review (PRR) process, ensuring robust operational handoff for all new services and features. - Lead capacity planning and disaster recovery strategy across critical infrastructure components. - Manage the relationship with vendors and service providers to troubleshoot systemic issues and ensure strict adherence to SLA performance. - Drive the creation of high-quality documentation, proactively share advanced learnings, and cultivate a reliability-first engineering culture across teams. Continuous Improvement - Own the creation, maintenance, and dissemination of operational playbooks, runbooks, and detailed system documentation. - Proactively identify systemic, recurring issues and architect and drive the implementation of long-term improvements and strategic design action plans. - Be a leading voice in promoting and embedding reliability-focused practices within development and operations teams. Qualifications: - Education & Experience - Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience). - 8+ years of progressive experience in site reliability, systems engineering, or operations. - Extensive experience designing, scaling, and operating large-scale, production-grade distributed systems. Technical Skills - Expert-level Linux systems administration and advanced troubleshooting skills. - Lead security-minded operations, focusing on system-wide patching, hardening, and proactive vulnerability identification. - Deep mastery of service reliability concepts, including advanced monitoring, complex alerting strategy, leading incident response, and in-depth root cause analysis. - Advanced proficiency in at least one modern scripting/programming language (Python or Go strongly preferred). - Expert knowledge of incident response methodologies and operational best practices. - Proven experience designing and operating container orchestration (Kubernetes, Docker) and microservices concepts required. - Expert experience with Hashicorp products (Nomad, Vault, Terraform) in a production environment. Preferred Attributes - Significant experience in a SaaS, service provider, or hyper-scale distributed systems environment. - Deep familiarity with ITIL/OSS practices and experience defining/enforcing SLO/SLA’s. - Exceptional problem-solving skills and a strong drive to learn and apply new, complex technologies. - Advanced experience with cloud platforms (AWS, GCP, or Azure) in a production setting. Backblaze Perks: - Healthcare for family, including dental and vision - Competitive compensation and 401K - RSU grants for full-time employees - ESPP program - Flexible vacation policy - Maternity & paternity leave - MacBook Pro to use for work, plus a generous stipend to personalize your workstation - Childcare bonus (human children only) - Fertility treatment and support - Learning & development program - Commuter benefits - Culture that supports a healthy work-life balance To provide greater transparency to candidates, we share base pay ranges for all US-based job postings regardless of state. We set standard base pay ranges for all roles based on function, level, and country location, benchmarked against similar-stage growth companies. Final offer amounts are determined by multiple factors, including candidate location, skills, depth of work experience, and relevant licenses/credentials, and may vary from the amounts listed below. The expected salary range for this role is $150,000 - $200,000. At Backblaze, we value being fair and good to our customers, partners, and employees. That’s why diversity, equity, and inclusion are at the core of our values. We are committed to fostering a workforce where all employees feel a sense of belonging regardless of race, ethnicity, nationality, gender, sexual orientation, age, religion, socio-economic status, ability, veteran status, and education. We believe that our dedication to cultivating a diverse workspace not only allows us to better serve our customers in over 175 countries but further reinforces our commitment to doing the right thing. We are proud to be an Equal Opportunity Employer. To understand more about the data we collect and process as part of your application, please view our Backblaze Employee Privacy Notice.

Linux Python Shell Kubernetes Docker Terraform Ansible Jenkins Prometheus Grafana HashiCorp Vault AWS GCP Azure CI/CD Infrastructure as Code Microservices Distributed Systems

View details: Sr. Site Reliability Engineer

United States

$150K - $200K / year

Apply

Senior Site Reliability Engineer

Centene Corporation

Transforming the health of the communities we serve, one person at a time.

DevOps Engineer83 days ago

Full Time RemoteTeam 10,001+Since 1984H1B No Sponsor

Company Site LinkedIn

• Helps lead projects that are focused on managing and maintaining optimum platform infrastructure performance, reliability, and security using SRE practices, observability tools, manual and automated procedures, documentation, people and processes and continuous delivery (CI/CD) tools, processes, and designs. • Develops complex services to automate monitoring activities and provide critical information to facilitate response and resolution of performance and availability issues and incidents. • Troubleshoots and analyzes service disruptions to determine the root cause of issues and develop solutions for improved reliability. • Support multiple applications and schedule batch jobs for a large number of transactions weekly • Leads more complex projects focused on building and maintaining observability/monitoring for the application, monitoring key performance indicators, maintaining alerting, and continuously improving visibility.

SQL

View details: Senior Site Reliability Engineer

Nebraska + 3 more

$87K - $161.3K / year

Apply

Job Closed

Senior DevOps Engineer

OpenVPN Inc.

OpenVPN® helps businesses of all sizes create secure, virtualized, reliable networks that scale with your team.

DevOps Engineer83 days ago

Full Time RemoteTeam 51-200Since 2002H1B No Sponsor

Company Site LinkedIn

• Design, implement, and maintain highly scalable, fault-tolerant systems that leverage cluster orchestration and containerization technologies • Work alongside Software Engineering and QA teams to refine and implement deployment processes that support microservices-based architectures • Build and oversee CI/CD pipelines that accommodate container-based application deployment and rollback capabilities • Ensure systems are consistently available, performing automated health checks and coordinating zero-downtime deployments • Participate in an on-call rotation to rapidly diagnose and resolve critical system outages • Collaborate with information security teams to guarantee that industry best practices and compliance requirements are met

Grafana Microservices Prometheus

View details: Senior DevOps Engineer

Albania

Apply

Job Closed

DevOps – Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

SRE Analyst – Mid-level

Sr. Site Reliability Engineer

Senior Site Reliability Engineer

Senior DevOps Engineer