Job Closed

This listing is no longer active.

ASAAS

Simplificamos o recebimento de cobranças para pessoa física, MEIs e grandes empresas.

Lead Site Reliability Engineer – Observability

DevOps EngineerDevOps EngineerFull Time Remote SeniorTeam 501-1,000Since 2010H1B No SponsorCompany Site LinkedIn

Location

Brazil

Posted

71 days ago

Salary

Seniority

Senior

Bachelor DegreePortugueseAWS Cloud Docker Grafana Java Kubernetes Linux Prometheus Python Terraform Go

Job Description

• Lead, develop, and retain the SRE team, fostering high performance, collaboration, and continuous learning • Conduct hiring, onboarding, feedback cycles, individual development plans (IDPs) and performance evaluations • Define the SRE team's strategy and roadmap aligned with Cloud and business objectives • Promote SRE and observability culture, acting as a technical reference for Engineering • Manage team priorities, capacity, and trade-offs, ensuring quality deliveries • Align initiatives with Cloud Engineering, Platform Engineering, and Cloud Security leadership • Report team metrics, risks, and progress to Cloud leadership • Define and lead the observability strategy (metrics, logs, and traces) • Evolve the observability platform (Prometheus, Grafana, OpenTelemetry, Loki, Tempo) • Establish and govern SLIs, SLOs, and Error Budgets for critical services • Define instrumentation standards for applications and infrastructure, driving adoption across teams • Implement an actionable alerting strategy to reduce noise • Plan and execute capacity management based on metrics • Optimize costs and performance of observability solutions at scale • Structure and lead the incident management process (escalation, war room and communication) • Ensure blameless post-mortems and follow up on corrective actions • Identify recurring issues and propose systemic, data-driven improvements • Lead toil reduction through operational automation • Keep operational documentation (runbooks, procedures, and architectures) up to date and accessible

Job Requirements

Experience leading technical teams (SRE, DevOps, Cloud Engineering)
Experience with SRE practices, including SLIs, SLOs, Error Budgets, and toil reduction
Experience with APM tools (Datadog, New Relic, Dynatrace)
Knowledge of observability and telemetry (metrics, logs, traces), with Prometheus and OpenTelemetry (Grafana)
Hands-on experience with Infrastructure as Code (AWS CDK, Terraform)
Proficiency in scripting languages (Python, Bash) and at least one programming language (Go, Java)
Experience with large-scale logging and tracing solutions (Loki, Tempo, Jaeger, ELK Stack)
Cloud experience, preferably AWS
Experience with containers (Docker) and orchestration (Kubernetes, ECS)
Experience in incident management and post-mortems
Understanding of Linux systems and diagnostic tools
Technical English (reading and writing)

Benefits

Medical and dental plans with no co-pay
Life insurance
Pharmacy/medication assistance
Support for physical activities (fitness subsidy)
Neon partnership for employee financial health
Zenklub for mental and physical health (4 free monthly sessions for therapy or nutrition)
Quick massages at headquarters
Flexible meal benefit via a Visa credit card
Free food on-site
Childcare allowance
Parental support program
Extended maternity and paternity leave
In-company training platform
Education assistance subsidizing 70% of tuition for degree programs and language courses
Home office allowance
Work equipment provided
Furniture allowance
Partnerships with coworking spaces across Brazil
Birthday day off
Happy hour allowance
Referral bonus for new hires
Bonus based on annual targets
Stock option plan
Relaxed, casual environment (no dress code)

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More DevOps Engineer Jobs

BI DevOps Engineer – m/w/d

BI2run

📈 Mehr als Business Intelligence-Support.

DevOps Engineer71 days ago

Full Time RemoteTeam 11-50Since 2015H1B No Sponsor

Company Site LinkedIn

• Du installierst, konfigurierst und betreibst unsere BI- und Planungssysteme auf Windows- und Linux-Servern • Du unterstützt beim Aufbau und Betrieb von Cloud-Umgebungen (z. B. in Azure) • Du kümmerst dich um Updates, Patches und die Absicherung der Systeme (u. a. Authentifizierung, Zertifikate, Verschlüsselung) • Du analysierst Störungen und Performance-Probleme, findest nachhaltige Lösungen und dokumentierst deine Anpassungen • Du arbeitest eng mit unseren BI Consultants beim Aufsetzen und Betreiben neuer Lösungen zusammen

AWS Azure Cloud Docker ERP Kubernetes Linux MS SQL Server OpenShift SQL

View details: BI DevOps Engineer – m/w/d

Germany

€50K - €70K / year

Apply

Job Closed

Senior DevOps Engineer, AWS

Xtremepush

DevOps Engineer71 days ago

Full Time Remote

• Work closely with the operations team • Develop CI/CD pipelines to improve on existing deployment processes • Performing application updates • Implement security projects on various server and networking platforms • Installation, monitoring and maintenance of hardware and software • Writing of scripts to automate jobs/processes • The security, stability and uptime of production, staging and development environments • Monitoring the above environments and reacting to alerts and issues • Participate in an on-call rota for priority-1 level alarms • The maintenance of network, server and storage assets in cloud environments • Ongoing upgrades and improvements to infrastructure and processes • Contribute to the planning of application/infrastructure releases and configuration changes • Interact with internal teams and external 3rd party vendors to troubleshoot and resolve complex problems

Cloud Kubernetes PHP Python Terraform

View details: Senior DevOps Engineer, AWS

Lithuania

Apply

Job Closed

Senior Site Reliability Engineer – Data & Automation Focus

SS&C Technologies

Established in 1986, SS&C Technologies is a leading global provider of services and software for the global financial services industry. Committed to helping cl

DevOps Engineer71 days ago

Full Time Remote

Company Site

• Design, build, and maintain a scalable and reliable data platform • Apply SRE principles to data pipelines and services • Define data architecture, models, and standards • Ensure high availability and performance of data systems • Build and maintain ETL/ELT pipelines integrating multiple data sources • Automate operational processes using scripting and APIs • Implement monitoring and alerting for data pipelines • Develop and maintain dashboards and reporting solutions • Support and optimise cloud-based data infrastructure

AWS Azure Cloud ETL Google Cloud Platform Python SQL Tableau Terraform

View details: Senior Site Reliability Engineer – Data & Automation Focus

United Kingdom

Apply

DevOps Engineer

AssureSoft - Careers

AssureSoft is a multinational software development and information technology company providing strategic consulting, technology services, and outsourcing business processes. We work to innovate and create quality software with motivated, passionate, and qualified teams that develop in an environment of professional, stable growth and continuous learning. Inclusive Opportunities for Every Talent. At AssureSoft, we believe that true innovation is born from diversity—of ideas, experiences, and perspectives. That’s why our hiring practices are inclusive and reflect a firm commitment to equity and equal opportunity. Here, every person—regardless of origin, gender, orientation, or beliefs—finds a space to grow, contribute, and be valued not only for their talent, but also for who they are.

DevOps Engineer71 days ago

Full Time RemoteTeam 201-500

Role Description - Architect, manage, and scale cloud-native infrastructure on Google Cloud Platform and AWS. - Design, implement, and maintain Terraform-based infrastructure across 40+ environments. - Manage GCP services including GKE, Cloud Run, Cloud Functions, networking, security, and load balancing. - Administer MongoDB Atlas clusters, Redis instances, BigQuery datasets, and Cloud Storage lifecycle policies. - Manage Redis instances (Cloud Memorystore) for caching, session management and real-time features. - Configure and maintain BigQuery datasets, scheduled queries and data pipelines. - Build and maintain CI/CD pipelines, container build workflows, and automated Terraform processes. - Develop infrastructure automation scripts in Python and Bash. - Maintain monitoring, alerting, tracing, logging, uptime checks, and SLO/SLI monitoring. - Respond to incidents, perform root cause analysis, and implement preventive measures. - Configure security controls including Cloud Armor, IAP, SSL/TLS automation, Secret Manager, VPC networking, and least-privilege IAM. - Create documentation, runbooks, and procedures while mentoring team members on DevOps best practices. Qualifications - 3–5 years of hands-on DevOps experience with production cloud environments. - Strong hands-on production experience with Google Cloud Platform. - Advanced Terraform proficiency, including module development, state management, and Terraform Cloud/Enterprise workflows. - Proven MongoDB Atlas administration experience in production environments. - Experience with Docker, Kubernetes, serverless container platforms, and container registries. - CI/CD expertise with GitHub Actions, Cloud Build, GitOps practices, and automated deployments. - Strong Python and Bash scripting skills. - Experience with monitoring, observability, log aggregation, and APM platforms such as Sentry or Datadog. - Bachelor’s degree in Computer Science, a related technical field, or equivalent practical experience. - Ability to work in a full-time, 100% remote role. Benefits - Great Place To Work certification. - A company with more than 15 years of experience. - Work with world-class clients and long-term projects. - English scholarships for an external institute. - English classes with company teachers. - State-of-the-art tools and resources. - Certifications for your professional growth. - Recreation and leisure activities. - Compliance with the regulations and labor rights of your region. Company Description AssureSoft is a multinational software development and information technology company providing strategic consulting, technology services, and outsourcing business processes. We work to innovate and create quality software with motivated, passionate, and qualified teams that develop in an environment of professional, stable growth and continuous learning. Inclusive Opportunities for Every Talent. At AssureSoft, we believe that true innovation is born from diversity—of ideas, experiences, and perspectives. That’s why our hiring practices are inclusive and reflect a firm commitment to equity and equal opportunity. Here, every person—regardless of origin, gender, orientation, or beliefs—finds a space to grow, contribute, and be valued not only for their talent, but also for who they are.

View details: DevOps Engineer

Finland

Apply

Job Closed

Lead Site Reliability Engineer – Observability

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

BI DevOps Engineer – m/w/d

Senior DevOps Engineer, AWS

Senior Site Reliability Engineer – Data & Automation Focus

DevOps Engineer