Job Closed
This listing is no longer active.
An innovator in tamper-resistant location identity solutions for cutting-edge user verification and account security.
Staff SRE – Site Reliability Engineer
Location
Brazil
Posted
124 days ago
Salary
R$42.0K / month
Seniority
Lead
Job Description
Staff SRE – Site Reliability Engineer
Incognia
• Lead the resolution of high-complexity issues related to the scalability of Kubernetes, databases, and observability systems. • Identify bottlenecks in CI/CD and automation pipelines, proposing and implementing solutions that increase predictability and delivery speed. • Collaborate with product teams to define and implement architectural patterns that ensure high availability of applications. • Respond to critical production incidents, ensuring environment stability and conducting effective post-mortems. • Directly manage team engineers, conducting performance follow-ups and weekly 1:1 meetings. • Support the technical growth of team members, promoting the dissemination of engineering best practices. • Align technical priorities with the business objectives of the Core Engineering area.
Job Requirements
- Bachelor's degree in Computer Science, Computer Engineering, or Software Engineering;
- Experience in leadership or team management
- Knowledge and experience with Kubernetes and containers;
- Experience addressing high-scalability challenges;
- Knowledge and experience with AWS;
- Upper-intermediate English (reads and writes well, can hold everyday conversations without much difficulty, though may speak slowly and with pauses).
Benefits
- CLT salary (Brazilian employment contract) with a range up to R$41,984.12
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Siamo alla ricerca di un DevOps Engineer altamente qualificato e meticoloso per unirsi al nostro team, operando in modalità completamente remota. • Il candidato ideale sarà un professionista attento ai dettagli, con una comprovata esperienza nella progettazione, implementazione e gestione di infrastrutture e pipeline CI/CD. • Questa posizione richiede una solida padronanza degli strumenti e delle pratiche DevOps, con un focus specifico su Jenkins, Terraform e Microsoft Azure.
• Collaborate with engineering, operations, and other stakeholders to understand enterprise architecture, monitoring requirements & performance goals. • Identify and define key performance indicators (KPIs) metrics, diagnose issues, and proactively identify areas for optimization. • Develop and implement observability frameworks, tools, and processes to enable comprehensive monitoring, logging, and tracing of systems and applications. • Ensure the availability, scalability, and reliability of infrastructure and deployment environments. • Implement and manage monitoring and observability tools (AppDynamics/DataDog/Splunk/ELK/Sentry etc) to gain insights into system performance and health. • Provide timely and accurate reports on application performance, highlighting key insights and trends. • Collaborate with digital squads to implement performance improvements, including code optimizations and infrastructure adjustments. • Offer guidance and training to end-users and internal teams on best practices for APM and optimizing application performance.
• Audit CI/CD pipelines, infrastructure-as-code (IaC), and deployment processes for compliance with internal policies and external standards (e.g., ISO 27001, SOC 2, PCI DSS, HIPAA). • Assess cloud environments (AWS, Azure, GCP) for security, governance, and cost controls. • Review access management, secrets handling, and identity policies. • Validate change management, release management and incident response processes. • Identify operational, security, and compliance risks in DevOps workflows. • Evaluate vulnerability management, patching, and dependency controls. • Review logging, monitoring, alerting, and observability practices. • Assess backup, disaster recovery, and business continuity readiness. • Examine DevOps maturity, automation coverage, and adherence to best practices. • Evaluate segregation of duties and approval workflows. • Review version control practices, branching strategies, and audit trails. • Assess third-party tools and vendor integrations for risk exposure. • Produce clear audit reports with findings, risk ratings, and actionable recommendations. • Present audit results to engineering leadership, security teams, and management. • Track remediation efforts and verify corrective actions. • Provide guidance on improving DevOps governance and control frameworks. • Participates in the other regular audits in the IT Audit Plan as assigned by the Head, IT Audit. • Follow-up responsible teams to implement the recommendations of internal auditors, external auditors, consultants, and security analysts.
• Kubernetes platform engineering (EKS-first) ● Design, build, and operate production-grade Kubernetes clusters (multi-nodegroup, autoscaling, upgrades, cluster add-ons). • Implement intelligent autoscaling using real metrics (queue depth, consumer lag, service latency) via tools like KEDA/Karpenter. • Own AWS environments end-to-end (VPC, IAM, EKS/ECS/EC2, ALB/ELB, S3, Route53, CloudWatch, RDS, SQS, Lambda). • Build reproducible infrastructure using Terraform, with strong review + change management practices. • Implement backup/DR patterns (e.g., snapshots, retention, automation) and safe rollouts. • Design infrastructure for data-intensive workloads: high-throughput ingestion, batch processing, and real-time streaming. • Understand and operate distributed systems at scale — consensus, partitioning, replication, and failure modes. • Build and maintain infrastructure for data pipelines, vector databases. • Design for horizontal scalability, ensuring systems handle growing data volumes and user traffic gracefully. • Build/own monitoring + logging from scratch and make it actionable (Prometheus/Grafana, ELK/EFK, alerting). • Define/partner on SLI/SLOs and incident response practices; improve reliability with data-driven changes. • Establish performance testing and production-like load testing environments. • Continuously reduce AWS spend via right-sizing, Spot strategies, reserved capacity planning, and architecture improvements. • Partner with engineering teams to diagnose bottlenecks (db queries, caching, queueing) and propose scalable solutions. • Optimize infrastructure costs for data-heavy workloads (storage tiering, compute scheduling, GPU utilization). • Improve cloud and cluster security posture (IAM, network policies, secrets management, least privilege). • Support SOC2 readiness/execution (controls, evidence automation, operational hardening). • Implement access management patterns.




