LWSA logo
LWSA

Integrando soluções & Impulsionando negócios

SRE & DevOps Specialist

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 1,001-5,000Since 1998H1B No SponsorCompany SiteLinkedIn

Location

Brazil

Posted

11 days ago

Salary

0

Seniority

Senior

Job Description

SRE & DevOps Specialist

LWSA

• Reliability Leadership: Lead the technical planning of operations initiatives, ensuring that resilience and scalability are core requirements from solution design. • Platform Engineering (Self-Service): Act as an enabler for technology teams by creating abstractions and automations that remove obstacles in deployment and infrastructure management, enabling smooth, low-friction operations. • Go beyond day-to-day operations by identifying repetitive tasks and turning them into robust automations so the Ops team can focus on high-value work. • Innovation and Continuous Improvement: Rethink legacy operational processes, introducing improvements (even small script or workflow changes) that bring predictability and safety to the production environment. • Incident Response Lead: Serve as the technical escalation point in complex crises, leading resolution efforts and, importantly, conducting root cause analysis to prevent recurrence. • Review proposed changes and architectures, ensuring they meet the company's security, cost (FinOps), and operational excellence standards. • Mentorship and SRE Culture: Be a catalyst for SRE culture within Operations, raising the team's technical level and promoting knowledge-sharing about distributed systems.

Job Requirements

  • Infrastructure as Code (IaC): Advanced experience with Terraform for large-scale environments, creating standards that enable engineering teams to operate autonomously.
  • AWS Expertise: Deep knowledge of AWS services.
  • Observability: Define and implement strategies based on SLIs, SLOs and Error Budgets, using New Relic to anticipate incidents.
  • CI/CD Experience: Strong command of automation pipelines (preferably GitLab CI/CD) with a focus on security.
  • Infrastructure Security: Experience remediating infrastructure vulnerabilities (CVEs) and image hardening, ensuring a hardened and compliant environment.
  • Linux Operating System: Advanced OS-level troubleshooting.
  • Orchestration Leadership: Lead the container strategy with a primary focus on Amazon ECS (EC2 and Fargate), ensuring resilience and cost efficiency.
  • Data Governance: Serve as a reference for MySQL and PostgreSQL, ensuring our instances (RDS/Aurora) are optimized for high performance and security.

Benefits

  • Health insurance;
  • Dental insurance;
  • Meal or food allowance;
  • Childcare assistance;
  • Transportation allowance;
  • Profit-sharing (PPR) program;
  • Paid day off during your birthday month;
  • Life insurance;
  • Wellhub (employee wellness program);
  • Férias&Co (vacation/travel benefit);
  • 6-month maternity leave and 20-day paternity leave;
  • Flexible working hours;
  • #Secuida - our Quality of Life program;
  • Partnerships with various establishments and institutions in education, health, leisure, entertainment, and more.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

TrueML logo

Junior DevOps Engineer

TrueML

TrueML is a fintech company building software to create positive experiences for consumers seeking financial health.

DevOps Engineer11 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

• Build small to medium-sized infrastructure components using Terraform and AWS. • Ensure reliable build-and-deploy cycles by maintaining and debugging CI/CD workflows, including GitHub Actions and ArgoCD. • Learn to troubleshoot and resolve issues in containerized environments, including Kubernetes pods and EKS networking bottlenecks. • Leverage GenAI and AI code assistants to accelerate your onboarding and complete well-defined automation tasks. • Validate AI-generated code for correctness and style according to team standards. • Contribute to system reliability by participating in the on-call rotation and swiftly responding to system alerts. • Utilize logging and observability tools (Datadog, Observe) to efficiently gather information during troubleshooting. • Own the quality of your work by testing and documenting your code, ensuring bug fixes are implemented reliably across all environments (dev, staging, production). • Engage actively in team ceremonies, including sprint planning and daily standups. • Clearly communicate project status and implementation details to the broader team. • Partner with senior engineers to understand and maximize the business and customer impact of your work.

Argentina
$39.6K - $47.6K / year
Mind Computing logo

DevOps Engineer

Mind Computing

Innovate | Automate | Accelerate

DevOps Engineer12 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor

• Design, develop, and implement AWS cloud architecture leveraging services such as EC2, S3, RDS, VPC, ELB/ALB, EKS, and other AWS-native services, with a focus on scalability, high availability, and disaster recovery. • Provision and manage AWS resources, including compute, storage, networking, and databases, using the AWS Management Console, CLI, and infrastructure-as-code tools. • Develop and maintain automated deployment and provisioning of solutions using Terraform and AWS CloudFormation. • Implement and enforce cloud security best practices, including IAM, encryption at rest and in transit, network segmentation, logging, and compliance with applicable regulatory standards. • Monitor cloud environments for performance, availability, and cost optimization using AWS monitoring and alerting tools; proactively troubleshoot and resolve issues. • Integrate cloud infrastructure with CI/CD pipelines to streamline application builds and deployments, leveraging GitHub Actions. • Collaborate closely with development teams to understand application requirements and translate them into efficient, secure AWS-based solutions. • Manage application deployments and releases with minimal downtime, using containerization technologies such as Docker and orchestration platforms like Kubernetes (EKS).

Washington
$115K - $130K / year
Job Closed
Dev.Pro logo

Intermediate DevSecOps

Dev.Pro

Software Development Partner. Result-driven. Quality-obsessed.

DevOps Engineer12 days ago
Full TimeRemoteTeam 501-1,000Since 2011H1B No Sponsor

Role Description At Dev.Pro, we work on projects that impact millions of people around the world — but we know it’s the people behind the tech who make it all happen. We truly value what makes each person unique and are building a workplace that’s inclusive, friendly, and supportive. Qualifications - Submit a CV in English Requirements - Intro call with a Recruiter - Internal interview - Client interview - Offer Benefits - 30 paid days off each year — use them for vacation, holidays, or personal time - 5 paid sick days, up to 60 days of medical leave, and 6 paid days off for family events like weddings, funerals, or having a baby - Partially covered health insurance - after probation - Wellness bonus for gym memberships, sports nutrition, and similar needs

Argentina
ASAAS logo

Lead Site Reliability Engineer – Observability

ASAAS

Simplificamos o recebimento de cobranças para pessoa física, MEIs e grandes empresas.

DevOps Engineer12 days ago
Full TimeRemoteTeam 501-1,000Since 2010H1B No Sponsor

• Lead, develop, and retain the SRE team, fostering high performance, collaboration, and continuous learning • Conduct hiring, onboarding, feedback cycles, individual development plans (IDPs) and performance evaluations • Define the SRE team's strategy and roadmap aligned with Cloud and business objectives • Promote SRE and observability culture, acting as a technical reference for Engineering • Manage team priorities, capacity, and trade-offs, ensuring quality deliveries • Align initiatives with Cloud Engineering, Platform Engineering, and Cloud Security leadership • Report team metrics, risks, and progress to Cloud leadership • Define and lead the observability strategy (metrics, logs, and traces) • Evolve the observability platform (Prometheus, Grafana, OpenTelemetry, Loki, Tempo) • Establish and govern SLIs, SLOs, and Error Budgets for critical services • Define instrumentation standards for applications and infrastructure, driving adoption across teams • Implement an actionable alerting strategy to reduce noise • Plan and execute capacity management based on metrics • Optimize costs and performance of observability solutions at scale • Structure and lead the incident management process (escalation, war room and communication) • Ensure blameless post-mortems and follow up on corrective actions • Identify recurring issues and propose systemic, data-driven improvements • Lead toil reduction through operational automation • Keep operational documentation (runbooks, procedures, and architectures) up to date and accessible

Brazil