Job Closed
This listing is no longer active.
Remedy supports founders and established companies in creating the next generation of great digital products
DevOps Engineer
Location
Brazil
Posted
127 days ago
Salary
0
Seniority
Senior
Job Description
DevOps Engineer
Remedy Product Studio
• Be an expert with AWS Cloud infrastructure and services and train others. • Automate pipeline deployments, tooling implementations and infrastructure deployments. • Automate solutions and develop processes to audit and monitor conformance with security standards. Remediate any security violations. • Review and optimize application deployments for NFRs like performance, scalability and robustness. • Innovate and prototype new tools and practices as a part of Remedy’s architecture offerings. • Promote DevOps best practices and culture among software engineers in the organization. • Able to operate with ambiguity and put together reliable and complex solutions • Are able to design for long term robustness while allowing pragmatic near term rollouts • Support various internal teams deploying solutions and developing applications.
Job Requirements
- 5+ Experience as DevOps Engineer
- 3+ years of working with core AWS tools (IAM, S3, VPC, EC2, ELB)
- Are able to design for long term robustness while allowing pragmatic near term rollouts
- Good understanding of containerization concepts (Docker)
- Previous software coding experience with shell scripting in Bash
- Good experience with managed Kubernetes clusters (AWS EKS)
- Strong knowledge of Terraform
- A minimum B2 English proficiency level
- Desired (will be a plus):
- Working knowledge of configuration and management of DataDog and Prometheus
- Proficient with Git and Git Workflows
- Ability to configure CI/CD pipelines for application deployments in cloud environments (CircleCI, GitHub Action)
- Understanding concepts of AWS big data tools and platforms (EMR, Glue, Athena, MSK)
- General understanding of Argo-workflows
Benefits
- Competitive compensation
- Remote first work environment
- Laptop reimbursement program
- Predetermined non-working days that align with your cultural and social contexts
- Monthly stipends to help support and cover work related costs
- Connectivity
- Technology
- Tools
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Site Reliability Engineer II
Restaurant365Restaurant365 is a computer software company that specializes in providing high-quality Software-as-a-Service (SaaS) solutions to the restaurant industry. The platform is cloud-bas
• The Site Reliability Engineer II will be responsible for supporting, enhancing, and maintaining Restaurant365’s cloud infrastructure and applications. • Collaborate with DevOps, development, and infrastructure teams to resolve moderately complex issues, propose improvements, and strengthen the reliability, scalability, and security of our SaaS platform. • Respond to production incidents, perform triage and troubleshooting, and contribute to post-incident analysis. • Identify and automate manual processes to improve efficiency and reduce risk. • Enhance and evolve monitoring tools and platforms to improve observability. • Promote and apply best practices for reliability, scalability, and performance across engineering. • Implement and support cloud automation using Terraform, Ansible, or CloudFormation. • Work within change management protocols to provide maximum uptime for production systems. • Participate in on-call rotation, providing 24x7 support for incidents and contributing to root cause analysis. • Partner with developers, architects, vendors, and IT teams to ensure reliable system operations. • Research and remediate vulnerabilities in coordination with security teams. • Maintain documentation of infrastructure, monitoring, runbooks, and incident response procedures.
• Responsible for the deployment and installation of technology and hardware across the country • Handling customer support tickets and calls • Ensuring that projects are completed on time • Working directly with product engineers and customers • Diagnosing and troubleshooting issues remotely and in the field • Planning and preparing for deployments
This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description We're looking for a Staff SRE who can own the reliability, scalability, and operational excellence of our platform. You'll work at the intersection of infrastructure and software engineering - building the systems, tooling, and practices that let our team ship confidently and operate at scale. - Set technical direction for infrastructure and reliability - evaluate approaches, make architectural decisions, and establish standards. - Own and evolve our Kubernetes-based infrastructure on GCP. - Build and maintain CI/CD pipelines, deployment tooling, and release processes. - Maintain and simplify our build system (Bazel) for faster, more reliable builds across the org. - Define and instrument SLIs/SLOs; build dashboards and alerting that surface real problems. - Drive incident response, post-mortems, and reliability improvements. - Partner with product engineers to design systems that are reliable and operable from day one. - Contribute to our engineering culture around AI-augmented development - sharing patterns, workflows, and lessons learned. Qualifications - Significant experience in SRE, platform engineering, or infrastructure roles at scale. - Demonstrated technical leadership: you've driven significant infrastructure or reliability initiatives, not just executed on them. - Deep hands-on expertise with Kubernetes (GKE preferred) and GCP services. - Strong programming skills - Go preferred. - Experience with build systems (Bazel strongly preferred) and CI/CD tooling. - Practical experience with AI coding assistants as part of your regular workflow - not just experimentation, but daily use. - Ability to critically evaluate AI-generated code and infrastructure configs: you know when to trust it, when to revise it, and when to write it yourself. - Track record of improving reliability through automation, observability, and good engineering practices. - Comfort with ambiguity and ownership; we're a small team where engineers drive decisions. Nice to Have - Background in security, malware analysis, or threat detection. - Experience with large-scale data systems (BigTable, Spanner, BigQuery). - Deep proficiency in Go. Benefits - Hard technical problems with real security impact. - Small team, huge impact, high autonomy, low process overhead. - Opportunity to collaborate with world-class experts in cybersecurity. - Work remotely in the USA or Canada, or use our co-working space in Santa Clara to collaborate with teammates in-person.
Senior Site Reliability Engineer, Azure Red Hat OpenShift
Red HatThe leading provider of enterprise open source solutions.
• Contribute code to increase the scalability and reliability of the service • Contribute software tests and participate in peer review to increase the quality of our codebase • Help and develop peers’ capabilities through knowledge sharing, mentoring, and collaboration • Participate in a regular on-call schedule, including occasional paid weekends and holidays • Practice sustainable incident response and blameless postmortems • Resolve customer issues escalated from the Red Hat Global Support team • Work within a small agile team to develop and improve SRE software, support your peers, plan and self-improve • Explore and experiment with emerging AI technologies relevant to software development, proactively identifying opportunities to incorporate new AI capabilities into existing workflows and tooling.



