Job Closed
This listing is no longer active.
Innovation Engineering_ part of AI/R ©AI Revolution Company
Senior SRE
Location
Brazil
Posted
62 days ago
Salary
0
Seniority
Senior
Job Description
Senior SRE
Invillia
• Define the best cloud infrastructure solutions and provide support throughout the entire service lifecycle (architecture, deployment, and operations). • Improve environment resilience by ensuring higher performance, scalability, availability, quality, monitoring, and alerting. • Design, plan, and implement technology processes and/or solutions based on data, critical thinking, and attention to detail to prioritize and make decisions. • Develop applications, components, and APIs to support other teams or improve the SRE team's management. • Actively participate in crisis and incident resolution. • Build processes and best practices to develop, promote, and evolve a reliability-focused vision. • Identify needs and understand security requirements for the continuous evolution of the product. • Monitor, develop, and manage continuous delivery of the DevOps pipeline, promoting integration between tools and provisioning of machines.
Job Requirements
- Experience with IaC (Atlantis and Terraform).
- Experience with microservices architecture.
- Experience with CI/CD processes (Jenkins and Groovy).
- Experience with infrastructure and networking.
- Experience with container orchestration (ECS, Kubernetes, and Docker).
- Experience with Linux and Windows.
- Experience with AWS (Route 53, ECS, SQS, STS, API Gateway, Lambda, IAM, cross-account/region setups, VPC, CloudFront, SSM, WAF, and CloudWatch).
- Experience with highly available and highly scalable production systems.
- Experience with messaging architectures (Kafka, SQS, and RabbitMQ).
- Desirable knowledge of WAF (Well-Architected Framework).
- Knowledge of configuration management (Salt and/or Ansible).
- Knowledge of Python, Java, and Kotlin to analyze code and develop scripts.
- Knowledge of programming languages.
- Knowledge of observability and log reading/interpretation.
- Experience with Splunk, New Relic, and Prometheus.
- Experience implementing security layers and data protection.
- Knowledge of agile methods such as XP, Scrum, and/or Kanban.
- Knowledge of DevOps (Collaboration, Affinity, Tools, and Scaling).
- Knowledge of Big Data on AWS EMR, Hadoop, and HDFS.
- Knowledge of Tomcat.
Benefits
- Unique benefits among techs #InfinitePowers
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior Infrastructure – Site Reliability Engineer, SRE
OscilarAI Risk Decisioning™ platform that helps organizations manage onboarding, fraud, credit, and compliance risks
• Architect and operate resilient cloud infrastructure (AWS, Pulumi, Kubernetes). • Lead initiatives to improve availability, latency, and performance at scale. • Design and evolve our CI/CD pipelines to optimize for speed, safety, and repeatability. • Define the metrics, alerts, and runbooks that form our observability backbone. • Run chaos experiments and failure simulations to harden the platform. • Mentor engineers and set best practices for SRE across the company.
Senior Site Reliability Engineer
CertifIDCertifID provides identity protection services to help prevent wire fraud. Focused on securing digital financial transactions, the company strives to reduce the financial and emoti
Cybercrime is rising, reaching record highs in 2024. According to the FBI's IC3 report, total losses exceeded $16 billion. With investment fraud and BEC scams at the forefront, the message is clear: the real estate sector remains a lucrative target for cybercriminals. At CertifID, we take this threat seriously and provide a secure platform that verifies the identities of parties involved in transactions, authenticates wire transfer instructions, and detects potential fraud attempts. Our technology is designed to mitigate risks and ensure that every transaction is conducted with confidence and peace of mind. We know we couldn’t take on this challenge without our incredible team. We have been recognized as one of the Best Startups to Work for in Austin, made the Inc. 5000 list, and won Best Culture by Purpose Jobs three years in a row. We are guided by our core values and our vision of a world without wire fraud. We offer a dynamic work environment where you can contribute to meaningful impact and be part of a team dedicated to enhancing security and fighting fraud. We are seeking a Senior Site Reliability Engineer (Senior SRE) to drive reliability improvements across our production SaaS environment. You’ll play a critical role in building scalable infrastructure patterns, advancing observability, improving incident response, and partnering with engineering teams to embed reliability into system design and delivery. This role is ideal for an experienced Sr. SRE who enjoys solving complex operational problems, building automation, and mentoring others. What You’ll Do - Reliability & Platform Operations: Own and improve the reliability, availability, and performance of production systems while defining and operationalizing SLIs/SLOs and error budgets. - AI Agent Enablement: Design and implement autonomous and semi-autonomous AI agents for monitoring distributed systems and applications. Build agents capable of consuming multi-source observability data (metrics, logs, traces, etc.). - Incident Response: Participate in and help lead an on-call rotation, serving as an escalation point for major incidents and facilitating blameless postmortems. - Automation & Infrastructure: Build automated workflows to eliminate manual work and design/maintain Infrastructure-as-Code with Terraform. - Observability: Improve metrics, logs, traces, and alerting using tools like Datadog or Prometheus to reduce noise and increase signal. - Collaboration & Mentorship: Partner with application teams to implement reliability best practices and mentor junior engineers to foster a culture of knowledge sharing. Who You Are - Strategic Architect: You look beyond the "what" to understand the "why," providing insights that influence our GTM and technical direction. - Startup Veteran: You are comfortable moving fast and staying proactive in an environment where the playbook is still being written. - Relatable & Adaptable: You can navigate different personalities across the organization, from high-energy sales teams to analytical engineering partners. - Lifelong Learner: You have a thirst for learning, keeping up with emerging technologies and industry trends. What We're Looking For - Experience: 5+ years in SRE, DevOps, Platform Engineering, or Infrastructure Engineering. - Cloud Expertise: Proven experience supporting production SaaS systems in Azure (preferred), AWS, or GCP. - Technical Stack: Strong Linux, networking, and distributed systems troubleshooting skills. - Containers: Strong experience with containers and orchestration (Kubernetes/EKS/AKS). - IaC & Tooling: Expertise with Infrastructure-as-Code (Terraform strongly preferred). - Programming: Strong scripting/programming skills in Python, Go, Bash, or C#/.NET. - Observability: Hands-on experience with Datadog, Prometheus/Grafana, or OpenTelemetry. What We Offer - Flexible vacation - 12 company-paid holidays - 10 paid sick days - No work on your birthday - Health, dental, and vision Insurance (including a $0 option) - 401(k) with matching, and no waiting period - Equity - Life insurance - Generous parental paid leave - Wellness reimbursement of $300/year - Remote worker reimbursement of $300/year - Professional development reimbursement - Competitive pay - An award-winning culture Not sure if you check all the boxes? Apply anyway! We know that great talent comes in many forms, and we value potential just as much as experience. If you're excited about this role and believe you can grow into it, we’d love to hear from you. We’re looking for people who are eager to learn, adapt, and solve challenges—so if that sounds like you, don’t let a checklist hold you back! Change doesn't happen overnight, and the same goes for us here at CertifID. We evolve collectively and individually as we grow by leaning into the core values that define us. As we grow, we embody GRIT—collectively and individually—to raise the bar and influence outcomes in everything we do. Guard the Customer - Raise the Bar - Influence Outcomes - Teamwork Wins
• Helping administer our existing small collection of Linux servers. • Helping look after our production Oracle + MySQL databases. • Working with our developers to improve and automate all our testing, deployment and monitoring processes • Writing and maintaining a variety of scripts as required. • Diagnosing and helping fix production issues • Providing 3rd line technical support for tickets raised by customers and clients • Comfortable working in a fast-paced, growing company where priorities can vary and evolve • Proactive mindset, taking initiative rather than waiting for direction • Continuously seeking ways to improve processes and drive efficiency
• Manage and maintain systems and applications hosted on cloud computing solutions, following project guidelines; • Perform deployments and system updates using project-specific technologies; • Develop medium- to high-complexity scripts to automate deployment processes and other strategic project tasks; • Monitor systems and applications, analyze high-complexity data, and perform advanced configurations in monitoring tools; • Define system architecture in terms of documentation and technology, ensuring clarity and efficiency in implementations; • Use cloud computing platforms, provisioning resources and basic services both manually and through automation; • Document processes and technical solutions, and produce additional documentation as required by the project; • Analyze and isolate defects identified during testing, investigate root causes, and propose effective solutions to ensure software quality; • Implement new software development procedures, describing methods and operationalizing their application within the project.



