Job Closed

This listing is no longer active.

Invillia logo
Invillia

Innovation Engineering_ part of AI/R ©AI Revolution Company

Senior SRE

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 5,001-10,000Since 2003H1B No SponsorCompany SiteLinkedIn

Location

Brazil

Posted

62 days ago

Salary

0

Seniority

Senior

Job Description

Senior SRE

Invillia

• Define the best cloud infrastructure solutions and provide support throughout the entire service lifecycle (architecture, deployment, and operations). • Improve environment resilience by ensuring higher performance, scalability, availability, quality, monitoring, and alerting. • Design, plan, and implement technology processes and/or solutions based on data, critical thinking, and attention to detail to prioritize and make decisions. • Develop applications, components, and APIs to support other teams or improve the SRE team's management. • Actively participate in crisis and incident resolution. • Build processes and best practices to develop, promote, and evolve a reliability-focused vision. • Identify needs and understand security requirements for the continuous evolution of the product. • Monitor, develop, and manage continuous delivery of the DevOps pipeline, promoting integration between tools and provisioning of machines.

Job Requirements

  • Experience with IaC (Atlantis and Terraform).
  • Experience with microservices architecture.
  • Experience with CI/CD processes (Jenkins and Groovy).
  • Experience with infrastructure and networking.
  • Experience with container orchestration (ECS, Kubernetes, and Docker).
  • Experience with Linux and Windows.
  • Experience with AWS (Route 53, ECS, SQS, STS, API Gateway, Lambda, IAM, cross-account/region setups, VPC, CloudFront, SSM, WAF, and CloudWatch).
  • Experience with highly available and highly scalable production systems.
  • Experience with messaging architectures (Kafka, SQS, and RabbitMQ).
  • Desirable knowledge of WAF (Well-Architected Framework).
  • Knowledge of configuration management (Salt and/or Ansible).
  • Knowledge of Python, Java, and Kotlin to analyze code and develop scripts.
  • Knowledge of programming languages.
  • Knowledge of observability and log reading/interpretation.
  • Experience with Splunk, New Relic, and Prometheus.
  • Experience implementing security layers and data protection.
  • Knowledge of agile methods such as XP, Scrum, and/or Kanban.
  • Knowledge of DevOps (Collaboration, Affinity, Tools, and Scaling).
  • Knowledge of Big Data on AWS EMR, Hadoop, and HDFS.
  • Knowledge of Tomcat.

Benefits

  • Unique benefits among techs #InfinitePowers

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Oscilar logo

Senior Infrastructure – Site Reliability Engineer, SRE

Oscilar

AI Risk Decisioning™ platform that helps organizations manage onboarding, fraud, credit, and compliance risks

DevOps Engineer62 days ago
Full TimeRemoteTeam 51-200Since 2021H1B Sponsor

• Architect and operate resilient cloud infrastructure (AWS, Pulumi, Kubernetes). • Lead initiatives to improve availability, latency, and performance at scale. • Design and evolve our CI/CD pipelines to optimize for speed, safety, and repeatability. • Define the metrics, alerts, and runbooks that form our observability backbone. • Run chaos experiments and failure simulations to harden the platform. • Mentor engineers and set best practices for SRE across the company.

Poland
Job Closed

Senior Site Reliability Engineer

CertifID

CertifID provides identity protection services to help prevent wire fraud. Focused on securing digital financial transactions, the company strives to reduce the financial and emoti

DevOps Engineer62 days ago
Full TimeRemoteTeam 130Since 2018

Cybercrime is rising, reaching record highs in 2024. According to the FBI's IC3 report, total losses exceeded $16 billion. With investment fraud and BEC scams at the forefront, the message is clear: the real estate sector remains a lucrative target for cybercriminals. At CertifID, we take this threat seriously and provide a secure platform that verifies the identities of parties involved in transactions, authenticates wire transfer instructions, and detects potential fraud attempts. Our technology is designed to mitigate risks and ensure that every transaction is conducted with confidence and peace of mind. We know we couldn’t take on this challenge without our incredible team. We have been recognized as one of the Best Startups to Work for in Austin, made the Inc. 5000 list, and won Best Culture by Purpose Jobs three years in a row. We are guided by our core values and our vision of a world without wire fraud. We offer a dynamic work environment where you can contribute to meaningful impact and be part of a team dedicated to enhancing security and fighting fraud. We are seeking a Senior Site Reliability Engineer (Senior SRE) to drive reliability improvements across our production SaaS environment. You’ll play a critical role in building scalable infrastructure patterns, advancing observability, improving incident response, and partnering with engineering teams to embed reliability into system design and delivery. This role is ideal for an experienced Sr. SRE who enjoys solving complex operational problems, building automation, and mentoring others. What You’ll Do - Reliability & Platform Operations: Own and improve the reliability, availability, and performance of production systems while defining and operationalizing SLIs/SLOs and error budgets. - AI Agent Enablement: Design and implement autonomous and semi-autonomous AI agents for monitoring distributed systems and applications. Build agents capable of consuming multi-source observability data (metrics, logs, traces, etc.). - Incident Response: Participate in and help lead an on-call rotation, serving as an escalation point for major incidents and facilitating blameless postmortems. - Automation & Infrastructure: Build automated workflows to eliminate manual work and design/maintain Infrastructure-as-Code with Terraform. - Observability: Improve metrics, logs, traces, and alerting using tools like Datadog or Prometheus to reduce noise and increase signal. - Collaboration & Mentorship: Partner with application teams to implement reliability best practices and mentor junior engineers to foster a culture of knowledge sharing. Who You Are - Strategic Architect: You look beyond the "what" to understand the "why," providing insights that influence our GTM and technical direction. - Startup Veteran: You are comfortable moving fast and staying proactive in an environment where the playbook is still being written. - Relatable & Adaptable: You can navigate different personalities across the organization, from high-energy sales teams to analytical engineering partners. - Lifelong Learner: You have a thirst for learning, keeping up with emerging technologies and industry trends. What We're Looking For - Experience: 5+ years in SRE, DevOps, Platform Engineering, or Infrastructure Engineering. - Cloud Expertise: Proven experience supporting production SaaS systems in Azure (preferred), AWS, or GCP. - Technical Stack: Strong Linux, networking, and distributed systems troubleshooting skills. - Containers: Strong experience with containers and orchestration (Kubernetes/EKS/AKS). - IaC & Tooling: Expertise with Infrastructure-as-Code (Terraform strongly preferred). - Programming: Strong scripting/programming skills in Python, Go, Bash, or C#/.NET. - Observability: Hands-on experience with Datadog, Prometheus/Grafana, or OpenTelemetry. What We Offer - Flexible vacation - 12 company-paid holidays - 10 paid sick days - No work on your birthday - Health, dental, and vision Insurance (including a $0 option) - 401(k) with matching, and no waiting period - Equity - Life insurance - Generous parental paid leave - Wellness reimbursement of $300/year - Remote worker reimbursement of $300/year - Professional development reimbursement - Competitive pay - An award-winning culture Not sure if you check all the boxes? Apply anyway! We know that great talent comes in many forms, and we value potential just as much as experience. If you're excited about this role and believe you can grow into it, we’d love to hear from you. We’re looking for people who are eager to learn, adapt, and solve challenges—so if that sounds like you, don’t let a checklist hold you back! Change doesn't happen overnight, and the same goes for us here at CertifID. We evolve collectively and individually as we grow by leaning into the core values that define us. As we grow, we embody GRIT—collectively and individually—to raise the bar and influence outcomes in everything we do. Guard the Customer - Raise the Bar - Influence Outcomes - Teamwork Wins

Texas + 1 moreAll locations: Texas | Michigan
White Hat Gaming logo

Site Reliability Engineer

White Hat Gaming

Market Leading Full-Service Platform

DevOps Engineer62 days ago
Full TimeRemoteTeam 501-1,000Since 2012H1B No Sponsor

• Helping administer our existing small collection of Linux servers. • Helping look after our production Oracle + MySQL databases. • Working with our developers to improve and automate all our testing, deployment and monitoring processes • Writing and maintaining a variety of scripts as required. • Diagnosing and helping fix production issues • Providing 3rd line technical support for tickets raised by customers and clients • Comfortable working in a fast-paced, growing company where priorities can vary and evolve • Proactive mindset, taking initiative rather than waiting for direction • Continuously seeking ways to improve processes and drive efficiency

Malta
Job Closed
Full TimeRemoteTeam 501-1,000Since 1965H1B No Sponsor

• Manage and maintain systems and applications hosted on cloud computing solutions, following project guidelines; • Perform deployments and system updates using project-specific technologies; • Develop medium- to high-complexity scripts to automate deployment processes and other strategic project tasks; • Monitor systems and applications, analyze high-complexity data, and perform advanced configurations in monitoring tools; • Define system architecture in terms of documentation and technology, ensuring clarity and efficiency in implementations; • Use cloud computing platforms, provisioning resources and basic services both manually and through automation; • Document processes and technical solutions, and produce additional documentation as required by the project; • Analyze and isolate defects identified during testing, investigate root causes, and propose effective solutions to ensure software quality; • Implement new software development procedures, describing methods and operationalizing their application within the project.

Brazil
Job Closed