Avaya logo
Avaya

Avaya is an Equal Opportunity employer and a U.S. Federal Contractor. Our commitment to equality is a core value of Avaya. All qualified applicants and employees receive equal treatment without consideration for race, religion, sex, age, sexual orientation, gender identity, national origin, disability, status as a protected veteran or any other protected characteristic. In general, positions at Avaya require the ability to communicate and use office technology effectively. Physical requirements may vary by assigned work location. This job brief/description is subject to change. Nothing in this job description restricts Avaya's right to alter the duties and responsibilities of this position at any time for any reason.

Site Reliability Engineer (SRE) - Azure | DevSecOps | IaC | Governance | Observability

DevOps EngineerDevOps EngineerOtherRemoteMid LevelTeam 5,001-10,000H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

81 days ago

Salary

$129K - $143K / year

Seniority

Mid Level

Job Description

Site Reliability Engineer (SRE) - Azure | DevSecOps | IaC | Governance | Observability

Avaya

About Avaya Avaya is an enterprise software leader that helps the world’s largest organizations and government agencies forge unbreakable connections. The Avaya Infinity™ platform unifies fragmented customer experiences, connecting the channels, insights, technologies, and workflows that together create enduring customer and employee relationships. We believe success is built through strong connections – with each other, with our work, and with our mission. At Avaya, you'll find a community that values your contributions and supports your growth every step of the way. Learn more at https://www.avaya.com Description We are seeking a Site Reliability Engineer (SRE) who will drive stability, reliability, and performance across our Azure and GCP-based platforms. This role blends operational excellence, proactive incident management, and strong collaboration with DevOps, Cloud, and Security teams. The ideal candidate will have hands-on experience with multi-cloud environments (Azure and GCP), IaC (Terraform/Ansible), CI/CD (Jenkins/GitHub Actions), and modern observability and AI-Ops systems. The engineer will also contribute to governance, cost optimization, and automation strategies that reduce toil and prevent issues before they occur. A key aspect of this role is the ability to perform deep-dive troubleshooting of application performance and errors by analyzing logs and traces in platforms like Grafana and Datadog. This position includes 24×7 support coverage (rotational) and requires strong ownership in managing major incidents, RCA processes, and continuous service improvements. Key Responsibilities Reliability & Incident Management - Serve as a key member of the 24×7 on-call rotation, responding to and managing incidents across production and pre-production environments. - Lead incident bridges, coordinate root cause analysis (RCA), and ensure post-incident reviews drive systemic improvements. - Maintain clear communication with cross-functional teams and leadership during major incidents. Monitoring, AI-Ops, Alerts & Prevention - Build, tune, and maintain observability dashboards (Azure Monitor, GCP Operations Suite, Prometheus, Grafana, Datadog, Log Analytics). - Perform deep-dive troubleshooting of application and service-level issues using distributed tracing and log analysis (Grafana, Datadog) to pinpoint root causes beyond infrastructure. - Define SLOs, SLIs, and error budgets to proactively identify and mitigate reliability risks before customer impact. - Integrate AI-Ops tools for anomaly detection, predictive alerting, and automated incident correlation. - Continuously enhance alert quality, reduce false positives, and automate runbooks for faster recovery. - Analyze trends to prevent recurring issues and support teams in resilience engineering. Requirements Required Skills & Experience - 5+ years in Site Reliability, DevOps, Cloud Operations, or Customer support roles. - Demonstrated experience in application-level troubleshooting by analyzing logs and traces to identify bugs, performance bottlenecks, and error conditions. - Expertise in Azure and GCP cloud operations and distributed system reliability. - Understanding of Terraform, Ansible, and CI/CD pipelines (Jenkins, GitHub Actions). - Experience with observability and AI-Ops tools (Azure Monitor, GCP Operations Suite, Grafana, Prometheus, Datadog, etc.). - Solid grasp of incident management frameworks (P1–P3 handling, RCA, PIRs, on-call rotations). - Excellent analytical, troubleshooting, and communication skills. Desired Behaviours - Proactive Prevention: Identifies and resolves risks before they escalate into incidents. - AI-Driven Mindset: Applies AI and automation to improve reliability and reduce human intervention. - Accountability: Owns service reliability and communicates with clarity. - Collaboration: Works seamlessly with platform, DevOps, and product teams. - Efficiency: Focuses on automation to reduce manual effort and improve MTTR. - Continuous Improvement: Learns from failures, iterates processes, and enhances documentation. The pay range for this opportunity is from $129,00 to $143,000 + performance-related bonus + benefits. This range represents the anticipated low and high end of the salary for this position. This role is also eligible to receive an annual bonus that aligns with individual and company performance. Actual salaries will vary and are based on factors such as a candidate’s qualifications, skills, competencies. Footer Applicants must be currently authorized to work in the United States without the need for visa sponsorship now or in the future. Avaya is an Equal Opportunity employer and a U.S. Federal Contractor. Our commitment to equality is a core value of Avaya. All qualified applicants and employees receive equal treatment without consideration for race, religion, sex, age, sexual orientation, gender identity, national origin, disability, status as a protected veteran or any other protected characteristic. In general, positions at Avaya require the ability to communicate and use office technology effectively. Physical requirements may vary by assigned work location. This job brief/description is subject to change. Nothing in this job description restricts Avaya right to alter the duties and responsibilities of this position at any time for any reason.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

InComm Payments logo

DevOps Engineer II

InComm Payments

Quando você pensar na InComm Payments, pense em tecnologia inovadora de pagamentos. Fomos fundados há mais de 30 anos e continuamos a ser pioneiros na indústria de pagamentos (FinTech). Desde a nossa criação estamos em continuo crescimento e somos uma equipe de mais de 3.000 funcionários em mais de 34 países ao redor do mundo. Possuímos mais de 400 patentes técnicas globais e uma rede que inclui mais de 525.000 pontos de distribuição no varejo que apontam para nossa experiência no setor. A InComm Payments está altamente focada em nosso pessoal e em seu crescimento, e trabalhamos duro para tornar a sua carreira significativa e gratificante. Valorizamos a inovação, a qualidade, a paixão, a integridade e a responsabilidade em tudo o que fazemos e procuramos pessoas excelentes para se juntarem à nossa equipa à medida que avançamos em direção a um futuro muito brilhante. Antecipamos o desenvolvimento de futuros líderes para nossas equipes no Brasil!

DevOps Engineer81 days ago
OtherRemoteTeam 1,001-5,000

Overview When you think of InComm Payments, think of Innovative Payments Technology. We were founded over 30 years ago and continue to be a pioneer in the payment (FinTech) industry. Since our inception, we have grown to be a team of over 3,000 employees in 35 countries around the world. We own over 400 global technical patents and a network that includes over 525,000 points of retail distribution that points to our industry expertise. We are significantly growing our Engineering and IT teams in Brazil and are focused on finding talent for various financial technology (Fintech) engineering, database, development, and testing teams. InComm Payments is highly focused on our people and their growth, and we work hard to make a career at InComm Payments meaningful and rewarding. We value innovation, quality, passion, integrity and responsibility in all that we do, and we are looking for great people to join our team as we move forward towards a very bright future. We anticipate developing future leaders for our teams in Brazil! Benefits include health and dental insurance, meal and restaurant vouchers, fixed monthly stipend for internet and mobile expenses, InComm hardware/software, and annual bonuses! All positions are CLT. You can learn more about InComm Payments by visiting our Website or connecting with us on LinkedIn, YouTube, Twitter, Facebook, or Instagram. About This Opportunity As a DevOps Engineer II, you will be responsible for implementing and improving CI/CD tooling to drive total automation across environments. This team will be supporting the enterprise as a shared service for development teams to utilize in automating build and deployment pipelines. You will collaborate with development teams to enhance services through rigorous testing and release procedures and create sustainable systems and services through automation. Responsibilities - Implement and improve CI/CD tooling for automated releases. - Collaborate with development teams to enhance services through testing and release procedures. - Create sustainable systems and services through automation. - Lead initiatives to monitor DevOps build and deploy pipelines. - Triage user requests and manage work queue with appropriate prioritization. - Emphasizing automation, recommend solutions and implement processes, procedures and best practice guidelines for code, build, test and deployments. - Participate in after-hours on-call rotation. Qualifications - Bachelor’s degree in computer science or a related technical discipline. - Proficient in Groovy, YAML, Bash, PowerShell and Python programming. - Proficient in CI/CD tools, such as Jenkins, GitHub Actions, and Azure DevOps. - Experience with Configuration Management and tools like SonarQube, Artifactory, GitHub, Selenium, and Postman. - Extensive experience in Linux and Windows system administration. - Proficient in container orchestration using Kubernetes, with deep expertise in Docker-based containerization for microservices architecture and cloud-native deployments. - Experience with orchestration tools like Ansible and Terraform. - Hands-on experience deploying and managing scalable applications in Microsoft Azure. - Experience in the complete lifecycle of projects, including version control, build management, unit testing, and issue tracking software. - Understanding of software development best practices. - Proactive in identifying problems, areas for improvement, and performance bottlenecks. InComm provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity or national origin, citizenship, veteran’s status, age, disability status, genetics or any other category protected by federal, state, or local law. *This position is eligible for the Employee Referral Bonus Program-Tier 3 #LI-WS1 #LI-Remote

United States + 1 moreAll locations: United States | Canada
Kunai logo

Senior DevOps Engineer

Kunai

20% of fortune 500 fintech trust Kunai for engineering talent.

DevOps Engineer81 days ago
OtherRemoteTeam 51-200Since 2001H1B Sponsor

Kunai builds full-stack technology solutions for banks, credit and payment networks, infrastructure providers, and their customers. Together, we are changing the world’s relationship with financial services. At Kunai, we help our clients modernize, capitalize on emerging trends, and evolve their business for the coming decades by remaining tech-agnostic and human-centered. We are seeking a DevOps Engineer to support client engagements by building, operating, and improving cloud infrastructure and delivery pipelines for production-grade systems. The initial project will focus on building and scaling a billing management platform, with an emphasis on production readiness, reliability, observability, and operational excellence. What you’ll do: - Design and implement resilient, secure, and scalable cloud environments to support client platforms in production. - Drive production readiness and operations: monitoring and alerting, incident support, runbooks, capacity planning, reliability improvements, and release readiness. - Build and maintain CI/CD workflows and reconfigure/enhance an existing proprietary pipeline using Argo. - Automate infrastructure provisioning and configuration using Infrastructure as Code (Terraform, CloudFormation, CDK). - Support containerized deployments and orchestration using Docker and ECS. - Develop automation scripts and utilities in Python and/or Bash for deployment, configuration, and operational tasks. - Implement and maintain service configuration and deployment automation across environments (dev/test/stage/prod). - Configure and manage cloud networking and access controls, including Security Groups. - Implement and maintain monitoring/observability capabilities (metrics, logs, traces, dashboards) and establish actionable SLOs/SLIs. - Plan and execute performance testing and scalability validation; partner with engineering to remediate bottlenecks and improve system performance. - Collaborate with engineering, architecture, security, and client stakeholders to triage issues, estimate work, and continuously improve delivery and reliability. Required skills: - 5+ years of hands-on DevOps / Platform / SRE experience supporting production systems. - Strong experience with at least one public cloud provider (AWS, GCP, or Azure). - Demonstrated practical experience with DevOps tools and practices, with a clear focus on production readiness and operations. - Experience designing and operating resilient systems (availability, scalability, fault tolerance). - Strong Infrastructure as Code experience with Terraform, CloudFormation, and/or CDK. - CI/CD experience, including adapting and improving existing pipelines; experience with Argo preferred. - Containerization and orchestration experience with Docker and ECS. - Scripting/automation skills with Python and/or Bash. - Experience with service configuration and deployment automation. - Experience configuring and managing Security Groups and related cloud networking controls. - Hands-on experience with monitoring/observability and performance testing in production-like environments. Nice to have: - Experience supporting billing, payments, or financial platforms. - Familiarity with SRE practices (error budgets, incident management, postmortems). - Exposure to multi-account/multi-environment cloud setups and governance. Our success over the past 20 years is rooted in our exceptional team, which thrives in a culture of collaboration, creativity, and continuous learning. We are proud to offer our employees a range of benefits, including competitive compensation, professional development opportunities, and flexible work arrangements, all designed to help them thrive. As we continue to expand, we remain committed to cultivating an environment where people feel valued, have a voice, and are given the tools to grow—both personally and professionally—while pushing the boundaries of innovation in the fintech industry. Minimum Degree Required: - Bachelor’s Degree, in lieu of a degree, demonstrating in addition to the minimum years of experience required for the role, three years of specialized training and/or progressively responsible work experience in technology for each missing year of college is required

United States
Job Closed
Keep IT Simple logo

Associate DevSecOps Engineer

Keep IT Simple

Keeping IT Simple Since 1988.

DevOps Engineer81 days ago
Full TimeRemoteTeam 11-50Since 1988H1B No Sponsor

• Support migration activities from Azure DevOps to GitHub Enterprise, including repository organization, workflow conversion, validation, and documentation; • Contribute to GitHub repository administration, branching strategies, pull requests, and review processes; • Build, update, and maintain GitHub Actions workflows for CI/CD tasks such as testing, build, packaging, and deployment; • Support Docker image creation, image push to Amazon ECR, and deployments to Amazon EKS under senior guidance; • Contribute to AWS Lambda development in Python or Node.js, including integration with API Gateway and event-driven services such as SQS and EventBridge; • Contribute to Terraform modules for AWS infrastructure and automation, including services such as S3, IAM, Lambda, and container-related resources; • Review and triage GitHub Advanced Security findings, including CodeQL, Dependabot, and secret scanning, with support from senior engineers and security teams; • Participate in automated testing, quality gates, and CI/CD best practices; • Create and maintain technical documentation related to pipelines, standards, and migration activities; • Actively use GenAI tools such as GitHub Copilot, Claude, and ChatGPT to support engineering work; • Participate in agile ceremonies and contribute to the broader cloud modernization effort.

Brazil
Job Closed
Keep IT Simple logo

Senior DevSecOps Engineer

Keep IT Simple

Keeping IT Simple Since 1988.

DevOps Engineer81 days ago
Full TimeRemoteTeam 11-50Since 1988H1B No Sponsor

• Lead the end-to-end migration from Azure DevOps to GitHub Enterprise, including repositories, pipelines, boards, and governance practices; • Design and implement reusable GitHub Actions workflows for build, test, SAST, security scanning, packaging, ECR push, and deployment; • Drive the implementation and adoption of GitHub Advanced Security, including CodeQL, secret scanning, dependency review, security gates, and vulnerability management; • Define and enforce CI/CD standards, branching strategies, pull request practices, and GitHub-based governance; • Act as the technical lead for AWS modernization initiatives, with focus on containerization on Amazon EKS and serverless solutions based on AWS Lambda; • Design and maintain Terraform code for AWS infrastructure, including EKS, Lambda, IAM, networking, and related services; • Support container engineering practices including Helm, image hardening, and ECR lifecycle policies; • Contribute to serverless architectures using AWS Lambda together with API Gateway, Step Functions, EventBridge, and SQS; • Embed security and compliance controls into delivery pipelines, supporting shift-left practices, security gates, and automated remediation workflows; • Work closely with platform, security, cloud, and application teams to align technical solutions with business priorities; • Mentor junior engineers and contribute to runbooks, ADRs, onboarding materials, standards, and operational best practices; • Actively use GenAI tools such as GitHub Copilot, Claude, and ChatGPT to accelerate engineering and migration activities.

Brazil
Job Closed