Job Closed
This listing is no longer active.
ScalableOS is a premium offshoring solutions provider based in the Philippines.
L3 Cloud Operations Engineer
Location
United States
Posted
75 days ago
Salary
0
Seniority
Senior
Job Description
L3 Cloud Operations Engineer
ScalableOS
• Serve as the senior technical leader for the L1 and L2 Cloud Operations Engineers, providing day-to-day guidance, coaching, and knowledge transfer • Lead by example on complex incidents, walking junior engineers through advanced troubleshooting methodologies and resolution strategies • Define and maintain standard operating procedures, runbooks, and escalation workflows to ensure consistent and high-quality service delivery • Act as the technical point of contact during operational hours, overseeing ticket queues, prioritization, and SLA adherence across the team • Conduct internal training sessions and knowledge sharing on new technologies, processes, and client-specific environments • Identify skill gaps within the team and recommend training paths and certification goals to the Head of Managed Services • Serve as the highest-level technical escalation point within the service desk, resolving the most complex incidents spanning cloud, networking, security, and back-end infrastructure • Own and lead major incident processes end-to-end, including triage, communication, escalation to third parties, root cause analysis, and post-incident reviews • Perform advanced troubleshooting across multi-tenant Azure environments, hybrid infrastructure, and complex networking topologies • Document and track all escalations, major incidents, and problem records in Jira Service Management with thorough root cause analysis • Architect and manage complex Azure environments including Azure AD, Conditional Access, Azure Virtual Desktop (AVD), Azure Networking (VNets, NSGs, ExpressRoute), and Azure Automation • Administer and optimize Office 365 tenants at an advanced level, including Exchange Online mail flow, hybrid configurations, security and compliance policies, and tenant-to-tenant migrations • Manage and troubleshoot Windows Server infrastructure (2016/2019/2022) at an advanced level, including Active Directory design, Group Policy architecture, DNS/DHCP, DFS, and Certificate Services • Oversee VMware ESXi and virtualization environments including capacity planning, performance optimization, host management, and migration strategies • Lead VDI environment management including Azure Virtual Desktop, Citrix, and thin client deployments at scale • Perform intermediate to advanced network troubleshooting and configuration across TCP/IP, DNS, DHCP, VLANs, routing protocols, and WAN connectivity • Configure and manage Fortinet FortiGate firewalls including advanced policy management, SD-WAN, IPS/IDS, web filtering, and high-availability configurations • Manage Cisco Meraki environments on a scale, including complex wireless deployments, SD-WAN, switch stacking, and security appliance policies • Design, configure, and troubleshoot SSL VPN and IPsec VPN solutions across multiple client environments • Serve as the subject matter expert for advanced desktop and endpoint issues that cannot be resolved at L1/L2, including complex OS corruption, driver conflicts, and application compatibility • Design and optimize Intune/Endpoint Manager deployment strategies including Autopilot, compliance policies, and application packaging • Liaise directly with client stakeholders on escalated issues, service reviews, and change management activities • Collaborate with the account management teams on client escalations, service improvement plans, and quarterly business reviews • Design and implement automation solutions using PowerShell, Azure Automation, and other scripting tools to eliminate manual overhead and improve operational efficiency
Job Requirements
- Minimum 5+ years of experience in a third line (L3) or senior helpdesk/service desk/cloud operations role within an MSP or MSSP environment
- At least 2 years of experience in a technical leadership, senior escalation, or mentorship capacity
- Extensive experience supporting enterprise Windows environments, Microsoft Azure, Office 365, and hybrid infrastructure
- Proven track record leading major incident resolution and driving root cause analysis in client-facing environments with stringent SLAs
- Hands-on experience with Jira Service Management and Jira for project and change management
- ITIL Foundation certification or demonstrated working knowledge of ITIL service management frameworks
- Nice to have Microsoft certifications: AZ-104, AZ-305, AZ-500, MS-102, or similar
- Fortinet NSE 4+ certifications
- Cisco CCNP, CCNA, or Meraki certifications
- ITIL Intermediate or higher certifications
- Experience with SIEM platforms, EDR solutions, and security incident response
- DevOps exposure: CI/CD pipelines, Infrastructure as Code (Terraform, ARM/Bicep templates)
- SQL proficiency in operational reporting and data analysis
- Prior experience in financial services, hedge funds, or trading technology environments
- A degree in Computer Science, Information Technology, or related field
Benefits
- Professional development opportunities
- Work-from-home arrangement
- Permanent night shift schedule
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Providing technical leadership and architectural direction across all DevOps initiatives. • Establishing engineering standards for CI/CD, infrastructure-as-code, container orchestration, observability, and DevSecOps practices. • Allocating DevOps resources across concurrent product initiatives based on priorities set by the Director of Product Development. • Conducting performance evaluations, career development planning, and technical mentorship. • Ensuring consistent operational excellence, reliability, security compliance, and automation maturity across environments. • Building scalable DevOps processes that enable autonomy within full-stack teams while maintaining governance and architectural alignment.
• Own and improve on-call processes, incident response playbooks, and post-mortem culture • Define, track, and manage SLOs, SLIs, and error budgets for critical services • Lead blameless post-mortems and drive systematic reliability improvements • Respond to production incidents and coordinate cross-functional resolution • Design, build, and maintain scalable AWS infrastructure using IaC (Terraform, Pulumi) • Manage Kubernetes clusters and containerized workloads in production • Build and maintain CI/CD pipelines to improve deployment speed and reliability • Evaluate and implement tooling to enhance developer productivity and system stability • Implement monitoring, alerting, and distributed tracing (Prometheus, Grafana, Datadog, Jaeger) • Identify and resolve performance bottlenecks across services, networks, and databases • Build dashboards and runbooks for self-service operational insights • Partner with engineering teams to embed reliability practices (load testing, capacity planning, chaos engineering) • Conduct architecture reviews with a focus on reliability and operability
Role Description - Apoiar a implementação, administração e evolução de ambientes em cloud AWS, garantindo estabilidade e alta disponibilidade; - Atuar na operação e gestão de ambientes produtivos, realizando monitoramento, troubleshooting e melhorias contínuas; - Implementar e manter infraestrutura como código (IaC) e automações para provisionamento e configuração de recursos; - Apoiar iniciativas de modernização de aplicações e migração para a nuvem; - Trabalhar em conjunto com times de arquitetura, engenharia, segurança e desenvolvimento para garantir alinhamento técnico; - Garantir que os ambientes sejam seguros, escaláveis, resilientes e eficientes em custo; - Participar da evolução da arquitetura cloud e plataforma, propondo melhorias e boas práticas; - Implementar e manter pipelines de CI/CD, contribuindo para automação de deploys e processos; - Monitorar ambientes e serviços, atuando na análise de incidentes e performance; - Apoiar a implementação de práticas de governança, segurança e conformidade em cloud; - Contribuir para a disseminação de boas práticas de DevOps e cultura cloud no time. Qualifications - Experiência prática com Amazon Web Services (AWS); - Experiência na administração de ambientes cloud em produção; - Conhecimento em sistemas operacionais Linux; - Vivência com ambientes distribuídos e arquitetura em nuvem; - Experiência com automação de infraestrutura; - Conhecimento em práticas de DevOps (CI/CD, automação, versionamento); - Capacidade de atuar com troubleshooting e análise de incidentes; - Boa comunicação e colaboração com times multidisciplinares; - Perfil analítico, organizado e orientado à melhoria contínua. Requirements - Experiência prática com Amazon Web Services (AWS); - Experiência na administração de ambientes cloud em produção; - Conhecimento em sistemas operacionais Linux; - Vivência com ambientes distribuídos e arquitetura em nuvem; - Experiência com automação de infraestrutura; - Conhecimento em práticas de DevOps (CI/CD, automação, versionamento); - Capacidade de atuar com troubleshooting e análise de incidentes; - Boa comunicação e colaboração com times multidisciplinares; - Perfil analítico, organizado e orientado à melhoria contínua. Experiences - #remote
• Design, implement, and maintain Continuous Integration/Continuous Delivery (CI/CD) pipelines to automate software builds, testing, and deployments • Integrate security tools and practices directly into CI/CD pipelines to ensure secure code delivery • Develop and manage Infrastructure as Code (IaC) scripts using tools such as Terraform, Ansible, or CloudFormation to automate infrastructure provisioning • Implement security measures throughout the software development lifecycle, including static code analysis, dynamic application security testing (DAST), and vulnerability scanning • Utilize and manage a modern security stack including GitLab Premium, Invicti, Trivy, AWS ECR managed signing, AWS GuardDuty, and DefectDojo • Manage AWS GovCloud environments and containerized applications using Docker and Kubernetes • Ensure secure configurations for all cloud resources and container orchestration platforms • Implement monitoring tools to track system performance, security, and availability • Respond to incidents promptly, conduct root cause analysis, and implement corrective actions • Maintain detailed documentation of DevSecOps processes, configurations, and security controls • Work closely with development, operations, and security teams to align practices with organizational goals • Utilize ticketing and project management software including ServiceNow and Jira.



