Job Closed
This listing is no longer active.
Agile. Unstoppable.
Senior DevOps Engineer – Azure Cloud
Location
United States
Posted
79 days ago
Salary
0
Seniority
Senior
Job Description
Senior DevOps Engineer – Azure Cloud
Gorilla Logic
• Design and maintain robust Azure cloud infrastructure and services, with familiarity in managing hybrid environments. • Architect network topology, security zones, and firewall rules for multi-tier application stacks. • Define and enforce Infrastructure as Code (IaC) standards across the organization using tools like ARM templates, Bicep, or Terraform. • Evaluate, select, and integrate Microsoft ecosystem tools including Azure DevOps, Azure Kubernetes Service (AKS), and Azure App Service to optimize cloud platform capabilities. • Design, build, and maintain multi-stage, secure, and resilient CI/CD pipelines using GitHub Actions and Azure Pipelines. • Own application build processes: manage dependency restoration, compilation, unit testing, and static analysis for modern software stacks (e.g., .NET, Node.js). • Own front-end pipelines: manage environments, dependencies, testing, and bundling for modern web frameworks (e.g., React). • Implement and enforce branching strategies (e.g., GitFlow, trunk-based development) and configure branch protection policies. • Implement and maintain sophisticated release gates with automated rollback triggers and advanced deployment strategies (e.g., canary, blue-green). • Manage reliable, automated, and scalable deployment to Azure services (e.g., AKS, App Service, Function Apps). • Implement and manage serverless and containerized deployment strategies in Azure. • Automate cloud resource provisioning and configuration using Azure-native tools. • Coordinate closely with network and security teams for critical cloud operations, including certificate management, Azure DNS configuration, and Azure Load Balancer/Traffic Manager setup. • Implement and manage comprehensive monitoring solutions using the Azure stack: Azure Monitor, Application Insights, and Log Analytics Workspaces. • Define, track, and report on SLIs (Service Level Indicators) and SLOs (Service Level Objectives), managing error budgets in collaboration with development leads. • Build actionable dashboards and detailed alert runbooks to support an efficient on-call rotation. • Implement standards for distributed tracing and structured logging across all services. • Integrate proactive DevSecOps practices: SAST (Static Analysis), DAST (Dynamic Analysis), dependency scanning, and secrets detection directly within the CI/CD pipelines. • Manage Azure Active Directory, define and enforce RBAC (Role-Based Access Control) policies, and secure sensitive data using Azure Key Vault integrations. • Ensure technical implementation is fully compliant with organizational security policies and relevant industry/regulatory requirements.
Job Requirements
- 6+ years of progressive experience in DevOps, Cloud Engineering, or a similar role.
- Expert-level expertise in the Azure ecosystem and services (Azure DevOps, Azure Monitor, AKS, Key Vault, Azure Networking, etc.).
- Proven ability to design and manage complex cloud infrastructure with an Azure-first approach.
- Expert-level knowledge of CI/CD principles and strong practical experience with GitHub Actions and/or Azure Pipelines.
- Strong scripting skills in Bash or Python for cloud automation.
- Proficiency with IaC tools (e.g., Bicep, Terraform) for managing Azure resources.
- Experience with application monitoring, logging, and alerting systems, particularly the Azure monitoring stack.
- Solid understanding of networking fundamentals, security best practices (e.g., firewall rules, RBAC, least privilege), and certificate management within Azure.
- Familiarity with modern application development concepts is a plus.
Benefits
- Professional development opportunities
- Flexible work arrangements
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevOps Engineer
InframarkInframark's Operations and Maintenance team is an award-winning team that delivers cutting-edge water, wastewater, and public works services to municipalities, utility districts, and industries. We are dedicated to supporting our employees as well as protecting the environment and the communities we serve. You would be empowered to thrive in a dynamic, supportive, and innovative environment. Our dedication to sustainability and community impact drives us to ensure clean, safe water for future generations. Whether you're at the start of your career or looking for advancement, Inframark offers purpose-driven work and opportunities for growth.
Join Inframark: Pioneering Automation and Intelligence Step into the future with Inframark's award-winning Automation and Intelligence team. We deliver cutting-edge solutions in instrumentation and controls, industrial cybersecurity, data analysis, and remote network operations center services for water and wastewater plants. Elevate your career and join us at Inframark. Apply today! Why Work for Inframark? Our dedication to sustainability and community impact drives us to ensure clean, safe water for future generations. Whether you're at the start of your career or looking for advancement, Inframark offers purpose-driven work and opportunities for growth. We offer an attractive salary package, including a generous benefits package with health, dental, and life insurance, 401(k) plan, paid time off, sick leave, holidays, and wellness plan. Job Title: DevOps Engineer Location: Remote (Eastern Time zone preferred - AWS GovCloud requirement) Reports To: Sr. Director of Technology and Architecture Position Overview We're looking for a DevOps Engineer who takes ownership of infrastructure. You'll stabilize and modernize the infrastructure supporting WaterMinds, our cloud-based platform for water and wastewater utilities—implementing proper monitoring and alerting, upgrading production environments, establishing operational discipline, and enabling our engineering teams to ship with confidence. You'll follow DevOps best practices, proactively identify and solve problems, and drive infrastructure improvements with minimal direction. The challenge: build and maintain infrastructure that can reliably serve hundreds of utility customers at scale. Your immediate focus is moving our infrastructure from reactive firefighting to proactive maintenance mode. As the platform matures and our data science team ramps up, you'll have the opportunity to transition into MLOps, building the infrastructure that enables machine learning at scale. Key Responsibilities Take ownership of production monitoring and alerting using Prometheus, Grafana, and CloudWatch—proactively identify issues before they become incidents. Modernize production EKS cluster with GitOps practices (ArgoCD), comprehensive monitoring, and proper deployment workflows following industry best practices. Streamline staging deployment process; eliminate branch-based workarounds and establish clean GitOps patterns. Design infrastructure patterns that scale to hundreds of customers and own AWS infrastructure operations including patching, maintenance, cost optimization, and security compliance—stay ahead of requirements. Expand into MLOps—building the infrastructure that enables data scientists to deploy models at scale across multiple utility customers once DevOps operations are automated. Manage Kubernetes clusters (EKS) including pod migrations, resource optimization, troubleshooting, and security updates—proactively, not reactively. Maintain infrastructure as code using Terraform and Ansible following best practices—all changes tested in non-production before deployment. Support engineering teams with infrastructure needs, unblock them quickly, and establish self-service patterns where possible—anticipate needs, don't wait for requests. Manage message queue infrastructure (Kafka/Redpanda) including retention policies, storage optimization, and performance tuning. Document infrastructure, create runbooks, and automate operational tasks to move systems into maintenance mode. Clean up technical debt—proactively identify infrastructure to decommission, resources to consolidate, and costs to optimize. Qualifications 5+ years of experience in DevOps, infrastructure, or site reliability engineering. Demonstrated ability to take ownership and initiative—you see what needs to be done and do it without waiting for direction. Deep knowledge of DevOps and infrastructure best practices—you know what good looks like and implement it proactively. Strong Kubernetes experience (EKS preferred) including cluster management, deployments, services, and troubleshooting. Hands-on AWS experience (EC2, EKS, ECS, RDS, VPC, IAM, CloudWatch, S3). Infrastructure as code proficiency (Terraform and Ansible). GitOps experience (ArgoCD, Flux, or similar). CI/CD pipeline experience (Bitbucket Pipelines, Jenkins, GitHub Actions, or similar). Monitoring and observability experience (Prometheus and Grafana preferred). Python scripting ability for automation and tooling. US citizenship (required for AWS GovCloud access). Self-starter mentality—you identify problems and opportunities, then drive solutions to completion. Proven track record of delivering tested, high-quality infrastructure changes on schedule. Excellent communication skills—proactive about sharing status, raising blockers, and documenting decisions. Bonus Points For Curiosity about machine learning and interest in transitioning to MLOps as the platform matures. Any MLOps or ML infrastructure experience (KServe, Kubeflow, SageMaker, model serving). Experience with data pipelines, feature engineering, or supporting data science teams. AWS GovCloud experience and understanding of compliance requirements (FedRAMP). Experience with message queue systems (Kafka, Redpanda). Container security and vulnerability scanning (Snyk). Background in SaaS platforms, IoT, or critical infrastructure. Inframark is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, age, religion, sex, sexual orientation, gender identity, national origin, or protected veteran status and will not be discriminated against based on disability. Learn more about us at Automation and Intelligence - Inframark
• Own end-to-end release and deployment lifecycle: build → package → deploy → verify → rollback • Develop and support **Octopus Deploy** projects, lifecycles, channels, variables, and deployment processes • Implement deployment automation with **Ansible** (playbooks/roles, inventories, idempotent changes) • Maintain Git-based release workflows in **GitHub** (branching, tagging, versioning, release notes) • Build/maintain CI pipelines in GitHub Actions (or existing tooling) to produce artifacts and trigger Octopus releases • Standardize deployment patterns across applications (templates, shared steps, reusable Ansible roles) • Manage environment configuration and secrets in a controlled way (variable sets, permissions, auditing) • Improve deployment safety: approvals, health checks, smoke tests, automated validation, and rollback strategies • Support production releases, troubleshoot deployment failures, and drive root-cause analysis • Maintain release documentation, runbooks, and change management practices • Collaborate with developers, QA, and operations to plan releases and reduce downtime
• Build and operate metrics/monitoring platforms: Prometheus and/or VictoriaMetrics (scrape configs, exporters, recording rules) • Design and maintain alerting strategy: thresholds, anomaly detection where applicable, alert routing, deduplication, and noise reduction • Integrate monitoring/alerting and events with BigPanda (correlation, enrichment, routing, incident workflows) • Create and maintain dashboards and operational visibility (Grafana or equivalent) • Develop and maintain runbooks, operational playbooks, and incident response procedures • Participate in on-call shifts: triage alerts, manage incidents, coordinate response, and lead communication during outages • Perform root-cause analysis, postmortems, and implement corrective/preventive actions • Improve service reliability via SLOs/SLIs, capacity planning, and automation to reduce toil • Support monitoring for core infrastructure and services on Windows and Linux, including HA components and clusters • Collaborate with DevOps/Engineering to instrument applications and standardize telemetry (metrics, logs, traces where applicable)
• Own end-to-end release and deployment lifecycle: build → package → deploy → verify → rollback • Develop and support **Octopus Deploy** projects, lifecycles, channels, variables, and deployment processes • Implement deployment automation with **Ansible** (playbooks/roles, inventories, idempotent changes) • Maintain Git-based release workflows in **GitHub** (branching, tagging, versioning, release notes) • Build/maintain CI pipelines in GitHub Actions (or existing tooling) to produce artifacts and trigger Octopus releases • Standardize deployment patterns across applications (templates, shared steps, reusable Ansible roles) • Manage environment configuration and secrets in a controlled way (variable sets, permissions, auditing) • Improve deployment safety: approvals, health checks, smoke tests, automated validation, and rollback strategies • Support production releases, troubleshoot deployment failures, and drive root-cause analysis • Maintain release documentation, runbooks, and change management practices • Collaborate with developers, QA, and operations to plan releases and reduce downtime

