Remote first tech projects
SRE, Network Engineer – MAAS
Location
Canada
Posted
2 days ago
Salary
0
Seniority
Senior
Job Description
SRE, Network Engineer – MAAS
Pragmatike
• Maintain and support core infrastructure systems with deep knowledge of Linux (Debian/Ubuntu preferred). • Work close to the metal: BIOS, IPMI, RAID setups, and hardware-level diagnostics are part of your comfort zone. • Design and maintain scalable networks using VLANs, L2/L3 routing, VPNs, and especially UniFi equipment. • Automate infrastructure provisioning and operations with Ansible, Bash/Python, and Git-based workflows. • Set up and manage observability stacks, including Prometheus/Grafana for metrics and Graylog, ELK, or Loki for log centralization. • Build tooling for server discovery, config auto-generation, automated OS deployments, PXE/Preseed/Cloud-init, and strong MAAS-based provisioning. • Integrate and/or develop internal APIs for tracking compute and GPU resource allocation, as well as external APIs (billing, monitoring, OpenStack, etc.). • Deploy and maintain virtualization and orchestration systems such as OpenStack (preferably with Kolla-Ansible), Proxmox VE, or VMware ESXi. • Support container-based workloads and isolate services efficiently.
Job Requirements
- Expert-level Linux administration (preferably Debian/Ubuntu).
- Strong MAAS / Ironic / bare-metal automation experience (mandatory).
- Excellent networking fundamentals: VLANs, routing, VPNs.
- Infrastructure as Code (Ansible), scripting (Bash/Python), GitOps.
- Experience with monitoring and logging tools like Prometheus, Grafana, ELK/Graylog.
- Comfort with custom deployment automation (PXE, Preseed, MAAS/Ironic).
- Familiarity with resource tracking, API integrations, and dashboard development.
- Proven experience with OpenStack, Proxmox, VMware, and container orchestration.
Benefits
- N/A
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Site Reliability Engineer
Artificial LabsWe're hiring! | Empowering commercial insurers to write better risks, faster.
• Support system reliability and operability, contributing to monitoring, observability, and infrastructure improvements. • Work with containerised and cloud-based systems, including Docker, Nix, and AWS (e.g. ECS and Fargate). • Develop Infrastructure-as-Code in Terraform, and contribute to code across all the software stack: Nix, scripting (Nu, bash, hell), Haskell, and even a bit of Python and TypeScript. • Participate in incident response and help improve on-call, alerting, and incident management practices. • Communicate effectively in a distributed team, taking ownership and supporting collaboration and continuous improvement.
• Lead a team of DevOps engineers, including performance management, growth planning, and career development • Own the DevOps team roadmap in partnership with the Director of Platform Engineering, including quarterly priorities and capacity planning • Drive technical decisions and architecture reviews for CI/CD, infrastructure automation, and platform tooling • Collaborate with engineering, data platform, data governance, and ITOps leadership on cross-functional initiatives and shared standards • Coach engineers through code review, design feedback, and incident retrospectives • Represent the team in executive forums, including roadmap reviews, FinOps reporting, and architecture councils • Partner with the Director of Platform Engineering on AI tooling governance, including standardization on approved platforms, usage policy, and measuring engineering productivity impact • Design, build, and maintain CI/CD pipelines using GitHub Actions, including reusable workflows, self-hosted runners, and security controls • Implement and operate infrastructure as code using Terraform across multi-account AWS environments • Manage Kubernetes (EKS) clusters, including ArgoCD-based GitOps delivery, ingress, observability, and security policies • Operate secrets management with HashiCorp Vault, including dynamic credentials, JWT/OIDC auth, and External Secrets Operator integration • Build and maintain observability tooling with Grafana, OpenTelemetry, and Kubernetes-native monitoring stacks • Lead incident response and post-incident reviews, including authoring runbooks and reliability improvements • Implement security controls, governance processes, and compliance validation across the platform • Contribute to AWS network architecture, including PrivateLink, VPCs, and cross-account access patterns
DevOps Engineer
Cognitive Medical Systems, Inc.Our purpose is to empower people and organizations to optimize healthcare through innovative technology solutions.
• Monitor, support, and maintain production applications to ensure system availability, reliability, and performance • Review application, server, and system logs to proactively identify, troubleshoot, and resolve issues • Perform root cause analysis and implement corrective actions to prevent recurring incidents • Establish operational monitoring, alerting, and support procedures • Manage and maintain Microsoft SQL Server environments supporting enterprise applications • Lead and manage application deployments across Development, Test, Staging, and Production environments • Support an Agile, Lean, and SAFe-based environment utilizing DevSecOps, CI/CD, and related methodologies • Collaborate with development teams to improve application performance, maintainability, and deployment efficiency • Support JavaScript-based application development efforts as needed
• Manage enterprise storage on Hitachi VSP Gx00 and 5x00 , covering LUN and volume provisioning, troubleshooting, and replication with GAD, UR, and SI • Storage NetApp AFF and FAS , deliver SAN and NAS services, oversee provisioning and health, resolve issues, and configure SnapMirror replication • Handle Purestorage X90 for day to day management, provisioning, and incident resolution • Operate Brocade GEN5 and later switches and directors, performing zoning, pathing, and diagnostics • Use Hitachi OPS Center to monitor Hitachi arrays and analyze performance, NetApp Active IQ Unified Manager to track NetApp health and capacity, and Brocade BNA to administer Brocade fabrics and report on events and performance




