Vultr is on a mission to make high-performance cloud computing easy to use, affordable, and locally accessible.
Network DevOps Engineer, RDMA Fabric Automation
Location
United States
Posted
102 days ago
Salary
$90K - $130K / year
Seniority
Senior
Job Description
Network DevOps Engineer, RDMA Fabric Automation
Vultr
• Automate deployment and operations of large-scale RDMA (RoCEv2) Ethernet fabrics across Vultr data centers. • Build Ansible and Python-based frameworks to provision, validate, and remediate underlay and overlay networks. • Integrate network automation with Vultr’s source-of-truth systems (NetBox, OpsMill) for intent-driven configuration and validation. • Develop telemetry ingestion and correlation pipelines (gNMI, Prometheus, Kafka, custom collectors) for real-time network health and performance metrics. • Collaborate with platform, orchestration, and product engineering teams to optimize RDMA performance, PFC/ECN behavior, and path symmetry across fabrics. • Implement CI/CD workflows for network configuration changes — validation, pre-checks, and rollbacks. • Investigate complex network behaviors across layers — flow hashing, congestion domains, ECMP, and overlay interactions. • Contribute to the design of next-generation GPU and AI interconnect fabrics, ensuring seamless integration into Vultr’s global network architecture.
Job Requirements
- Solid understanding of modern data center networking: EVPN-VXLAN, BGP, MLAG, QoS, and traffic engineering.
- Deep familiarity with RoCEv2, RDMA transport tuning, ECN/PFC, and lossless Ethernet design.
- Strong experience with automation frameworks like Ansible, and languages like Python, Golang, Rust, or PHP
- Comfort working with telemetry and monitoring stacks — Prometheus, Grafana, Loki, ELK, or similar.
- Previous experience integrating with NetBox, Nautobot, OpsMill or similar for topology and configuration source-of-truth.
- Familiarity with CI/CD systems (GitHub Actions, Jenkins, ArgoCD) for continuous delivery of network automation.
- Strong Linux networking background, including namespaces, netlink, and system-level debugging.
Benefits
- 100% company-paid insurance premiums for employee medical, dental and vision plans.
- 401(k) plan that matches 100% up to 4%, with immediate vesting
- Professional Development Reimbursement of $2,500 each year
- 11 Holidays + Paid Time Off Accrual + Rollover Plan
- Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year
- $500 stipend for remote office setup in first year + $400 each following year
- Internet reimbursement up to $75 per month
- Gym membership reimbursement up to $50 per month
- Company paid Wellable subscription
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Design, build, and maintain scalable and resilient infrastructure on Microsoft Azure to support production SaaS workloads • Define and track service level objectives (SLOs), service level indicators (SLIs), and error budgets to drive reliability decisions • Build and maintain comprehensive monitoring, alerting, and observability systems to ensure early detection of issues • Develop and maintain CI/CD pipelines using GitHub Actions to enable safe, rapid, and repeatable deployments • Lead incident response and on-call rotations, conduct blameless post-incident reviews, and drive follow-up action items to completion • Automate operational tasks and eliminate toil through scripting, infrastructure-as-code, and self-healing systems • Manage and optimize Azure Kubernetes Service (AKS) clusters, container orchestration, and related networking and storage configurations • Collaborate with software engineering teams to embed reliability into application architecture, including capacity planning, load testing, and chaos engineering • Maintain and improve infrastructure-as-code using tools such as Terraform, Bicep, or ARM templates • Partner cross-functionally with Product, Support, and Quality to reduce friction and accelerate delivery
Senior DevOps – Platform Engineer, Harness
XebiaCreating Digital Leaders. Digital Transformation Consultancy Services and Solutions
• Own and evolve the Harness platform while enabling fast, safe, and reliable cloud-native deployments across AWS, Azure, and GCP environments • Design and maintain Harness CI/CD pipelines for Kubernetes, ECS, Serverless, and VM workloads • Implement modern deployment strategies including Canary and Blue-Green releases • Build reusable pipeline templates and delivery workflows • Standardize infrastructure provisioning using Terraform and Helm / Kustomize • Embed Security, quality gates, and automated testing into CI/CD pipelines • Integrate Observability tooling and support platform reliability • Onboard and enable engineering teams on platform capabilities
• Architect and operate multi-region deployments across AWS, GCP, or Azure • Build and maintain high-throughput telemetry ingestion pipelines • Design autoscaling and failover strategies for mission-critical services • Own observability systems including Prometheus, Grafana, and distributed tracing • Improve MTTR and operational readiness processes • Manage CI/CD pipelines, GitOps workflows, and automated deployments • Collaborate with backend teams on API performance and infrastructure reliability • Harden infrastructure for security, compliance, and tenant isolation • Drive long-term infrastructure roadmap and architectural direction
Senior DevOps, Platform Engineer
XebiaCreating Digital Leaders. Digital Transformation Consultancy Services and Solutions
• own and evolve the Harness platform • enable fast, safe, and reliable cloud-native deployments across AWS, Azure, and GCP environments • design and maintain Harness CI/CD pipelines for Kubernetes, ECS, Serverless, and VM workloads • implement modern deployment strategies including Canary and Blue-Green releases • build reusable pipeline templates and delivery workflows • standardize infrastructure provisioning using Terraform and Helm / Kustomize • embed Security, quality gates, and automated testing into CI/CD pipelines • integrate Observability tooling and support platform reliability • onboard and enable engineering teams on platform capabilities



