Silicon Valley technology for the country's most critical national security problems
Engineering Manager, DevOps
Location
United States
Posted
3 days ago
Salary
0
Seniority
Lead
Job Description
Engineering Manager, DevOps
Vannevar Labs
• Lead and develop a DevOps team: Hire, mentor, and grow engineers; set clear expectations and create an environment of ownership, high standards, and continuous improvement. • Drive platform reliability: Own the health of CI/CD, deployment, and runtime infrastructure; improve availability, performance, and incident response through measurable SLOs and operational rigor. • Build self-service and automation: Create developer-facing tooling to reduce toil (golden paths, paved roads, templates, and automation for common workflows). • Evolve CI/CD and release engineering: Improve build and deploy pipelines, change management, release safety (progressive delivery, rollbacks), and supply-chain security. • Observability and monitoring: Implement and mature logging, metrics, and alerting; build dashboards and guardrails that help teams understand and improve system behavior. • Infrastructure as code: Standardize and scale infrastructure management using modern IaC patterns and review/approval workflows. • Security and compliance partnership: Work closely with Security and compliance stakeholders to deliver secure-by-default systems and audit-ready practices. • Cross-functional collaboration: Partner with application teams to improve deployability and operability, and translate business priorities into an executable roadmap.
Job Requirements
- Engineering management experience: 3+ years leading and developing software engineers, including coaching, performance management, hiring, and building healthy team practices.
- Deep hands-on background: 8+ years of experience as an individual contributor in DevOps/SRE/platform/infrastructure or software engineering roles.
- Cloud and systems expertise: Strong experience operating production systems in a major cloud environment (AWS preferred) and building for reliability and scale.
- CI/CD and automation: Proven experience designing and operating modern build and deploy pipelines and automating operational workflows.
- Observability: Experience with logging/metrics/alerting stacks (e.g., Datadog, Grafana, CloudWatch, ELK/OpenSearch) and using them to drive reliability improvements.
- Infrastructure as code: Experience with Terraform, Pulumi, or equivalent tools and associated engineering practices (code review, testing, drift detection).
- Containers and runtime: Familiarity with containerization and orchestration (Docker; Kubernetes/ECS preferred).
- Strong communication: Ability to align stakeholders, explain tradeoffs, and drive execution across teams and functions.
Benefits
- Health, dental, and vision insurance
- 100% remote first culture. You can work from anywhere in the US and all full time employees have WeWork access
- Unlimited PTO including competitive vacation and holiday schedules
- Lifestyle stipends - Monthly mental health, wellness & fitness stipend, in-home office setup stipend and family planning assistance
- Salary top-up during military reserve duty
- Fully paid parental leave
- Child and pet care reimbursement during travel
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Design, implement, and maintain CI/CD pipelines (e.g., GitHub Actions, Jenkins, GitLab CI, etc.). • Continuously optimise and reduce infrastructure and cloud costs, especially our AWS bill. • Automate environment provisioning, deployment, and monitoring processes. • Improve observability with tools like Datadog. • Collaborate closely with developers and QA to ensure smooth deployments and operations. • Proactively identify manual processes and automate them wherever possible. • Upgrade and maintain local development environments to ensure consistency, performance, and secure integration (e.g., managing certificates, local TLS, dev containers, etc.). • Apply strong Linux and system administration skills to manage servers, troubleshoot performance issues, and ensure secure, stable environments.
• You will be responsible for operating and contributing to automations and pipelines that add value and increase the squads' productivity; • Manage cloud aspects including vulnerabilities, observability, compliance, and FinOps; • Work within an SRE squad, actively participating in ceremonies, discussions, supporting decision-making, and resolving conflicts.
• Design, build, and maintain AWS-based infrastructure supporting high-performance analytics platforms • Use Terraform for Infrastructure as Code (IaC), ensuring scalable and reproducible deployments • Implement, manage, and optimise Kubernetes clusters for container orchestration and service scaling • Automate CI/CD pipelines, monitoring, and alerting processes • Configure and manage PostgreSQL, MongoDB, and Kafka environments ensuring reliability and performance • Champion cloud security, cost optimisation, and disaster recovery practices • Define and improve reliability practices across cloud platforms and services • Establish and maintain observability standards including metrics, logs, traces, dashboards, and alerts • Participate in incident response, root cause analysis, and post-incident reviews • Create and maintain operational runbooks, playbooks, and automation scripts • Identify operational toil and implement automation to reduce manual work • Collaborate with engineering, analytics, and product teams to ensure reliability, security, and scalability • Drive continuous improvement in deployment reliability, system resilience, and operational processes
DevOps Developer II, Engineering
Policy ReporterProviding access to live medical, diagnostic and pharmaceutical policy updates.
• Collaborate closely with software developers to build, maintain and improve our CI/CD pipelines using GitHub Actions, ensuring efficient and reliable delivery of code • Develop, maintain and extend infrastructure-as-code (IaC) configurations using Ansible and/or Terraform to provision and manage consistent, repeatable environments • Manage and maintain containerized workflows using Docker, including writing and optimizing Dockerfiles and maintaining container image hygiene • Operate and maintain AWX/Semaphore job templates, inventories and workflows to automate routine operational tasks • Maintain and improve observability across applications, scheduled jobs, infrastructure, and CI/CD systems using logging, metrics, alerting, and dashboards • Participate actively in code reviews, providing constructive feedback on DevOps-related contributions from peers within and beyond the Ops team • Contribute and maintain clear and accurate technical documentation for pipelines and infrastructure configurations • Collaborate with the Engineering team to troubleshoot CI/CD pipeline failures, incidents and deployment issues, identifying root causes and implementing durable solutions • Support incident response/reporting activities, including post-incident analysis and implementation of remediation actions • Actively pursue growth in AWS and CloudOps knowledge, taking advantage of available learning resources and internal mentorship opportunities • Communicate task and project status clearly and proactively to teammates and management • Remain available and responsive on an on-call basis as needed for emergencies or scheduled off-hours maintenance/deployments. • Follow secure DevOps practices including secrets management, least-privilege IAM, dependency/image vulnerability remediation, audit-friendly change management, and secure handling of customer-facing systems.



