Senior IT Infrastructure Engineer
Location
United States
Posted
6 days ago
Salary
$115K - $130K / year
Seniority
Senior
Job Description
Senior IT Infrastructure Engineer
Dotdash Meredith
• Develop and maintain scalable, reliable automation and integrations across AWS, GCP, and Azure, SaaS platforms, and custom services (including APIs and event-driven workflows). • Design, implement, and operate CI/CD using Jenkins, Dagger, Terraform, and Docker, with repeatable pipelines for applications and platform components. • Work in a Git-centric workflow (branching, reviews, collaboration) and contribute to infrastructure-as-code and GitOps delivery models. • Build, operate, and troubleshoot workloads on Kubernetes, using Kustomize and Helm, and platform tooling such as Argo CD, Argo Workflows, Argo Events, Argo Rollouts, Crossplane, and related controllers (External Secrets, Sealed Secrets, cert-manager, ingress such as Traefik, AWS Load Balancer Controller, External DNS) as appropriate to the environment. • Support configuration management and fleet-style operations with Ansible where applicable, alongside cloud APIs and automation. • Develop and maintain REST (and SOAP where legacy systems require) APIs and integrations between services, data stores, and identity systems. • Program in Python, Go, and TypeScript/JavaScript (Node.js) (and other languages as needed) for tooling, integrations, and platform automation. • Apply Agile practices to prioritize work, collaborate with stakeholders, and deliver incremental value with clear operational outcomes. • Design and operate event-driven patterns using managed services (e.g. Lambda, SQS, SNS, API Gateway) and/or Kubernetes-native and workflow platforms (Knative, workflow engines, integration with n8n or similar) to improve responsiveness and scale. • Contribute to observability and reliability: metrics, logs, and tracing aligned with stacks such as Prometheus, Grafana, and Grafana Alloy (or equivalent), and operational dashboards for platform health. • Where relevant, support developer portal and platform engineering initiatives (e.g. Backstage-style experiences), container registries (Harbor, artifact stores), and safe promotion patterns across environments.
Job Requirements
- Minimum of five years of experience with a strong focus on automation, integration, and platform or systems engineering.
- Proficiency in AWS and GCP cloud services; familiarity with Azure is a plus.
- Strong hands-on experience with Terraform and infrastructure-as-code patterns.
- Extensive experience with Docker, Jenkins, and CI/CD; experience with Dagger or similar pipeline-as-code approaches is a plus.
- Strong programming skills in Python and Go; proficiency in TypeScript/JavaScript for tooling or services is a plus.
- Experience with Git and collaborative development workflows.
- Experience building and consuming APIs (REST required; SOAP where legacy integrations exist).
- Familiarity with integration and low-code/automation tools (n8n, Zapier, or similar) for connecting systems and accelerating workflows.
- Familiarity with Single Sign-On, SAML, and OpenID Connect (OIDC); experience with identity platforms such as Okta is a plus.
- Hands-on experience with Kubernetes, Helm/Kustomize, and GitOps (Argo CD or equivalent); exposure to Crossplane or multi-cluster patterns is a plus.
- Broad experience integrating SaaS products and cloud services into secure, auditable enterprise patterns.
- Experience with Agile methodologies, iterative delivery, and adapting to changing requirements.
- Strong analytical and problem-solving skills; ability to handle complex, cross-team initiatives and prioritize effectively.
- Excellent communication and collaboration skills; able to work with diverse teams and deliver high-quality, supportable outcomes.
Benefits
- medical, dental, vision, prescription drug coverage
- unlimited paid time off (PTO)
- adoption or surrogate assistance
- donation matching
- tuition reimbursement
- basic life insurance
- basic accidental death & dismemberment
- supplemental life insurance
- supplemental accident insurance
- commuter benefits
- short term and long term disability
- health savings and flexible spending accounts
- family care benefits
- a generous 401K savings plan with a company match program
- 10-12 paid holidays annually
- generous paid parental leave (birthing and non-birthing parents)
- voluntary benefits such as pet insurance, accident, critical and hospital indemnity health insurance coverage, life and disability insurance
Related Guides
Related Categories
Related Job Pages
More Infrastructure Engineer Jobs
• Own and maintain our data pipeline architectures (e.g., critical data ingestion services, ETL pipelines, database mirroring and warehousing), ensuring they are reliable, monitored, and meet SLAs. • Manage and evolve our data modeling environments and provide a smooth, well-documented workflow for analysts and engineers. • Operate and improve our orchestration systems (Dagster), ensuring jobs run reliably and are observable. • Evaluate and rationalize data tooling from Databricks and notebooks (Marimo, Jupyter) to BI/analytics platforms (Redash and alternatives) and guide Voltus toward a sustainable, coherent data platform. • Implement observability for data systems (logging, alerting, metrics) so issues are detected early and data quality is continuously monitored. • Champion data governance and documentation, making datasets well-defined, trustworthy, and easy to navigate. • Collaborate with analysts, data scientists, and platform engineers to ensure the infrastructure you build is intuitive, scalable, and solves real-world problems. • Lay the groundwork for advanced applications by making Voltus’ data reliably accessible via well-documented interfaces, positioning us to adapt to future ML and AI use cases.
Cloud Infrastructure Engineer
Hello HeartEmpowering people to understand and improve their heart health using technology and behavioral science.
• Build, maintain, and scale production-ready cloud infrastructure across AWS and Kubernetes. • Support the development of machine learning pipelines and a full data lake architecture. • Improve build automation processes and help move the team from continuous integration to continuous delivery. • Secure, scale, and operate Kafka clusters on Kubernetes. • Partner with Engineering and Security teams to improve infrastructure reliability, security, and compliance. • Develop dashboards, alerts, internal tools, and response processes to identify and address security and reliability risks. • Improve logging, monitoring, and observability across production systems. • Support containerized application deployments using Docker and Kubernetes. • Help evaluate and adopt new tools that improve developer productivity, system reliability, and infrastructure scalability.
Infrastructure Engineer – AI Platform
OpenVPN Inc.OpenVPN® helps businesses of all sizes create secure, virtualized, reliable networks that scale with your team.
• Own the rollout and operational management of AI-assisted development tools across engineering (e.g., Cursor, Copilot, Claude Code) • Define and implement access controls, license management, and usage policies that satisfy SOC2/ISO 27001 requirements • Build cost tracking and reporting so leadership has visibility into AI tool spend and usage patterns across the org • Reduce friction for engineers adopting these tools while maintaining security and auditability • Partner with teams across the org to identify, build, and support internal AI applications such as RAG pipelines, agents, and automation workflows • Evaluate and recommend tooling, frameworks, and patterns based on what teams actually need • Define where IaaS’s responsibility ends and consuming teams’ begins • Advise on data governance policies for LLM usage, including what data can go into which models, where outputs are stored, and how audit trails are maintained • Ensure AI infrastructure and tooling meets existing SOC2 and ISO 27001 controls and can be evidenced in audits • Provide leadership with clear, regular reporting on AI adoption, cost, risk, and usage across the org • Stand up and manage AI/ML infrastructure, primarily on GCP (Vertex AI) within OpenVPN’s existing environment • Design the Terraform modules and IaC patterns for AI infrastructure that follow the team’s existing conventions (e.g., Atlantis-driven GitOps workflows) • Build visibility into AI/ML infrastructure costs and implement controls consistent with how compute costs are managed elsewhere • Evaluate build-vs-buy decisions for AI/ML infrastructure components and managed services with an eye toward operational fit within existing patterns
Infrastructure Engineer – AI Platform
OpenVPN Inc.OpenVPN® helps businesses of all sizes create secure, virtualized, reliable networks that scale with your team.
• Own the rollout and operational management of AI-assisted development tools across engineering (e.g., Cursor, Copilot, Claude Code) • Define and implement access controls, license management, and usage policies that satisfy SOC2/ISO 27001 requirements • Build cost tracking and reporting so leadership has visibility into AI tool spend and usage patterns across the org • Reduce friction for engineers adopting these tools while maintaining security and auditability • Partner with teams across the org to identify, build, and support internal AI applications such as RAG pipelines, agents, and automation workflows • Evaluate and recommend tooling, frameworks, and patterns based on what teams actually need • Define where IaaS’s responsibility ends and consuming teams’ begins – this boundary doesn’t exist yet; you’ll help draw it • Advise on data governance policies for LLM usage, including what data can go into which models, where outputs are stored, and how audit trails are maintained • Ensure AI infrastructure and tooling meets existing SOC2 and ISO 27001 controls and can be evidenced in audits • Provide leadership with clear, regular reporting on AI adoption, cost, risk, and usage across the org • Stand up and manage AI/ML infrastructure, primarily on GCP (Vertex AI) within OpenVPN’s existing environment • Design the Terraform modules and IaC patterns for AI infrastructure that follow the team’s existing conventions (e.g., Atlantis-driven GitOps workflows) • Build visibility into AI/ML infrastructure costs and implement controls (spot instances, auto-scaling policies, idle resource cleanup) consistent with how compute costs are managed elsewhere • Evaluate build-vs-buy decisions for AI/ML infrastructure components and managed services with an eye toward operational fit within existing patterns



