Making visual AI a reality

Principal Infrastructure Engineer

Infrastructure EngineerInfrastructure EngineerFull Time Remote LeadTeam 11-50Since 2018H1B No SponsorCompany Site LinkedIn

Location

United States

Posted

72 days ago

Salary

$250K - $280K / year

Seniority

Lead

EnglishAnsible AWS Azure Distributed Systems Docker GCP Kubernetes MongoDB NoSQL Python Terraform

Job Description

• Shape the architecture and evolution of Voxel51’s infrastructure to support deployments ranging from individual researchers to Fortune 500 enterprises • Design, build, and scale deployment systems across cloud (GCP, AWS, Azure) and on-premises environments, ensuring reliability, security, and repeatability • Partner with enterprise customers to deliver and support production-grade deployments in their environments, guiding them through installation, troubleshooting, and scaling • Lead infrastructure initiatives across engineering teams, enabling peers to develop, test, and ship features faster with robust internal tooling and automation • Drive best practices in CI/CD, evolving our pipelines (currently GitHub Actions + Google Cloud Build) and introducing new approaches where they add value • Develop and maintain deployment solutions for Voxel51-hosted environments (GKE) as well as customer on-prem installations (K8s or Docker Compose) • Champion developer productivity, improving workflows for development and automated cloud deployments • Troubleshoot and resolve complex infrastructure issues, spanning build failures, runtime failures, and customer deployment challenges • Anticipate and prevent failures by designing monitoring, alerting, and predictive solutions for both internal and customer environments • Mentor engineers and set technical direction, ensuring Voxel51’s infrastructure remains ahead of customer needs and industry trends

Job Requirements

Deep experience with containerized environments
Infrastructure as Code expertise (Terraform, Ansible, or equivalent)
Scripting and automation skills (Bash or similar)
Python expertise, including build and environment management, packaging/distribution, release management, and dependency debugging
CI/CD systems experience, ideally GitHub Actions (we use this today)
Cloud infrastructure knowledge, especially GCP (IAM, VPC, load balancing, ingress/egress routing, proxies, firewall rules)
Database fundamentals, ideally MongoDB or similar NoSQL systems
Observability skills, including designing meaningful monitors, logging, tracing, and alerting
Security best practices, including certificates, service accounts, least privilege, and role assumptions
Troubleshooting ability across complex, distributed systems (including with customers in the loop)
Testing mindset: comfortable with designing and applying different types of tests to validate functionality
Strong communication skills, with the ability to work directly with enterprise customers as well as collaborate across teams in a remote-first, collaborative environment
Adaptability and curiosity, with the ability to ramp quickly on unfamiliar concepts and technologies

Benefits

equity in the form of options
a variety of benefits
opportunity to grow in an exciting and collaborative environment

Related Categories

Infrastructure Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More Remote Jobs

More Infrastructure Engineer Jobs

Senior Cloud Infrastructure Engineer

Fingerprint

The device identity platform for high-scale applications. Powered by the world's most accurate visitor identifier.

Infrastructure Engineer72 days ago

Full Time RemoteTeam 51-200Since 2019H1B No Sponsor

Company Site LinkedIn

• Design, write and deliver software that improves the lives of our engineers and the scalability of our platform • Build secure, high-performing, resilient, and efficient infrastructure for a variety of applications and workloads • Partner with engineering teams to architect their services for scale • Get involved in the design and implementation of new developer-facing services and tooling

AWS Kubernetes Terraform

View details: Senior Cloud Infrastructure Engineer

United States

$178K - $205K / year

Apply

Job Closed

Senior Infrastructure Engineer

OpenNebula

The Open Source Cloud & Edge Computing Platform 🚀

Infrastructure Engineer72 days ago

Full Time RemoteTeam 11-50Since 2010H1B No Sponsor

Company Site LinkedIn

• Maintaining all company infrastructures: production (company services: web pages, wordpress), continuous integration, training and demos • Maintain and develop CI infrastructure based on Jenkins, which includes package building, testing and release • Maintain and develop the internal development toolchain and infrastructure • Maintain and develop the release process pipelines • Maintain and develop OpenNebula software deployment and evaluation tools • Control resource usage, assists team members on the usage of the cluster • Perform training sessions with the team for new infrastructure features • Provide different statistics: infrastructure usage, web page visits, software package and repository downloads, marketplace appliances

Ansible Jenkins Linux Prometheus Terraform Unix WordPress

View details: Senior Infrastructure Engineer

Spain

Apply

Site Reliability Engineer

Sofka

Retos técnicos y personales que te mantendrán en constante crecimiento. Un equipo conectado, enfocado en tu bienestar físico y mental. Cultura de mejora continua, fresca y colaborativa, con oportunidades de aprendizaje y gente dispuesta a apoyarte. Programas como Happy Kaizen y WeSofka que cuidan tu bienestar físico y emocional.

Infrastructure Engineer72 days ago

Full Time RemoteTeam 1,001-5,000

Role Description Buscamos un SRE con más de 3 años de experiencia liderando la resiliencia tecnológica y la observabilidad en entornos de alta complejidad. Tu misión será actuar como el puente entre la innovación y la estabilidad, dominando conceptos de Observabilidad & Reliability, Arquitectura de Sistemas Distribuidos y Automatización (IaC). Esta es tu oportunidad para diseñar el futuro de la disponibilidad tecnológica en un entorno remoto, donde tu trabajo impactará directamente en la experiencia de miles de usuarios. Si buscas un reto donde la ingeniería de caos, la autorremediación y la innovación constante sean tu día a día, ¡queremos conocerte! Responsibilities - Adaptar las necesidades de observabilidad a cada solución técnica para asegurar cobertura, visibilidad y eficiencia operativa. - Configurar y mantener dashboards, métricas, alertas y controles críticos para el negocio. - Validar la resiliencia de las soluciones mediante pruebas de caos y evaluaciones de escalabilidad bajo carga. - Implementar patrones de diseño resilientes como circuit breakers, fallbacks y retries en arquitecturas distribuidas. - Identificar y automatizar procesos manuales utilizando herramientas de infraestructura como código para reducir el MTTR. - Liderar la implementación de flujos de autorremediación y promover prácticas de mejora continua en la operación. - Colaborar con equipos de desarrollo y arquitectura para asegurar la calidad técnica en los journeys críticos de los usuarios. Qualifications - Profesional en Ingeniería de Sistemas, Computación, Electrónica o carreras afines. Requirements - Amplia trayectoria en implementación de observabilidad y resiliencia en microservicios, entornos Cloud y equipos ágiles; experiencia comprobada en automatización de tareas operativas y gestión de incidentes bajo metodologías SRE/DevOps. - Conocimientos Técnicos: - Observabilidad: Dynatrace (Hands-on principal), Grafana, Prometheus, OpenTelemetry y ELK Stack. - Automatización e IaC: Ansible, Terraform, Terragrunt y Monaco (Monitoring as Code). - Contenerización: Kubernetes (AKS, EKS), OpenShift (Nivel avanzado) y Docker. - Lenguajes de Programación: Python (Avanzado), Bash, YAML y PowerShell. - Cloud & Infraestructura: Azure, AWS o GCP (Networking, Seguridad y Cómputo). - Gestión de Confiabilidad: Definición de SLIs, SLOs, SLAs y gestión de Error Budgets. - CI/CD: Git, Jenkins, Azure DevOps y GitHub Actions. - Ingeniería de Resiliencia: Chaos Engineering, Circuit Breaker y despliegues Canary/Blue-Green. Benefits - Contrato a término indefinido. - Queremos relaciones a largo plazo y que seas parte de nuestra familia por mucho tiempo. - En Sofka, te ofrecemos un ecosistema de aprendizaje con múltiples herramientas para cerrar brechas y potenciar tus habilidades. ¡Tú decides cómo quieres crecer! - Modalidad: remota.

Grafana Prometheus OpenTelemetry ELK Stack Ansible Terraform Kubernetes Azure Kubernetes Service Amazon EKS OpenShift Docker Python Shell PowerShell Azure AWS GCP Git Jenkins Azure DevOps GitHub Actions

View details: Site Reliability Engineer

Worldwide

Apply

Job Closed

Hosting Lead (Senior Task Lead/Systems Engineer - Hosting)

Node.Digital

A Digital Automation Company - Enabling Frictionless Transactions with Digital Engagement & Intelligent Automation

Infrastructure Engineer72 days ago

Other RemoteTeam 11-50H1B No Sponsor

Company Site LinkedIn

Hosting Lead (Senior Task Lead / Systems Engineer - Hosting) Location: Remote Work - Clearance: Must meet DHS/USCIS background investigation/EOD; support $24x7 after-hours response. Role Summary Lead day-to-day Operations & Maintenance (O&M) across hybrid multi-cloud enterprise, DHS data centers, Equinix data centers, and cloud environments (AWS, Azure, GCP). Own infrastructure readiness, patching, capacity, performance, and Tier II-III incident/problem resolution. Drive automation-first operations and ensure compliance with EIOSS/eAUTO SLAs and AQLs. Key Responsibilities - End-to-End O&M: Own O&M for servers, storage, backup, databases, virtualization, and middleware; ensure operational acceptance with As-Built/VDD/Runbooks. - Patching & Remediation: Lead patching, image baselines, remediation, and hardening for Windows, Red Hat/CentOS, and Solaris. Run integrated patch IPTs with DBAs, Security, and Field Ops. - Infrastructure Management: Manage VMware vSphere, Microsoft Hyper-V, Citrix/VDI, FlexPod, NetApp, and enterprise backup (NetBackup, Backup Exec, Veeam). - Modern Hosting: Run CI/CD-enabled hosting using Jenkins, Harness, Git, and Sonatype; operate OpenShift, Docker, Kubernetes, and UiPath. - Performance Metrics: Deliver bi-weekly health checks and monthly metrics. Meet SLAs/AQLs (e.g., $\ge99.95\%$ availability) and produce capacity/incident reports. - Automation & ITSM: Champion automation (Ansible/Chef) and ServiceNow ITSM/ITOM integration; maintain CMDB accuracy. - Collaboration: Lead collaboration with customers and partners to break down silos and drive automation adoption across Engineering, Security, and the TOC.

AWS Azure GCP Microsoft Windows CentOS VMware Citrix Jenkins Git OpenShift Docker Kubernetes UiPath Ansible Chef ServiceNow

View details: Hosting Lead (Senior Task Lead/Systems Engineer - Hosting)

Virginia

Apply

Job Closed

Principal Infrastructure Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More Infrastructure Engineer Jobs

Senior Cloud Infrastructure Engineer

Senior Infrastructure Engineer

Site Reliability Engineer

Hosting Lead (Senior Task Lead/Systems Engineer - Hosting)