Job Closed
This listing is no longer active.
AI-powered enterprise knowledge management
Senior Engineer, Infrastructure – DevOps
Location
New York
Posted
65 days ago
Salary
$180K - $200K / year
Seniority
Senior
Job Description
Senior Engineer, Infrastructure – DevOps
Pryon
• Design and implement cloud-native architectures for AI/ML applications using Kubernetes (GKE, EKS, AKS) • Architect and maintain CI/CD pipelines using modern GitOps practices with tools like FluxCD and BitBucket • Design and implement observability solutions using Prometheus, Grafana, and other monitoring tools • Create and maintain Infrastructure as Code (IaC) using Terraform • Implement container orchestration strategies using Docker, Kubernetes, and Helm • Design and implement multi-cloud deployment strategies • Establish SLOs/SLIs and implement SRE best practices • Automate operational tasks and create self-healing systems • Mentor team members on DevOps best practices • Collaborate with ML engineers and researchers to optimize model deployment and serving infrastructure • Stay current with emerging technologies and best practices in the DevOps/MLOps space
Job Requirements
- 7+ years of experience in DevOps/Platform engineering
- Deep expertise in Kubernetes, Helm and container orchestration
- Strong experience with a major cloud provider (GCP, AWS, Azure)
- Experience with CI/CD tools and GitOps practices
- Proficiency in Go, Python, or similar programming languages
- Experience with observability tools (Prometheus, Grafana, etc.)
- Knowledge of security best practices and compliance requirements
- Experience with Infrastructure as Code and configuration management
- (Desirable) Experience with MLFlow, AirFlow, KubeFlow or Ray
- BS degree in Computer Science or related field
- Excellent communication and collaboration skills
- Strong problem-solving abilities and systematic thinking
- Experience working in an Agile environment
Benefits
- Remote first organization
- 100% Company paid Health/Dental/Vision benefits for you and your dependents
- Life Insurance, Short-term and Long-term Disability
- 401k
- Unlimited PTO
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
• Run advisory and Roadmapping sessions on DevOps for our Customers. • Lead the design and implementation of our DevOps solutions, encompassing CI/CD pipelining for JavaScript, Java, and .NET and like codebases. • Architect and maintain CI/CD pipelines using tools such as Jenkins, Kubernetes (EKS), and potentially Spinnaker, JenkinsX, and FlowCD. • Ensure rigorous approval processes for deploying into each environment, including Dev, staging, and production. • Manage cloud resources across AWS, Azure, and GCP, with a focus on Azure and AWS as primary providers.
• Ensure the reliability, availability, and performance of customer platforms and services. • Bridge the gap between development and operations.
• Proficiency in scripting, programming, and database management. • Optimize performance and resource management. • Document and manage requirements collaboratively.
• Projetar, implementar e manter a infraestrutura baseada em AWS, garantindo escalabilidade, segurança, alta disponibilidade e conformidade com as melhores práticas de segurança. • Implementar políticas e práticas de segurança desde o desenvolvimento até a operação, com foco em hardening de infraestrutura, análise de vulnerabilidades e conformidade com as diretrizes OWASP. • Administrar e otimizar clusters Kubernetes (EKS), utilizando Karpenter para autoscaling eficiente e KEDA para escalabilidade baseada em eventos, com atenção especial à segurança dos clusters. • Criar e manter pipelines de integração e entrega contínuas (CI/CD) utilizando Terraform e GitHub Actions para provisionamento de infraestrutura e deploys automatizados, integrando testes de segurança e compliance. • Manter e versionar infraestrutura e serviços utilizando Helm e ArgoCD, garantindo rastreabilidade e integridade das configurações. • Implementar e gerenciar serviços como DynamoDB, RDS, Lambda, SQS e SNS, garantindo eficiência, interoperabilidade e segurança entre os sistemas. • Criar e otimizar dashboards e alertas no Grafana, assegurando a coleta e análise de métricas com Prometheus e OpenTelemetry, e incluindo monitoramento de aspectos de segurança. • Colaborar com times de desenvolvimento, produto e segurança para otimizar desempenho, reduzir custos e aprimorar a confiabilidade e segurança da plataforma.



