DevOps Engineer

Location

United States

Posted

57 days ago

Salary

$90.4K - $135K / year

Seniority

Mid Level

No structured requirement data.

Job Description

DevOps Engineer

Drexel University

Job Summary The University Research Computing Facility (URCF) at Drexel University is building a new shared computing platform focused on GPU-accelerated workloads, particularly AI model training. The system includes GPU and CPU compute nodes with Nvidia H200, A100, and Grace Hopper hardware, orchestrated by Kubernetes on bare-metal, as well as a 1 PB high-performance Weka storage cluster and a 3 PB S3-compatible archival storage system with iRODS as the metadata layer. The DevOps Engineer will help build and operate this platform, working alongside the URCF’s Research Computing Specialist and collaborators in Drexel IT. The platform is under active development, and URCF is itself in the process of adopting container-native tools and workflows coming from a more traditional HPC background. This means the role involves building new things, improving what exists, and navigating some institutional learning curves alongside us. We currently use the following Technologies: - Ansible - Warewulf - Proxmox - Kubernetes (RKE2) - Cilium - Kyverno - Envoy - Kubeflow - Weka - iRODS - STORJ - Globus - Rocky Linux - Python and - Bash. PLEASE NOTE: You don’t need experience with all of these. We include the list so you can get a sense of the environment This is a grant-funded position through September 1, 2027. It is fully remote. If you’re not sure whether you’re qualified, we’d encourage you to apply anyway. This position is grant-funded; employment is contingent upon the continued availability of those funds. Essential Functions - Develop and maintain automation for provisioning, configuring, and managing the cluster (Ansible, Warewulf, Kubernetes manifests, shell scripts). - Contribute to the Kubernetes platform layer, including networking, storage integration, security policies, and workload orchestration. - Help built out storage infrastructure, including iRODS and Globus/Globus Connect Server for data transfer, as well as the integrations between these systems and the compute cluster. - Troubleshoot issues across the stack, from bare-metal boot problems to container orchestration bugs. - Write and maintain operational and user-facing documentation. - Coordinate with Drexel’s IT teams on shared infrastructure concerns (networking, DNS, firewall rules, etc.). - Contribute to web application development for a user-facing portal for project management, permissions, and usage tracking. Required Qualifications - Minimum of a Bachelor's Degree in Computer Science, Engineering, or a related field or the equivalent combination of education and work experience (Please review the Equivalency Chart for additional information). - Minimum of 1–3 years of experience. - Experience with infrastructure tooling such as Linux systems administration, configuration management, containers, or container orchestration. - Comfortable working in a terminal with tools like Git, SSH, and a text editor. - Working proficiency with at least one scripting language (Python, Bash, etc.). - Strong written communication skills. - Ability to work independently and manage your own time in a fully remote setting. Preferred Qualifications - Experience with Kubernetes. - Experience with bare-metal provisioning or HPC cluster management. - Familiarity with any of: Ansible, Warewulf, RKE2, Cilium, Kubeflow, Weka, iRODS, Globus, infrastructure-as-code tools generally. - Web application development experience (any stack). - Experience in an academic or research computing environment. Physical Demands - Typically sitting at a desk/table - Lifting demands ≤ 25lbs Location - Remote Additional Information This position is classified as Exempt, grade N. Compensation for this grade ranges from $90,430.00 - $135,64000 per year. Please note that the offered rate for this position typically aligns with the minimum to midrange of this grade, but it can vary based on the successful candidate’s qualifications and experience, department budget, and an internal equity review. Applicants are encouraged to explore the Professional Staff salary structure and Compensation Guidelines & Policies for more details on Drexel’s compensation framework. For information about benefits, please review Drexel’s Benefits Brochure. Special Instructions to the Applicant Please make sure you upload your CV/resume and cover letter when submitting your application. A review of applicants will begin once a suitable candidate pool is identified. #LI-Remote Job duties: • Develop and maintain automation for provisioning, configuring, and managing the cluster (Ansible, Warewulf, Kubernetes manifests, shell scripts). • Contribute to the Kubernetes platform layer, including networking, storage integration, security policies, and workload orchestration. • Help built out storage infrastructure, including iRODS and Globus/Globus Connect Server for data transfer, as well as the integrations between these systems and the compute cluster. • Troubleshoot issues across the stack, from bare-metal boot problems to container orchestration bugs. • Write and maintain operational and user-facing documentation. • Coordinate with Drexel’s IT teams on shared infrastructure concerns (networking, DNS, firewall rules, etc.). • Potentially contribute to web application development for a user-facing portal for project management, permissions, and usage tracking. (This isn’t the core of the role, but if you have web development experience and are interested, there’s real work to be done here.)Essential -->

Related Categories

Related Job Pages

More DevOps Engineer Jobs

CorroHealth logo

Senior DevOps

CorroHealth

Clinically Led Healthcare Analytics Intelligent Technology to Improve your Financial Health

DevOps Engineer57 days ago
Full TimeRemoteTeam 5,001-10,000H1B Sponsor

• Apply methods, concepts, and theories to new situations. • Mentor less experienced employees in fundamentals of development and technology stack • Adapt and apply changes to local team processes to support team goals • Cooperate with the business development, product, design, and development teams to participate in product feature and design discussions and to regularly demo your well tested and peer reviewed code • Remove roadblocks to development using excellent troubleshooting and problem-solving skills • Identify meaningful opportunities for improvement; mentor and be mentored by others • Provide clear written communication using collaboration tools • Demonstrate good presentation skills • Separate complex topics into understandable parts • Influence the accomplishment of tasks beyond personal scope of responsibility • Work with minimal supervision to provide solutions in a timely fashion • Bring a passion to learn and the best engineering practices to the team

United States
Job Closed
Centene Corporation logo

Senior DevSecOps Engineer

Centene Corporation

Transforming the health of the communities we serve, one person at a time.

DevOps Engineer57 days ago
Full TimeRemoteTeam 10,001+Since 1984H1B No Sponsor

You could be the one who changes everything for our 28 million members by using technology to improve health outcomes around the world. As a diversified, national organization, Centene's technology professionals have access to competitive benefits including a fresh perspective on workplace flexibility. Position Purpose: Drives the automation of secure software builds, test and deployment systems, and infrastructure. Manages various development, test, staging, and demo environments (code deployment using CI/CD pipelines, backups, data refreshes), as well as deploys and manages software in production environment while leveraging automation. Continually advances the technology in a collaborative and creative agile environment using the latest technologies and industry best practices to deliver solutions which meet business objectives more effectively. Incorporates security into development builds, and creates a "security as code" culture that prioritizes delivery of secure services as a core characteristic. - Partners with software engineers to identify security vulnerabilities in code and develop mitigation recommendations - Promotes a DevSecOps and Agile mindset across the technology functions - Designs, creates, and supports complex security tests in CI/CD pipelines, such as container and API scanning - Designs and implements effective and efficient tools to secure build/release pipelines for cloud native services - Develops moderately complex code for collecting and injecting data from security vendors API’s - Proactively evolves the existing platform by guiding security standards and implementing improvements - Partners with the platform engineering team to develop complex CI/CD security processes - Designs solutions that are compliant with partner security compliance requirements - Implement automated secrets management, credential rotation, and other secure API authentication techniques - Evaluates latest information security threats and recommends suitable defense measures - Reviews architectural changes for security implications and recommends enhancements - Develops information security activity monitoring reports - Stays updated on latest technologies to improve security practices to create advantage - Identifies, evaluates, and conducts proof-of-concepts for new technologies, enabling secure development of core architectural components - Performs other duties as assigned - Complies with all policies and standards Education/Experience: A Bachelor's degree in a quantitative or business field (e.g., statistics, mathematics, engineering, computer science) and Requires 4 – 6 years of related experience. Or equivalent experience acquired through accomplishments of applicable knowledge, duties, scope and skill reflective of the level of this position. Technical Skills: - Experience with one or more of the following: Linux or Windows, Docker, Kubernetes, SQL/NoSQL Databases and CI/CD tools - Experience with scripting languages and automation tools such as Ansible - Experience with infrastructure-as-code tooling such as Terraform or CloudFormation - Knowledge of large-scale Front-End architecture and data-driven development - Knowledge of security concepts and secure coding practices - Experience with vendor support for troubleshooting and installations. Soft Skills: - Intermediate - Seeks to acquire knowledge in area of specialty - Intermediate - Ability to identify basic problems and procedural irregularities, collect data, establish facts, and draw valid conclusions - Intermediate - Ability to work independently - Intermediate - Demonstrated analytical skills - Intermediate - Demonstrated project management skills - Intermediate - Demonstrates a high level of accuracy, even under pressure - Intermediate - Demonstrates excellent judgment and decision making skills Pay Range: $87,000.00 - $161,300.00 per year Centene offers a comprehensive benefits package including: competitive pay, health insurance, 401K and stock purchase plans, tuition reimbursement, paid time off plus holidays, and a flexible approach to work with remote, hybrid, field or office work schedules. Actual pay will be adjusted based on an individual's skills, experience, education, and other job-related factors permitted by law, including full-time or part-time status. Total compensation may also include additional forms of incentives. Benefits may be subject to program eligibility. Centene is an equal opportunity employer that is committed to diversity, and values the ways in which we are different. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or other characteristic protected by applicable law. Qualified applicants with arrest or conviction records will be considered in accordance with the LA County Ordinance and the California Fair Chance Act

United States
$87K - $161K / year
Job Closed
redbee logo

DevOps Ssr

redbee

Connecting businesses and technology expertise. Conectando negocios con expertise tecnológico.

DevOps Engineer57 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

• Proveer soporte y evolucionar la infraestructura en AWS (EKS, Cognito, Lambda) y entornos Tanzu, asegurando estabilidad y escalabilidad. • Diagnosticar y resolver incidentes en sistemas cloud y on-premise, garantizando disponibilidad, resiliencia y desempeño. • Configurar y mantener pipelines de CI/CD y GitOps (GitLab, ArgoCD), asegurando despliegues eficientes y confiables. • Implementar prácticas de observabilidad (logging, métricas y tracing) y colaborar en estrategias de recuperación ante desastres y tolerancia a fallos.- Optimizar costos en la nube, automatizar procesos y asegurar la infraestructura mediante buenas prácticas de seguridad (OAuth, JWT, OIDC, HTTPS).

Argentina
Zup Innovation logo

Profissional de SRE/DEVOPS (AWS)

Zup Innovation

We create digital assets to build, grow and accelerate your applications with efficiency, security and scalability.

DevOps Engineer57 days ago
Full TimeRemoteTeam 1,001-5,000H1B No Sponsor

Nosso propósito é criar tecnologias que desafiam as melhores do mundo e mudam o jogo para nossos clientes. Por isso, buscamos profissionais que desejam fazer parte de uma cultura pautada pela excelência e inovação. Se você se identifica com um ambiente de trabalho colaborativo e movido por curiosidade, venha construir o futuro da tecnologia conosco. O que você fará por aqui - Desenhar e criar arquiteturas complexas e críticas em ambientes cloud, garantindo segurança e otimização de custos. - Desenvolver e implementar soluções customizadas de monitoramento e observabilidade, incluindo a criação de dashboards no DataDog. - Conduzir treinamentos, mentorias e disseminação de boas práticas SRE para elevar a maturidade da comunidade técnica. - Participar da discussão e definição de padrões para infraestrutura como código (IaC), integração contínua e automação via GitOps. - Colaborar com a melhoria contínua da plataforma de desenvolvimento, fornecendo feedback técnico e sugestões de aprimoramento. - Atuar como referência técnica, apoiando squads na resolução de problemas complexos relacionados à infraestrutura cloud e observabilidade O que esperamos que você saiba - Sólido domínio em arquiteturas AWS (multi-account, VPCs isoladas, segurança e controle de custos). - Experiência avançada com ECS, EKS e demais serviços de computação em nuvem. - Automação de infraestrutura usando práticas GitOps e ferramentas como Terraform. - Desenvolvimento de soluções avançadas de monitoramento e observabilidade (DataDog, logs estruturados, tracing distribuído). - Implementação de estratégias de disaster recovery e alta disponibilidade em cloud. Instrumentação customizada para observabilidade de aplicações e infraestrutura. O que seria muito legal se você soubesse - Experiência prática em mentoria técnica e treinamentos sobre SRE e observabilidade. - Domínio de técnicas avançadas de arquitetura para compliance (ex: LGPD) em cloud. - Conhecimento em integração de IA com pipelines DevOps ou monitoramento. - Participação ativa em discussões para evolução de plataformas internas ou enterprise. - Capacidade de desenhar soluções escaláveis para plataformas de múltiplos squads. O que te oferecemos Atuamos em um modelo de trabalho remoto por padrão, priorizando a sua liberdade e responsabilidade. Além disso, proporcionamos: Carreira - Liberdade para trabalhar de onde quiser - Horários flexíveis - Auxílio Educação - Ferramenta própria de desenvolvimento de carreira - Guildas internas e grupos de estudo e interesse Saúde e bem-estar - Plano de saúde - Plano odontológico - Parceria na compra de medicamentos - Telemedicina disponível 24x7 - Terapia online gratuita - Wellhub - Licença maternidade estendida - Licença paternidade estendida - CAZ – Central de Atendimento a zuppers Conforto financeiro - Vale-refeição e alimentação - Seguro de vida - Vale-transporte - Auxílio home office - Auxílio Creche - Auxílio plano telefônico - Participação em Lucros e Resultados

Brazil
Job Closed