CDW Corporation is a leading multi-brand provider of information technology solutions to business, government, education and healthcare customers in the United States, the United Kingdom and Canada. A Fortune 500 company and member of the S&P 500 Index, CDW helps its customers to navigate an increasingly complex IT market and maximize return on their technology investments. For more information about CDW, please visit www.CDW.com. Our broad array of products and services range from hardware and software to integrated IT solutions such as security, cloud, hybrid infrastructure and digital experience.
Senior Platform DevOps Engineer
Location
United States
Posted
37 days ago
Salary
0
Seniority
Senior
Job Description
Senior Platform DevOps Engineer
CDW
• Automate and manage Databricks workspaces, clusters, Unity Catalog, identity • Design, implement, and maintain CI/CD pipelines using GitHub Actions and Azure DevOps • Establish and enforce platform guardrails including cluster policies, cost controls, logging, alerting • Ensure platform health, scalability, reliability, and cost optimization across Databricks environments • Provide tier‑3 operational support, troubleshoot complex incidents, and drive root‑cause resolution • Collaborate with data engineering, analytics, and AI teams to enable efficient development and deployment of data solutions
Job Requirements
- 5 years of experience in designing, developing, and deploying solutions on the Databricks platform
- Proficiency in Python, including PySpark, and SQL
- Strong understanding of cloud platforms such as AWS, Azure, or GCP
- Hands‑on experience with data warehousing, data lake, and Lakehouse architectures
- Proven experience building and maintaining ETL and ELT pipelines
- Experience using Git‑based version control and CI/CD practices
Benefits
- Health insurance
- 401(k) matching
- Paid time off
- Professional development opportunities
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Site Reliability Engineer
AceHack 4.0Innovate - Elevate - Transform | 24 Hour in-person Hackathon in Jaipur
• Own reliability, availability, and performance of production systems running in cloud environments • Define and monitor SLIs/SLOs and help manage error budgets across the platform • Lead incident response efforts including detection, triage, mitigation, and postmortems • Improve observability through logging, monitoring, alerting, and dashboards • Automate operational workflows and reduce manual toil wherever possible • Partner closely with engineering teams to improve system resiliency and scalability • Assist with capacity planning, infrastructure optimization, and performance tuning • Build internal tooling, runbooks, and operational best practices • Support Kubernetes-based infrastructure and distributed systems at scale • Act as an escalation point for complex production and platform issues
Role Description Você se motiva a atuar com tecnologia, resolver desafios e fazer a diferença em ambientes dinâmicos? Como parte do time, você terá um papel essencial na sustentação e evolução da nossa infraestrutura, contribuindo diretamente para a estabilidade das soluções e para a continuidade dos nossos produtos. Seu desafio será atuar com olhar técnico e senso de responsabilidade na resolução de incidentes e no dia a dia do ambiente, garantindo respostas eficientes e bem conduzidas. Buscamos alguém que organize, documente e contribua para que os problemas não se repitam. Procuramos uma pessoa com iniciativa, repertório técnico e senso de priorização, capaz de avaliar cenários, propor caminhos viáveis e contribuir com melhorias contínuas. Também é importante ter um olhar crítico sobre o que já existe, colaborando com a evolução do ambiente de forma consistente e estruturada. Se você gosta de ambientes dinâmicos, colaborativos e com espaço para atuação prática no dia a dia, essa pode ser a oportunidade ideal para você. Vamos juntos evoluir a base que sustenta a nossa tecnologia! 🚀 Responsibilities - Gerenciar e evoluir ambientes cloud (Azure, AWS ou GCP) garantindo disponibilidade, escalabilidade e eficiência. - Manter e evoluir infraestrutura como código (Terraform, Ansible) e plataformas baseadas em containers (Kubernetes, Docker). - Definir, acompanhar e defender SLIs e SLOs como instrumentos reais de tomada de decisão. - Implementar e aprimorar observabilidade: monitoramento, logs e tracing (Prometheus, Grafana, Elastic/Sentry). - Responder a incidentes (on-call), reduzir MTTR e conduzir post-mortems com aprendizados concretos. - Automatizar processos operacionais, pipelines de CI/CD e eliminar toil de forma sistemática. - Monitorar e otimizar custos de infraestrutura (FinOps), garantindo uso eficiente dos recursos computacionais. - Apoiar decisões arquiteturais equilibrando custo, performance, segurança e confiabilidade. Qualifications - Cloud: Azure, AWS ou GCP — gerenciamento de ambientes, redes, IAM e custos - Kubernetes (kubectl, Helm) e Docker — operação e troubleshooting em produção - IaC: Terraform/Ansible - CI/CD: GitHub Actions/Jenkins - Linux avançado - Observabilidade: Prometheus, Grafana + ao menos uma ferramenta APM (Elastic, Sentry) - Redes e segurança: DNS, TCP/IP, Load Balancer, VPN, Firewalls, IAM - Scripting: Bash e Python - Inglês técnico (leitura de documentação) Differentials - Certificações: CKA, AWS, Azure ou Terraform Associate. - Mensageria: Kafka ou RabbitMQ - GitOps: ArgoCD ou FluxCD - Experiência com Platform Engineering - Conhecimento em FinOps e otimização de custos de infraestrutura Benefits - Cartão Multibenefício (Swile) - Plano de Saúde - Unimed - Conexa Plus + Psicologia Viva - Plano Odontológico - Metlife - Seguro de vida - Metlife - TotalPass - Day Off no mês do aniversário de vida - Parceria com curso de inglês e espanhol - 20 dias úteis de descanso
• Provide ongoing DevOps and security guidance to engineering and leadership • Review current infrastructure (cloud, CI/CD, access controls) and recommend improvements • Conduct periodic security audits and risk assessments • Advise on and help implement best practices across cloud security, IAM, and data protection • Support incident response for security-related events, as well as helping refine our incident response procedures • Review and strengthen deployment pipelines and system architecture • Assist with security tooling selection and implementation (monitoring, alerting, vulnerability scanning) • Help ensure alignment with SOC 2 and general compliance standards • Partner with engineering on secure system design and new builds when needed • Document recommendations and maintain lightweight security playbooks
• Engage in 24/7 global shift rotations to provide remote support for network repairs and changes while collaborating across teams and updating customers on status and ticket information. • Drive operational improvements in change management and daily operations by following procedures. • Manage and operate large scale IP network technologies and infrastructures. • Utilize your skills in Peering and Datacenter interconnect technologies: PNI, Transit, Exchange, Passive DWDM, Wave circuits. • Monitor and support the network health of on-premises and cloud infrastructures. • Collaborate and develop workflow enhancements while documenting best practices.



