Spend is the fuel to help your company deliver performance, profitability, and purpose!
Lead Data Platform Engineer
Location
New York
Posted
1 day ago
Salary
$125K - $174.3K / year
Seniority
Senior
Job Description
Lead Data Platform Engineer
Coupa Software
• Manage end-to-end **Data pipeline **(ETL jobs) within agreed SLAs. • Manage AWS core and **big data services** (S3, IAM, EMR, Redshift, etc..). • Running applications in containers (ECS, Docker). • Lead Day 2 operational lifecycle for ML and GenAI infrastructure. This includes designing, deploying, and maintaining high-availability production LLM serving platforms, implementing automated scaling, self-healing, and infrastructure-as-code patterns. Focus on proactive reliability, model performance observability, and continuous cost optimization for high-compute AI workloads. • Collaborate closely with our product development and engineering teams to create AI-driven features. • Drive cloud operations consistency by automating platform maintenance, standardizing infrastructure configurations (IaC), and implementing robust release management processes to minimize drift across multi-cloud environments. • Manage AWS infrastructure using code (Terraform, Chef, etc..). • Administering applications running in Linux operating system. • Enable application and system monitoring for better observability. • Application and infrastructure support for ETL jobs and data pipelines including participating in an on-call rotation for after-hours emergencies. • Collaborate with platform and Dev teams to plan and deploy product releases and patch Linux/ECS clusters. • Ability to participate in design reviews, code reviews, and troubleshooting incidents. • Ability to operate in a high-pressure environment and troubleshoot complex issues quickly while successfully handling multiple priorities. • Ability to record, write, and review RCAs.
Job Requirements
- Bachelor's Degree and at least 8+ years of experience managing Big Data technologies and Data Pipelines.
- Sound knowledge and experience in Linux administration and troubleshooting.
- 5+ years of experience in managing cloud infrastructure and platforms, such as AWS and Azure.
- Familiar with the current engineering landscape in the generative AI space and have a strong interest in AI and related technologies.
- Strong expertise in MLOps and production-grade LLM operations. Proven track record in managing high-availability model inference clusters, automating model lifecycle management, and implementing advanced observability (latency, throughput, and error rate monitoring) specifically for AI workloads.
- Have Bash or Python scripting experience.
- Experience with containerization, Amazon ECS, EKS/ Azure AKS.
- Experience with tools like Chef, Ansible, Jenkins, Rundeck, or equivalent.
- Experience with source control systems such as Git and operating in complex branching strategies.
- Experience with Infrastructure as Code products like Terraform, helm charts.
- Good understanding of DNS and Load balancers setup and troubleshooting.
- Experience in Big Data platforms/Data lakes and managing Business Intelligence tools (like looker..).
- Knowledge in ApacheSpark architecture and troubleshooting Java applications.
- Basic understanding of MySQL Server and general database knowledge.
- Excellent written and verbal communication with a passion for solving the problem.
- Confidence in your ability to own and deliver projects and issues to resolution on your own & can think and act globally.
- Deep experience in Day 2 cloud operations, including automated incident remediation, capacity planning, and managing large-scale production cloud environments with a focus on performance and reliability.
Related Guides
Related Categories
Related Job Pages
More Platform Engineer Jobs
Platform Engineer (GitHub Enterprise)
Grupo DataOur client is a leading investment bank in Latin America, offering a broad range of financial services across investment banking, asset management, and wealth management. They are known for their innovation-driven environment and advanced technology platforms that support their global operations.
Role Description Administrar e dar suporte ao ambiente GitHub Enterprise. - Identificar, diagnosticar e resolver problemas relacionados ao GitHub, incluindo configuração de repositórios, permissões e gerenciamento de acessos. - Desenvolver automações utilizando GitHub Actions e a GitHub API. - Construir, manter e otimizar pipelines de CI/CD. - Implementar automações de deploy e apoiar iniciativas de Infrastructure as Code (IaC). - Definir e aplicar padrões de governança de repositórios e boas práticas de desenvolvimento. - Melhorar a experiência dos desenvolvedores (Developer Experience - DevEx) por meio da criação de automações, documentação e soluções self-service. - Contribuir para iniciativas de segurança da plataforma, controle de acesso e compliance. - Integrar pipelines de CI/CD com ambientes em nuvem como AWS, Azure e GCP. - Participar da escala de plantão (on-call) da equipe, quando necessário. - Trabalhar em conjunto com os times de Engenharia, Segurança e Infraestrutura para entregar soluções de plataforma escaláveis. Qualifications - Experiência com GitHub Enterprise. - Experiência prática com GitHub Actions. - Vivência na construção e manutenção de pipelines de CI/CD. - Sólidos conhecimentos em controle de versão utilizando Git. - Conhecimento de estratégias de branching, como Git Flow, Trunk-Based Development ou similares. - Experiência com automação de deploy. - Conhecimento em Infrastructure as Code (IaC) utilizando ferramentas como Terraform, CloudFormation ou equivalentes. - Conhecimento sobre segurança em pipelines de CI/CD e boas práticas de DevSecOps. - Experiência na integração de pipelines com plataformas em nuvem (AWS, Azure e/ou GCP). - Conhecimento em governança de repositórios, controle de acesso e gerenciamento de permissões. - Experiência utilizando a GitHub API. - Forte capacidade analítica e de resolução de problemas. - Boa comunicação e habilidade para trabalhar em equipe. - Facilidade para atuar em conjunto com times de Engenharia, Segurança e Infraestrutura. - Alto senso de responsabilidade, autonomia e comprometimento. - Interesse por automação, melhoria contínua e Developer Experience (DevEx). Requirements - Experiência com GitHub Advanced Security. - Experiência com GitHub Copilot. - Vivência em Platform Engineering ou Developer Experience (DevEx). - Conhecimento em Identity and Access Management (IAM). - Experiência em iniciativas de auditoria, compliance ou conformidade com SOX (Sarbanes-Oxley). - Experiência com scripting utilizando Python, Go ou Shell Script. - Experiência no suporte a plataformas de desenvolvimento em larga escala. Language - Inglês Intermediário a Intermediário Avançado (Intermediate / Upper-Intermediate). Location - Preferencialmente em São Paulo. Work Model - Remoto, com expectativa de transição para o modelo híbrido a partir de 2027.
Staff Platform Engineer
Cross Border Talents🌎 Your international recruitment partner for hard to find professionals and jobs all over the globe.
• Design and own the architecture of the company's platform infrastructure end-to-end • Build and maintain GraphQL gateways, API infrastructure, internal SDKs, and shared platform services • Manage Kubernetes environments across AWS and GCP, including networking, scaling, security, and upgrades • Develop production-grade backend services, automation tooling, shared libraries, and Infrastructure as Code using Pulumi • Modernize legacy systems through pragmatic refactoring and architectural improvements • Build observability, monitoring, incident response, and reliability systems across the platform • Implement security best practices covering zero-trust networking, secrets management, compliance, and auditability • Drive compliance initiatives across HIPAA, LGPD, ISO 27001, and other regulatory frameworks • Champion AI-assisted engineering workflows across platform operations, incident response, code review, and automation • Mentor engineers through technical leadership while remaining an active contributor
• Help build and operate the platform • Operate and improve EKS clusters • Maintain and evolve Terraform and ArgoCD setup • Improve deployment workflows and developer experience • Maintain and evolve authentication stack • Build access governance needed as we grow • Build on and improve existing observability stack
Director of Platform Engineering
Hawk-Eye Innovations LtdPioneering technology making sports fairer, safer, more engaging, and better informed since 2001.
• Responsible for building foundations for platform and data engineering • Lead and manage Hawk-Eye’s Platform Engineering function • Build and lead a high-caliber engineering team • Foster a culture of accountability and continuous improvement



