Job Closed

This listing is no longer active.

Five Acts

Inspiring people through data.

AWS Platform Engineer (Senior) – Support and Governance

Platform EngineerPlatform EngineerFull Time Remote SeniorTeam 51-200Since 2005H1B No SponsorCompany Site LinkedIn

Location

Worldwide

Posted

75 days ago

Salary

Seniority

Senior

No structured requirement data.

Job Description

Role Description Buscamos um(a) profissional sênior com forte experiência em AWS para atuar na sustentação, monitoramento e evolução de uma plataforma de dados e analytics em ambiente cloud. Esta posição possui caráter estratégico e exige atuação ponta a ponta, incluindo análise e resolução de incidentes, monitoramento de infraestrutura, atuação proativa na identificação de riscos e proposição de melhorias técnicas e de governança. O profissional atuará em um modelo de AMS estruturado, com foco em suporte N2/N3, confiabilidade da plataforma e evolução contínua do ambiente, apoiando diretamente a operação e a estabilidade dos serviços de dados e analytics. Responsibilities - Sustentação e suporte técnico - Atuar na análise e resolução de incidentes em ambiente AWS (nível N2/N3) - Investigar causas raiz de falhas em serviços de dados e infraestrutura - Realizar troubleshooting envolvendo performance, disponibilidade e custo - Apoiar tecnicamente os níveis iniciais de atendimento - Monitoramento e observabilidade - Monitorar e analisar métricas de serviços AWS, incluindo: - Redshift (CPU, filas, queries, armazenamento) - EC2 (CPU, memória, disco, status checks) - EMR (jobs, uso de recursos, HDFS) - Athena (queries, custo, performance) - SQS (backlog, throughput) - DynamoDB (throttling, latência) - Lambda (erros, duração, concorrência) - S3 (armazenamento e erros) - Identificar gargalos de performance e riscos operacionais - Criar e evoluir mecanismos de alerta e monitoramento - Atuação proativa - Identificar oportunidades de melhoria em performance, custo e estabilidade - Propor ações preventivas para evitar incidentes - Automatizar rotinas operacionais e de monitoramento - Governança e evolução da plataforma - Apoiar na definição de boas práticas de arquitetura em cloud - Contribuir com a evolução da governança da plataforma de dados - Apoiar a análise e direcionamento de vulnerabilidades - Gestão e comunicação - Apoiar na elaboração de relatórios técnicos mensais (saúde da plataforma, riscos e melhorias) - Interagir com clientes e stakeholders técnicos - Documentar incidentes, análises e soluções Qualifications - Experiência sólida com AWS em ambientes produtivos - Vivência prática com: - EC2 (monitoramento e troubleshooting) - CloudWatch (métricas, logs e alarmes) - S3 - Experiência em análise e resolução de incidentes de infraestrutura - Atuação prévia em suporte técnico nível N2/N3 ou AMS - Experiência com análise de performance (CPU, memória, disco, I/O) - Conhecimento de arquitetura em cloud, preferencialmente voltada a dados - Capacidade de atuação autônoma em cenários críticos Requirements - Experiência com ferramentas AWS como Amazon Redshift, DynamoDB, EMR e Lambda - Conhecimento em Athena e em mensageria (SQS) - Experiência com arquiteturas serverless - Experiência com práticas de FinOps (otimização de custos em cloud) - Conhecimento em segurança e vulnerabilidades em AWS - Experiência com ferramentas de observabilidade e monitoramento avançado - Vivência com metodologias ITIL ou AMS estruturado - Experiência em ambientes de dados e analytics - Inglês Intermediário ou avançado Profile Expected - Perfil analítico, com forte capacidade de investigação e diagnóstico - Proatividade na identificação e resolução de problemas - Organização e senso de priorização - Boa comunicação - Capacidade de atuar de forma prática, sem perder a visão estratégica Benefits - Vales Alimentação e Refeição (Swile) - Flexibilidade para crédito em Auxílio Home-Office (Swile) - Cobertura de até 100% em Plano de Saúde e Odontológico - Seguro de Vida em grupo - Trabalho remoto - Convênio Saúde Mental - psicoterapia online e presencial - Incentivo a certificações e cursos - Convênio para cursos de pós-graduação e MBA (Esalq/USP) - Parceria com escolas de idiomas - Parceria com academias e apps de bem-estar (Wellhub) - Palestras e rodas de conversa internas - Bônus por indicação - Happy hours - Mimos em datas comemorativas

Related Categories

Platform Engineer

Related Job Pages

Remote Full-time Jobs (US)More Remote Jobs

More Platform Engineer Jobs

Lead AI Platform Engineer

Prolific

Building a better world with better data.

Platform Engineer75 days ago

Full Time RemoteTeam 51-200Since 2014H1B Sponsor

Company Site LinkedIn

Role Description As a Lead AI Platform Engineer, you will be the backbone of our AI production lifecycle. You will bridge the gap between research and real-world application, ensuring our Data Scientist, AI Researchers, Product teams and others in the company have the high-performance infrastructure, automated pipelines, and deployment strategies needed to ship state-of-the-art models and agents at scale. Qualifications - 5+ years experience with cloud infrastructure and infrastructure as code. - Previous experience with the ML and LLM lifecycle - training, hosting, optimisation, observability. - Used to working closely with researchers and data scientists - taking experiments from worksheets into production. - Strong grasp of ML fundamentals and modern GenAI stack. Requirements - Infrastructure as Code (IaC): Design and maintain scalable cloud environments (GCP/AWS) using Terraform. - Resource Provisioning: Manage GPU/TPU resource allocation for training, fine-tuning, and interactive notebooks. - Custom Tooling: Build internal services and CLI tools to streamline the developer experience for the AI team. - Automated Pipelines: Design CI/CD and training pipelines using tools such as GitHub Actions, MLFlow, Vertex AI Pipelines. - Deployment Methodology: Develop reusable patterns for model serving. Managing service deployments to Kubernetes. - Vector Infrastructure: Manage and optimize vector databases and embedding pipelines for RAG-based systems. - Observability and Reliability: Model drift monitoring, resource utilisation, LLM and agent tracing. - Inference Optimization: Implement techniques to reduce latency and increase throughput (quantisation, distillation, etc…) - Cold Start Mitigation: Solve scaling bottlenecks for serverless or containerized model deployments. - Cost Management: Optimize GPU utilization and cloud spend without compromising performance. - Support AI Agent Deployment: Define and create tooling and service templates around agent deployment (tool libraries, tracing, default agent frameworks, skills, etc…). - Enablement for non-technical agent users: Help create workflows and guidance on no-code/low-code agent platforms (n8n, LangSmith, or similar). - Create tooling and policies to enable safe usage of local agents such as Claude code. Benefits - Competitive salary. - Benefits. - Remote working within an impactful, mission-driven culture.

AI Infrastructure as Code AI/ML LLM Observability/Monitoring GCP AWS Terraform CI/CD GitHub Actions MLflow Kubernetes AI Agents

View details: Lead AI Platform Engineer

Worldwide

Apply

Staff AWS Platform Engineer

RWS Group

Take global further

Platform Engineer75 days ago

Full Time RemoteTeam 5,001-10,000H1B No Sponsor

Company Site LinkedIn

• Design and implement the AWS platform foundations used by product and service teams across RWS • Develop reusable infrastructure patterns aligned with the RWS platform reference architecture • Implement core cloud capabilities including networking, identity integration, security controls, and platform services • Support the creation of standardised infrastructure building blocks to accelerate application deployment • Support engineering and IT teams with guidance as migration of application workloads from on-premise environments into AWS is completed • Build and implement prioritised plan for migrated applications • Collaborate with application teams to modernise architectures • Provide guidance and tooling to help teams successfully adopt AWS infrastructure and services • Build and maintain infrastructure using Infrastructure as Code to ensure consistent, repeatable cloud deployments • Enable product teams to provision infrastructure and deploy services through self-service platform capabilities

AWS Cloud Terraform

View details: Staff AWS Platform Engineer

United Kingdom

Apply

Job Closed

Platform Engineer (Cloud-Native AI/ML Systems Integration)

Rackner

Rackner, Inc. builds cutting-edge solutions that apply the power of AI and DevSecOps in public and private clouds, leveraging the future of computing capability and technologies su

Platform Engineer75 days ago

Full Time Remote

Platform Engineer (Cloud-Native AI/ML Systems Integration) Location: Dayton, OH (preferred) | Remote Eligible (CAC Access Required) Clearance: TS/SCI Preferred Build the Infrastructure That Makes AI Mission-Ready This is not a typical engineering role. As a Platform Engineer, you will design and operate the infrastructure that enables AI/ML systems to function in real-world mission environments. Your work will directly support Air Force / NASIC-aligned programs, where performance, security, and reliability are non-negotiable. You won’t just build platforms: you’ll bridge the gap between AI development and operational deployment, ensuring systems scale, integrate, and perform under real constraints. What You’ll Do - Architect and operate Kubernetes-based platforms supporting AI/ML workloads - Build and manage containerized environments (Docker, Helm, OCI) for scalable deployment - Design and optimize data pipelines for ingestion, transformation, and model lifecycle support - Integrate AI/ML services into secure, mission-critical systems - Develop Infrastructure as Code (Terraform, Ansible) for repeatable, compliant environments - Build and enhance CI/CD pipelines (GitLab, Jenkins, GitHub Actions) - Implement observability and monitoring (Prometheus, Grafana) to ensure system health and performance - Collaborate cross-functionally to translate complex requirements into deployable systems - Solve engineering challenges within classified and constrained environments What You Bring - Strong experience with Kubernetes and cloud-native platform engineering - Hands-on experience with containerization (Docker, Helm) - Experience supporting data pipelines or ML-enabled systems - Familiarity with Infrastructure as Code (Terraform, Ansible) - Experience with CI/CD and DevSecOps practices - Understanding of distributed systems and system integration - Ability to operate effectively in secure or regulated environments Why This Role Matters In many environments, AI stops at experimentation; in this role, you ensure it becomes operational capability. You will: - Enable AI/ML systems to move from development to deployment - Support mission-critical operations tied to national security - Work at the intersection of cloud-native engineering, DevSecOps, and AI infrastructure - Build systems where failure is not an option and performance is essential Your work directly impacts how advanced technology is applied in real-world scenarios. What You’ll Gain - Hands-on ownership of platforms powering AI/ML in mission environments - Exposure to complex, high-scale distributed systems - Experience integrating modern cloud-native technologies into secure, real-world systems - Growth across platform engineering, DevSecOps, and AI infrastructure - The opportunity to solve problems most engineers never encounter About Rackner Rackner is a software consultancy that builds cloud-native solutions for startups, enterprises, and the public sector. We are an energetic, growing consultancy focused on solving complex problems through distributed systems, DevSecOps, AI/ML, and modern systems architecture. We enable digital transformation by applying cloud-first, cost-effective innovation across mission-critical environments. Our customers span a diverse and growing set of industries, and our teams are driven by a shared focus on end-to-end system delivery and real-world impact. Benefits & Perks Rackner invests in your growth, stability, and long-term success: - Paid certifications & professional development - 401(k) with 100% match up to 6% - Highly competitive PTO - Comprehensive Medical, Dental, Vision coverage - Life Insurance + Short & Long-Term Disability - Home office & equipment plan - Industry-leading weekly pay schedule #PlatformEngineering #Kubernetes #DevSecOps #AIInfrastructure #MachineLearningOps #CloudNative #DefenseTech #ClearedJobs #NationalSecurity #InfrastructureEngineering #DistributedSystems #Terraform #Docker #DataEngineering #MLOps

Kubernetes Docker Helm Terraform Ansible GitLab Jenkins GitHub Actions Prometheus Grafana

View details: Platform Engineer (Cloud-Native AI/ML Systems Integration)

United States

Apply

Director of Platform Engineering

UniUni

We Deliver the Goods.

Platform Engineer75 days ago

Full Time RemoteTeam 501-1,000Since 2019H1B No Sponsor

Company Site LinkedIn

• Own the design, governance, and continuous evolution of UniUni's AWS-based cloud platform • Lead and define cloud architecture standards across all AWS services • Own end-to-end FinOps practice including budgeting and forecasting • Mandate and mature IaC-first practices across the organization • Own the strategy, performance, and reliability of UniUni's multi-model database platform • Establish and enforce cloud security baselines • Lead, coach, and grow a team of platform and infrastructure engineers

AWS NoSQL Terraform

View details: Director of Platform Engineering

United States

Apply

AWS Platform Engineer (Senior) – Support and Governance

Job Description

Related Guides

Related Categories

Related Job Pages

More Platform Engineer Jobs

Lead AI Platform Engineer

Staff AWS Platform Engineer

Platform Engineer (Cloud-Native AI/ML Systems Integration)

Director of Platform Engineering