Job Closed

This listing is no longer active.

Five Acts logo
Five Acts

Inspiring people through data.

AWS Platform Engineer (Senior) – Support and Governance

Platform EngineerPlatform EngineerFull TimeRemoteSeniorTeam 51-200Since 2005H1B No SponsorCompany SiteLinkedIn

Location

Worldwide

Posted

75 days ago

Salary

0

Seniority

Senior

No structured requirement data.

Job Description

AWS Platform Engineer (Senior) – Support and Governance

Five Acts

Role Description Buscamos um(a) profissional sênior com forte experiência em AWS para atuar na sustentação, monitoramento e evolução de uma plataforma de dados e analytics em ambiente cloud. Esta posição possui caráter estratégico e exige atuação ponta a ponta, incluindo análise e resolução de incidentes, monitoramento de infraestrutura, atuação proativa na identificação de riscos e proposição de melhorias técnicas e de governança. O profissional atuará em um modelo de AMS estruturado, com foco em suporte N2/N3, confiabilidade da plataforma e evolução contínua do ambiente, apoiando diretamente a operação e a estabilidade dos serviços de dados e analytics. Responsibilities - Sustentação e suporte técnico - Atuar na análise e resolução de incidentes em ambiente AWS (nível N2/N3) - Investigar causas raiz de falhas em serviços de dados e infraestrutura - Realizar troubleshooting envolvendo performance, disponibilidade e custo - Apoiar tecnicamente os níveis iniciais de atendimento - Monitoramento e observabilidade - Monitorar e analisar métricas de serviços AWS, incluindo: - Redshift (CPU, filas, queries, armazenamento) - EC2 (CPU, memória, disco, status checks) - EMR (jobs, uso de recursos, HDFS) - Athena (queries, custo, performance) - SQS (backlog, throughput) - DynamoDB (throttling, latência) - Lambda (erros, duração, concorrência) - S3 (armazenamento e erros) - Identificar gargalos de performance e riscos operacionais - Criar e evoluir mecanismos de alerta e monitoramento - Atuação proativa - Identificar oportunidades de melhoria em performance, custo e estabilidade - Propor ações preventivas para evitar incidentes - Automatizar rotinas operacionais e de monitoramento - Governança e evolução da plataforma - Apoiar na definição de boas práticas de arquitetura em cloud - Contribuir com a evolução da governança da plataforma de dados - Apoiar a análise e direcionamento de vulnerabilidades - Gestão e comunicação - Apoiar na elaboração de relatórios técnicos mensais (saúde da plataforma, riscos e melhorias) - Interagir com clientes e stakeholders técnicos - Documentar incidentes, análises e soluções Qualifications - Experiência sólida com AWS em ambientes produtivos - Vivência prática com: - EC2 (monitoramento e troubleshooting) - CloudWatch (métricas, logs e alarmes) - S3 - Experiência em análise e resolução de incidentes de infraestrutura - Atuação prévia em suporte técnico nível N2/N3 ou AMS - Experiência com análise de performance (CPU, memória, disco, I/O) - Conhecimento de arquitetura em cloud, preferencialmente voltada a dados - Capacidade de atuação autônoma em cenários críticos Requirements - Experiência com ferramentas AWS como Amazon Redshift, DynamoDB, EMR e Lambda - Conhecimento em Athena e em mensageria (SQS) - Experiência com arquiteturas serverless - Experiência com práticas de FinOps (otimização de custos em cloud) - Conhecimento em segurança e vulnerabilidades em AWS - Experiência com ferramentas de observabilidade e monitoramento avançado - Vivência com metodologias ITIL ou AMS estruturado - Experiência em ambientes de dados e analytics - Inglês Intermediário ou avançado Profile Expected - Perfil analítico, com forte capacidade de investigação e diagnóstico - Proatividade na identificação e resolução de problemas - Organização e senso de priorização - Boa comunicação - Capacidade de atuar de forma prática, sem perder a visão estratégica Benefits - Vales Alimentação e Refeição (Swile) - Flexibilidade para crédito em Auxílio Home-Office (Swile) - Cobertura de até 100% em Plano de Saúde e Odontológico - Seguro de Vida em grupo - Trabalho remoto - Convênio Saúde Mental - psicoterapia online e presencial - Incentivo a certificações e cursos - Convênio para cursos de pós-graduação e MBA (Esalq/USP) - Parceria com escolas de idiomas - Parceria com academias e apps de bem-estar (Wellhub) - Palestras e rodas de conversa internas - Bônus por indicação - Happy hours - Mimos em datas comemorativas

Related Categories

Related Job Pages

More Platform Engineer Jobs

Prolific logo

Lead AI Platform Engineer

Prolific

Building a better world with better data.

Full TimeRemoteTeam 51-200Since 2014H1B Sponsor

Role Description As a Lead AI Platform Engineer, you will be the backbone of our AI production lifecycle. You will bridge the gap between research and real-world application, ensuring our Data Scientist, AI Researchers, Product teams and others in the company have the high-performance infrastructure, automated pipelines, and deployment strategies needed to ship state-of-the-art models and agents at scale. Qualifications - 5+ years experience with cloud infrastructure and infrastructure as code. - Previous experience with the ML and LLM lifecycle - training, hosting, optimisation, observability. - Used to working closely with researchers and data scientists - taking experiments from worksheets into production. - Strong grasp of ML fundamentals and modern GenAI stack. Requirements - Infrastructure as Code (IaC): Design and maintain scalable cloud environments (GCP/AWS) using Terraform. - Resource Provisioning: Manage GPU/TPU resource allocation for training, fine-tuning, and interactive notebooks. - Custom Tooling: Build internal services and CLI tools to streamline the developer experience for the AI team. - Automated Pipelines: Design CI/CD and training pipelines using tools such as GitHub Actions, MLFlow, Vertex AI Pipelines. - Deployment Methodology: Develop reusable patterns for model serving. Managing service deployments to Kubernetes. - Vector Infrastructure: Manage and optimize vector databases and embedding pipelines for RAG-based systems. - Observability and Reliability: Model drift monitoring, resource utilisation, LLM and agent tracing. - Inference Optimization: Implement techniques to reduce latency and increase throughput (quantisation, distillation, etc…) - Cold Start Mitigation: Solve scaling bottlenecks for serverless or containerized model deployments. - Cost Management: Optimize GPU utilization and cloud spend without compromising performance. - Support AI Agent Deployment: Define and create tooling and service templates around agent deployment (tool libraries, tracing, default agent frameworks, skills, etc…). - Enablement for non-technical agent users: Help create workflows and guidance on no-code/low-code agent platforms (n8n, LangSmith, or similar). - Create tooling and policies to enable safe usage of local agents such as Claude code. Benefits - Competitive salary. - Benefits. - Remote working within an impactful, mission-driven culture.

Worldwide
Full TimeRemoteTeam 5,001-10,000H1B No Sponsor

• Design and implement the AWS platform foundations used by product and service teams across RWS • Develop reusable infrastructure patterns aligned with the RWS platform reference architecture • Implement core cloud capabilities including networking, identity integration, security controls, and platform services • Support the creation of standardised infrastructure building blocks to accelerate application deployment • Support engineering and IT teams with guidance as migration of application workloads from on-premise environments into AWS is completed • Build and implement prioritised plan for migrated applications • Collaborate with application teams to modernise architectures • Provide guidance and tooling to help teams successfully adopt AWS infrastructure and services • Build and maintain infrastructure using Infrastructure as Code to ensure consistent, repeatable cloud deployments • Enable product teams to provision infrastructure and deploy services through self-service platform capabilities

United Kingdom
Job Closed
Rackner logo

Platform Engineer (Cloud-Native AI/ML Systems Integration)

Rackner

Rackner, Inc. builds cutting-edge solutions that apply the power of AI and DevSecOps in public and private clouds, leveraging the future of computing capability and technologies su

Platform Engineer (Cloud-Native AI/ML Systems Integration) Location: Dayton, OH (preferred) | Remote Eligible (CAC Access Required) Clearance: TS/SCI Preferred Build the Infrastructure That Makes AI Mission-Ready This is not a typical engineering role. As a Platform Engineer, you will design and operate the infrastructure that enables AI/ML systems to function in real-world mission environments. Your work will directly support Air Force / NASIC-aligned programs, where performance, security, and reliability are non-negotiable. You won’t just build platforms: you’ll bridge the gap between AI development and operational deployment, ensuring systems scale, integrate, and perform under real constraints. What You’ll Do - Architect and operate Kubernetes-based platforms supporting AI/ML workloads - Build and manage containerized environments (Docker, Helm, OCI) for scalable deployment - Design and optimize data pipelines for ingestion, transformation, and model lifecycle support - Integrate AI/ML services into secure, mission-critical systems - Develop Infrastructure as Code (Terraform, Ansible) for repeatable, compliant environments - Build and enhance CI/CD pipelines (GitLab, Jenkins, GitHub Actions) - Implement observability and monitoring (Prometheus, Grafana) to ensure system health and performance - Collaborate cross-functionally to translate complex requirements into deployable systems - Solve engineering challenges within classified and constrained environments What You Bring - Strong experience with Kubernetes and cloud-native platform engineering - Hands-on experience with containerization (Docker, Helm) - Experience supporting data pipelines or ML-enabled systems - Familiarity with Infrastructure as Code (Terraform, Ansible) - Experience with CI/CD and DevSecOps practices - Understanding of distributed systems and system integration - Ability to operate effectively in secure or regulated environments Why This Role Matters In many environments, AI stops at experimentation; in this role, you ensure it becomes operational capability. You will: - Enable AI/ML systems to move from development to deployment - Support mission-critical operations tied to national security - Work at the intersection of cloud-native engineering, DevSecOps, and AI infrastructure - Build systems where failure is not an option and performance is essential Your work directly impacts how advanced technology is applied in real-world scenarios. What You’ll Gain - Hands-on ownership of platforms powering AI/ML in mission environments - Exposure to complex, high-scale distributed systems - Experience integrating modern cloud-native technologies into secure, real-world systems - Growth across platform engineering, DevSecOps, and AI infrastructure - The opportunity to solve problems most engineers never encounter About Rackner Rackner is a software consultancy that builds cloud-native solutions for startups, enterprises, and the public sector. We are an energetic, growing consultancy focused on solving complex problems through distributed systems, DevSecOps, AI/ML, and modern systems architecture. We enable digital transformation by applying cloud-first, cost-effective innovation across mission-critical environments. Our customers span a diverse and growing set of industries, and our teams are driven by a shared focus on end-to-end system delivery and real-world impact. Benefits & Perks Rackner invests in your growth, stability, and long-term success: - Paid certifications & professional development - 401(k) with 100% match up to 6% - Highly competitive PTO - Comprehensive Medical, Dental, Vision coverage - Life Insurance + Short & Long-Term Disability - Home office & equipment plan - Industry-leading weekly pay schedule #PlatformEngineering #Kubernetes #DevSecOps #AIInfrastructure #MachineLearningOps #CloudNative #DefenseTech #ClearedJobs #NationalSecurity #InfrastructureEngineering #DistributedSystems #Terraform #Docker #DataEngineering #MLOps

United States
Full TimeRemoteTeam 501-1,000Since 2019H1B No Sponsor

• Own the design, governance, and continuous evolution of UniUni's AWS-based cloud platform • Lead and define cloud architecture standards across all AWS services • Own end-to-end FinOps practice including budgeting and forecasting • Mandate and mature IaC-first practices across the organization • Own the strategy, performance, and reliability of UniUni's multi-model database platform • Establish and enforce cloud security baselines • Lead, coach, and grow a team of platform and infrastructure engineers

United States