Job Closed

This listing is no longer active.

AST SpaceMobile logo
AST SpaceMobile

Transforming how the world connects

AWS DevOps Engineer

DevOps EngineerDevOps EngineerOtherRemoteSeniorTeam 51-200Since 2017H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

114 days ago

Salary

0

Seniority

Senior

Job Description

AWS DevOps Engineer

AST SpaceMobile

• Design, deploy, and operate AWS infrastructure supporting data lakes and containerized workloads. • Implement Infrastructure-as-Code using Terraform, CloudFormation, or similar tools. • Establish secure, scalable, and highly available AWS architectures following cloud best practices. • Collaborate with application and data engineering teams to translate requirements into reliable platform solutions. • Build and manage AWS-based data lakes using services such as S3, Glue, Athena, EMR, Redshift, and Lake Formation. • Support ingestion, transformation, storage, and access for structured and unstructured datasets. • Implement data lifecycle management, tiering, and cost optimization strategies. • Ensure data platforms meet required security, compliance, and governance standards. • Deploy, manage, and operate containerized applications on Amazon EKS. • Build and maintain container images, registries, and deployment pipelines. • Manage Kubernetes clusters including upgrades, scaling, networking, and security. • Partner with developers to improve application reliability, performance, and deployment consistency. • Design, implement, and maintain CI/CD pipelines for data and application workloads. • Automate infrastructure provisioning, application deployment, and operational tasks. • Implement monitoring, logging, and alerting for AWS services, data pipelines, and Kubernetes workloads. • Participate in incident response, root cause analysis, and continuous improvement initiatives.

Job Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related STEM field, or equivalent practical experience.
  • A minimum of 5+ years in DevOps, Site Reliability Engineering (SRE), or cloud infrastructure roles focused on AWS.
  • Strong hands-on experience with AWS data services (S3, Glue, Athena, EMR, Redshift, or similar).
  • Experience running containerized applications, ideally on Amazon EKS.
  • Proficiency with Infrastructure-as-Code and automation tools such as Terraform or CloudFormation.
  • Strong Linux systems knowledge and scripting experience (Bash, Python, etc.).
  • Solid understanding of AWS networking, security, IAM, and operational best practices.
  • Experience with large-scale or enterprise data platforms.
  • Experience operating Kubernetes in production environments.
  • Familiarity with distributed systems and API-based integrations.
  • Experience with monitoring, observability, and logging tools (e.g., CloudWatch, Prometheus, Grafana, ELK).
  • Experience designing cost‑optimized cloud architectures.
  • Knowledge of data engineering workflows or analytics platforms.
  • Strong interpersonal skills and ability to collaborate across technical and non‑technical teams.
  • Proven ability to work effectively within cross-functional, fast‑paced environments.
  • Excellent written and verbal communication skills.
  • Meticulous attention to detail to ensure accuracy of documentation, automation scripts, and deployments.
  • Strong problem-solving abilities and willingness to take ownership of issues.
  • Ability to manage multiple priorities while maintaining high quality and reliability standards.

Benefits

  • Ability to participate in on-call rotations or off-hours support as required by operational needs.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Veeva logo

Senior Software Engineer – SRE

Veeva

Headquartered in Pleasanton, California, Veeva is a leading provider of cloud-based software and services for the life sciences industry. As an employer, Veeva

DevOps Engineer114 days ago

• Build Cloud Infrastructure: Rapidly build new cloud infrastructure from scratch, adhering to software development best practices • Drive Reliability & Scalability: Ensure our platform meets the scalability and reliability needs of our hundreds of global customers (across North America, Europe, and Asia) • Lead Incident Management: During an incident, effectively lead triage and mitigation efforts, potentially performing periodic on-call duty for escalations • Automate & Optimize: Develop tools and automation to eliminate manual work and reduce issue resolution times • Full-Stack Diagnostics: Proactively learn all necessary systems to provide full-stack diagnostics and determine root causes of production problems • Strategic Engineering Partnership: Strategize with engineering teams on complex problems, offering insights on what will work at scale (supporting 2M+ users) and guiding development decisions before features ship • Influence Design: Participate in engineering design reviews of new features and drive initiatives to improve operational efficiency and platform scalability • Cross-functional Collaboration: Partner effectively with Product Management, Design, and QA to deliver cutting-edge solutions and direct customer value • Backend Focus: Work across multiple layers of our technology stack, with a primary focus on backend development, and opportunities in frontend and infrastructure • Effective Communication: Communicate clearly with engineering teams, succinctly describing problems for seamless hand-offs during outages with both technical and non-technical audiences • Mentorship: Actively mentor team members, contributing to a positive and high-performing team environment

Ireland
Veeva logo

Senior Software Engineer – SRE

Veeva

Headquartered in Pleasanton, California, Veeva is a leading provider of cloud-based software and services for the life sciences industry. As an employer, Veeva

DevOps Engineer114 days ago

• Build Cloud Infrastructure: Rapidly build new cloud infrastructure from scratch, adhering to software development best practices • Drive Reliability & Scalability: Ensure our platform meets the scalability and reliability needs of our hundreds of global customers (across North America, Europe, and Asia) • Lead Incident Management: During an incident, effectively lead triage and mitigation efforts, potentially performing periodic on-call duty for escalations • Automate & Optimize: Develop tools and automation to eliminate manual work and reduce issue resolution times • Full-Stack Diagnostics: Proactively learn all necessary systems to provide full-stack diagnostics and determine root causes of production problems • Strategic Engineering Partnership: Strategize with engineering teams on complex problems, offering insights on what will work at scale (supporting 2M+ users) and guiding development decisions before features ship • Influence Design: Participate in engineering design reviews of new features and drive initiatives to improve operational efficiency and platform scalability • Cross-functional Collaboration: Partner effectively with Product Management, Design, and QA to deliver cutting-edge solutions and direct customer value • Backend Focus: Work across multiple layers of our technology stack, with a primary focus on backend development, and opportunities in frontend and infrastructure • Effective Communication: Communicate clearly with engineering teams, succinctly describing problems for seamless hand-offs during outages with both technical and non-technical audiences • Mentorship: Actively mentor team members, contributing to a positive and high-performing team environment

Germany
Affirm logo

Director, Software Engineering – Site Reliability Engineering

Affirm

Affirm is a financial services company that is on a mission to provide its customers with “honest financial products that improve lives.” As an employer, Af

DevOps Engineer114 days ago
OtherRemoteTeam 2,200Since 2012

• Set the vision and drive execution for Reliability Engineering at Affirm • Own and coordinate delivery of high availability of core Affirm’s services, to attain our service level standards and expectations with external partners • Iterate and maintain a best-in-industry global incident response & lifecycle program • Build software and program management structure to perform continual risk management across the entire Affirm system and Engineering organization • Run a robust development lifecycle establishing a culture for operational excellence, while experimenting and failing fast • Work with a wide variety of cross functional partners outside of engineering ranging from product, enterprise risk, security, legal and compliance • Hire and build a global team of SREs, system engineers, and full stack engineers • Cultivate a respectful and supportive environment for all team members that effectively demonstrates the diversity of the team

United States
$267K - $360K / year
Job Closed
Full TimeRemoteTeam 51-200Since 2019H1B No Sponsor

• Atuar como ponto focal de confiabilidade e performance das plataformas de dados em ambiente Google Cloud Platform (GCP). • Implementar práticas de DataOps e SRE, garantindo observabilidade, automação, escalabilidade e resiliência nos pipelines de dados. • Monitorar e otimizar jobs de ingestão, transformação e orquestração de dados (Airflow, Composer, Dataflow, Dataproc, BigQuery). • Apoiar engenheiros e arquitetos de dados na implementação de boas práticas de CI/CD, infraestrutura como código e controle de versionamento. • Criar e manter dashboards de monitoramento, alertas e métricas de performance e custo em GCP (Stackdriver, Cloud Monitoring, Prometheus, Grafana). • Atuar em incidentes críticos de dados e infraestrutura, com resolução rápida e análise de causa raiz (RCA). • Trabalhar em conjunto com as áreas de segurança e arquitetura para garantir governança, compliance e proteção dos dados. • Promover a cultura de automação e melhoria contínua, reduzindo retrabalhos e aumentando a eficiência operacional.

Brazil
Job Closed