Job Closed

This listing is no longer active.

Axiomatic_AI

https://www.axiomatic-ai.com/

Senior Platform Engineer

Platform EngineerPlatform EngineerOther Remote SeniorTeam 11-50Since 2024H1B No SponsorCompany Site LinkedIn

Location

United States + 1 more

Posted

129 days ago

Salary

Seniority

Senior

Terraform AWS Azure GCP Kubernetes Python CI/CD GitHub Actions GitLab CI Jenkins Prometheus Grafana Datadog Linux Shell Docker Infrastructure as Code Observability / Monitoring Git

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more. Role Description As a Senior Platform Engineer at Axiomatic, you will own the reliability, deployment, and operational excellence of our AI platform. This role focuses primarily on infrastructure, CI/CD, and operations, with additional responsibilities for automation and tooling development. - Lead deployment strategies and CI/CD pipelines across multiple environments - Architect and maintain multi-cloud infrastructure (Azure, AWS, GCP) and on-premise deployments - Own infrastructure as code using Terraform to automate provisioning and configuration - Build comprehensive observability systems: monitoring, metrics, logging, and alerting - Implement security controls, compliance frameworks, and data governance policies - Develop automation tools, APIs, and scripts (Python) to improve operational efficiency - Ensure system reliability, performance, and scalability - Drive incident response, postmortems, and continuous improvement - Troubleshoot infrastructure and application issues across multiple environments Qualifications - 7+ years of experience in Platform Engineering, Site Reliability Engineering, DevOps, or Infrastructure Engineering roles - Deployment expert: Deep experience with CI/CD pipelines, release strategies, and production deployments at scale - Multi-cloud expertise: Hands-on experience with Azure and AWS required (GCP is a plus) - On-premise deployment experience: Linux system administration, bare-metal provisioning, networking - Terraform expert: Deep experience writing and maintaining infrastructure as code - Observability systems: Proven track record building monitoring, alerting, and metrics platforms - Security mindset: Experience implementing security controls and best practices. Security certification preferred (CISSP, CEH, AWS/Azure Security Specialty, or similar) - Data governance: Understanding of data privacy, residency requirements, and governance frameworks - Backend/scripting skills: Python (preferred) or Go for automation, tooling, and operational scripts - Kubernetes and container orchestration in production - Strong Linux/Unix administration and scripting (Bash, Python) - CI/CD platforms: GitHub Actions, GitLab CI, Jenkins, or similar - Version control and GitOps practices - Strong problem-solving and debugging skills - Fluent in English (Spanish is a plus) Requirements - Design and implement deployment pipelines for multi-environment releases (dev, staging, production) - Own the full deployment lifecycle: build, test, release, and rollback strategies - Implement blue-green deployments, canary releases, and progressive rollouts - Build automated deployment tooling and workflows - Ensure zero-downtime deployments and rollback capabilities - Optimize build and deployment performance - Manage artifact repositories and container registries - Design and operate multi-cloud infrastructure across Azure, AWS, and GCP - Architect and deploy on-premise solutions for enterprise customers (Linux-based) - Manage Kubernetes clusters, container orchestration, and networking - Implement disaster recovery, backup strategies, and business continuity - Optimize cloud costs and resource utilization - Define and track SLIs, SLOs, and error budgets for critical services - Write and maintain Terraform modules for infrastructure provisioning - Implement GitOps workflows for infrastructure changes - Automate infrastructure scaling, updates, and operations - Ensure reproducible and version-controlled infrastructure - Design comprehensive monitoring, logging, and alerting (Prometheus, Grafana, Datadog, or similar) - Build dashboards for system health, performance, and business metrics - Implement distributed tracing for microservices - Conduct capacity planning and performance analysis - Drive reliability improvements through data-driven insights - Implement security best practices: identity management, secrets management, network policies - Work towards or maintain security certifications (SOC 2, ISO 27001, or similar) - Conduct security audits and vulnerability remediation - Implement data governance policies for AI pipelines and user data - Ensure compliance with data privacy regulations (GDPR, CCPA) - Write automation scripts and tools in Python for operational tasks - Build internal tooling for deployments, monitoring, and incident response - Develop runbooks, automation, and self-healing systems - Create APIs for infrastructure operations when needed - Maintain high code quality and testing standards for tooling - Participate in on-call rotation and lead incident response - Conduct blameless postmortems and drive action items - Build and maintain incident response playbooks - Improve system resilience and failure modes - Partner with engineering teams on deployment strategies and architecture - Work with security team on compliance and governance - Mentor engineers on operational best practices - Document systems, procedures, and runbooks Benefits - Opportunity to work on technology that drives innovation in AI for scientific and engineering applications - Contribute to the development of new AI architectures that can reason coherently and produce interpretable and verifiable solutions - Collaborate with a global team of engineers and AI specialists - Flexible working arrangements, including hybrid or fully remote options Company Description Axiomatic AI is building a new class of AI systems designed to reason with the rigor of the scientific method. Our mission, 30×30, is to deliver a 30× improvement in the speed, accessibility, and cost of semiconductor and photonic hardware development by 2030.

Job Requirements

7+ years of experience in Platform Engineering, Site Reliability Engineering, DevOps, or Infrastructure Engineering roles
Deployment expert: Deep experience with CI/CD pipelines, release strategies, and production deployments at scale
Multi-cloud expertise: Hands-on experience with Azure and AWS required (GCP is a plus)
On-premise deployment experience: Linux system administration, bare-metal provisioning, networking
Terraform expert: Deep experience writing and maintaining infrastructure as code
Observability systems: Proven track record building monitoring, alerting, and metrics platforms
Security mindset: Experience implementing security controls and best practices. Security certification preferred (CISSP, CEH, AWS/Azure Security Specialty, or similar)
Data governance: Understanding of data privacy, residency requirements, and governance frameworks
Backend/scripting skills: Python (preferred) or Go for automation, tooling, and operational scripts
Kubernetes and container orchestration in production
Strong Linux/Unix administration and scripting (Bash, Python)
CI/CD platforms: GitHub Actions, GitLab CI, Jenkins, or similar
Version control and GitOps practices
Strong problem-solving and debugging skills
Fluent in English (Spanish is a plus)
Design and implement deployment pipelines for multi-environment releases (dev, staging, production)
Own the full deployment lifecycle: build, test, release, and rollback strategies
Implement blue-green deployments, canary releases, and progressive rollouts
Build automated deployment tooling and workflows
Ensure zero-downtime deployments and rollback capabilities
Optimize build and deployment performance
Manage artifact repositories and container registries
Design and operate multi-cloud infrastructure across Azure, AWS, and GCP
Architect and deploy on-premise solutions for enterprise customers (Linux-based)
Manage Kubernetes clusters, container orchestration, and networking
Implement disaster recovery, backup strategies, and business continuity
Optimize cloud costs and resource utilization
Define and track SLIs, SLOs, and error budgets for critical services
Write and maintain Terraform modules for infrastructure provisioning
Implement GitOps workflows for infrastructure changes
Automate infrastructure scaling, updates, and operations
Ensure reproducible and version-controlled infrastructure
Design comprehensive monitoring, logging, and alerting (Prometheus, Grafana, Datadog, or similar)
Build dashboards for system health, performance, and business metrics
Implement distributed tracing for microservices
Conduct capacity planning and performance analysis
Drive reliability improvements through data-driven insights
Implement security best practices: identity management, secrets management, network policies
Work towards or maintain security certifications (SOC 2, ISO 27001, or similar)
Conduct security audits and vulnerability remediation
Implement data governance policies for AI pipelines and user data
Ensure compliance with data privacy regulations (GDPR, CCPA)
Write automation scripts and tools in Python for operational tasks
Build internal tooling for deployments, monitoring, and incident response
Develop runbooks, automation, and self-healing systems
Create APIs for infrastructure operations when needed
Maintain high code quality and testing standards for tooling
Participate in on-call rotation and lead incident response
Conduct blameless postmortems and drive action items
Build and maintain incident response playbooks
Improve system resilience and failure modes
Partner with engineering teams on deployment strategies and architecture
Work with security team on compliance and governance
Mentor engineers on operational best practices
Document systems, procedures, and runbooks

Benefits

Opportunity to work on technology that drives innovation in AI for scientific and engineering applications
Contribute to the development of new AI architectures that can reason coherently and produce interpretable and verifiable solutions
Collaborate with a global team of engineers and AI specialists
Flexible working arrangements, including hybrid or fully remote options

Related Categories

Platform Engineer

Related Job Pages

Remote Python Jobs (US)More Remote Jobs

More Platform Engineer Jobs

Platform Engineer

Curri

Transforming the way construction and industrial supplies are delivered.

Platform Engineer129 days ago

Other RemoteTeam 51-200Since 2018H1B No Sponsor

Company Site LinkedIn

• Build and maintain CI/CD pipelines and deployment automation using RWX, focusing on reliability, speed, and cost efficiency. • Manage and evolve AWS infrastructure (Aurora, ElastiCache, VPC, IAM, EC2, Secrets Manager) using Infrastructure as Code with Pulumi. • Operate, debug, and scale Kubernetes workloads in production environments. • Improve developer experience by reducing build times, enhancing tooling, and creating self-service capabilities for engineering teams. • Support and optimize the TypeScript monorepo build infrastructure and related tooling. • Collaborate closely with product engineers on debugging, system design, and performance optimization. • Participate in the on-call rotation (Tuesday-to-Tuesday) and support incident response without burnout-driven expectations.

AWS Amazon EC2 Kubernetes Terraform TypeScript

View details: Platform Engineer

California

Apply

Job Closed

Senior ML Platform Engineer – ML Platforms, MLOps

Software Mind

Software House focused on results since 1999

Platform Engineer130 days ago

Full Time RemoteTeam 1,001-5,000Since 1999H1B No Sponsor

Company Site LinkedIn

• Support and contribute hands-on to multiple ML platform POCs • Work closely with Applied Scientists, ML Engineers, and internal platform teams • Evaluate platform capabilities across: GPU training and experimentation, real-time and batch inference, orchestration, monitoring, and operability, multi-tenancy, isolation, and scalability • Assess integration points with existing in-house tooling • Perform performance and operability analysis • Contribute technical input to: Build vs buy vs extend decisions, target platform stack recommendations, OPEX and CAPEX justification for rollout

Docker Apache Kafka Kubernetes Apache Spark

View details: Senior ML Platform Engineer – ML Platforms, MLOps

Mali

Apply

Job Closed

Senior Platform Engineer

Pondurance

Delivering personalized, 24/7 MDR services that grow with your organization.

Platform Engineer130 days ago

Other RemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

• Support and enhance Pondurance’s MDR Data Pipeline • Focus on enhancing the durability and reliability of HashiCorp-based, containerized Vector endpoints and the systems that deploy them • Act as a subject matter expert for GitHub Actions, Terraform, and Nomad–based CI/CD pipeline • Own and maintain self-service deployment mechanisms for platform frontend • Manage system and application configuration using Salt and HashiCorp tooling • Oversee and maintain Docker-based systems • Implement and maintain monitoring, alerting, and dataflow solutions using Datadog • Participate in on-call rotations and root cause analysis (RCA) • Collaborate with cross-functional teams on reliability, scalability, and continuous improvement initiatives • Document processes, procedures, and troubleshooting steps to support knowledge sharing and team efficiency

Docker Linux Python SaltStack Terraform

View details: Senior Platform Engineer

Virginia

$160K / year

Apply

Job Closed

Marketing Platform Engineer

2Brains

En 2Brains, integramos estrategia, diseño y tecnología para potenciar empresas y disruptores tecnológicos.

Platform Engineer130 days ago

Contract RemoteTeam 201-500H1B No Sponsor

Company Site LinkedIn

• Diseñar, desarrollar y evolucionar Odoo como plataforma central de growth, marketing, analytics y ventas. • Eliminar la dependencia de herramientas externas y habilitar una visión completa del funnel end-to-end. • Combinando arquitectura técnica, analítica avanzada y entendimiento profundo del negocio.

JavaScript PostgreSQL Python

View details: Marketing Platform Engineer

Chile

Apply

Job Closed

Senior Platform Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More Platform Engineer Jobs

Platform Engineer

Senior ML Platform Engineer – ML Platforms, MLOps

Senior Platform Engineer

Marketing Platform Engineer