Job Closed
This listing is no longer active.
AI observability command center to detect and solve emerging quality issues.
Senior Infrastructure Engineer
Location
New York
Posted
35 days ago
Salary
$190K - $210K / year
Seniority
Senior
Job Description
Senior Infrastructure Engineer
Axion
• Design, build, and maintain Axion's cloud infrastructure on GCP, ensuring high availability, scalability, and security • Own and evolve our Kubernetes-based container orchestration platform and Terraform infrastructure-as-code practices • Build and improve CI/CD pipelines, developer tooling, and deployment automation to help engineering teams ship with speed and confidence • Establish and maintain observability across our stack — logging, metrics, alerting, and tracing — and lead incident response efforts • Define and track SLOs and SLIs, driving a culture of reliability and continuous improvement • Contribute to security hardening, compliance posture, and cost optimization across our cloud environment
Job Requirements
- 4+ years of experience in DevOps, SRE, or infrastructure engineering
- Deep hands-on experience with GCP (Cloud Run, GKE, Cloud SQL, Pub/Sub, or similar services)
- Strong proficiency with Kubernetes — cluster management, networking, scaling, and troubleshooting
- Solid Terraform experience — writing modular, reusable, production-grade infrastructure-as-code
- Experience building and maintaining CI/CD pipelines and developer tooling
- Strong background in observability tooling (Datadog, Prometheus, Grafana, or equivalent)
- Comfortable operating in fast-paced, ambiguous environments with a bias toward action
- Experience at a venture-backed startup preferred
- Must be based in the Eastern or Central time zones
Benefits
- Generous time off
- Competitive compensation, equity, and benefits
- Lunch stipend
- Work with cutting-edge AI technology making a tangible impact in manufacturing
- Collaborative, mission-driven team and supportive leadership
Related Guides
Related Categories
Related Job Pages
More Infrastructure Engineer Jobs
Job DetailsJob Location: Work From Home - McLean, VA 22012Position Type: Full TimeSalary Range: $128,000.00 - $145,000.00 Salary/yearCORAS is a secure, cloud-native SaaS platform that enables government and defense organizations to manage risk, compliance, and security operations in highly regulated environments. We’re looking for a Cloud Infrastructure Engineer to help operate and evolve the infrastructure behind CORAS. In this role, you’ll work closely with senior engineers to support the reliability, security, and performance of a production system used by Federal customers. This is a hands-on, mid-level opportunity for someone with strong Cloud and Linux fundamentals. You’ll contribute to day-to-day operations while also helping improve the platform’s scalability and resilience over time. Key Responsibilities Cloud & Infrastructure Support and maintain AWS infrastructure (EC2, VPC, IAM, storage, load balancing) Monitor system performance, troubleshoot issues, and improve reliability Assist with infrastructure changes and deployments in a production environment Systems Administration Administer and maintain Linux-based systems (RHEL or similar) Support patching, updates, and system hardening best practices (STIGs) Assist with Windows Server and identity systems, as needed Containers & Platform Work with containerized workloads (Docker, ECS/Fargate, or similar) Support application deployments and infrastructure operations Security & Compliance Follow security best practices and support compliance requirements in a regulated environment Assist with vulnerability remediation and system monitoring Gain exposure to frameworks such as FedRAMP, NIST, and DoD security standards Collaboration Partner with engineering and security teams to support a stable, secure platform Participate in on-call rotation for production support Required Qualifications Experience 3–6 years of experience in cloud infrastructure, systems engineering, and/or DevOps roles Experience working with AWS (Commercial or GovCloud) Experience supporting production systems or applications, preferably a SaaS offering Technical Skills Strong Linux administration skills (RHEL or similar) Familiarity with core AWS services (compute, networking, storage) Basic experience with containers (Docker, ECS, Kubernetes, or similar) Understanding of networking fundamentals (VPCs, subnets, security groups) Exposure to monitoring, logging, or troubleshooting tools Clearance & Compliance Must be eligible to obtain a DoD security clearance US Citizenship required Nice to have experience Experience in regulated environments (FedRAMP, DoD, healthcare, fintech, etc.) Familiarity with security practices (system hardening, vulnerability scanning) Experience with tools such as Splunk, Nessus, or similar Exposure to identity systems (Active Directory, SSO) Experience with databases (e.g., MongoDB or similar) AWS certifications or relevant technical certifications Work Environment This is a fully remote position. Candidates must be able to work standard US business hours and be available for on-call support as required by a production mission-critical environment. All work is performed within a DoD-compliant, FedRAMP High authorized AWS GovCloud environment. Candidates must be comfortable operating within the security constraints and change management processes of a government-facing platform. Experience with databases (e.g., MongoDB or similar) AWS certifications or relevant technical certifications Benefits Medical, Dental and Vision Coverage 401(k) Matching PTO Qualifications
Job DetailsJob Location: Work From Home - McLean, VA 22012Position Type: Full TimeSalary Range: $128,000.00 - $145,000.00 Salary/yearCORAS is a secure, cloud-native SaaS platform that enables government and defense organizations to manage risk, compliance, and security operations in highly regulated environments. We’re looking for a Cloud Infrastructure Engineer to help operate and evolve the infrastructure behind CORAS. In this role, you’ll work closely with senior engineers to support the reliability, security, and performance of a production system used by Federal customers. This is a hands-on, mid-level opportunity for someone with strong Cloud and Linux fundamentals. You’ll contribute to day-to-day operations while also helping improve the platform’s scalability and resilience over time. Key Responsibilities Cloud & Infrastructure Support and maintain AWS infrastructure (EC2, VPC, IAM, storage, load balancing) Monitor system performance, troubleshoot issues, and improve reliability Assist with infrastructure changes and deployments in a production environment Systems Administration Administer and maintain Linux-based systems (RHEL or similar) Support patching, updates, and system hardening best practices (STIGs) Assist with Windows Server and identity systems, as needed Containers & Platform Work with containerized workloads (Docker, ECS/Fargate, or similar) Support application deployments and infrastructure operations Security & Compliance Follow security best practices and support compliance requirements in a regulated environment Assist with vulnerability remediation and system monitoring Gain exposure to frameworks such as FedRAMP, NIST, and DoD security standards Collaboration Partner with engineering and security teams to support a stable, secure platform Participate in on-call rotation for production support Required Qualifications Experience 3–6 years of experience in cloud infrastructure, systems engineering, and/or DevOps roles Experience working with AWS (Commercial or GovCloud) Experience supporting production systems or applications, preferably a SaaS offering Technical Skills Strong Linux administration skills (RHEL or similar) Familiarity with core AWS services (compute, networking, storage) Basic experience with containers (Docker, ECS, Kubernetes, or similar) Understanding of networking fundamentals (VPCs, subnets, security groups) Exposure to monitoring, logging, or troubleshooting tools Clearance & Compliance Must be eligible to obtain a DoD security clearance US Citizenship required Nice to have experience Experience in regulated environments (FedRAMP, DoD, healthcare, fintech, etc.) Familiarity with security practices (system hardening, vulnerability scanning) Experience with tools such as Splunk, Nessus, or similar Exposure to identity systems (Active Directory, SSO) Experience with databases (e.g., MongoDB or similar) AWS certifications or relevant technical certifications Work Environment This is a fully remote position. Candidates must be able to work standard US business hours and be available for on-call support as required by a production mission-critical environment. All work is performed within a DoD-compliant, FedRAMP High authorized AWS GovCloud environment. Candidates must be comfortable operating within the security constraints and change management processes of a government-facing platform. Experience with databases (e.g., MongoDB or similar) AWS certifications or relevant technical certifications Benefits Medical, Dental and Vision Coverage 401(k) Matching PTO Qualifications
Infrastructure Engineer, Data & Automations
ElevenLabsElevenLabs is a young voice AI research and deployment company on a mission to make content universally accessible. Specifically, the company provides a text-to
• Owning the infrastructure underpinning our Data and Automations teams - setting up internal services, building and maintaining ETLs, and connecting systems with one another. • Taking end-to-end ownership of platform reliability and security, with a particular focus on improving security across our internal systems. • Collaborating closely with the Infrastructure team to bridge platform needs with infra capabilities. • Partnering with Growth, Finance and other internal teams to ensure they have the data and tooling they need.
• Build and operate production-grade model serving infrastructure using frameworks such as vLLM, TGI, Triton, or equivalent • Design and implement robust deployment pipelines with blue/green and canary rollout strategies for ML models • Develop and maintain auto-scaling systems, multi-model serving architectures, and intelligent request routing layers • Optimize GPU utilization, memory efficiency, network throughput, and model artifact storage performance • Design observability systems for tracking inference latency, throughput, GPU usage, cost metrics, and system health • Manage model registries and CI/CD pipelines enabling automated and reproducible model deployments • Own the full lifecycle of ML systems from development through production, including operational support and on-call responsibilities • Define engineering best practices and contribute to platform scalability in a fast-moving startup environment



