Job Closed

This listing is no longer active.

Intetics

Where software concepts come alive™

1058 | SRE / DevOps / Infrastructure Engineer

DevOps EngineerDevOps EngineerOther Remote Mid LevelTeam 501-1,000Since 1995H1B No SponsorCompany Site LinkedIn

Location

Poland

Posted

73 days ago

Salary

Seniority

Mid Level

English

Job Description

Intetics Inc., a global technology company providing custom software application development, distributed professional teams, software product quality assessment, and “all-things-digital” solutions, is seeking a highly skilled and experienced Senior DevOps Engineer to join our dynamic team on a full-time basis. About the Project A fast-growing tech company is building an infrastructure layer for modern AI workloads — a globally distributed platform that provides scalable, cost-efficient, and reliable access to GPU computing resources. The platform enables customers to run production-level inference workloads across a diverse network of providers, offering flexibility, performance, and resilience required for real-world AI applications. Since its launch, the company has demonstrated strong traction, securing a significant Series A investment and achieving multi-million ARR within its first year of operation. As both customer demand and platform scale continue to expand, the team is actively growing its infrastructure capabilities to support the next stage of development. About the Role We are looking for a strong SRE / DevOps / Infrastructure Engineer to help scale and operate a distributed AI-focused infrastructure platform. The system combines a cloud-based control layer (running on AWS, including EKS and managed MySQL) with a large fleet of GPU-powered nodes distributed across multiple external providers. These components are connected via a custom networking layer to ensure high availability and performance for production workloads. Workloads are orchestrated with Kubernetes, while observability is built around Prometheus, Grafana, Loki, Jaeger, and OpenTelemetry, covering metrics, logging, and tracing across the platform. While the control layer is relatively lightweight and cloud-native, the GPU infrastructure introduces additional complexity. It spans different providers and environments, often resembling distributed on-premise setups rather than standard cloud infrastructure, requiring a deeper understanding of networking, reliability, and systems behavior at scale. This is a hands-on role focused on solving real infrastructure challenges across Kubernetes, networking, observability, and production operations. You will join a small, high-impact infrastructure team (currently a couple of engineers) that is actively growing as the platform and customer base continue to expand. The goal is to strengthen the core infrastructure early and support further scaling. What you’ll do - Build, operate, and improve the infrastructure powering Parasail’s distributed inference platform - Own reliability, scalability, and operational excellence across AWS-based control planes and our multi-provider GPU fleet - Design and maintain the networking layer connecting control planes, Kubernetes clusters, and geographically distributed GPU hosts - Operate and improve Kubernetes-based inference orchestration, primarily on EKS - Manage deployments and infrastructure changes using Helm, FluxCD, and Terraform - Improve observability across the platform using metrics, logs, traces, dashboards, and alerting built on Prometheus, Grafana, Loki, Jaeger, and OpenTelemetry - Tune alerts, improve runbooks, and strengthen operational readiness as the system scales - Respond to production issues, perform root cause analysis, and implement durable fixes - Work closely with engineers across time zones using clear asynchronous communication and handoff practices, especially through Slack - Help expand Europe-based infrastructure coverage to support sustainable operations outside US business hours

Job Requirements

5+ years of experience in SRE, DevOps, platform engineering, or infrastructure engineering
Strong production experience with networking and Kubernetes
Experience operating AWS infrastructure in production, especially EKS
Strong hands-on experience managing Linux hosts, clusters, and distributed systems in environments that are not fully abstracted by a major cloud provider
Experience with Prometheus, Grafana, Loki, Jaeger, and OpenTelemetry
Experience with deployment and GitOps workflows using tools such as Helm and FluxCD
Experience with infrastructure as code, ideally Terraform
Familiarity with alert tuning, runbook development, and practical incident management in production systems
Strong operational judgment: able to troubleshoot independently, respond calmly to incidents, and improve systems without constant direction
Comfortable working in a fast-moving startup where infrastructure, product, and customer demands are changing quickly
Clear communicator who can work effectively in an async environment and handle shift handoffs cleanly
Nice to have
Experience with AI inference, ML infrastructure, or adjacent high-performance distributed systems
Experience operating heterogeneous GPU fleets, bare-metal infrastructure, or multi-provider compute environments
Experience using AI tools productively in engineering workflows

Related Categories

DevOps Engineer

Related Job Pages

More Remote Jobs

More DevOps Engineer Jobs

1058 | SRE / DevOps / Infrastructure Engineer

Intetics

Where software concepts come alive™

DevOps Engineer73 days ago

Other RemoteTeam 501-1,000Since 1995H1B No Sponsor

Company Site LinkedIn

View details: 1058 | SRE / DevOps / Infrastructure Engineer

Ukraine

Apply

Job Closed

Senior DevOps Engineer

Excyl, Inc.

A Recruiting Firm Driven By Quality, Integrity and Partnership..!!

DevOps Engineer73 days ago

Contract RemoteTeam 51-200Since 1997H1B No Sponsor

Company Site LinkedIn

• Integrate AWS codepipeline with ServiceNow APIs (Indepth knowledge of ServiceNow APIs is a must, JIRA/JIRA Align API skills can transfer if candidate can learn fast) • Integrate AWS codepipeline with BitBucket APIs (Indepth knowledge of BitBucket APIs is a must, Gitlab/Github/Git API skills can transfer if candidate can learn fast) • Build codepipeline, codedeploy and codebuild based CICD pipelines using AWS cloudformation (skills like Jenkins Groovy/Pipeline as Code can transfer if candidate can learn fast) • Develop infrastructure as Code using Cloud formation YAML/JSON (not python Boto3; candidate can transfer skills from Python if candidate can learn fast)

AWS Cloud Docker Groovy Java Jenkins Kubernetes Python ServiceNow Spring Spring Boot SpringBoot

View details: Senior DevOps Engineer

Maryland

Apply

DevOps Engineer

Endava

Technology is our how. And people are our why.

DevOps Engineer73 days ago

Full Time RemoteTeam 10,001+Since 2000H1B No Sponsor

Company Site LinkedIn

• Seeking a highly skilled and experienced Senior DevOps Engineer to join our dynamic team • Design, implement, and manage scalable infrastructure using Terraform • Create and maintain reusable Terraform modules and templates to streamline deployment processes • Architect, Manage and optimize cloud resources across platforms such as AWS, Azure, or Google Cloud • Implement best practices for security, compliance, and cost management in cloud environments • Experience with serverless architectures and microservices • Build and maintain robust CI/CD pipelines to automate the deployment of applications and infrastructure • Collaborate with development teams to integrate automated testing and deployment strategies • Implement monitoring, logging, and alerting solutions for cloud infrastructure • Provide on call support for production systems as needed

Ansible AWS Azure Chef Cloud Docker Jenkins Kubernetes Microservices Puppet Python Terraform

View details: DevOps Engineer

United States

Apply

Job Closed

Senior Site Reliability Engineer, SRE

PrizePicks

PrizePicks is the fastest-growing sports company in North America according to the 2023 Inc. 5000 rankings, two years running, and the largest independent skill-based fantasy sports operator in the country.

DevOps Engineer73 days ago

Full Time RemoteTeam 201-500H1B No Sponsor

Company Site LinkedIn

• Design, implement, maintain, and monitor reliable production systems at scale. • Lead incident response, mitigate production issues, and conduct post mortem analysis. • Proactively monitor performance, analyze system failures, identify bottlenecks, and propose solutions. • Create and support observability/monitoring tools and vendor integrations. • Drive the growth of a reliability culture, promoting cross-functional collaboration towards improving system reliability, scalability, resilience, and security. • Train and mentor other engineers.

AWS Azure Cloud Google Cloud Platform Grafana Kubernetes Python Ruby Terraform

View details: Senior Site Reliability Engineer, SRE

United States

$120K - $175K / year

Apply

Job Closed

1058 | SRE / DevOps / Infrastructure Engineer

Job Description

Job Requirements

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

1058 | SRE / DevOps / Infrastructure Engineer

Senior DevOps Engineer

DevOps Engineer

Senior Site Reliability Engineer, SRE