Job Closed

This listing is no longer active.

DevOps/SRE Team Lead

DevOps EngineerDevOps EngineerFull TimeRemoteLeadTeam 501-1,000Since 1998H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

33 days ago

Salary

0

Seniority

Lead

No structured requirement data.

Job Description

DevOps/SRE Team Lead

Telestream

Role Description We are seeking a DevOps/SRE Team Lead with proven, hands-on Kubernetes expertise to drive the reliability and scalability of our video processing infrastructure and oversee a small team of SREs and DevOps Engineers. This is a deeply technical lead role, requiring real-world experience administering production Kubernetes clusters—not theoretical familiarity. You will own CI/CD pipelines, infrastructure automation, and cloud platform operations in a fully remote environment where independent execution is essential. You will spend 70-80% of your day being hands-on in the following areas: - Design, deploy, and administer production Kubernetes clusters, including workload scheduling, namespace management, RBAC, network policies, and cluster upgrades. - Design and maintain continuous integration/deployment pipelines to automate testing and deployment, including Kubernetes-native delivery workflows using Helm and ArgoCD or equivalent. - Track software performance, fixing errors, troubleshooting systems, implement preventative measures to ensure smooth workflows. - Implement and manage infrastructure. - Utilize Terraform or CloudFormation for IaC management. - Optimize cloud resources by implementing cost-effective solutions. - Collaborate with various teams to ensure smooth deployment. - Monitor and create new processes based on performance analysis. - Implement security best practices, including automated compliance checks and secure code deployment. You will spend 20-30% of your time managing the following areas: - Manage the technical roadmap, architecture while mentoring SRE and DevOps Engineers (Player/Coach). - Hire, coach, and manage a team of DevOps engineers and Site Reliability Engineers. - Strong communication, conflict resolution, and the ability to influence without authority. - Define DevOps/Platform roadmap aligned with business goals (e.g., cloud cost optimization, automation maturity). - Excellent communication and collaboration skills. Qualifications - Bachelor’s degree in computer science, Engineering or equivalent. - 5-8+ years of experience in DevOps/SRE, with 2-3+ years in a leadership role. - Hands-on experience building and maintaining CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, or equivalent) with direct integration into Kubernetes deployment workflows. - Production-level experience with infrastructure as code (Terraform required; CloudFormation or Pulumi a plus), including managing cloud-hosted Kubernetes clusters (EKS, GKE, or AKS). - Experience with monitoring, logging, and observability tooling in Kubernetes environments (Prometheus, Grafana, Datadog, ELK/EFK stack, or equivalent); ability to build dashboards and alerts from scratch, not just consume existing ones. - Demonstrated, hands-on Kubernetes experience in production environments: cluster administration, Helm chart authoring and management, RBAC configuration, persistent storage, horizontal/vertical pod autoscaling, and diagnosing and resolving real production failures (CrashLoopBackOff, OOMKilled, networking issues, etc.). - Strong troubleshooting skills with the ability to diagnose infrastructure and application issues live, under pressure, without reference materials—this is evaluated directly in our interview process. - Proficiency in scripting languages (Python, Go, Bash, or PowerShell); ability to write and own automation scripts, not just modify existing ones. Benefits - Day-one medical, dental & vision coverage. - 100% company-paid life + disability insurance. - 401(k) with a sweet company match (up to 8%). - Quarterly HSA boosts & flexible spending accounts. - Flexible time off (salaried) or PTO (hourly) + generous paid holidays. - Pet insurance (yes, your dog gets benefits too). - Legal plan + extras like accident & critical illness coverage.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

LWSA logo

Senior Infrastructure/SRE Analyst

LWSA

Integrando soluções & Impulsionando negócios

DevOps Engineer33 days ago
Full TimeRemoteTeam 1,001-5,000Since 1998H1B No Sponsor

• Ensure the stability, availability, and performance of production environments, focusing on automation and reducing manual interventions (toil); • Work across the service lifecycle, from design and deployment to monitoring and continuous improvement; • Implement and evolve infrastructure as code (IaC) and CI/CD pipelines; • Manage and optimize critical services, including Kubernetes/ECS clusters and database layers; • Develop and maintain observability strategies (logs, metrics, and APM) to support root cause analysis and troubleshooting; • Ensure security routines, backup policies, disaster recovery, and cloud cost management; • Collaborate with development teams to improve software architecture and delivery processes; • Document technical procedures and promote infrastructure and reliability best practices across engineering; • Proactively participate in continuous improvement analyses and the architecture of highly available systems.

Brazil
Job Closed
A.C.Coy Company logo

DevSecOps Architect

A.C.Coy Company

Staffing and consulting firm specializing in IT, Accounting & Finance, Engineering and Sales placements.

DevOps Engineer33 days ago
ContractRemoteTeam 51-200Since 1986H1B No Sponsor

• Lead the evolution of the software delivery lifecycle by embedding security into every stage of the CI/CD pipeline • Architect and maintain automated CI/CD pipelines that utilize AI/ML models for static and dynamic analysis (SAST/DAST) to identify complex vulnerabilities that traditional rule-based tools miss • Design security frameworks for the end-to-end AI lifecycle, including securing data ingestion, protecting model weights, and implementing 'Guardrail' architectures for Large Language Models (LLMs) • Develop AI-driven orchestration (SOAR) to automate the triage and remediation of security findings, reducing manual overhead for engineering teams • Implement enterprise-wide governance using tools like Open Policy Agent (OPA) to enforce security compliance automatically across multi-cloud environments • Conduct advanced threat modeling for cloud-native applications, specifically accounting for AI-specific attack vectors like model inversion or data poisoning • Create self-service security tools and 'Golden Paths' that allow developers to deploy securely without friction, fostering a proactive security culture • Achieve 90% automated security coverage across all production-bound code • Utilize AI to reduce vulnerability remediation time by 40% within the first year

Virginia
Decisive Point Consulting logo

Junior DevOps Engineer

Decisive Point Consulting

DPCG is an Equal Opportunity Employer committed to hiring and developing the most qualified individuals based on merit, experience, and business needs, without regard to any protected status under applicable law.

DevOps Engineer33 days ago

Role Description As a Junior DevOps Engineer on our team, you’ll use your experience to streamline our software development life cycle from requirements to monitoring in production. You’ll incorporate open-source tools, automation, and cloud resources to cut down on tedious, boring tasks and free up the teams to do what they do best – innovate. You’ll implement continuous integration and delivery to limit manual testing and troubleshooting. This is an opportunity to broaden your skillset into areas like automation, cloud-based development, and open-source tools. This role may require team members to provide after-hours support for deployments. These are pre-scheduled, but may not occur at a standard cadence. Position is 100% remote - Monday - Friday 8AM EST to 4:30PM EST. Qualifications - For the DevOps Engineer, Junior three (3) years of experience is required. - For the System Engineer, Junior one (1) year of experience is required. - Experience with Linux systems engineering efforts in system design and evaluation, solution engineering, software development, or system administration. - Experience as an administrator for one of the following container platforms: Docker, OpenShift, Kubernetes, and/or EKS. - Experience with advanced scripting languages, including how to write infrastructure as code and the development of custom scripts to automate capabilities including the use of Ansible, Python, Bash, and Terraform. - Experience with a complex build system. - Knowledge of Agile methodologies or the software development life cycle (SDLC). Requirements - Experience with developing back-end software applications. - Experience with leveraging Cloud service providers, including AWS and Azure, to create automated DevSecOps pipelines. - Experience with configuring an Opensearch cluster and managing index lifecycle. - Knowledge of security scanning tools, configuring their use and working with teams to fix vulnerabilities. - Knowledge of Java. Company Description DPCG is an Equal Opportunity Employer committed to hiring and developing the most qualified individuals based on merit, experience, and business needs, without regard to any protected status under applicable law.

United States
$60K - $80K / year
Backblaze logo

Site Reliability Engineer I

Backblaze

Backblaze is the cloud storage innovator delivering a modern alternative to traditional cloud providers.

DevOps Engineer33 days ago
Full TimeRemoteTeam 201-500Since 2007H1B Sponsor

• Act as first point of contact for all customer affecting issues • Be a Key Driver for managing the resolution of technical problems • Ensure that incident management processes are following and that incident post-mortems are completed to capture process deviations and areas for improvement • Deliver consistent communication to Management • Respond to zabbix alerts/regular monitoring of zabbix, either by taking direct action on alerts or escalating. Acknowledge every alert if direct action taken, or with escalation point of contact. • Make sure escalations are handed off successfully. • Ensure health of pods across all sites (define pod alerts on zabbix). • Work through daily filesystem checks for pods. • Troubleshoot technical issues for DC Techs -> advanced pod questions, deployment questions, migration troubleshooting, and ansible playbook issues. • Identification and escalating any potential issues regarding the network. • Vault pre-deployment configuration and testing. • Start Vault Migrations, monitor migration pods, handle applicable migration pod health checks. • Document/Work on automating Daily Items. • Document/Provide Network IP's for upcoming deployments. • Monitor Releases/Updates to the Server Farm, escalate issues as they arise. • Engaging in on-call rotation shifts. • Assist fellow TechOps team members in handling tasks. • Making recommendations for improvements in organizational productivity. • Be able to work outside of normal business hours(weekend shift, holidays & evenings) as needed

India
$66K - $88K / year