Job Closed

This listing is no longer active.

DistroKid logo
DistroKid

We're the easiest way for music creators to get music into Spotify, Apple Music, and all major streaming services.

Senior Systems Operations Engineer

DevOps EngineerDevOps EngineerOtherRemoteSeniorTeam 51-200Since 2013H1B No SponsorCompany SiteLinkedIn

Location

United States + 1 moreAll locations: United States | United Kingdom

Posted

71 days ago

Salary

$155K - $170K / year

Seniority

Senior

Job Description

Senior Systems Operations Engineer

DistroKid

Location: Remote (USA, Canada, United Kingdom, Europe) Sponsorship: Not available. We cannot support visas, work permits, or extensions in any country (including OPT/CPT, PGWP, Graduate Route, or similar programs). Salary: Varies by region — see details below Summary DistroKid is the world’s largest distributor of music to Spotify, Apple Music, YouTube, and beyond. Most new music today is released through DistroKid. We are seeking a highly skilled Senior Systems Operations Engineer with deep expertise in cloud infrastructure, Infrastructure-as-Code (IaC), and AI-enhanced operations. This role is a critical technical leadership position on the Systems Operations (SysOps) team, responsible for architecting and managing our cloud environment, driving IaC maturity, and integrating AI-powered practices that improve reliability, reduce toil, and scale our operational capabilities. You will serve as a subject matter expert in infrastructure domains, own complex workstreams end-to-end, and partner strategically with peers, engineering teams, and guidance to deliver impactful outcomes across the organization. This is a fully remote position, and success in the role depends on clear, open, and proactive communication to keep distributed teammates informed, aligned, and unblocked. What You’ll Do Cloud & Infrastructure Architecture - Design, deploy, and manage scalable and highly available cloud infrastructure on AWS, with deep expertise in core services (EC2, EKS, S3, RDS, IAM, VPC, and beyond). Develop and maintain disaster recovery plans leveraging AWS capabilities for backup and replication to ensure business continuity. Collaborate with engineering and security teams to improve infrastructure health, security, and long-term scalability. Infrastructure as Code (IaC) - Design reusable Terraform/OpenTofu modules following DRY principles and organizational standards; implement module versioning and lifecycle strategies. - Direct the migration of manual infrastructure to code; establish patterns and best practices for IaC adoption across the team. - Implement IaC testing strategies, including validation, linting, and integration testing, using tools such as Terraform-Compliance or Checkov. - Architect and maintain complex Bitbucket pipeline configurations for multi-environment IaC deployments; implement pipeline security best practices. AI-Enhanced Operations (AIOps) - Implement AIOps practices, leveraging AI tools to enhance monitoring, incident response, and predictive alerting. - Use AI-assisted development and operations tools (e.g., Cursor, Claude) to accelerate troubleshooting, code review, and documentation generation. - Evaluate and implement AI-powered automation to reduce operational toil, improve repeatability, and scale platform capabilities. Reliability & Observability - Define and implement SLOs for services; guide and/or participate in incident response and conduct blameless postmortems. - Implement chaos engineering practices to proactively identify system weaknesses before they impact production. - Build and maintain comprehensive monitoring solutions using tools such as CloudWatch and Datadog to track performance and drive optimization. Automation, Developer Experience & Internal Developer Portal - Develop automation scripts and tools in Python, Bash, or similar languages to streamline operations and eliminate manual toil. - Build self-service capabilities for development teams to reduce cognitive load and enable developer autonomy across the organization. - Guide the solution architecture and end-to-end implementation of DistroKid’s first Internal Developer Portal (IDP). - Define the IDP roadmap and success criteria in partnership with engineering leadership; establish golden paths, service catalogs, and self-service workflows that reduce deployment friction and accelerate developer productivity. - Drive adoption of the IDP across engineering teams; gather feedback, iterate on the platform, and measure impact through developer experience metrics and reduced time-to-deploy. Cost Optimization - Guide cost optimization initiatives; implement rightsizing recommendations, reserved-capacity strategies, and tagging standards for cost allocation. - Monitor and optimize AWS resource usage; select appropriate services and configurations to meet performance requirements cost-effectively. Technical Leadership & Collaboration - Direct planning, decision-making, and execution for infrastructure projects; own workstreams end-to-end. - Partner cross-functionally with engineering, security, and product teams; communicate impact in terms of company strategy and OKRs. - Provide technical mentorship to junior and mid-level engineers; invest in team growth and foster a culture of continuous learning. - Maintain and contribute to infrastructure documentation, runbooks, and architectural decision records to ensure knowledge sharing and operational consistency. Qualifications Education - Bachelor’s degree in Computer Science, Information Technology, a related field, or equivalent practical experience. Experience - 5+ years of experience in systems operations, platform engineering, or DevOps with a focus on cloud infrastructure and containerized environments. - Proven production experience with AWS services (EC2, EKS, S3, RDS, IAM, VPC, API Gateway, Event Bridge, etc) and Kubernetes. - 5+ years of hands-on experience with Infrastructure as Code tools, specifically Terraform and/or OpenTofu, including module design, state management, remote backends, and IaC testing. Technical Skills - Strong knowledge of Linux/Unix administration, systems, and shell scripting. - Proficiency in Python, Go, or similar programming languages. - Experience with CI/CD pipelines for infrastructure deployments (Bitbucket Pipelines, Jenkins, or similar). - Experience with monitoring and observability tools (Prometheus, Grafana, CloudWatch, or Datadog). - Demonstrated experience implementing or working with AIOps tools, practices, or AI-assisted operations in a professional context. - Experience using AI-assisted development tools (e.g., Cursor, Warp, Claude, or similar) to accelerate engineering work. Soft Skills - Strong communication skills with the ability to engage effectively across technical and non-technical audiences. - Practices open, transparent, and proactive communication in a fully remote environment; defaults to over-communication to keep distributed teammates informed and aligned across time zones and async workflows. - Demonstrated ability to guide and influence without formal authority. Excellent problem-solving skills with the composure to guide through incidents under pressure. - Ability to work in a fast-paced, dynamic environment with shifting priorities while maintaining a high-quality bar. Preferred Qualifications - AWS Certified Solutions Architect, DevOps Engineer, or equivalent certification. - Prior experience designing or implementing an Internal Developer Portal (IDP) using platforms such as Backstage, Port, Cortex, or equivalent. - Experience with policy-as-code tools such as OPA, Checkov, or Sentinel. - Experience with service mesh technologies (Istio, Linkerd, or similar). - Familiarity with Docker and container orchestration tools beyond Kubernetes. This salary range ONLY applies to candidates living in the USA for this job. Rates may differ in other regions. USA salary range $155,000—$170,000 USD This salary range ONLY applies to candidates living in the UK for this job. Rates may differ in other regions. UK salary range £100,000—£120,000 GBP This salary range ONLY applies to candidates living in the EU for this job. Rates may differ in other regions. EU salary range €55.000—€110.000 EUR This salary range ONLY applies to candidates living in Canada for this job. Rates may differ in other regions. Canada salary range $160,000—$180,000 CAD What We Offer - Retirement plans (401k, SIPP, etc.), Health insurance, Generous paid time off, Parental leave, Home office allowance, Flexible work schedules, Paid and discounted subscriptions, Regular engagement activities About DistroKid DistroKid helps millions of independent artists get their music into streaming services and keep 100% of their earnings. We move fast, stay curious, and build tools that empower creativity. If you want your work to directly impact how artists share their music with the world, we’d love to hear from you. DistroKid is an Equal Opportunity Employer We are committed to building a diverse and inclusive team and strongly encourage applications from individuals of all backgrounds, identities, and experiences. We value a wide range of perspectives and believe that our differences make us stronger.

Job Requirements

  • Bachelor’s degree in Computer Science, Information Technology, or equivalent practical experience.
  • 5+ years of experience in systems operations, platform engineering, or DevOps.
  • Proven production experience with AWS services and Kubernetes.
  • 5+ years of hands-on experience with Infrastructure as Code tools, specifically Terraform and/or OpenTofu.
  • Strong knowledge of Linux/Unix administration and shell scripting.
  • Proficiency in Python, Go, or similar programming languages.
  • Experience with CI/CD pipelines for infrastructure deployments.
  • Experience with monitoring and observability tools.
  • Demonstrated experience implementing or working with AIOps tools.
  • Experience using AI-assisted development tools.
  • Strong communication skills with the ability to engage effectively across technical and non-technical audiences.
  • Practices open, transparent, and proactive communication in a fully remote environment.
  • Demonstrated ability to guide and influence without formal authority.
  • Excellent problem-solving skills with the composure to guide through incidents under pressure.
  • Ability to work in a fast-paced, dynamic environment with shifting priorities.

Benefits

  • Retirement plans (401k, SIPP, etc.)
  • Health insurance
  • Generous paid time off
  • Parental leave
  • Home office allowance
  • Flexible work schedules
  • Paid and discounted subscriptions
  • Regular engagement activities

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Self Financial, Inc. logo

Senior DevOps Engineer

Self Financial, Inc.

Build credit. Build savings. Build dreams.

DevOps Engineer71 days ago
Full TimeRemoteTeam 51-200Since 2015H1B No Sponsor

• Work closely with Developers to provide feedback and drive operational improvements within our production infrastructure • Develop infrastructure and network best practices • Participate in our on-call operations and monitoring pool • Document processes and procedures with the appropriate level of detail • Create, maintain, extend, and operate the SDLC platform used to build, test, package, and deploy our web and mobile services and applications • Measure and optimize system performance to push our capabilities forward, get ahead of customer needs, and innovate to continually improve • Provide primary operational support for large, distributed software applications • Production performance and uptime analysis

Texas
$105K - $156K / year
Job Closed
Aura logo

DevOps Engineer

Aura

Aura is a mission driven digital security company dedicated to creating a safer internet.

DevOps Engineer71 days ago
Full TimeRemoteTeam 501-1,000H1B Sponsor

• Create tools for use by all of Aura in multiple languages and paradigms. EG Serverless / Containers. • Investigate better ways to enable teams to deploy and manage networks and product deployments • Assist with deployments using Terraform and GH Actions • Debug cloud issues when development teams need assistance • Offer design assistance for teams trying new stuff. Help offer vision and strategic optimization to ensure we move forward quickly, and securely

Poland
Job Closed
Smart Working logo

Senior Full Stack Engineer – Platform, DevOps

Smart Working

Empowering companies to work with the best engineers in the world

DevOps Engineer71 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

• Deliver priority roadmap features across frontend and backend systems • Develop and maintain applications using Node.js, React, and TypeScript • Support and enhance NHS and third-party integrations • Contribute to scalable, reusable platform architecture • Maintain and optimise cloud infrastructure (AWS preferred) • Build, maintain, and improve CI/CD pipelines and release processes • Improve monitoring, logging, and system observability • Strengthen deployment reliability and platform stability • Implement and support security improvements and secure coding practices • Ensure alignment with compliance standards (e.g., GDPR, ISO27001) • Improve engineering quality and release discipline by: Reducing defect leakage, Strengthening testing practices and • Supporting test-first / automated testing approaches • Collaborate with Product and stakeholders to translate complex requirements into pragmatic technical solutions • Work across multiple concurrent workstreams and manage priorities effectively

Pakistan
Smart Working logo

DevOps Engineer – Google Cloud Platform, Terraform

Smart Working

Empowering companies to work with the best engineers in the world

DevOps Engineer71 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

• Design, implement, and maintain cloud-native infrastructure on Google Cloud Platform (GCP) using Terraform across multiple environments (production, staging, sandbox, and customer deployments). • Architect and operate serverless container workloads using Cloud Run, ensuring efficient scaling, resource management, and cost optimisation. • Design and manage event-driven systems using Pub/Sub, including message retention, acknowledgement deadlines, dead-letter queues (DLQ), and monitoring. • Build and maintain CI/CD pipelines using GitHub Actions and Cloud Build, including automated Terraform deployments and GitOps-based workflows. • Develop reusable Terraform modules and manage infrastructure across multiple GCP projects using best practices for remote state and environment separation. • Manage containerized workloads and cloud networking using services such as GKE, VPC, Load Balancers, Cloud Armor, IAM, and Secret Manager. • Collaborate with software engineers on architecture design decisions, including scaling strategies, service separation (HTTP vs WebSockets), and performance optimisation. • Implement monitoring, alerting, and observability using Google Cloud Monitoring, Cloud Logging, Sentry, and OpenTelemetry. • Administer and optimise data infrastructure, including MongoDB Atlas, Redis, BigQuery, and Cloud Storage.**Perform incident response and root cause analysis, implementing long-term improvements to increase reliability and resilience. • Own infrastructure end-to-end, including architecture decisions, performance optimisation, cost management, and operational excellence. • Create and maintain documentation, operational runbooks, and best practices. • Mentor engineers and promote DevOps and cloud architecture best practices across the organisation.

Pakistan