Director, SRE

DevOps EngineerDevOps EngineerFull TimeRemoteLeadTeam 51-200

Location

United States

Posted

58 days ago

Salary

$180K - $215K / year

Seniority

Lead

No structured requirement data.

Job Description

Director, SRE

Cast & Crew

About Us At Cast & Crew, we’ve empowered creativity and supported the global entertainment industry for decades. Together with our family of brands - Backstage, CAPS, Checks & Balances, Final Draft, Media Services, Sargent-Disc, and The TEAM Companies – we operate as a combined entertainment technology and services provider offering industry standard screenwriting accounting software, digital payroll products, data & reporting, and a host of creative tools. The industry continues to move faster than ever, and the need for our expertise, our technology, and our people has never been greater. We are a production’s best ally every step of the way. #OneCastOneCrew The Director of SRE is a senior leadership role responsible for the reliability, scalability, and operational excellence of a large, multi-discipline engineering organization. You will own the platform and DevOps engineering functions — building and leading the teams, tools, and practices that allow software engineering, data engineering, and QA teams to ship confidently and run sustainably. You will report directly to the VP of Engineering and partner closely with software and data engineering leadership to define and deliver on reliability and platform strategy. This is both a hands-on leadership role and a strategic one: you will be as comfortable driving organizational design conversations as you are reviewing incident post-mortems or evaluating a new observability toolchain. Key Responsibilities Platform & DevOps Engineering - Own the platform engineering roadmap — CI/CD pipelines, container orchestration (EKS/Kubernetes), secrets management, and infrastructure-as-code standards across the org. - Drive standardization and adoption of DevOps best practices, including YAML pipeline conventions, Dockerfile standards, and deployment patterns. - Partner with software and data engineering teams to reduce toil, improve deployment frequency, and reduce time-to-restore. - Oversee the engineering organization's observability strategy, including tooling (New Relic) and alerting integration (PagerDuty, Microsoft Teams). People Leadership & Org Building - Lead, mentor, and grow a team of SRE and DevOps engineers — establishing clear career paths, setting high standards, and fostering a culture of ownership and psychological safety. - Define team topology for the SRE and platform functions, including team scope, interfaces with stream-aligned teams, and on-call responsibilities. - Build hiring plans and execute on them — owning the full cycle from role definition through onboarding. - Serve as an organizational model for blameless culture, continuous improvement, and cross-functional collaboration. Incident Management & Reliability - Own the incident management process end-to-end: severity classification, on-call rotation design, escalation paths, post-incident reviews, and tooling integration. - Drive down MTTR and MTBF through systematic root cause analysis, reliability investments, and proactive capacity planning. - Champion SLO/SLI/error budget practices across engineering teams. Strategy & Governance - Contribute to engineering-wide standards and documentation — including coding standards, CI/CD expectations, and operational runbooks in Confluence and Azure DevOps. - Act as a key voice in architectural decisions with reliability, scalability, or operational implications. - Stay current on industry trends and lead evaluation of emerging tools and practices. What We're Looking For Required - 8+ years in SRE, DevOps, or platform engineering, with at least 3 years in a senior leadership role managing managers or senior ICs. - Demonstrated experience leading platform or DevOps engineering teams in a large, multi-team engineering organization (100+ engineers). - Deep hands-on background in CI/CD, container-based infrastructure, cloud platforms (Azure preferred), and observability tooling. - Experience defining and scaling on-call programs, incident management processes, and reliability practices. - Strong communication skills — able to translate technical complexity for senior stakeholders and drive alignment across engineering leadership. - Track record of building high-performing teams through hiring, mentoring, and clear goal-setting. Nice to Have - Familiarity with Azure DevOps pipelines and EKS-based deployment patterns. - Experience with Team Topologies principles or similar frameworks for team structure design. - Background in highly regulated or enterprise environments. - Exposure to feature flag management (e.g., Unleash) and progressive delivery strategies. Why This Role - You will own something that matters — building the reliability and platform foundation for a large, actively modernizing engineering organization. - The org is in a meaningful growth and standardization phase, giving you real opportunity to shape culture, process, and tooling. - You will have a direct reporting relationship to engineering leadership with organizational trust and authority to make decisions. - Collaborative, transparent engineering culture with investment in documentation, standards, and continuous improvement. Benefits Cast & Crew provides a comprehensive package of employee benefits including: Medical, Dental, Vision, PTO, health and wellness programs, employee discounts, and more! Note: Cast & Crew benefits are subject to eligibility requirements. Cast & Crew is an equal opportunity employer committed to hiring a diverse workforce and sustaining an inclusive culture. It is our policy to provide equal employment opportunities to all individuals based on job-related qualifications and ability to perform a job, without regard to age, gender, gender identity, sexual orientation, race, color, religion, creed, national origin, disability, genetic information, veteran status, citizenship or marital status, and to maintain a non-discriminatory environment free from intimidation, harassment or bias based upon these grounds. CA residents Your personal information may be collected in connection with certain services provided by Cast & Crew or its affiliated companies. A summary of your California privacy rights can be found at: https://www.castandcrew.com/privacy-policy/ Compensation is commensurate with various factors including, but not limited to, relevant experience, qualifications, skills, training, licensure, certifications, geographic cost of labor, and other business and organizational needs. Compensation range for candidates in other locations may differ based on the cost of labor in that location. The compensation range for this position is: $180,000.00 - $215,000.00 per year.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Full TimeRemoteTeam 5,001-10,000Since 1995H1B No Sponsor

We are tech transformation specialists, uniting human expertise with AI to create scalable tech solutions. With over 8,000 CI&Ters around the world, we’ve built partnerships with more than 1,000 clients during our 30 years of history. Artificial Intelligence is our reality. How you’ll make an impact: Are you an experienced Senior/Specialist DevOps Engineer with a passion for platform architecture and solution design? Do you thrive in environments where you can drive innovation and shape the future of infrastructure? Join our team and play a key role in designing, building, and optimizing cutting-edge solutions that enhance our platform’s scalability, reliability, and efficiency. As a Senior DevOps Engineer, you'll work closely with cross-functional teams to deliver robust infrastructure designs, perform PoCs, and ensure our DevOps practices are aligned with the latest industry trends. What will you be doing? - Collaborate with software architects and development teams to define infrastructure requirements and design comprehensive platform solutions. - Lead the design, implementation, and optimization of CI/CD pipelines to streamline software development, testing, and deployment processes. - Architect and manage Infrastructure as Code (IaC) using tools such as Terraform or CloudFormation, enabling scalable and reproducible infrastructure management. - Conduct PoCs to evaluate new tools, technologies, and methodologies, assessing their potential impact on the platform and operations. - Monitor and enhance the performance, reliability, and scalability of systems, ensuring high availability across production and development environments. - Troubleshoot and resolve complex issues across infrastructure, deployments, and applications, implementing robust solutions to improve system stability. - Integrate security best practices into the architecture and deployment processes, ensuring compliance with industry standards and regulations. - Mentor team members on advanced DevOps practices and contribute to establishing a culture of continuous improvement and operational excellence. Requirements: - Fluency in English for daily communication with our client (please submit your resume in English). - Extensive experience as a DevOps Engineer with a focus on platform architecture and designing scalable infrastructure solutions. - Proficiency in building and optimizing CI/CD pipelines using tools like Jenkins, Azure DevOps, or CircleCI, with an emphasis on automation and efficiency. - Strong scripting and automation skills (Python, Bash, or similar), with the ability to create scalable solutions and streamline operations. - Hands-on experience with containerization and orchestration tools (e.g., Docker, Kubernetes), including production-grade deployments. - Deep knowledge of cloud platforms such as AWS, Azure, or Google Cloud, with expertise in infrastructure provisioning and management. - Strong understanding of Infrastructure as Code (IaC) principles and experience with relevant tools (Terraform, CloudFormation). - Experience in performing PoCs and assessing new tools and technologies to enhance infrastructure and operations. - Security-focused mindset with a track record of implementing best practices for securing cloud-based and on-premise environments. - Excellent communication skills, with the ability to clearly articulate technical concepts and collaborate effectively across teams. You will stand out if you have: - Relevant certifications such as Azure Certified DevOps Engineer, Certified Kubernetes Administrator (CKA), or other cloud and DevOps-related certifications. - Previous experience in roles involving software development or Site Reliability Engineering (SRE). - Proven experience with Azure, including designing and managing infrastructure, implementing DevOps practices, and leveraging Azure-specific services. - A history of designing or contributing to platform-level architecture and infrastructure strategies. If you like it, just apply and good luck! #LI-JM2 Our benefits: -Health and dental insurance -Meal and food allowance -Childcare assistance -Extended paternity leave -Partnership with gyms and health and wellness professionals via Wellhub (Gympass) TotalPass; -Profit Sharing and Results Participation (PLR); -Life insurance -Continuous learning platform (CI&T University); -Discount club -Free online platform dedicated to physical, mental, and overall well-being -Pregnancy and responsible parenting course -Partnerships with online learning platforms -Language learning platform And many more! More details about our benefits here: https://ciandt.com/br/pt-br/carreiras At CI&T, inclusion starts at the first contact. If you are a person with a disability, it is important to present your assessment during the selection process. See which data needs to be included in the report by clicking here.This way, we can ensure the support and accommodations that you deserve. If you do not yet have the assessment, don't worry: we can support you in obtaining it. We have a dedicated Health and Well-being team, inclusion specialists, and affinity groups who will be with you at every stage. Count on us to make this journey side by side.

Brazil
Upstart logo

Senior DevOps Engineer

Upstart

Our mission is to enable effortless credit based on true risk.

DevOps Engineer58 days ago
Full TimeRemoteTeam 1,001-5,000Since 2012H1B Sponsor

• Design and operate a fleet of Kubernetes (EKS) clusters across production, staging, and ephemeral environments, ensuring reliability and high availability • Evolve AWS infrastructure and network architecture (VPCs, subnets, IAM, account structure) to support scalable, multi-team workloads • Build and maintain infrastructure-as-code and GitOps workflows using tools such as Terraform, CDK, and ArgoCD • Improve platform reliability and performance by defining and driving SLOs, analyzing incidents, and implementing systemic fixes • Participate in and help improve the on-call rotation, leading incident response and post-incident reviews to drive systemic platform improvements • Partner with SRE, Delivery, InfoSec, and product/ML teams to land high-impact infrastructure changes and platform standards • Drive improvements in developer experience by simplifying platform usage, reducing toil, and enabling faster product and ML development • Contribute to cost efficiency initiatives by optimizing resource utilization across Kubernetes and cloud infrastructure

United States
$166.9K - $230.9K / year
EverOps logo

Senior DevOps Engineer

EverOps

The Embedded Service Provider

DevOps Engineer58 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

Overview Some of the world’s most innovative global enterprise software companies struggle to find engineering partners capable of matching their rigorous standards. These teams need a partner that can co-own complex problems from within their own development environment. Enter EverOps – the premier Embedded Service Provider. We partner directly with customer engineering teams to assess and address mission-critical delivery and infrastructure challenges. The Challenge EverOps is looking for a Senior DevOps Engineer with a deep mastery of enterprise cloud infrastructure and the ability to drive technical projects autonomously. You will act as a technical anchor, leveraging advanced engineering skills to migrate high-traffic workloads and restructure AWS environments to meet enterprise standards. The Mission As a Senior DevOps Engineer, you will join our U.S.-Based Virtual Operating Center, working within a dynamic team to manage and evolve production cloud environments. Your primary mission will involve the strategic containerization of legacy workloads and the architectural separation of accounts to improve security and scalability. You will be expected to lead by example—architecting solutions with Terraform + Atmos, implementing EKS best practices, and mentoring peers to ensure collective success. What You’ll Do - Workload Migration: Lead the transition of acquired ad exchange workloads from EC2 into a modern, containerized Amazon EKS architecture. - Account Restructuring: Execute AWS account separation, moving from shared environments into distinct, isolated accounts (Dev, Staging, Prod) with robust enterprise guardrails. - Infrastructure as Code: Design and maintain a DRY, component-driven infrastructure using Terraform and Atmos. - EKS Operations: Architect and operate multi-tenant Kubernetes platforms, focusing on namespace isolation, RBAC models, and cluster security. - GitOps & CI/CD: Implement and optimize deployments using ArgoCD and GitHub Actions for a seamless, automated SDLC. - Technical Mentorship: Act as a subject matter expert (SME) within your pod, guiding engineers on complex troubleshooting and architectural best practices. - Cloud-Native Security: Manage secrets and encryption using AWS Secrets Manager and KMS, ensuring secure cross-account access and Kubernetes integration. You Have - Experience: 5+ years of professional experience in DevOps, CloudOps, or SRE, specializing in high-scale Amazon EKS environments. - Infrastructure Frameworks: Advanced proficiency with Terraform, specifically utilizing Atmos (or similar wrappers) to manage hierarchical configurations across multi-account structures. - Migration Mastery: Proven track record of migrating complex workloads from EC2 to EKS with minimal downtime. - Multi-Account Governance: Deep experience with AWS Organizations and Landing Zone patterns to enforce environment isolation. - Enterprise Guardrails: Ability to implement security guardrails using IAM Permission Boundaries and SCPs. - Containerization: Expert knowledge of Docker and Kubernetes-native networking. - Coding: Proficiency in Golang, Python, or Bash for building custom automation and migration tooling. - AWS Security: Production experience with AWS Secrets Manager, KMS, and integration patterns like External Secrets Operator for EKS. - Observability: Experience implementing enterprise monitoring suites like Datadog, Prometheus, or Grafana. Extra Awesome - Progressive Delivery: Production experience with Argo Rollouts for Canary and Blue/Green deployments. - FinOps & Cost Governance: Experience with cost allocation, tagging strategies, and tools like KubeCost to manage spend during account migrations. - Policy as Code: Experience implementing OPA (Open Policy Agent) or Kyverno to enforce compliance within EKS. - Scale & Performance: Background in high-transaction industries (AdTech, Gaming, or Fintech) where platform stability is critical. - Platform Engineering: A mindset toward building internal developer platforms (IDPs) that allow for self-service within the new EKS accounts. - Certifications: AWS Certified Solutions Architect (Pro) or Certified Kubernetes Administrator (CKA). Benefits - 100% Remote Workplace: We’ve been remote since Day 1! - Unlimited Paid Time Off. - Equity: Become a true owner of the company. - 401K with company contribution and sponsored healthcare. - Professional Growth: Access to training and certification programs to accelerate your career.

United States
GT logo

DevOps Engineer

GT

GT provides clients with offshore product teams from CEE, a product development studio & data science services.

DevOps Engineer58 days ago
Full TimeRemoteTeam 51-200Since 2019H1B Sponsor

• Take ownership of rebuilding and optimizing the DevOps setup in a complex, high-load environment • Analyze existing Azure infrastructure, Terraform configurations, and Kubernetes environments • Reverse-engineer or redesign CI/CD pipelines currently managed by an external vendor • Define the best approach for rebuilding or replicating the existing setup • Provide recommendations around ownership, architecture, and transition strategy • Design and implement CI/CD pipelines (GitHub → Azure) • Automate infrastructure and deployment processes using Terraform and scripting • Ensure full ownership, documentation, and maintainability of pipelines • Manage and optimize Azure services and Kubernetes clusters • Ensure scalability, reliability, and performance of production systems • Apply best practices around infrastructure, deployment, and system stability • Work closely with backend engineers, system engineers, and integration teams • Provide documentation and knowledge transfer to support long-term in-house ownership • Support the transition towards a fully internal DevOps capability • Collaborate with stakeholders to meet deadlines • Participate in regular syncs and provide progress updates

Europe
Job Closed