2K develops interactive entertainment for handheld gaming systems, console systems, personal computers, and mobile devices. The gaming and entertainment company
Senior Site Reliability Engineer
Location
Texas
Posted
1 day ago
Salary
0
Seniority
Senior
Job Description
Senior Site Reliability Engineer
2K
Title: Senior Site Reliability Engineer Location: Austin, Texas, United States Job Description: Who We Are At 2K, we create some of the most iconic and culture-shaping video games in entertainment, including NBA® 2K, one of the top-selling franchises in the world, and legendary titles like BioShock®, Borderlands®, Mafia, Sid Meier’s Civilization®, and XCOM®, as well as fan favorites WWE® 2K, TopSpin®, and PGA TOUR® 2K. We build unforgettable experiences by pushing the boundaries of creativity, authenticity and innovation across every genre. Our portfolio is brought to life by some of the most influential game development studios in the world. Visual Concepts, Firaxis Games, Hangar 13, Cat Daddy Games, 31st Union, Cloud Chamber, Gearbox, HB Studios, and 2K SportsLab create world-class experiences across platforms. But what truly powers 2K is our people. We believe the best ideas come from teams that feel empowered, supported, and inspired. As an equal opportunity employer, we are committed to fostering a diverse, inclusive workplace where people are encouraged to come as they are and do their best work. The Team The 2K SRE team owns the infrastructure behind every player connection—All 2K game services, account platforms, CI/CD pipelines, and developer tooling spanning AWS, GCP, and on-premises data centers across multiple global regions. Global launch windows and live-service events push systems to their limits, and this team is expected to hold the line. Post-mortems here focus on systems, not people. Automation is the default answer to repetitive work. The infrastructure keeps millions of players connected, and the team takes that seriously! The Role The Senior SRE at 2K is a hands-on technical leader—shaping production infrastructure across multiple clouds and regions while partnering with network engineers, systems architects, and game studio developers. This is an ownership role: driving technical direction, influencing reliability from architecture review through production operation, and closing the gap between what engineering ships and what players experience. What You'll Do Platform & Infrastructure - Design, build, and operate scalable multi-cloud and hybrid infrastructure using Terraform, Pulumi, and GitOps workflows (ArgoCD, Flux). - Own Kubernetes platforms (EKS, GKE) end-to-end cluster lifecycle, multi-tenancy, networking (Istio, Cilium), and autoscaling. - Push progressive delivery patterns (blue/green, canary) across game service deployments. Observability & Reliability - Build and run the full observability stack: Prometheus + Grafana + Datadog. - Define SLI/SLO/error budget policies and build alerting that cuts through the noise. - Lead chaos engineering exercises to surface failure modes before players encounter them. - Drive incident response and post-mortems with a focus on systemic fixes and real follow-through. Automation, Security & Developer Experience - Eliminate toil through self-service provisioning, automated remediation, and intelligent scaling. - Harden CI/CD pipelines (GitHub Actions, Jenkins, ArgoCD). - Embed security at the platform layer through secrets management (PasswordState, 1Password, and AWS Secrets Manager) and policy-as-code (OPA/Gatekeeper). Leadership - Promote SRE practices across 2K studios through reliability reviews, runbooks, and embedded collaboration. - Shape architectural decisions and author engineering RFCs that move the platform forward. Required Qualifications - Experience: 5+ years in SRE, Platform Engineering, or equivalent infrastructure work at production scale. - Kubernetes: Deep experience in cloud environments (EKS or GKE preferred), including networking, storage, and multi-cluster patterns. - Infrastructure as Code (IaC): Strong proficiency with Terraform and/or Pulumi; hands-on with Helm, Terragrunt, and GitOps tooling (ArgoCD or GitHub Actions). - Environments: Experience with modern and legacy tech, including AWS, GCP, VMware, and Bare metal servers. - Configuration Management: Server configuration using Ansible, Puppet, and AWS Systems Manager. - Observability: Experience with Datadog, Prometheus + Grafana, and OpenTelemetry; fluency in operationalizing SLI/SLO/error budgets inside engineering teams. - Software Engineering: Production-quality code in Go, Python, or TypeScript for tools, automation, and internal libraries. - Systems & Networking: Solid understanding of Linux internals, TCP/IP networking, DNS, and TLS proven enough to debug at the system level.Incident Management: Incident response and post-mortem leadership with a track record of systemic follow-through. Preferred Qualifications - Live-service game or large-scale consumer internet experience dealing with millions of concurrent users. - Deep knowledge of Service mesh (Istio, Cilium) and advanced Kubernetes networking. - Experience with FinOps and managing resources efficiently at cloud scale. - Experience with AI and Agentic Development. - Cloud certifications (AWS Solutions Architect, GCP Professional Cloud Architect, CKA/CKS, or equivalent). - Experience mentoring SREs or leading reliability working groups. As an equal opportunity employer, we are committed to ensuring that qualified individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform their essential job functions, and to receive other benefits and privileges of employment. Please contact us if you need reasonable accommodation. Please note that 2K Games and its studios never uses instant messaging apps or personal email accounts to contact prospective employees or conduct interviews and when emailing, only use 2K.com accounts. #LI-Hybrid
Related Guides
Related Categories
Related Job Pages
More Engineer Jobs
• Capture client needs and requirements • Design solution architectures • Execute deployment and management strategies for customer projects • Assist with presales efforts and scoping services engagements • Present to prospective clients
Principal Engineer Team Lead, AV Behavior
MotionalWe're making driverless vehicles a safe, reliable, and accessible reality.
Role Description The Systems Readiness and Performance team is the crucial bridge between software development and real-world deployment. We are responsible for driving system design, verifying and validating the autonomy stack, and defining, measuring, and validating system performance targets. We work closely with stakeholders in autonomy, infrastructure, and operations to build the definitive safety case for the commercial launch of our fully driverless IONIQ 5 robotaxis in Las Vegas later this year. We are looking for a passionate, self-starting engineering leader to lead the AV Behavior Understanding and Evaluation team within the SRP organization. This team defines the standards for safe, comfortable, and assertive autonomous driving. We bridge the gap between systems engineering and real-world performance by decomposing the Operational Design Domain (ODD) into actionable competencies. The team owns the behavior definition, actionable driving policies backed by tests, and acceptance criteria derived from standards and human benchmarks. The team works closely with the autonomy teams on both target setting and validation. We are seeking a hands-on Principal Engineer Team Lead to drive the design, evaluation, and scaling of these systems through data-driven design backed by structured and large-scale assessment in simulation and on the road. You will lead a high-performing team to transform complex data into measurable progress toward a safer, smarter transportation future. What You'll Do - Design the System Behavior: - Decompose ODD requirements into system and subsystem behaviors and interfaces to ensure vehicle capabilities meet product requirements. - Characterize the performance target against parameters used to define AV capabilities to provide nuance in performance evaluation. - Develop actionable policies and measurable targets: - Convert legal and industry policy narratives to mathematical and/or specific requirements that the autonomy teams can implement. - Leverage human benchmark and industry standards to establish measurable acceptance criteria for system readiness. - Drive Evaluation Strategy: - Advance our performance assessment framework by defining key metrics and reward functions. - Derive Launch Criteria and Driving policy and strategically plan and execute testing campaigns leveraging advanced structured and large scale simulation, closed-course tracks, and on-road data. - Lead and Scale: - Mentor and grow a high-performing engineering team. - Foster a culture of technical excellence, scientific rigor, and data-driven decision-making. - Cross-Functional Collaboration: - Partner with Autonomy, Infrastructure, and the other Systems teams to drive design improvements. - Ensure features generate a net positive effect on system safety, comfort, and assertiveness and meet our launch criteria. - Data exploration: - Leverage data science techniques and existing infrastructure to understand the problem space and engage with development teams to drive resolution in close collaboration with System Performance. - Incorporate Bleeding Edge technology: - Leverage a combination of learning methods and metrics along with classical metrics to stretch the horizon of performance evaluation. Qualifications - Bachelor's degree and 7+ years of experience in Systems Engineering, Autonomous Systems, or Robotics, with deep expertise in one and working knowledge of the others. - Or Master’s degree/PhD and 5+ years of experience. - 3+ years of experience leading and scaling data-driven evaluation programs for autonomous driving or safety-critical emerging technologies. - A hands-on technical leader with a proven track record of growing small, high-performing teams. - Demonstrated track record in decomposing ODD requirements into vehicle capabilities, driving behavior logic, and defining measurable acceptance criteria. - Expert at leveraging large-scale simulation, track, and road data to extract actionable technical insights that drive resolution with development teams. - Proven track record of building consensus and driving alignment across multidisciplinary teams—spanning Autonomy, Infrastructure, and Operations—by leveraging technical influence over organizational authority. - Strong foundation in statistical methodology, Data Science, and scientific rigor. - Proficiency in Python, SQL, and data visualization techniques. - Excellent communication skills; capable of articulating complex concepts to technical peers and executive leadership alike. - Self-starter with a proactive mindset for driving ambiguous projects to completion. Ideally, You Also Have - Master’s degree in Systems Engineering, Autonomous Systems, or Robotics. - Familiarity with SOTIF, ISO 26262, UL4600, AVSC Behavior competencies and other industry standards. - Experience shaping product roadmaps. - Experience establishing and leveraging industry benchmarks as well as human naturalistic driving behaviors to improve autonomous driving. - Familiarity with Reinforcement Learning (RL). - Experience collaborating within agile environments and global, distributed teams. - Experience with Industry standards and establishing human benchmarks. - Specialized expertise in creating expansive datasets. - Experience collaborating in an agile development environment. Benefits - Medical, dental, vision insurance. - 401k with a company match. - Health saving accounts. - Life insurance. - Pet insurance. - And more. Salary Range $200,000 — $275,000 USD
Trackside Tyre Engineer
RaceOnFrom Pit to Podium: Elevate Your Motorsport Team with Premium Services and Headhunting Expertise! 🚀
• Support all race, test, simulator and rig activities • Work in conjunction with other engineers and drivers to optimize performance on track • Provide pre & post event simulation, analysis and key performance metrics of tyre performance in race and test events • Assist in developing tyre physics understanding and in conjunction with tyre modelling engineers to support vehicle performance analysis and development • Ensure ongoing validation and correlation of the tyre models to track and rig data to ensure robust validation of simulation results • Research and use appropriate technologies to ensure the continuous development of our models to meet the engineering requirements of the Team • Interact across the entire Team to provide methodologies, tools and solutions to support objective technical decision making
• Support mission requirements for a structured approach to further develop, integrate, and sustain a scalable, federated data ecosystem. • Design, implement, and manage Identity and Access Management (IAM) solutions, ensuring secure authentication and access control across cloud and on-premises environments. • Enforce Zero Trust Architecture (ZTA) principles and role-based access control (RBAC) policies to protect mission-critical systems. • Integrate IAM solutions with Microsoft Entra ID (formerly Azure Active Directory), Public Key Infrastructure (PKI), and Common Access Card (CAC) authentication mechanisms. • Oversee IAM automation, ensuring streamlined provisioning and de-provisioning of user roles and permissions. • Submit the Identity & Access Management Compliance Report, detailing system configurations, access logs, and compliance status.




