Technical Director, Media & Entertainment AI Infrastructure
Location
California + 1 moreAll locations: California | New York
Posted
124 days ago
Salary
$295K - $365K / year
Seniority
Lead
Job Description
Technical Director, Media & Entertainment AI Infrastructure
Nebius Group
• Own the Technical Blueprint: Personally architect the infrastructure solutions for our most strategic M&E partnerships, studio-scale content production pipelines, agency data consolidation plays, generative AI model deployments. • The Physics to P&L Narrative: Fluently demonstrate to executive stakeholders how infrastructure decisions, data lake locality, storage tiering, inference optimization, directly impact their business model and operability. • Deconstruct the Bottleneck: Go beyond the stated problem to find the technical truth. Translate vague business goals (e.g., “We need lower rendering costs”) into precise engineering requirements (e.g., “We need to optimize the inference batch size on L40s to reduce cost-per-token by 30%”). • Map the Transition: Identify exactly where a customer sits on the curve from legacy service bureau to AI-native tech platform and prescribe the specific infrastructure intervention needed to move them forward. • Build and Validate the Integration Layer: Identify, engage, and technically validate relationships with the most critical ISVs in the media and entertainment landscape, from rendering and VFX toolchains to generative AI platforms. • Define the Standard: It is not enough to support these tools. You will define the reference architectures for how they run best on Nebius infrastructure, and work directly with ISV engineering teams to build and publish those standards. • Decide What’s Worth Doing: In partnership with the GM, evaluate ISV and partner opportunities on their technical merit and strategic leverage, and be equally rigorous about what not to pursue. • Shape the M&E Roadmap: Use forensic evidence from the field to prioritize and justify the M&E vertical roadmap. You will work directly with Nebius’s global Head of Product and Head of Engineering to translate partner and customer needs into product direction. • Lead the M&E Product Summit: Chair a quarterly summit with Core Engineering leadership, using field evidence to drive roadmap decisions and maintain vertical momentum.
Job Requirements
- 12+ years of experience in cloud infrastructure, platform engineering, distributed systems, or a closely related technical domain.
- Executive Presence: Capable of commanding a room of engineers and presenting a layered technical roadmap to a C-Suite. You have operated at the top-to-top level, your counterparts are CTOs and VPs of Engineering.
- Builder Mentality: This is an IC role. You build things. You are not here to manage a team or delegate to an implementation function, you are here to architect and ship solutions alongside partners, and to build the assets (reference architectures, integration playbooks, technical frameworks) that make Nebius’s M&E infrastructure strategy defensible and scalable.
- Product-Minded: Experience defining a platform strategy, not just executing tickets. You are comfortable telling a customer “No” when a request creates technical debt, and proposing a better alternative.
- Ambiguity Tolerance: You thrive in environments where requirements are evolving. You do not wait for a roadmap; you build it.
- Forensic Mindset: You are not satisfied with surface-level answers. You dig into the kernel, the logs, and the P&L to find the truth.
- Mastery of the Stack: Expert-level, production-grade knowledge of GPU architectures (H100, L40s), Kubernetes orchestration including Soperator, high-performance and parallel file systems (e.g., Lustre, WEKA), data lake architecture, and networking constraints (InfiniBand/Ethernet).
- Inference Optimization: You understand the nuances of model serving, batch sizes, quantization, KV caching, latency tradeoffs, and can architect solutions for both massive throughput and real-time (sub-50ms) demands.
Benefits
- Health Insurance: 100% company-paid medical, dental, and vision coverage for employees and families.
- 401(k) Plan: Up to 4% company match with immediate vesting.
- Parental Leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers.
- Remote Work Reimbursement: Up to $85/month for mobile and internet.
- Disability & Life Insurance: Company-paid short-term, long-term, and life insurance coverage.
Related Guides
Related Job Pages
More LLM Engineer Jobs
• Develop performant, scalable, and high quality APIs and backend processes for Ostro's SaaS platform, with a strong emphasis on LLM integration. • Collaborate with cross-functional teams to implement new features and refine existing ones, particularly those involving AI/LLM capabilities. • Provide feedback on roadmap and features for your team, contributing to the strategic direction of Ostro’s AI/LLM initiatives. • Ensure code quality and compliance through thorough reviews, unit testing, and adherence to best practices for LLM-powered applications and Ostro engineering. • Optimize application performance and scalability to meet user demands, especially for LLM inference and data processing. • Stay informed about emerging AI/LLM technologies, prompt engineering techniques, and industry trends. • Troubleshoot and resolve production issues, ensuring performance, reliability, and scalability of LLM-driven features.
• Build and improve agentic workflows (tool/function calling, planning, self-checks) for analytics, summaries, visualizations, and task automation. • Implement adapters and tools to connect LLMs with internal and external services. • Contribute to our FastAPI backend with clean interfaces, Pydantic validation, and tests. • Develop evaluation metrics to measure accuracy, latency, and cost. • Optimize prompts, retrieval/contexting, and execution strategies for privacy, reliability, and performance. • Ship services in containers (Docker) and collaborate on deployments (Kubernetes), CI, and observability. • Document technical decisions and share learnings with the team.
• Develop infrastructure software and tools for large-scale pre-training, post-training, and inference. • Develop and optimize tools and libraries to improve infrastructure efficiency and resiliency. • Co-design and implement APIs for integration with NVIDIA's resiliency stacks. • Enhance infrastructure and products underpinning NVIDIA's AI platforms. • Define meaningful and actionable reliability metrics to track and improve system and service reliability. • Skilled in problem-solving, root cause analysis, and optimization. • Root cause and analyze and triage failures from the application level to the hardware level.
Conversational AI Engineer
ZillowZillow is a leading online real estate marketplace covering the whole spectrum of purchasing, owning, and selling a home. In support of flexible work options an
• Design, build, and deploy intelligent chat agents and automated workflows to resolve common customer and frontline issues. • Integrate core systems (such as Salesforce) with AI tools to create a unified, compliant user experience. • Develop and optimize prompts to ensure the AI delivers accurate, relevant answers and help content. • Evaluate, onboard, and manage AI/ML tools and emerging technologies to enhance system performance. • Implement safeguards and monitoring to maintain accuracy, prevent misinformation, and build user trust. • Collaborate with Product, Engineering, QA, Content, and Analytics teams to embed conversational AI into business strategy and track performance. • Apply machine learning and large language models to improve natural language understanding and generation in our chat agents.




