cover image
The Walt Disney Company

Lead Software Engineer - AI Operations and Tooling

On site

San francisco, United states

Senior

Full Time

29-12-2025

Share this job:

Skills

Communication Leadership Creativity Python Java Go Incident Response DevOps Monitoring Test Test Automation Architecture Regression Organization Azure AWS Cost Management GCP OpenAI Langchain Quality Testing Prometheus Grafana

Job Specifications

Lead Software Engineer - AI Operations and Tooling
Disney Entertainment and ESPN Product & Technology
Technology is at the heart of Disney's past, present, and future. Disney Entertainment and ESPN Product & Technology is a global organization of engineers, product developers, designers, technologists, data scientists, and more - all working to build and advance the technological backbone for Disney's media business globally.
The team marries technology with creativity to build world-class products, enhance storytelling, and drive velocity, innovation, and scalability for our businesses. We are storytellers and innovators, creators and builders, entertainers and engineers. We work with every part of The Walt Disney Company's media portfolio to advance the technological foundation and consumer media touch points serving millions of people around the world.
Here are a few reasons why we think you'd love working here:
Building the future of Disney's media: Our technologists are designing and building the products and platforms that will power our media, advertising, and distribution businesses for years to come.
Reach, Scale & Impact: More than ever, Disney's technology and products serve as a signature doorway for fans' connections with the company's brands and stories. Disney+, Hulu, ESPN, ABC, ABC News, and many more - these products and brands matter to millions of people globally.
Innovation: We develop and implement groundbreaking products and techniques that shape industry norms and solve complex and distinctive technical problems.
Ad Platforms is responsible for Disney's industry-leading ad technology and products - driving advertising performance, innovation, and value in Disney's sports, news, and entertainment content across all media platforms.
Job Summary:
We are hiring a Lead Engineer to establish and guide our AI Operations and Tooling practice, enabling the safe, reliable, and cost-efficient operation of AI applications across AWS, Azure, and GCP. This role focuses on AI-specific operations, such as hallucination testing, A/B evaluation, guardrail enforcement, and cost optimization, by leveraging, extending, and building around existing tools and platforms to accelerate operational stability and performance.
As a hands-on technical lead, you will mentor engineers, design operational enablement frameworks, and partner closely with AI engineering and product teams. The goal is not to own every tool, but to make AI systems more observable, testable, and resilient by enabling the right capabilities and automation around them. This role will deliver measurable business outcomes by preventing runaway spend, improving reliability, and driving efficiency in AI/cloud usage.
Responsibilities and Duties of the Role:
Operational Architecture & Enablement
Define frameworks for AI-specific operations: hallucination/quality testing, evaluation pipelines, and continuous validation.
Establish reference patterns for scaling LLM services, prompt orchestration, and multi-agent workloads.
Build automation for safe rollout, monitoring, and incident response.
Observability, Reliability & Cost Management
Implement end-to-end observability: latency, drift, failure modes, hallucination rates, and GPU/compute utilization.
Drive cost optimization and efficiency across AI cloud usage (AWS, Azure, GCP).
Define SLOs, dashboards, and runbooks for AI/LLM production systems.
Governance, Guardrails & Security
Embed compliance, safety checks, and prompt-injection defenses into operational frameworks.
Partner with security and governance teams to enforce enterprise-grade auditability and policy enforcement.
Leadership & Cross-Team Collaboration
Mentor engineers in DevOps, infrastructure, and AI operations.
Drive adoption of best practices for AI reliability, test automation, and incident management.
Collaborate across AI Core, Data Foundations, Security, and Product teams to ensure operational safety and scale.
Basic Qualifications
Bachelor's degree in Computer Science, Engineering, or related technical field (Master's preferred), or equivalent experience.
7+ years of experience in software engineering, DevOps, or infrastructure, with at least 2 years in a lead role.
Expert in at least one foundational language (Python, Java, or Go) with production-grade system experience.
Hands-on experience with cloud-native infrastructure (AWS preferred; Azure/GCP a plus) and modern orchestration platforms.
Proven experience with observability stacks (Datadog, Prometheus, Grafana) and incident response automation.
Familiarity with AI/LLM APIs (OpenAI, Anthropic, Bedrock, Azure AI Foundry) and orchestration frameworks (LangChain, LangGraph).
Strong knowledge of operational AI testing (A/B evaluation, regression, red-teaming) and guardrail enforcement.
Demonstrated ability to optimize cloud/GPU usage and manage costs at scale.
Excellent communication skills and proven ability to lead design reviews, mentor eng

About the Company

The Walt Disney Company, together with its subsidiaries and affiliates, is a leading diversified international family entertainment and media enterprise that includes three core business segments: Disney Entertainment, ESPN, and Disney Experiences. Our mission is to entertain, inform and inspire people around the globe through the power of unparalleled storytelling, reflecting the iconic brands, creative minds and innovative technologies that make us the world's premier entertainment company. Know more