No Priors Ep. 69 | With HeyGen CEO and Co-Founder Joshua Xu

AI video generation models still have a long way to go when it comes to making compelling and complex videos but the HeyGen team are well on their way to streamlining the video creation process by using a combination of language, video, and voice models to create videos featuring personalized avatars, b-roll, and dialogue. This week on No Priors, Joshua Xu the co-founder and CEO of HeyGen, joins Sarah and Elad to discuss how the HeyGen team broke down the elements of a video and built or found models to use for each one, the commercial applications for these AI videos, and how they’re safeguarding against deep fakes. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil l @joshua_xu_ Show Notes: 0:00 Introduction 3:08 Applications of AI content creation 5:49 Best use cases for Hey Gen 7:34 Building for quality in AI video generation 11:17 The models powering HeyGen 14:49 Research approach 16:39 Safeguarding against deep fakes 18:31 How AI video generation will change video creation 24:02 Challenges in building the model 26:29 HeyGen team and company

Joshua XuguestElad GilhostSarah Guohost

Jun 19, 202427mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

HeyGen’s AI Avatars Aim To Replace Cameras And Personalize Video

HeyGen CEO Joshua Xu explains how the company is building AI-generated avatars and video tools that aim to replace traditional cameras, making visual storytelling accessible to everyone.
The product focuses on three main use cases—creating, localizing, and personalizing videos—serving thousands of mainstream businesses for training, marketing, internal communication, and large campaigns like McDonald’s and PepsiCo.
Xu contrasts HeyGen’s modular, controllable approach to video generation with end‑to‑end text‑to‑video models like Sora, arguing brands need quality, control, and consistency rather than purely generative novelty.
He also discusses upcoming full‑body and real‑time streaming avatars, the challenges of video model research, and HeyGen’s safety practices to mitigate deepfakes and political misuse.

IDEAS WORTH REMEMBERING

5 ideas

AI avatars can remove major bottlenecks in business video production.

Recording executives or spokespeople is time‑consuming and costly; avatar-based A‑roll lets companies generate high-quality, on‑brand video from text, dramatically speeding up training, explainers, and marketing content creation.

Decomposing video into components gives brands more control than pure text‑to‑video.

HeyGen separates A‑roll (avatars) and B‑roll (voiceover, music, transitions, brand assets), orchestrating them instead of generating everything in one shot, preserving accuracy for logos, fonts, and visual identity while still leveraging generative models.

Localization and personalization at scale unlock entirely new use cases.

Beyond saving cost, customers now do things that were previously impossible, like McDonald’s consumer campaigns in many languages or PepsiCo sending 100,000 individualized thank‑you videos, each localized and personalized with names and details.

Future video will be dynamic and user-specific, not a single static file.

Xu argues generative video is a new format: instead of one immutable MP4 for everyone, video players could render tailored content in real time based on each viewer’s attributes, especially in advertising and education.

Full‑body and real‑time avatars depend on tighter multimodal model integration.

Achieving natural gestures and body motion synchronized with speech requires jointly training voice and video models, moving beyond today’s pipeline of separate TTS feeding into video, and leveraging multimodal architectures like those behind GPT‑4o.

WORDS WORTH SAVING

5 quotes

We wanted to replace the camera because we think AI can create the content, and AI could become the new camera.

— Joshua Xu

Editing is not that expensive…but camera is super expensive.

— Joshua Xu

When we initially started HeyGen, we want to help the business solve the video creation problem. What is a business looking for? They're looking for quality, they're looking for control, they're looking for consistency.

— Joshua Xu

Generative image is still image, but generative video is not a video. It is a new format.

— Joshua Xu

Video generation is not only about solving a mathematical problem, it's actually about creating something the customer love and appreciate.

— Joshua Xu

Origin of HeyGen and the vision of AI as a new cameraCore use cases: create, localize, and personalize video contentTechnical approach: modular A‑roll/B‑roll stack vs. end‑to‑end text‑to‑videoRoadmap: full‑body avatars and real‑time streaming experiencesBusiness applications and examples (McDonald’s, PepsiCo, corporate comms)Safety, deepfakes, and policies around political/election contentResearch challenges in video generation and aesthetics evaluation

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.