
No Priors Ep. 69 | With HeyGen CEO and Co-Founder Joshua Xu
Narrator, Joshua Xu (guest), Elad Gil (host), Narrator, Sarah Guo (host), Elad Gil (host)
In this episode of No Priors, featuring Narrator and Joshua Xu, No Priors Ep. 69 | With HeyGen CEO and Co-Founder Joshua Xu explores heyGen’s AI Avatars Aim To Replace Cameras And Personalize Video HeyGen CEO Joshua Xu explains how the company is building AI-generated avatars and video tools that aim to replace traditional cameras, making visual storytelling accessible to everyone.
HeyGen’s AI Avatars Aim To Replace Cameras And Personalize Video
HeyGen CEO Joshua Xu explains how the company is building AI-generated avatars and video tools that aim to replace traditional cameras, making visual storytelling accessible to everyone.
The product focuses on three main use cases—creating, localizing, and personalizing videos—serving thousands of mainstream businesses for training, marketing, internal communication, and large campaigns like McDonald’s and PepsiCo.
Xu contrasts HeyGen’s modular, controllable approach to video generation with end‑to‑end text‑to‑video models like Sora, arguing brands need quality, control, and consistency rather than purely generative novelty.
He also discusses upcoming full‑body and real‑time streaming avatars, the challenges of video model research, and HeyGen’s safety practices to mitigate deepfakes and political misuse.
Key Takeaways
AI avatars can remove major bottlenecks in business video production.
Recording executives or spokespeople is time‑consuming and costly; avatar-based A‑roll lets companies generate high-quality, on‑brand video from text, dramatically speeding up training, explainers, and marketing content creation.
Get the full analysis with uListen AI
Decomposing video into components gives brands more control than pure text‑to‑video.
HeyGen separates A‑roll (avatars) and B‑roll (voiceover, music, transitions, brand assets), orchestrating them instead of generating everything in one shot, preserving accuracy for logos, fonts, and visual identity while still leveraging generative models.
Get the full analysis with uListen AI
Localization and personalization at scale unlock entirely new use cases.
Beyond saving cost, customers now do things that were previously impossible, like McDonald’s consumer campaigns in many languages or PepsiCo sending 100,000 individualized thank‑you videos, each localized and personalized with names and details.
Get the full analysis with uListen AI
Future video will be dynamic and user-specific, not a single static file.
Xu argues generative video is a new format: instead of one immutable MP4 for everyone, video players could render tailored content in real time based on each viewer’s attributes, especially in advertising and education.
Get the full analysis with uListen AI
Full‑body and real‑time avatars depend on tighter multimodal model integration.
Achieving natural gestures and body motion synchronized with speech requires jointly training voice and video models, moving beyond today’s pipeline of separate TTS feeding into video, and leveraging multimodal architectures like those behind GPT‑4o.
Get the full analysis with uListen AI
Trust and safety must be built into avatar creation workflows from the start.
HeyGen bans political/election content, uses strong user verification (live consent, dynamic passcodes, human review), and embeds safeguards at each creation step to reduce deepfake and misuse risks while still enabling powerful avatar capabilities.
Get the full analysis with uListen AI
Evaluating video quality is as much about aesthetics as metrics.
Lower loss functions or higher resolution don’t necessarily produce videos users prefer; HeyGen relies heavily on in‑product A/B testing and customer behavior signals to iterate models, similar to lessons Xu learned working on Snap’s camera.
Get the full analysis with uListen AI
Notable Quotes
“We wanted to replace the camera because we think AI can create the content, and AI could become the new camera.”
— Joshua Xu
“Editing is not that expensive…but camera is super expensive.”
— Joshua Xu
“When we initially started HeyGen, we want to help the business solve the video creation problem. What is a business looking for? They're looking for quality, they're looking for control, they're looking for consistency.”
— Joshua Xu
“Generative image is still image, but generative video is not a video. It is a new format.”
— Joshua Xu
“Video generation is not only about solving a mathematical problem, it's actually about creating something the customer love and appreciate.”
— Joshua Xu
Questions Answered in This Episode
How will widespread access to realistic avatars change audience trust in video communication and marketing?
HeyGen CEO Joshua Xu explains how the company is building AI-generated avatars and video tools that aim to replace traditional cameras, making visual storytelling accessible to everyone.
Get the full analysis with uListen AI
What governance or industry standards are needed to distinguish legitimate avatar content from malicious deepfakes?
The product focuses on three main use cases—creating, localizing, and personalizing videos—serving thousands of mainstream businesses for training, marketing, internal communication, and large campaigns like McDonald’s and PepsiCo.
Get the full analysis with uListen AI
At what point does hyper‑personalized, real‑time video advertising cross the line into being uncomfortably invasive for users?
Xu contrasts HeyGen’s modular, controllable approach to video generation with end‑to‑end text‑to‑video models like Sora, arguing brands need quality, control, and consistency rather than purely generative novelty.
Get the full analysis with uListen AI
How might education systems be redesigned if personalized, on‑demand instructional videos become cheap and ubiquitous?
He also discusses upcoming full‑body and real‑time streaming avatars, the challenges of video model research, and HeyGen’s safety practices to mitigate deepfakes and political misuse.
Get the full analysis with uListen AI
Could the modular A‑roll/B‑roll approach limit creativity compared to fully generative video, or does it actually enable more experimentation for brands?
Get the full analysis with uListen AI
Transcript Preview
Welcome, Joshua. We're so excited to have you here today. How are you?
Hey, Sarah. I'm so excited to be here. Thanks for having me today.
It's our pleasure. Let's get started. Welcome to the Huberman Lab Podcast, where we discuss science and science-based tools for everyday life.
Uh...
I'm Sarah Guo, and I'm a professor of neurobiology and ophthalmology at the School of Medicine.
Um, Sarah, I think you're...
Wait, Sarah, I'm so confused. What's going on here?
Is this thing on?
Today, we're here to discuss how AI can benefit your health, and what medicinal properties the technology holds.
Sarah, I'm so lost. Isn't this the No Priors podcast where you interview technology superstars like Garry Tan and Alexander Wang?
No, that's only for humans. We're really excited to have you. Welcome, Joshua.
Yeah, excited to be here. Thank you for having me.
So, let's start with a little bit of backstory. You started this company, HeyGen. It's had this amazing growth trajectory and is being used by millions of people now. Um, what's the story of starting the company?
Yeah, sure. So, uh, yeah. Hello, everyone. Uh, my name is Joshua. I'm co-founder and CEO of HeyGen. We founded the company roughly three and a half years ago, and before that I was working at Snapchat for about six and a half years there. I studied robotics at Carnegie Mellon and joined Snap back in 2014 there. I initially worked on machine learning in Snapchat ads, ads ranking and recommendation, then I spent my last two year at Snap working on AI cameras. So, you know, Snap leveraged a lot of AI technology to enhance the camera experience. If you look at, you know, 2018 Snapchat released a baby filter and Disney-style filter. That was the first time I saw a computer can actually create and generate something that does not exist in the world. I was just so fascinated by the technology back then, and I had a feeling that that would potentially change the way how people create the content. So, you know, Snapchat is a camera company, and everybody create the content through the mobile camera. But we wanted to replace the camera because we think AI can create the content, and AI could become the new camera. And that's how we get started with HeyGen, and our mission is to making visual storytelling accessible to all.
I love it. The greatest minds of our generation, you know, inspired by, uh, you know, "Your face is a cute kitten," or, or whatever. Um, uh, what does replacing the camera mean to you? Like, why do we need to do that? I use my camera a lot.
I kind of grew up my career in the whole, you know, mobile, uh, camera space where we worked on a lot of the software and technology to enable people to feeling comfortable and making it easier for people to create content through the mobile camera. But, you know, there's still lots of people are not able to create good content using the camera today, and we felt that if we can replace the camera, that means we can remove the barrier for visual storytelling, for visual content creation, and that will help us to step ahead in terms of the whole content creation space.
Install uListen to search the full transcript and get AI-powered insights
Get Full TranscriptGet more from every podcast
AI summaries, searchable transcripts, and fact-checking. Free forever.
Add to Chrome