No Priors Ep. 69 | With HeyGen CEO and Co-Founder Joshua Xu

Name: No Priors Ep. 69 | With HeyGen CEO and Co-Founder Joshua Xu
Uploaded: 2024-06-20T12:00:00Z
Duration: 27 min 26 s
Description: HeyGen CEO Joshua Xu explains how the company is building AI-generated avatars and video tools that aim to replace traditional cameras, making visual storytelling accessible to everyone.

No PriorsJun 20, 202427m

Narrator, Joshua Xu (guest), Elad Gil (host), Narrator, Sarah Guo (host), Elad Gil (host)

Origin of HeyGen and the vision of AI as a new cameraCore use cases: create, localize, and personalize video contentTechnical approach: modular A‑roll/B‑roll stack vs. end‑to‑end text‑to‑videoRoadmap: full‑body avatars and real‑time streaming experiencesBusiness applications and examples (McDonald’s, PepsiCo, corporate comms)Safety, deepfakes, and policies around political/election contentResearch challenges in video generation and aesthetics evaluation

In this episode of No Priors, featuring Narrator and Joshua Xu, No Priors Ep. 69 | With HeyGen CEO and Co-Founder Joshua Xu explores heyGen’s AI Avatars Aim To Replace Cameras And Personalize Video HeyGen CEO Joshua Xu explains how the company is building AI-generated avatars and video tools that aim to replace traditional cameras, making visual storytelling accessible to everyone.

HeyGen’s AI Avatars Aim To Replace Cameras And Personalize Video

HeyGen CEO Joshua Xu explains how the company is building AI-generated avatars and video tools that aim to replace traditional cameras, making visual storytelling accessible to everyone.

The product focuses on three main use cases—creating, localizing, and personalizing videos—serving thousands of mainstream businesses for training, marketing, internal communication, and large campaigns like McDonald’s and PepsiCo.

Xu contrasts HeyGen’s modular, controllable approach to video generation with end‑to‑end text‑to‑video models like Sora, arguing brands need quality, control, and consistency rather than purely generative novelty.

He also discusses upcoming full‑body and real‑time streaming avatars, the challenges of video model research, and HeyGen’s safety practices to mitigate deepfakes and political misuse.

Key Takeaways

AI avatars can remove major bottlenecks in business video production.

Recording executives or spokespeople is time‑consuming and costly; avatar-based A‑roll lets companies generate high-quality, on‑brand video from text, dramatically speeding up training, explainers, and marketing content creation.

Get the full analysis with uListen AI

Decomposing video into components gives brands more control than pure text‑to‑video.

HeyGen separates A‑roll (avatars) and B‑roll (voiceover, music, transitions, brand assets), orchestrating them instead of generating everything in one shot, preserving accuracy for logos, fonts, and visual identity while still leveraging generative models.

Get the full analysis with uListen AI

Localization and personalization at scale unlock entirely new use cases.

Beyond saving cost, customers now do things that were previously impossible, like McDonald’s consumer campaigns in many languages or PepsiCo sending 100,000 individualized thank‑you videos, each localized and personalized with names and details.

Get the full analysis with uListen AI

Future video will be dynamic and user-specific, not a single static file.

Xu argues generative video is a new format: instead of one immutable MP4 for everyone, video players could render tailored content in real time based on each viewer’s attributes, especially in advertising and education.

Get the full analysis with uListen AI

Full‑body and real‑time avatars depend on tighter multimodal model integration.

Achieving natural gestures and body motion synchronized with speech requires jointly training voice and video models, moving beyond today’s pipeline of separate TTS feeding into video, and leveraging multimodal architectures like those behind GPT‑4o.

Get the full analysis with uListen AI

Trust and safety must be built into avatar creation workflows from the start.

HeyGen bans political/election content, uses strong user verification (live consent, dynamic passcodes, human review), and embeds safeguards at each creation step to reduce deepfake and misuse risks while still enabling powerful avatar capabilities.

Get the full analysis with uListen AI

Evaluating video quality is as much about aesthetics as metrics.

Lower loss functions or higher resolution don’t necessarily produce videos users prefer; HeyGen relies heavily on in‑product A/B testing and customer behavior signals to iterate models, similar to lessons Xu learned working on Snap’s camera.

Get the full analysis with uListen AI

Notable Quotes

“We wanted to replace the camera because we think AI can create the content, and AI could become the new camera.”
— Joshua Xu

“Editing is not that expensive…but camera is super expensive.”
— Joshua Xu

“When we initially started HeyGen, we want to help the business solve the video creation problem. What is a business looking for? They're looking for quality, they're looking for control, they're looking for consistency.”
— Joshua Xu

“Generative image is still image, but generative video is not a video. It is a new format.”
— Joshua Xu

“Video generation is not only about solving a mathematical problem, it's actually about creating something the customer love and appreciate.”
— Joshua Xu

Questions Answered in This Episode

How will widespread access to realistic avatars change audience trust in video communication and marketing?

HeyGen CEO Joshua Xu explains how the company is building AI-generated avatars and video tools that aim to replace traditional cameras, making visual storytelling accessible to everyone.

Get the full analysis with uListen AI

What governance or industry standards are needed to distinguish legitimate avatar content from malicious deepfakes?

Get the full analysis with uListen AI

At what point does hyper‑personalized, real‑time video advertising cross the line into being uncomfortably invasive for users?

Get the full analysis with uListen AI

How might education systems be redesigned if personalized, on‑demand instructional videos become cheap and ubiquitous?

He also discusses upcoming full‑body and real‑time streaming avatars, the challenges of video model research, and HeyGen’s safety practices to mitigate deepfakes and political misuse.

Get the full analysis with uListen AI

Could the modular A‑roll/B‑roll approach limit creativity compared to fully generative video, or does it actually enable more experimentation for brands?

Get the full analysis with uListen AI

Transcript Preview

Narrator

Welcome, Joshua. We're so excited to have you here today. How are you?

Joshua Xu

Hey, Sarah. I'm so excited to be here. Thanks for having me today.

Narrator

It's our pleasure. Let's get started. Welcome to the Huberman Lab Podcast, where we discuss science and science-based tools for everyday life.

Elad Gil

Uh...

Narrator

I'm Sarah Guo, and I'm a professor of neurobiology and ophthalmology at the School of Medicine.

Elad Gil

Um, Sarah, I think you're...

Joshua Xu

Wait, Sarah, I'm so confused. What's going on here?

Elad Gil

Is this thing on?

Narrator

Today, we're here to discuss how AI can benefit your health, and what medicinal properties the technology holds.

Joshua Xu

Sarah, I'm so lost. Isn't this the No Priors podcast where you interview technology superstars like Garry Tan and Alexander Wang?

Narrator

No, that's only for humans. We're really excited to have you. Welcome, Joshua.

Joshua Xu

Yeah, excited to be here. Thank you for having me.

Elad Gil

So, let's start with a little bit of backstory. You started this company, HeyGen. It's had this amazing growth trajectory and is being used by millions of people now. Um, what's the story of starting the company?

Joshua Xu

Yeah, sure. So, uh, yeah. Hello, everyone. Uh, my name is Joshua. I'm co-founder and CEO of HeyGen. We founded the company roughly three and a half years ago, and before that I was working at Snapchat for about six and a half years there. I studied robotics at Carnegie Mellon and joined Snap back in 2014 there. I initially worked on machine learning in Snapchat ads, ads ranking and recommendation, then I spent my last two year at Snap working on AI cameras. So, you know, Snap leveraged a lot of AI technology to enhance the camera experience. If you look at, you know, 2018 Snapchat released a baby filter and Disney-style filter. That was the first time I saw a computer can actually create and generate something that does not exist in the world. I was just so fascinated by the technology back then, and I had a feeling that that would potentially change the way how people create the content. So, you know, Snapchat is a camera company, and everybody create the content through the mobile camera. But we wanted to replace the camera because we think AI can create the content, and AI could become the new camera. And that's how we get started with HeyGen, and our mission is to making visual storytelling accessible to all.

Elad Gil

I love it. The greatest minds of our generation, you know, inspired by, uh, you know, "Your face is a cute kitten," or, or whatever. Um, uh, what does replacing the camera mean to you? Like, why do we need to do that? I use my camera a lot.

Joshua Xu

I kind of grew up my career in the whole, you know, mobile, uh, camera space where we worked on a lot of the software and technology to enable people to feeling comfortable and making it easier for people to create content through the mobile camera. But, you know, there's still lots of people are not able to create good content using the camera today, and we felt that if we can replace the camera, that means we can remove the barrier for visual storytelling, for visual content creation, and that will help us to step ahead in terms of the whole content creation space.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome