No Priors Ep. 12 | With Noam Shazeer

No PriorsApr 25, 202335m

Elad Gil (host), Noam Shazeer (guest), Sarah Guo (host), Sarah Guo (host), Sarah Guo (host)

Noam Shazeer’s background at Google, Google Brain, and early AI workTechnical shift from recurrent neural networks to transformer/attention architecturesScaling laws in language models: compute, data, algorithms, and limitsOrigins of Google’s Mina/LaMDA chatbot and why it wasn’t broadly releasedFounding and team-building principles behind Character.aiCharacter.ai’s product design: user-created personas, role-play, and emotional supportSafety, hallucinations, commercialization, and the path toward AGI-level systems

In this episode of No Priors, featuring Elad Gil and Noam Shazeer, No Priors Ep. 12 | With Noam Shazeer explores transformer Pioneer Noam Shazeer Builds Emotional AI at Character.ai Noam Shazeer, co‑founder of Character.ai and co‑author of the Transformer paper, discusses his path from early Google AI work to building large language models and chat-based products. He explains why transformers overtook RNNs, emphasizes that language modeling is an “AI-complete” problem, and argues that scaling models, data, and compute still shows no clear saturation point. Shazeer details the origins of Google’s LaMDA (formerly Mina), why big companies hesitated to launch open-ended chatbots, and how that led him and co-founder Daniel de Freitas to start Character.ai. He also explores user behavior on Character.ai, emotional and parasocial use cases, safety tradeoffs, commercialization plans, and his broader motivation of using AI progress as a lever toward AGI and solving real-world problems like medicine.

Transformer Pioneer Noam Shazeer Builds Emotional AI at Character.ai

Noam Shazeer, co‑founder of Character.ai and co‑author of the Transformer paper, discusses his path from early Google AI work to building large language models and chat-based products. He explains why transformers overtook RNNs, emphasizes that language modeling is an “AI-complete” problem, and argues that scaling models, data, and compute still shows no clear saturation point. Shazeer details the origins of Google’s LaMDA (formerly Mina), why big companies hesitated to launch open-ended chatbots, and how that led him and co-founder Daniel de Freitas to start Character.ai. He also explores user behavior on Character.ai, emotional and parasocial use cases, safety tradeoffs, commercialization plans, and his broader motivation of using AI progress as a lever toward AGI and solving real-world problems like medicine.

Key Takeaways

Transformers won because they align with modern parallel hardware.

Shazeer explains that deep learning’s success—and transformers in particular—comes from being highly optimized for GPU/TPU-style matrix-multiply hardware, enabling massive parallelism over sequences rather than slow, stepwise RNN computation.

Get the full analysis with uListen AI

Language modeling is simple to define yet essentially AI-complete.

Predicting the next word from vast text corpora is conceptually simple but, done well, yields general-purpose capabilities like dialogue, reasoning, and task assistance, making language modeling a central route to broad AI.

Get the full analysis with uListen AI

We have not yet hit a clear capability wall for LLMs.

Between algorithmic improvements (better architectures, training, quantization) and large increases in compute budgets, Shazeer sees no obvious point where current architectures definitively “tap out” in performance.

Get the full analysis with uListen AI

Data scarcity is overstated; human and AI-generated text can fuel growth.

He notes the enormous volume of language humans produce daily and anticipates increasing interaction data with AIs themselves, suggesting that data, especially with privacy-preserving methods, is unlikely to be the fundamental bottleneck soon.

Get the full analysis with uListen AI

Multi-persona chat is a better product fit than a single ‘universal assistant.’

Shazeer argues that single corporate assistants must be bland and inoffensive, whereas allowing users to create diverse characters and personas produces richer, more engaging and human-feeling interactions.

Get the full analysis with uListen AI

Emotional and parasocial use cases are central, not peripheral, to Character.ai.

Users spend on the order of hours per active day on the service, often for role-play, companionship, and emotional support, showing strong demand for affective interaction even with systems that are explicitly fictional.

Get the full analysis with uListen AI

Building great AI products requires both cutting-edge research and extreme motivation.

In hiring and co-founder selection, Shazeer emphasizes “burning desire or childhood dream” levels of motivation alongside technical excellence, which he credits for scrappy wins like Mina/LaMDA and the rapid build-out of Character. ...

Get the full analysis with uListen AI

Notable Quotes

“The most exciting problem out there is language modeling… it’s really AI-complete.”
— Noam Shazeer

“Deep learning really took off because it runs thousands of times faster than anything else on modern hardware.”
— Noam Shazeer

“I don’t think anyone’s seen a wall in terms of how good this stuff is, so I think it’s just gonna keep getting better and I don’t know what stops it.”
— Noam Shazeer

“Basically this is a technology that’s so accessible that billions of people can just invent use cases.”
— Noam Shazeer

“I wanted to have a company that was both AGI first and product first… by making the product depend entirely on the quality of the AI.”
— Noam Shazeer

Questions Answered in This Episode

If transformers eventually do hit a capability wall, what technical directions beyond today’s architectures does Shazeer think are most promising?

Noam Shazeer, co‑founder of Character. ...

Get the full analysis with uListen AI

How should society balance the benefits of AI companionship and emotional support with potential risks of dependence or isolation?

Get the full analysis with uListen AI

What concrete methods could make large-scale conversational data collection genuinely privacy-preserving while remaining useful for training?

Get the full analysis with uListen AI

How might Character.ai’s safety approach evolve as characters become more knowledgeable and more tightly integrated into real-world tasks?

Get the full analysis with uListen AI

In what ways could an AGI-first, consumer-product-first strategy accelerate progress in domains like medicine faster than domain-specific research alone?

Get the full analysis with uListen AI

Transcript Preview

Elad Gil

(digital music) Noam, welcome to No Priors.

Noam Shazeer

Hey, Elad. Th- uh, thanks for having me on. Uh, hi, Sara.

Elad Gil

Good to see you. Yeah, thanks for joining. So, um, you've been working on the NLP and AI for a long time. So I think you were at Google for something like 17 years off and on. And I think even your Google interview question was something around spellchecking, an approach that eventually got (laughs) implemented there. Um, and when I joined Google, one of the main, um, systems being used at the time for ads targeting was like fill, and fill clusters, and all the stuff which I think you wrote with George Herrick. And so it'd just be great to get kind of your history in terms of working on, um, AI, NLP, language models, how this all evolved, what you got started on, and what sparked your interest.

Noam Shazeer

Oh, thanks, Elad. Yeah, uh, just, uh, always was naturally drawn to AI, you know? Wanted to make the computer do something smart. Seems like pretty much the most, uh, fun, uh, fun game around. Um, so, uh, yeah, was, uh, lucky to find, uh, Google early on, and, uh, it really is, uh, an AI company. So, um, yeah, I got, uh, got involved in a lot of the, uh, early projects there that, uh, that maybe you wouldn't call AI now but, uh, but seemed pretty smart, uh, at the time. Um, and then more recently was on the Google, uh, Brain team starting in 2012. It looked like a really smart group of people, uh, doing something interesting, and never... Uh, I had never done deep learning before, or neural networks I guess as it was called then, or whatever. I forget when the rebrand happened-

Elad Gil

(laughs)

Noam Shazeer

... but, uh (laughs) -

Elad Gil

Yeah.

Noam Shazeer

... but, uh, yeah, it turned out to be really fun.

Elad Gil

That's cool. And then, you know, you were one of the main people working on the transformer paper and design in 2017, and then you worked on Mesh TensorFlow, I think, um, sometime within the following year. Could you talk a little bit about how all that got going?

Noam Shazeer

Yeah, I, I mean, I, um, messed around a few years, um, on the Google Brain team and, like, utterly failed at a bunch of stuff, uh, till I kinda got the hang of it. Um, really the key insight is that what makes deep learning work is that it is, um, really well-suited to, uh, to modern hardware, um, where, you know, you have the, uh, the current generation of, uh, of chips that are great at, um, at matrix multiplies and, you know, o- other forms of things that require large amounts of, um, computation relative to communication. So, uh, so basically deep learning, like, really took off because, you know, it runs thousands of times faster than anything else. And as soon as I got the hang of that, started designing things that actually, uh, were smart and, uh, and ran fast. Um, but, you know, the most, most exciting, uh, problem out there is language modeling. It's just, like, i- i- it's like the best problem ever, because, like, there's, like, an infinite amount of data, you know, just scrape the web and you've got, like, all the training data you could ever, uh, ever hope for. And, like, the problem is super simple to define. It's just, like, um, predict the next word. The fat cat sat on the, you know, like, okay, what, you know, what comes next?

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome