No Priors Ep. 12 | With Noam Shazeer

Noam Shazeer played a key role in developing key foundations of modern AI - including co-inventing Transformers at Google, as well as pioneering AI chat pre-chatGPT. These are the foundations supporting today’s AI revolution. On this episode of No Priors, Noam discusses his work as an AI researcher, engineer, inventor, and now CEO. Noam Shazeer is currently the CEO and Co-founder of Character AI, a service that allows users to design and interact with their own personal bots that take on the personalities of well-known individuals or archetypes. You could have a socratic conversation with Socrates. You could pretend you’re being interviewed by Oprah. Or you could work through a life decision with a therapist bot. Character recently raised $150M from A16Z, Elad Gil, and others. Noam talks about his early AI adventures at Google, why he started Character, and what he sees on the horizon of AI development. 00:00 - Introduction 01:50 - Noam’s early AI projects at Google 07:13 - Noam’s focus on language models and AI applications 11:13 - Character’s co-founder Daniel de Freitas Adiwardana work on Google’s Lambda 13:53 - The origin story of Character.AI 18:47 - How AI can express emotions 26:51 - What Noam looks for in new hires

Elad GilhostNoam ShazeerguestSarah Guohost

Apr 25, 202335mWatch on YouTube ↗

CHAPTERS

0:00 – 1:51
Noam’s path into AI at Google and the early “AI-ish” systems
Elad frames Noam’s long arc in NLP/AI at Google, from early practical systems to joining Google Brain. Noam shares how he was drawn to AI for its intrinsic fun and describes the shift from pre-deep-learning approaches to neural methods.
- •Noam’s long tenure at Google and early NLP/ads-related systems
- •Early projects that felt “smart” even if they weren’t labeled AI then
- •Joining Google Brain in 2012 and learning neural networks/deep learning
- •Motivation: making computers do something genuinely smart
1:51 – 2:52
Learning deep learning the hard way—and why hardware matters
Noam explains that early experimentation involved lots of failure until the key insight clicked: deep learning maps extremely well onto modern hardware. He highlights compute patterns like matrix multiplies and why that changed what was feasible.
- •Iteration and failure as part of learning deep learning
- •Deep learning’s advantage: favorable compute-to-communication ratio
- •Modern chips excel at matrix multiplies, enabling huge speedups
- •Designing models that are both “smart” and fast
2:52 – 3:12
Why language modeling is the ‘best problem’: infinite data, simple objective
Noam argues language modeling is uniquely compelling because the objective is straightforward (next-token prediction) and the world supplies enormous training data. He frames strong language models as a path to broad, general capability.
- •Language modeling: predict the next word/token
- •Data availability: web-scale text is effectively endless
- •Simplicity of the task definition vs. breadth of emergent capabilities
- •Early work around 2015 with RNN-based language models
3:12 – 7:08
From RNNs to Transformers: the origin story and the core architectural shift
Noam recounts how the team moved away from RNNs toward attention-based approaches that became Transformers. He explains the practical benefits: parallelism, efficiency on GPUs/TPUs, and attention as differentiable associative memory.
- •RNNs require sequential, step-by-step computation
- •Transformers process sequences in parallel, leveraging hardware
- •Attention as fuzzy key-value lookup / associative memory
- •Self-attention: using attention to look back over the same sequence
7:08 – 8:35
Transformers beyond text—and why Noam stayed focused on language
Elad asks about surprising applications like protein folding, while Noam emphasizes his single-minded focus on language as the densest modality for intelligence. He notes excitement about multimodal progress but expects core reasoning to come from text models.
- •Unexpected wins: Transformers used in areas like AlphaFold (mentioned)
- •Noam’s bias toward text: higher information density than pixels
- •Multimodal systems are useful, but text models may drive core intelligence
- •Aspirational framing: models that can help solve major problems (e.g., medicine)
8:35 – 9:55
What pushes capabilities forward: scale, efficiency, chips, and ‘no wall yet’
The conversation turns to what limits or accelerates progress: algorithmic improvements, training efficiency, hardware evolution, and brute-force investment. Noam argues there’s no evidence of imminent architectural ‘tapping out.’
- •Many potential 2× gains: architectures, training algorithms, quantization
- •Bigger gains from capital/scale as value becomes obvious
- •No clear “wall” observed yet in capability scaling
- •Open question: what ultimately stops progress
9:55 – 11:18
Data and training debates: undertraining, new data sources, and synthetic data
Sarah raises concerns about data scarcity and undertrained frontier models. Noam responds that human-generated text (including future AI interactions) remains vast and suggests synthetic generation may also fill gaps—ideally with privacy-preserving approaches.
- •Concern: largest models may be undertrained relative to compute
- •More data will come from ongoing human communication and AI chats
- •Data needs rise with compute/model growth
- •Possibility of generating additional data with AI
11:18 – 12:07
Hallucinations, memory, and personalization: product-driven research priorities
Elad asks what needs solving next; Noam jokes that hallucinations can be a feature, but stresses memory as a top priority for Character.AI. He highlights the user demand for long-term recall and personalized experiences, alongside ongoing work on truthfulness.
- •Hallucinations: entertaining but also something to address
- •Memory as a key feature for persistent, personal relationships
- •Personalization depends on efficiently using user-specific data
- •Parallel efforts: distinguishing factual vs. fabricated content
12:07 – 14:55
Project Mina → LaMDA: Daniel’s scrappy chatbot mission inside Google
Noam describes co-founder Daniel de Freitas’s long-running obsession with open-domain chatbots and how Mina emerged at Google Brain via 20% time and improvised resource gathering. Mina went viral internally, scaled to large models, and was later renamed LaMDA.
- •Daniel’s lifelong goal: build truly open-domain chatbots
- •Mina started as a 20% project with “army” of part-time helpers
- •Creative resourcing: recruiting contributors and TPU quota
- •Mina’s internal success led to rebranding as LaMDA
14:55 – 15:30
Why LaMDA didn’t launch—and why a startup could move faster
Sarah asks why LaMDA wasn’t released; Noam points to large-company risk concerns around systems that can “say anything.” This becomes part of the rationale for founding a startup to iterate quickly and ship.
- •Large-company caution: reputational and safety risk of open-ended chat
- •Risk/reward calculus differs for big orgs vs. startups
- •Motivation to move faster by building outside Google
- •Context: Noam left before the sentience media episode
15:30 – 17:23
Character.AI origin and team-building: recruiting for extreme motivation
Noam shares the decision to leave and launch quickly, assemble a top-tier engineering/research team, and secure compute. He explains their recruiting emphasis—especially Daniel’s bar for “burning desire” and startup-level drive.
- •Founding impulse: “build it and launch fast”
- •Team formation via Google connections and a Meta LLM infrastructure leader
- •Recruiting lens: talent plus unusually high motivation
- •Willingness to pass on strong candidates who don’t match the mission intensity
17:23 – 20:11
Why ‘characters’ work: multiple personas beat a single bland assistant
Noam explains the product shape: letting users create many personas avoids the constraints of a single mass-appeal assistant that must be consistent and inoffensive. “Character” becomes a flexible container for imagination, expectations, and user control.
- •User control: billions of people invent their own use cases
- •Single-assistant problem: one persona becomes boring/over-constrained
- •Multiple personas enable opinionated, human-feeling interactions
- •“Character” as text unit + persona + moral character
20:11 – 24:57
What users do with Character.AI: roleplay, parasocial chat, loneliness, and emotion
The discussion covers real usage patterns and why engagement is so high: roleplaying, fandom and influencer interactions, and companionship for lonely or troubled users. Noam argues emotional usefulness doesn’t require maximal intelligence and expects scaling to improve knowledge and alignment to user intent.
- •Use cases: RPG/text adventures, anime/game characters, public figures
- •Parasocial relationships become interactive and personalized
- •Companionship and emotional support as major drivers
- •Scaling expectations: smarter, more knowledgeable, better attuned behavior
24:57 – 28:15
Business model, focus, and safety: monetization plans, priorities, and guardrails
Noam outlines plans to monetize via premium subscriptions to fund compute-intensive capabilities, while keeping broad free access. The team’s small size forces prioritization; they also discuss safety framing (characters are fiction) and content filters to prevent harm.
- •Monetization: premium subscription for more expensive capabilities
- •Compute costs make sustainable revenue important
- •Focus constraints: small team prioritizes broad public availability
- •Safety approach: clear fiction framing + self-harm/violence constraints + porn filtering
28:15 – 35:36
AGI as a product-driven feedback loop, founder lessons, and quickfire wrap-up
Elad asks about AGI; Noam frames his deeper motivation as accelerating technological progress (e.g., medicine) and designs the company so product quality depends on AI quality. They close with founder advice, tips for creating good characters, hiring needs, and a brief set of rapid-fire questions.
- •AGI + product: make the product’s success depend on model intelligence
- •Founder advice: build fast, learn from mistakes, hire motivated people
- •How to make a good character: minimal setup for famous figures; example chats for others
- •Hiring focus: engineers across DL + frontend/backend; growing business/product; quickfire Q&A to close