Noam Shazeer: How We Spent $2M to Train a Single AI Model and Grew Character.ai to 20M Users | E1055

Noam Shazeer is the co-founder and CEO of Character.AI, a full-stack AI computing platform that gives people access to their own flexible superintelligence. A renowned computer scientist and researcher, Shazeer is one of the foremost experts in artificial intelligence (AI) and natural language processing (NLP). He is a key author for the Transformer, a revolutionary deep learning model enabling language understanding, machine translation, and text generation that has become the foundation of many NLP models. A former member of the Google Brain team, Shazeer led the development of spelling corrector capabilities within Gmail, the algorithm at the heart of AdSense. --------------------------------------------------------- Timestamps: (0:00) Intro (00:43) Noam's Google Experience and Introduction to Character (06:18) Character. AI's Vision, Growth, and Ethical Considerations (14:13) Technical and Business Aspects of AI and Machine Learning (23:19) Business Strategies and AI Philosophy (30:05) Quick-Fire Round --------------------------------------------------------- In Today’s Episode with Noam Shazeer We Discuss: 1. Entry into the World of AI and NLP: How did Noam first make his way into the world of AI and come to work on spell corrector with Google? What are 1-2 of his biggest takeaways from spending 20 years at Google? What does Noam know now that he wishes he had known when he started Character? 2. Model Size or Data Size: What is more important, the size of the data or the size of the model? Does Noam agree that “we will not use models in a year that we have today?” What is the lifespan of a model? Does Noam agree that the companies that win are those that are able to switch between models with the most ease? With the majority of data being able to be downloaded from the internet, is there real value in data anymore? 3. The Biggest Barriers: What is the single biggest barrier to Character today? What are the most challenging elements of model training? Why did they need to spend $2M to train an early model? What are the most difficult elements of releasing a horizontal product with so many different use cases? Where does the value accrue in the race for AI dominance; startups or incumbents? 4. AI’s Role on Society: Why does Noam believe that AI can create greater not worse human connections? Why is Noam not concerned by the speed of adoption of AI tools? What does Noam know about AI’s impact on society that the world does not see? --------------------------------------------------------- Subscribe on Spotify: https://open.spotify.com/show/3j2KMcZTtgTNBKwtZBMHvl?si=85bc9196860e4466 Subscribe on Apple Podcasts: https://podcasts.apple.com/us/podcast/the-twenty-minute-vc-20vc-venture-capital-startup/id958230465 Follow Harry Stebbings on Twitter: https://twitter.com/HarryStebbings Follow Noam Shazeer on Twitter: https://twitter.com/NoamShazeer Follow 20VC on Instagram: https://www.instagram.com/20vc_reels Follow 20VC on TikTok: https://www.tiktok.com/@20vc_tok Visit our Website: https://www.20vc.com Subscribe to our Newsletter: https://www.thetwentyminutevc.com/contact --------------------------------------------------------- #NoamShazeer #CharacterAI #HarryStebbings

Noam ShazeerguestHarry Stebbingshost

Aug 31, 202336mWatch on YouTube ↗

CHAPTERS

0:00 – 0:55
What users actually do with Character.ai: unexpected therapy and emotional support
Noam opens with a surprising user behavior: people using fictional/game characters as a form of therapy or emotional relief. This sets up the central theme that users, not builders, often discover the most meaningful use cases.
- •Users report emotional support benefits from talking to character bots
- •Unintended use cases can dominate intended ones
- •Early signal of companionship/mental-health-adjacent demand
0:55 – 3:24
From Google’s spelling corrector to product lessons about scale
Noam recounts his first Google project improving spelling correction for web search, and why the old dictionary-based approach failed at internet scale. He then generalizes the lesson: the biggest wins come from broadly useful, mass-market tools.
- •Why dictionary-driven spellcheck was terrible for web search
- •Search diversity forced new approaches beyond rigid rules
- •Key Google lesson: general tech + billions of users beats narrow B2B assumptions
3:24 – 5:05
Character.ai’s full-stack, direct-to-consumer bet inspired by Google
Noam explains why Character.ai prioritizes launching a general-purpose AI product directly to consumers rather than starting with vertical B2B applications. He argues that controlling the full stack—from research to product—enables speed, iteration, and co-design across the system.
- •LLMs are unusually versatile and easy to use (conversation as interface)
- •D2C-first strategy: launch to everyone and let use cases emerge
- •Full-stack approach enables end-to-end optimization and faster learning cycles
5:05 – 6:01
Motivation and leverage: pushing AI forward as the highest-impact path
The conversation shifts to what drives Noam personally: curiosity, enjoyment of hard technical problems, and a belief in AI as leverage on global challenges. Rather than tackling domains directly (e.g., medicine), he frames advancing AI capability as a force multiplier.
- •AI work is intrinsically motivating: making computers do what they can’t
- •Leverage argument: better AI can accelerate progress across many fields
- •Acknowledges huge solvable problems (disease, aging, etc.)
6:01 – 8:20
Mission as humility: “a billion users inventing a billion use cases”
Noam lays out a mission philosophy grounded in humility and user agency. Character.ai’s guiding motto is to build something broadly capable and let the world decide what it’s for—illustrated by users repurposing game characters into therapists.
- •Humility about predicting or controlling societal outcomes
- •Core motto: many users discovering many uses
- •Entertainment/companionship/emotional support emerged strongly
- •Users’ agency is central to product direction
8:20 – 9:21
Explaining the growth engine: launching, generality, and real human need
Noam attributes Character.ai’s rapid traction (millions of users, massive message volume) to a simple combination: they shipped, the product stayed general, and it meets a widespread need for connection. He contrasts this with large-company hesitancy and brand risk.
- •Growth driver #1: actually launching the product
- •General tool unlocks organic discovery of use cases
- •Demand-side pull: loneliness/need to talk is widespread
- •Large companies can be slower due to perceived brand risk
9:21 – 10:29
Ethical tension: does AI companionship reduce or improve human connection?
Harry challenges whether machine conversation could pull people away from real relationships. Noam emphasizes valuing human connection, and suggests the product can help some users practice socially—while acknowledging the ultimate effect depends on user choices.
- •Concern: AI as substitute for human relationships
- •Noam’s stance: human connection has moral and practical value
- •Potential benefit: practice tool for social anxiety
- •Outcome is user-dependent and hard to generalize
10:29 – 11:41
Core product dilemma: keep it general without sacrificing usability/quality
Noam describes the central product challenge: building something both highly general and genuinely usable. He rejects the standard PM advice to narrow into verticals, arguing that neural language models uniquely enable broad capability without hand-crafted specialization.
- •Perceived trade-off: versatility vs usability
- •Character.ai explicitly resists narrowing to verticals
- •General-purpose usability is a primary design goal
- •Quality comes from model capability rather than rules/handcrafted flows
11:41 – 16:50
Why neural language models scale: next-word prediction, data abundance, and capability gains
Noam explains the conceptual simplicity behind modern LLMs: predicting the next word. He contrasts fragile rule-based systems with neural models, notes the abundance of web-scale training data, and outlines how capabilities expanded from translation toward open-ended conversation.
- •LLMs reduce to a simple objective: next-token prediction
- •Rule-based NLP was complex, brittle, and non-generalizing
- •Massive training data is available (e.g., Common Crawl)
- •Early killer app was machine translation; conversation required more scale
16:50 – 18:40
What really limits frontier models: compute, training time, and a $2M run
The discussion moves into the economics of training: model size and training duration both multiply compute needs. Noam states compute is the main constraint and shares that Character.ai spent about $2M in compute to train a model the prior summer, with plans to improve via better hardware and longer training.
- •Compute is the dominant bottleneck (more than raw data access)
- •Bigger models + longer training drives cost multiplicatively
- •Character.ai trained a serving model with ~$2M of compute
- •Better hardware and longer runs are the path to smarter models
18:40 – 21:04
Proprietary conversation data and privacy: learning from users without leaking them
Harry asks about defensibility via proprietary data versus broadly available internet corpora. Noam argues user interaction data is valuable for preference and product tuning, but stresses privacy risks if conversations are naively used for training, since models could regurgitate private information.
- •User data helps identify what people like and how they use the system
- •General knowledge + smaller task-specific signals mirrors human learning
- •Privacy is a central constraint; naive training can cause memorization/leakage
- •Emphasis on aggregate learning and careful handling of sensitive content
21:04 – 24:13
Startups vs incumbents, open vs closed: speed, scale economics, and ecosystem pluralism
Noam explains why Character.ai works better as a standalone startup: faster shipping and fewer constraints from existing products. He predicts multiple winners across startups, big companies, universities, and individuals, with both open and closed approaches coexisting—while noting scale advantages for training and serving.
- •Standalone advantage: speed and willingness to launch
- •‘Users win’ framing: many viable players and approaches
- •Open + closed ecosystems will both thrive; more small-scale tinkering boosts research
- •Economies of scale matter for training and serving efficiency (batching)
24:13 – 26:39
AI perspective reset: best applications aren’t invented yet; hallucinations as a feature
Noam challenges today’s narratives by comparing the moment to the invention of electricity/computers: we don’t yet know the killer apps. He also reframes hallucinations as a feature for early use cases (entertainment, companionship, creativity), letting product-market fit emerge where the tech is naturally strong.
- •We’re early: the most valuable uses are still unknown
- •Historical analogy: electricity/computers before widespread application discovery
- •Hallucinations can be beneficial in creative/entertainment contexts
- •Strategy: ship general tools and let natural-fit use cases surface
26:39 – 30:05
Leadership, usefulness over fun, and the parenthood-induced mindset shift
Noam reflects on becoming CEO and why he intends to remain in the role: to ensure the company makes the right decisions while he still contributes technically. He describes a shift toward prioritizing meaningful usefulness over immediate fun, influenced by parenthood and a sense of responsibility.
- •CEO transition: continues technical work alongside leadership
- •Staying CEO to guide key decisions
- •Personal ethic: optimize for usefulness/meaning over fun
- •Parenthood and faith themes: gratitude, responsibility, maturity
30:05 – 36:32
Quick-fire: rapid predictions on AI acceleration, research noise, and an unpredictable 10-year horizon
In the closing quick-fire, Noam predicts rapid advancement in the next 1–3 years, and emphasizes Character.ai is more than entertainment—it’s a full-stack AI and product company. He critiques the ‘alchemy’ state of ML publishing (signal amid noise), shares an early mistake about sparsity vs hardware realities, and declines to predict where the company will be in 2033 due to compounding technological change.
- •AI will get far smarter; momentum in hardware and research
- •Near-term adoption: significant progress in 1–3 years
- •Character.ai positioning: AI quality and product excellence are aligned
- •ML research is noisy; positive experimental results drive adoption
- •Lesson learned: hardware favors dense ops; sparsity must respect compute realities
- •2033 outlook: impossible to forecast; agility matters

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

What users actually do with Character.ai: unexpected therapy and emotional support

From Google’s spelling corrector to product lessons about scale

Character.ai’s full-stack, direct-to-consumer bet inspired by Google

Motivation and leverage: pushing AI forward as the highest-impact path

Mission as humility: “a billion users inventing a billion use cases”

Explaining the growth engine: launching, generality, and real human need

Ethical tension: does AI companionship reduce or improve human connection?

Core product dilemma: keep it general without sacrificing usability/quality

Why neural language models scale: next-word prediction, data abundance, and capability gains

What really limits frontier models: compute, training time, and a $2M run

Proprietary conversation data and privacy: learning from users without leaking them

Startups vs incumbents, open vs closed: speed, scale economics, and ecosystem pluralism

AI perspective reset: best applications aren’t invented yet; hallucinations as a feature

Leadership, usefulness over fun, and the parenthood-induced mindset shift

Quick-fire: rapid predictions on AI acceleration, research noise, and an unpredictable 10-year horizon

Get more out of YouTube videos.