Skip to content
The Twenty Minute VCThe Twenty Minute VC

Noam Shazeer: How We Spent $2M to Train a Single AI Model and Grew Character.ai to 20M Users | E1055

Noam Shazeer is the co-founder and CEO of Character.AI, a full-stack AI computing platform that gives people access to their own flexible superintelligence. A renowned computer scientist and researcher, Shazeer is one of the foremost experts in artificial intelligence (AI) and natural language processing (NLP). He is a key author for the Transformer, a revolutionary deep learning model enabling language understanding, machine translation, and text generation that has become the foundation of many NLP models. A former member of the Google Brain team, Shazeer led the development of spelling corrector capabilities within Gmail, the algorithm at the heart of AdSense. --------------------------------------------------------- Timestamps: (0:00) Intro (00:43) Noam's Google Experience and Introduction to Character (06:18) Character. AI's Vision, Growth, and Ethical Considerations (14:13) Technical and Business Aspects of AI and Machine Learning (23:19) Business Strategies and AI Philosophy (30:05) Quick-Fire Round --------------------------------------------------------- In Today’s Episode with Noam Shazeer We Discuss: 1. Entry into the World of AI and NLP: How did Noam first make his way into the world of AI and come to work on spell corrector with Google? What are 1-2 of his biggest takeaways from spending 20 years at Google? What does Noam know now that he wishes he had known when he started Character? 2. Model Size or Data Size: What is more important, the size of the data or the size of the model? Does Noam agree that “we will not use models in a year that we have today?” What is the lifespan of a model? Does Noam agree that the companies that win are those that are able to switch between models with the most ease? With the majority of data being able to be downloaded from the internet, is there real value in data anymore? 3. The Biggest Barriers: What is the single biggest barrier to Character today? What are the most challenging elements of model training? Why did they need to spend $2M to train an early model? What are the most difficult elements of releasing a horizontal product with so many different use cases? Where does the value accrue in the race for AI dominance; startups or incumbents? 4. AI’s Role on Society: Why does Noam believe that AI can create greater not worse human connections? Why is Noam not concerned by the speed of adoption of AI tools? What does Noam know about AI’s impact on society that the world does not see? --------------------------------------------------------- Subscribe on Spotify: https://open.spotify.com/show/3j2KMcZTtgTNBKwtZBMHvl?si=85bc9196860e4466 Subscribe on Apple Podcasts: https://podcasts.apple.com/us/podcast/the-twenty-minute-vc-20vc-venture-capital-startup/id958230465 Follow Harry Stebbings on Twitter: https://twitter.com/HarryStebbings Follow Noam Shazeer on Twitter: https://twitter.com/NoamShazeer Follow 20VC on Instagram: https://www.instagram.com/20vc_reels Follow 20VC on TikTok: https://www.tiktok.com/@20vc_tok Visit our Website: https://www.20vc.com Subscribe to our Newsletter: https://www.thetwentyminutevc.com/contact --------------------------------------------------------- #NoamShazeer #CharacterAI #HarryStebbings

Noam ShazeerguestHarry Stebbingshost
Aug 31, 202336mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 0:55

    What users actually do with Character.ai: unexpected therapy and emotional support

    Noam opens with a surprising user behavior: people using fictional/game characters as a form of therapy or emotional relief. This sets up the central theme that users, not builders, often discover the most meaningful use cases.

    • Users report emotional support benefits from talking to character bots
    • Unintended use cases can dominate intended ones
    • Early signal of companionship/mental-health-adjacent demand
  2. 0:55 – 3:24

    From Google’s spelling corrector to product lessons about scale

    Noam recounts his first Google project improving spelling correction for web search, and why the old dictionary-based approach failed at internet scale. He then generalizes the lesson: the biggest wins come from broadly useful, mass-market tools.

    • Why dictionary-driven spellcheck was terrible for web search
    • Search diversity forced new approaches beyond rigid rules
    • Key Google lesson: general tech + billions of users beats narrow B2B assumptions
  3. 3:24 – 5:05

    Character.ai’s full-stack, direct-to-consumer bet inspired by Google

    Noam explains why Character.ai prioritizes launching a general-purpose AI product directly to consumers rather than starting with vertical B2B applications. He argues that controlling the full stack—from research to product—enables speed, iteration, and co-design across the system.

    • LLMs are unusually versatile and easy to use (conversation as interface)
    • D2C-first strategy: launch to everyone and let use cases emerge
    • Full-stack approach enables end-to-end optimization and faster learning cycles
  4. 5:05 – 6:01

    Motivation and leverage: pushing AI forward as the highest-impact path

    The conversation shifts to what drives Noam personally: curiosity, enjoyment of hard technical problems, and a belief in AI as leverage on global challenges. Rather than tackling domains directly (e.g., medicine), he frames advancing AI capability as a force multiplier.

    • AI work is intrinsically motivating: making computers do what they can’t
    • Leverage argument: better AI can accelerate progress across many fields
    • Acknowledges huge solvable problems (disease, aging, etc.)
  5. 6:01 – 8:20

    Mission as humility: “a billion users inventing a billion use cases”

    Noam lays out a mission philosophy grounded in humility and user agency. Character.ai’s guiding motto is to build something broadly capable and let the world decide what it’s for—illustrated by users repurposing game characters into therapists.

    • Humility about predicting or controlling societal outcomes
    • Core motto: many users discovering many uses
    • Entertainment/companionship/emotional support emerged strongly
    • Users’ agency is central to product direction
  6. 8:20 – 9:21

    Explaining the growth engine: launching, generality, and real human need

    Noam attributes Character.ai’s rapid traction (millions of users, massive message volume) to a simple combination: they shipped, the product stayed general, and it meets a widespread need for connection. He contrasts this with large-company hesitancy and brand risk.

    • Growth driver #1: actually launching the product
    • General tool unlocks organic discovery of use cases
    • Demand-side pull: loneliness/need to talk is widespread
    • Large companies can be slower due to perceived brand risk
  7. 9:21 – 10:29

    Ethical tension: does AI companionship reduce or improve human connection?

    Harry challenges whether machine conversation could pull people away from real relationships. Noam emphasizes valuing human connection, and suggests the product can help some users practice socially—while acknowledging the ultimate effect depends on user choices.

    • Concern: AI as substitute for human relationships
    • Noam’s stance: human connection has moral and practical value
    • Potential benefit: practice tool for social anxiety
    • Outcome is user-dependent and hard to generalize
  8. 10:29 – 11:41

    Core product dilemma: keep it general without sacrificing usability/quality

    Noam describes the central product challenge: building something both highly general and genuinely usable. He rejects the standard PM advice to narrow into verticals, arguing that neural language models uniquely enable broad capability without hand-crafted specialization.

    • Perceived trade-off: versatility vs usability
    • Character.ai explicitly resists narrowing to verticals
    • General-purpose usability is a primary design goal
    • Quality comes from model capability rather than rules/handcrafted flows
  9. 11:41 – 16:50

    Why neural language models scale: next-word prediction, data abundance, and capability gains

    Noam explains the conceptual simplicity behind modern LLMs: predicting the next word. He contrasts fragile rule-based systems with neural models, notes the abundance of web-scale training data, and outlines how capabilities expanded from translation toward open-ended conversation.

    • LLMs reduce to a simple objective: next-token prediction
    • Rule-based NLP was complex, brittle, and non-generalizing
    • Massive training data is available (e.g., Common Crawl)
    • Early killer app was machine translation; conversation required more scale
  10. 16:50 – 18:40

    What really limits frontier models: compute, training time, and a $2M run

    The discussion moves into the economics of training: model size and training duration both multiply compute needs. Noam states compute is the main constraint and shares that Character.ai spent about $2M in compute to train a model the prior summer, with plans to improve via better hardware and longer training.

    • Compute is the dominant bottleneck (more than raw data access)
    • Bigger models + longer training drives cost multiplicatively
    • Character.ai trained a serving model with ~$2M of compute
    • Better hardware and longer runs are the path to smarter models
  11. 18:40 – 21:04

    Proprietary conversation data and privacy: learning from users without leaking them

    Harry asks about defensibility via proprietary data versus broadly available internet corpora. Noam argues user interaction data is valuable for preference and product tuning, but stresses privacy risks if conversations are naively used for training, since models could regurgitate private information.

    • User data helps identify what people like and how they use the system
    • General knowledge + smaller task-specific signals mirrors human learning
    • Privacy is a central constraint; naive training can cause memorization/leakage
    • Emphasis on aggregate learning and careful handling of sensitive content
  12. 21:04 – 24:13

    Startups vs incumbents, open vs closed: speed, scale economics, and ecosystem pluralism

    Noam explains why Character.ai works better as a standalone startup: faster shipping and fewer constraints from existing products. He predicts multiple winners across startups, big companies, universities, and individuals, with both open and closed approaches coexisting—while noting scale advantages for training and serving.

    • Standalone advantage: speed and willingness to launch
    • ‘Users win’ framing: many viable players and approaches
    • Open + closed ecosystems will both thrive; more small-scale tinkering boosts research
    • Economies of scale matter for training and serving efficiency (batching)
  13. 24:13 – 26:39

    AI perspective reset: best applications aren’t invented yet; hallucinations as a feature

    Noam challenges today’s narratives by comparing the moment to the invention of electricity/computers: we don’t yet know the killer apps. He also reframes hallucinations as a feature for early use cases (entertainment, companionship, creativity), letting product-market fit emerge where the tech is naturally strong.

    • We’re early: the most valuable uses are still unknown
    • Historical analogy: electricity/computers before widespread application discovery
    • Hallucinations can be beneficial in creative/entertainment contexts
    • Strategy: ship general tools and let natural-fit use cases surface
  14. 26:39 – 30:05

    Leadership, usefulness over fun, and the parenthood-induced mindset shift

    Noam reflects on becoming CEO and why he intends to remain in the role: to ensure the company makes the right decisions while he still contributes technically. He describes a shift toward prioritizing meaningful usefulness over immediate fun, influenced by parenthood and a sense of responsibility.

    • CEO transition: continues technical work alongside leadership
    • Staying CEO to guide key decisions
    • Personal ethic: optimize for usefulness/meaning over fun
    • Parenthood and faith themes: gratitude, responsibility, maturity
  15. 30:05 – 36:32

    Quick-fire: rapid predictions on AI acceleration, research noise, and an unpredictable 10-year horizon

    In the closing quick-fire, Noam predicts rapid advancement in the next 1–3 years, and emphasizes Character.ai is more than entertainment—it’s a full-stack AI and product company. He critiques the ‘alchemy’ state of ML publishing (signal amid noise), shares an early mistake about sparsity vs hardware realities, and declines to predict where the company will be in 2033 due to compounding technological change.

    • AI will get far smarter; momentum in hardware and research
    • Near-term adoption: significant progress in 1–3 years
    • Character.ai positioning: AI quality and product excellence are aligned
    • ML research is noisy; positive experimental results drive adoption
    • Lesson learned: hardware favors dense ops; sparsity must respect compute realities
    • 2033 outlook: impossible to forecast; agility matters

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.