Lenny's PodcastDr. Fei-Fei Li: Why world models come next, not bigger LLMs
Through ImageNet, AlexNet, and GPUs the modern AI recipe was set; today's LLMs cannot reliably count chairs in a video, and world models are how that changes.
CHAPTERS
- 0:00 – 9:37
Fei-Fei Li’s humanist view of AI: optimism with accountability
Lenny frames Fei-Fei as an “AI optimist,” and she clarifies that she’s not utopian—AI’s impact on jobs and society is real, and outcomes depend on human choices. She emphasizes personal responsibility and the idea that AI is created by people and affects people, so governance and ethics can’t be an afterthought.
- •AI as a net positive historically, but always a double-edged sword
- •“Nothing artificial about AI”: people inspire, build, and are impacted by it
- •Responsibility across development, deployment, and application
- •Why everyone—not just technologists—should care about AI’s direction
- 9:37 – 14:15
From Turing to machine learning: how AI evolved before the recent boom
Fei-Fei gives a rapid history of AI from the 1950s origins through expert systems and early neural nets, to the rise of machine learning in the late ’80s through early 2000s. She explains how statistical learning shifted the field away from purely rule-based systems toward pattern learning and generalization.
- •Foundations: Turing’s question and the 1956 Dartmouth workshop
- •Early eras: logic systems, expert systems, and initial neural network exploration
- •Machine learning as a marriage of programming + statistical learning
- •Learning as generalization (recognizing the ‘fourth cat’ after seeing a few)
- 14:15 – 16:46
Why visual intelligence mattered—and why object recognition became the ‘north star’
Entering AI around 2000, Fei-Fei chose to pursue visual intelligence, arguing human cognition is deeply grounded in perception and spatial understanding. She and her students focused on object recognition as a core building block for interpreting and interacting with the world.
- •Humans are ‘deeply visual animals’; language is important but not sufficient
- •Object recognition as a practical abstraction layer for real-world interaction
- •The combinatorial complexity of objects in images (infinite variations)
- •Why solving perception is foundational for broader intelligence
- 16:46 – 17:46
The ImageNet insight: big data as the missing ingredient
Fei-Fei describes the key bottleneck: models couldn’t learn without large, well-labeled datasets. She connects human learning and evolution to ‘big data’ experience, motivating the ambition to curate internet-scale labeled images for training.
- •Early AI pain point: strong models, insufficient training data
- •Human learning as experience-rich ‘big data’ learning
- •The ambition to capture object concepts at internet scale (for its time)
- •Using WordNet-style taxonomy to organize visual concepts
- 17:46 – 21:09
Building ImageNet and the 2012 inflection point that sparked modern deep learning
Fei-Fei recounts curating 15 million images across 22,000 concepts, open-sourcing ImageNet, and creating the annual challenge. The 2012 breakthrough combined ImageNet data with neural nets and GPUs—AlexNet’s success crystallized the ‘golden recipe’ that still underpins today’s AI scaling.
- •ImageNet: 15M images, 22K concepts, open-source release
- •ImageNet Challenge as a community catalyst
- •2012: Hinton’s team + two GPUs + big data = major leap in recognition
- •The enduring trio: data + neural nets + GPUs (now at massive scale)
- 21:09 – 23:54
When ‘AI’ was a dirty word: branding whiplash and the myth of lone heroes
They discuss how recently companies avoided the term “AI” and how quickly it flipped into a default label. Fei-Fei also cautions against Silicon Valley’s tendency to credit single individuals, stressing AI’s progress as multi-generational and collective.
- •2015–2016: some companies avoided saying “AI”
- •2017-ish: the marketing shift to calling themselves ‘AI companies’
- •Cultural critique: over-attributing achievements to one person
- •AI as a 70-year, multi-generation scientific endeavor
- 23:54 – 29:51
AGI skepticism: definitions, marketing, and what current systems still can’t do
Fei-Fei questions whether AGI is meaningfully defined and argues it often functions more as marketing than science. She highlights gaps in today’s AI—from simple spatial tasks to profound scientific creativity and emotional intelligence—underscoring the need for continued innovation beyond scaling.
- •AGI lacks a single scientific definition; ‘AI vs AGI’ may be a false divide
- •Scaling laws matter, but innovation is far from finished
- •Examples of shortcomings: spatial counting in videos, robust world understanding
- •Creativity/abstraction (Newton-level theory) and emotional intelligence remain unsolved
- 29:51 – 40:57
World models and spatial intelligence: the missing layer beyond language
Fei-Fei explains why language models inspired her yet also revealed what’s missing: embodied, spatial understanding needed for real-world reasoning and action. She defines world models as systems that can generate, navigate, interact within, and reason about 3D worlds—useful for humans and robots alike.
- •Motivation: intelligence includes spatial, perceptual, and situational awareness
- •World model capabilities: create worlds, navigate them, interact and reason inside them
- •Connection to embodied AI/robotics and to human augmentation
- •World Labs formed to move fast on spatial intelligence and world modeling
- 40:57 – 48:03
Robotics and the ‘bitter lesson’: why data scaling is harder in the physical world
Prompted by Ben Horowitz’s question, Fei-Fei explains Sutton’s ‘bitter lesson’—simple methods plus lots of data tend to win—and why robotics complicates this. Robots need action-centric 3D data, and they are physical systems with hardware, safety, supply chains, and long product cycles, more like self-driving cars than chatbots.
- •Bitter lesson: simple models + massive data beat complex models + little data
- •Robotics data is harder: actions in 3D worlds don’t match abundant web data formats
- •Need for teleoperation and synthetic data to bridge the gap
- •Robots’ physical constraints and long timelines (self-driving as a 20-year example)
- 48:03 – 52:50
Marble launch: the first ‘prompt-to-3D-worlds’ product from World Labs
They introduce Marble as an application built on World Labs’ frontier world model that generates genuinely navigable 3D worlds from prompts (text and images). Fei-Fei describes the small, research-heavy team and the effort required to reach the point where users can explore worlds—optionally in VR.
- •Marble as a product layer on top of a frontier 3D world model
- •Prompting with text/images to create navigable, immersive 3D scenes
- •Team composition (~30) blending research, engineering, product, and design
- •Delight details: the ‘dot’ visualization as intentional onboarding/UX, not the model itself
- 52:50 – 1:02:23
Marble in practice: VFX, games, robot simulation, and unexpected research uses
Fei-Fei outlines early real-world use cases and why 3D structure matters more than ‘just video generation.’ Examples include virtual production (with Sony), game development via mesh export, synthetic environments for robotics training, and even psychology research needing controllable immersive scenes.
- •Virtual production/VFX: camera-aligned 3D worlds; reported 40× faster production
- •Game development: exporting meshes and building experiences on top
- •Robotics: generating diverse synthetic environments for training and simulation
- •Unexpected demand: psychology/psychiatry experiments needing varied immersive scenes
- •How Marble differs from video models: interactive 3D world structure and exportability
- 1:02:23 – 1:10:29
Founder lessons and career through-line: fearlessness, mission, and the right teams
Fei-Fei reflects on founding World Labs and the intensity of competition for both technology and talent in today’s AI landscape. She attributes her career choices—Stanford, SAIL leadership, Google, and now World Labs—to intellectual fearlessness, prioritizing mission and people over obsessing about failure cases, and encourages young talent to do the same.
- •Surprises of founding now: intense competition and high-cost talent markets
- •Career pattern: take risks for mission and exceptional collaborators
- •Leadership moves: restarting tenure clock, directing SAIL, joining Google to work with top AI leaders
- •Advice to candidates: don’t over-optimize; prioritize passion, mission alignment, and team faith
- 1:10:29 – 1:14:52
Human-Centered AI at Stanford: research + policy + governance for a civilizational tech
Fei-Fei explains why she co-founded Stanford HAI after seeing AI’s broad societal stakes during her Google tenure. HAI spans interdisciplinary research across Stanford’s schools and emphasizes policy bridges—helping connect Silicon Valley with governments and producing tools like the AI Index and policy programs.
- •HAI founded in 2018 to anchor AI in human-centered values
- •Interdisciplinary scope: medicine, law, humanities, business, sustainability, and more
- •Policy focus: congressional bootcamps, briefings, AI Index, regulatory engagement
- •Goal: ensure deployment and governance preserve human agency and benefit society
- 1:14:52 – 1:19:33
Closing message: every profession has a role in AI—protect dignity and agency
In the finale, Fei-Fei addresses the most common global question she hears: will AI take over my work and life? She argues everyone—artists, teachers, nurses, accountants, farmers—has a role in shaping how AI is used, and that no technology should erode human dignity; AI should be a tool for augmentation and societal benefit.
- •AI anxiety across professions is the central human question
- •Universal inclusion: everyone should have a voice in AI’s use and governance
- •AI as an augmenting tool (storytelling, healthcare support, easier work)
- •Non-negotiable principle: protect human dignity and agency
- •Where to find World Labs and Marble (worldlabs.ai)