Skip to content
a16za16z

How Fei-Fei Li Is Rebuilding AI for the Real World

What if the next leap in artificial intelligence isn’t about better language—but better understanding of space? In this episode, a16z General Partner Erik Torenberg moderates chats with Fei-Fei Li, cofounder and CEO of World Labs, and Martin Casado, a16z General Partner and early investor in the company. Together, they explore the concept of world models—AI systems that understand and reason about the physical, 3D world—not just text. Fei-Fei, often called the “godmother of AI,” explains why spatial intelligence is a critical (and missing) component of today's AI systems, and why her new company is going all-in on solving this challenge. Martin shares the story of how he and Fei-Fei aligned on this vision long before it was trendy - and why it may define the future of robotics, creativity, and computation itself. From the limitations of LLMs to the promise of embodied AI, from personal anecdotes to deep technical insights, this is a discussion on what it truly means to build intelligence for the real (and virtual) world. Timecodes: 00:00 Spatial Intelligence 00:39 Fei-Fei Li’s Background 01:17 Building a World Model 05:14 Reflecting on AI's Evolution 08:07 The Importance of 3D Understanding 10:20 Unrolling Evolution: Why 3D Intelligence Is Harder Than Language 12:19 From Single Reality to Infinite Virtual Universes 16:52 3D vs 2D: Why 2D Isn’t Enough for Machines 17:57 Fei-Fei’s Personal Story of Losing Stereo Vision 19:24 Research and Development at World Labs Resources: Find Fei-Fei on X: https://x.com/drfeifei Find Martin on X: https://x.com/martin_casado Learn more about World Labs: https://www.worldlabs.ai/ Stay Updated: Let us know what you think: https://ratethispodcast.com/a16z Find a16z on Twitter: https://twitter.com/a16z Find a16z on LinkedIn: https://www.linkedin.com/company/a16z Subscribe on your favorite podcast app: https://a16z.simplecast.com/ Follow our host: https://x.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.

Fei-Fei LiguestErik TorenberghostMartin Casadoguest
Jun 4, 202522mWatch on YouTube ↗

CHAPTERS

  1. Spatial intelligence as the next frontier beyond language

    Fei-Fei Li frames “space”—the 3D world outside us and in our mind’s eye—as a core component of intelligence. She argues that progress in AI will increasingly require models that understand and operate in 3D, not just in words.

  2. Fei-Fei Li’s impact: making data central to modern AI

    Martin Casado summarizes Fei-Fei’s major contributions and why she’s often called the “godmother of AI.” He emphasizes her role in elevating data as a first-class driver of AI progress, alongside model architecture.

  3. Why World Labs needed an “intellectual partner” investor

    Fei-Fei explains why she chose Martin as the first investor: not just for capital, but for deep technical alignment and ongoing collaboration. She describes World Labs as “deep tech” requiring sustained, high-conviction partnership.

  4. The “world model” dinner moment: aligning on what’s missing

    Martin recounts a key conversation during an LLM-hype dinner where Fei-Fei crystallized the idea: AI is missing a true “world model.” Fei-Fei adds that many people nodded at the phrase without understanding it, so she tested alignment by asking Martin to define it precisely.

  5. Looking back at AI’s evolution: surprise at data-driven emergence

    Asked what would surprise her younger self, Fei-Fei highlights the emotional and scientific surprise of how far data-hungry models have gone. She notes emergent behaviors that feel like “thinking machines,” even given her long-standing belief in data-centric AI.

  6. Why LLMs aren’t enough: language is lossy for the physical world

    Fei-Fei argues language is powerful but an incomplete encoding of reality—especially the 3D physical world where perception, interaction, and embodiment matter. She contrasts language’s generative nature with the grounded, ever-present structure of the perceptual world.

  7. A thought experiment: blindfolded instructions vs seeing the room

    Martin illustrates why words fail as a substitute for spatial representation: describing a room to a blindfolded person is inadequate for precise tasks. Vision (and internal 3D reconstruction) enables accurate manipulation and navigation.

  8. “Unrolling evolution”: why 3D intelligence is harder than language

    The discussion argues that language capabilities arrived first in AI partly because spatial navigation is deeply rooted in evolution and far more demanding. They point to decades of expensive robotics and autonomous vehicle efforts as evidence that world interaction remains difficult.

  9. Why 3D matters: science, creativity, and human breakthroughs depend on space

    Fei-Fei connects spatial reasoning to major human achievements, from deciphering DNA’s double helix to understanding molecular structures like buckyballs. The point is that core reasoning and innovation often require 3D mental models, not just verbal ones.

  10. From one reality to infinite virtual universes (the multiverse vision)

    Fei-Fei describes how combining reconstruction and generation could let us create “infinite universes” for robotics training, creativity, travel, socialization, and storytelling. The promise is a horizontal platform, akin to LLMs, but for spatial worlds.

  11. 3D vs 2D: why 2D isn’t enough for machines

    They argue that physics and interaction happen in 3D, so machine agents need explicit depth and geometry—especially for tasks like measuring distances and grasping objects. Humans can infer 3D from 2D video, but robots/computers need that structure represented directly.

  12. Fei-Fei’s stereo vision injury: a personal proof of 3D’s importance

    Fei-Fei recounts temporarily losing stereo vision due to a cornea injury and how it made driving feel unsafe. The experience highlights how critical depth perception is for accurate distance estimation and real-world behavior.

  13. Inside World Labs R&D: the 3D toolkit and team composition

    Fei-Fei outlines the state of the field and the building blocks World Labs is combining: NeRFs, Gaussian splats, image generation, and broader advances from academia and industry. Both she and Martin emphasize that success requires a rare blend of AI/modeling, data, and computer graphics expertise to represent and render 3D worlds effectively.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome