Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416

Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416

Lex Fridman PodcastMar 7, 20242h 47m

Yann LeCun (guest), Lex Fridman (host), Lex Fridman (host), Lex Fridman (host)

Limits of current autoregressive LLMs for achieving human-level intelligenceWorld models, self-supervised learning, and Joint Embedding Predictive Architectures (JEPA/V-JEPA)Reasoning, planning, hierarchical control, and energy-based modelsReinforcement learning vs. model predictive control and curiosity-driven learningOpen-source AI, bias, censorship, and the dangers of centralized proprietary modelsSafety, AI doom narratives, and comparisons to past technological revolutionsFuture of robotics, embodied AI, and advice for researchers entering the field

In this episode of Lex Fridman Podcast, featuring Yann LeCun and Lex Fridman, Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416 explores yann LeCun Rejects LLM Hype, Champions Open-Source, Embodied World-Models Yann LeCun argues that current autoregressive large language models (LLMs) are powerful but fundamentally limited, lacking real understanding, persistent memory, reasoning, and planning, and therefore cannot on their own lead to human-level intelligence.

Yann LeCun Rejects LLM Hype, Champions Open-Source, Embodied World-Models

Yann LeCun argues that current autoregressive large language models (LLMs) are powerful but fundamentally limited, lacking real understanding, persistent memory, reasoning, and planning, and therefore cannot on their own lead to human-level intelligence.

He proposes a different path via self-supervised learning on rich sensory data using Joint Embedding Predictive Architectures (JEPAs) and world models that predict abstract representations of reality rather than pixels or tokens.

LeCun strongly advocates open‑source AI as essential to prevent dangerous centralization of informational power in a few companies, and sees diverse, locally fine‑tuned models as the only way to preserve democratic, cultural, and linguistic plurality.

He is skeptical of AGI doom scenarios, arguing that intelligence will emerge gradually, will be controllable via objective-driven architectures and guardrails, and that AI will likely make humanity collectively smarter rather than extinguish it.

Key Takeaways

Autoregressive LLMs are powerful but structurally incapable of full intelligence.

LLMs generate tokens one-by-one without an internal, language-independent plan, lack persistent memory, can’t robustly reason or plan in the physical world, and are trained on low-bandwidth text rather than rich sensory data that actually grounds human and animal intelligence.

Get the full analysis with uListen AI

World models must be learned from high-bandwidth sensory data, not text alone.

LeCun emphasizes that a child’s visual stream far exceeds text in information content; most of our commonsense and intuitive physics are learned from interacting with the world, so AI must similarly learn from images and video, not just language corpora.

Get the full analysis with uListen AI

Joint Embedding Predictive Architectures (JEPAs) offer a better route than generative pixel or token prediction.

Instead of reconstructing every missing pixel or word, JEPAs predict abstract representations of unobserved parts from observed parts, stripping away unpredictable noise and forcing the model to learn higher-level, task-relevant structure that transfers well to recognition and action.

Get the full analysis with uListen AI

Future systems should think via optimization in abstract space, then decode to language.

LeCun proposes objective‑driven, energy-based models that internally optimize over continuous latent representations (world-modelled ‘thoughts’) to minimize a compatibility or ‘energy’ score between question and answer, and only then decode into text—unlike today’s direct token sampling.

Get the full analysis with uListen AI

Reinforcement learning should be minimized and used mainly to correct models and objectives.

He sees RL as sample-inefficient; the main learning should come from self-supervised world-model building and model-predictive control, with RL reserved for updating world models and reward/objective functions when predictions fail or outcomes differ from intended goals.

Get the full analysis with uListen AI

Open-source foundation models are critical to avoid dangerous concentration of power.

Because future AI assistants will mediate nearly all digital interactions, LeCun argues it is democratically unacceptable for a few companies to control them; open-source base models like LLaMA enable governments, NGOs, companies, and communities to build culturally and politically diverse systems.

Get the full analysis with uListen AI

AGI ‘doomer’ scenarios are implausible; intelligence will emerge gradually with guardrails.

He rejects the idea of a sudden superintelligence ‘event’ or an intrinsically domination-seeking AI species; desires like dominance are hardwired, not inevitable, and objective-driven systems can be designed to obey humans with layered guardrails, refined iteratively like safety in jet engines.

Get the full analysis with uListen AI

Notable Quotes

We're fooled by their fluency. We just assume that if a system is fluent in manipulating language, then it has all the characteristics of human intelligence, but that impression is false.

Yann LeCun

If you're really interested in human-level AI, abandon the idea of generative AI.

Yann LeCun

We cannot afford those systems to come from a handful of companies on the West Coast of the US because those systems will constitute the repository of all human knowledge.

Yann LeCun

Intelligence is a collection of skills and an ability to acquire new skills efficiently.

Yann LeCun

AI is gonna make humanity smarter… It’s as if every one of us will have a staff of smart AI assistants. We shouldn’t feel threatened by this any more than we should feel threatened by being the manager of people smarter than us.

Yann LeCun

Questions Answered in This Episode

If JEPAs are so promising, what are the main technical roadblocks to scaling them to human-level world models and planning?

Yann LeCun argues that current autoregressive large language models (LLMs) are powerful but fundamentally limited, lacking real understanding, persistent memory, reasoning, and planning, and therefore cannot on their own lead to human-level intelligence.

Get the full analysis with uListen AI

How could we practically implement the kind of energy-based, objective-driven ‘thinking before speaking’ architecture LeCun describes on top of existing LLMs?

He proposes a different path via self-supervised learning on rich sensory data using Joint Embedding Predictive Architectures (JEPAs) and world models that predict abstract representations of reality rather than pixels or tokens.

Get the full analysis with uListen AI

What governance or standards would be needed to ensure that open-source AI remains safe while still preserving the diversity and decentralization LeCun advocates?

LeCun strongly advocates open‑source AI as essential to prevent dangerous centralization of informational power in a few companies, and sees diverse, locally fine‑tuned models as the only way to preserve democratic, cultural, and linguistic plurality.

Get the full analysis with uListen AI

How might hierarchical planning and representation learning actually emerge in practice, rather than being hand-designed into robot or agent architectures?

He is skeptical of AGI doom scenarios, arguing that intelligence will emerge gradually, will be controllable via objective-driven architectures and guardrails, and that AI will likely make humanity collectively smarter rather than extinguish it.

Get the full analysis with uListen AI

In what concrete ways might widespread AI assistants change education, work, and political discourse, and how can societies prepare for both the benefits and the disruptions?

Get the full analysis with uListen AI

Transcript Preview

Yann LeCun

I see the danger of this concentration of power through, uh, through proprietary AI systems as a much bigger danger than everything else. What works against this is people who think that for reasons of security, we should keep AI systems under lock and key, because it's too dangerous to put it in the hands of- of everybody. That would lead to a very bad future (laughs) in which all of our information diet is controlled by a small number of, uh, uh, companies who proprietary systems.

Lex Fridman

I believe that people are fundamentally good, and so if AI, especially open source AI, can, um, make them smarter, it just empowers the goodness in humans.

Yann LeCun

So I sh- I share that feeling, okay? I think people are fundamentally good. (laughs) Uh, and in fact a lot of doomers are doomers because they don't think that people are fundamentally good.

Lex Fridman

The following is a conversation with Yann LeCun, his third time on the spot cast. He is the chief AI scientist at Meta, professor at NYU, Turing Award winner, and one of the seminal figures in the history of artificial intelligence. He and Meta AI have been big proponents of open sourcing AI development, and have been walking the walk by open sourcing many of their biggest models, including LLaMA 2 and eventually LLaMA 3. Also, Yann has been an outspoken critic of those people in the AI community who warn about the looming danger and existential threat of AGI. He believes the AGI will be created one day, but it will be good, it will not escape human control, nor will it dominate and kill all humans. At this moment of rapid AI development, this happens to be somewhat a controversial position. And so it's been fun seeing Yann get into a lot of intense and fascinating discussions online, as we do in this very conversation. This is the Lex Fridman Podcast. To support it, please check out our sponsors in the description. And now, dear friends, here's Yann LeCun. You've had some strong statements, technical statements, about the future of artificial intelligence recently, throughout your career actually, but recently as well. Uh, you've said that autoregressive LLMs are, uh, not the way we're going to make progress towards superhuman intelligence. These are the large language models like GPT-4, like LLaMA 2 and 3 soon, and so on. How do they work and why are they not going to take us all the way?

Yann LeCun

For a number of reasons. The first is that there is a number of characteristics of intelligent behavior. For example, the capacity to understand the world, understand the physical world. The ability to remember and retrieve things. Um, persistent memory. The ability to reason and the ability to plan. Those are four essential characteristic of intelligent, um, systems or entities. Humans, animals. LLMs can do none of those, or they can only do them in a very primitive way, and, uh, they don't really understand the physical world, they don't really have persistent memory, they can't really reason, and they certainly can't plan. And so, eh, you know, if- if- if you expect a system to become intelligent just, eh, you know, without having the possibility of doing those things, uh, you're making a mistake. That is not to say that autoregressive LLMs are not useful, they're certainly useful. Um, that they are not interesting, that we can't build a whole ecosystem of, uh, applications around them. Of course we can. But as a path towards human level intelligence, they're missing essential competence. And then there is another tidbit or- or fact that I think is very interesting. Those LLMs are trained on enormous amounts of text. Basically the entirety of all publicly available texts on the internet, right? That's typically on the order of, uh, 10 to the 13th tokens. Each token is typically two bytes, so that's two 10 to the 13th bytes as training data. It would take you or me 170,000 years to just read through this at eight hours a day. (laughs) Uh, so it seems like an enormous amount of knowledge, right, that those systems can accumulate. Um, but then you realize it's really not that much data. If you- you talk to developmental psychologists and they tell you a four-year-old has been awake for 16,000 hours in his or her life, um, and the amount of information that has, uh, reached the visual cortex of that child in four years, um, is about 10 to the 15th bytes, and you can compute this by estimating that the, uh, optical nerve carry about 20 megabit- megabytes per second roughly. And so 10 to the 15th bytes for a four-year-old versus two times 10 to the 13th bytes for 170,000 years worth of reading, what that tells you is that, uh, through sensory input we see a lot more information than we- than we do through language, and that despite our intuition, most of what we learn and most of our knowledge is through our observation and interaction with the real world, not through language. Everything that we learn in the first few years of life, and, uh, certainly everything that animals learn, has nothing to do with language.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome