Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning | Lex Fridman Podcast #36

Name: Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning | Lex Fridman Podcast #36
Uploaded: 2019-08-31T15:55:16Z
Duration: 1 h 15 min 58 s
Description: Yann LeCun discusses the limitations of current AI, arguing that real progress toward human-level intelligence requires self-supervised learning and rich predictive models of the world rather than just bigger supervised or reinforcement learning systems.

Lex Fridman and Yann LeCun on yann LeCun outlines path to human-level AI through self-supervision.

Lex FridmanhostYann LeCunguest

Aug 31, 20191h 15m

Value alignment, ethics, and legal systems as objective functions for AIDeep learning history, ConvNets, and why neural nets succeeded despite theoryNeural networks, reasoning, and the need for working memory and planningSymbolic AI vs. continuous, gradient-based learning and causal inferenceSelf-supervised learning as the path to world models and common senseReinforcement learning limits, model-based RL, and autonomous drivingEmbodiment vs. grounding, human intelligence’s limits, and emotions in AI

In this episode of Lex Fridman Podcast, featuring Lex Fridman and Yann LeCun, Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning | Lex Fridman Podcast #36 explores yann LeCun outlines path to human-level AI through self-supervision Yann LeCun discusses the limitations of current AI, arguing that real progress toward human-level intelligence requires self-supervised learning and rich predictive models of the world rather than just bigger supervised or reinforcement learning systems.

WHAT IT’S REALLY ABOUT

Yann LeCun outlines path to human-level AI through self-supervision

Yann LeCun discusses the limitations of current AI, arguing that real progress toward human-level intelligence requires self-supervised learning and rich predictive models of the world rather than just bigger supervised or reinforcement learning systems.
He contrasts symbolic, logic-based AI with gradient-based neural approaches, emphasizing continuous representations, working memory, and planning as keys to enabling reasoning in neural networks.
LeCun explores ethical and societal issues via HAL 9000, legal systems as "objective functions," and the non-generality of human intelligence, stressing that grounding in physical reality and common sense is essential for true language understanding.
He also reflects on deep learning’s history, why neural nets briefly fell out of favor, the role of benchmarks and open-source tools, and why emotions, causality, and model-based reinforcement learning will be central to future autonomous systems.

IDEAS WORTH REMEMBERING

7 ideas

AI safety parallels human lawmaking: objective functions are like legal codes.

LeCun frames AI alignment as an extension of what societies already do with laws—designing objective functions (rules, penalties) that shape behavior toward the common good, suggesting AI ethics will fuse computer science and jurisprudence rather than invent something entirely new.

Deep learning works by violating classical theory, and that’s informative.

Modern neural nets with huge parameter counts and non-convex objectives train successfully on relatively modest data via stochastic gradient descent, contradicting pre–deep learning textbooks; this empirical success implies our theoretical understanding of generalization and optimization was too narrow.

Reasoning in neural nets requires working memory, recurrence, and world models.

LeCun argues that human-like reasoning emerges from systems with hippocampus-like memory, recurrent access to that memory, and energy-minimization style planning (model predictive control), not from static feed-forward models alone.

Symbolic logic is brittle and hard to learn; continuous representations scale better.

He critiques logic- and graph-based expert systems for their brittleness and manual knowledge acquisition bottleneck, advocating vector-based “symbols” and continuous functions (à la Hinton and Bottou) as a way to make reasoning compatible with gradient-based learning.

Self-supervised learning is crucial for common sense and data efficiency.

LeCun sees self-supervised prediction (e.g., masked word prediction, video/frame prediction) as the primary route to learning rich world models that later make supervised and reinforcement learning vastly more sample-efficient, mirroring how babies learn physics and causality from observation.

Current RL is far from human learning; model-based approaches are needed.

He notes that deep RL systems need the equivalent of years or centuries of experience to reach human performance in games, whereas humans learn tasks like driving in tens of hours because they rely on internal predictive models of physics, not just trial-and-error reward signals.

Human intelligence is highly specialized and not truly “general.”

Using arguments about the structure of the visual system and the vast space of possible Boolean functions, LeCun contends that humans operate over a tiny subset of possible tasks and stimuli—our sense of “generality” is confined to what we can even conceptualize.

WORDS WORTH SAVING

5 quotes

Machine learning is the science of sloppiness.

— Yann LeCun

Intelligence is inseparable from learning. The idea you can create an intelligent machine by basically programming was a non-starter for me from the start.

— Yann LeCun

We’re not going to have autonomous intelligence without emotions.

— Yann LeCun

Human intelligence is nothing like general. It’s very, very specialized.

— Yann LeCun

The main problem we need to solve is: how do we learn models of the world? That’s what self-supervised learning is all about.

— Yann LeCun

QUESTIONS ANSWERED IN THIS EPISODE

5 questions

If self-supervised world modeling is so central, what specific architectures or objective functions might finally crack uncertainty-aware video and image prediction?

Yann LeCun discusses the limitations of current AI, arguing that real progress toward human-level intelligence requires self-supervised learning and rich predictive models of the world rather than just bigger supervised or reinforcement learning systems.

How can we rigorously benchmark “common sense” and grounding in AI systems beyond language tasks like the Winograd schemas?

He contrasts symbolic, logic-based AI with gradient-based neural approaches, emphasizing continuous representations, working memory, and planning as keys to enabling reasoning in neural networks.

What would a practical, legally-informed “objective function” for a powerful general-purpose AI actually look like in code or system design?

LeCun explores ethical and societal issues via HAL 9000, legal systems as "objective functions," and the non-generality of human intelligence, stressing that grounding in physical reality and common sense is essential for true language understanding.

To what extent can large language models acquire genuine causal understanding from text alone, and where will they fundamentally need non-linguistic grounding?

He also reflects on deep learning’s history, why neural nets briefly fell out of favor, the role of benchmarks and open-source tools, and why emotions, causality, and model-based reinforcement learning will be central to future autonomous systems.

How might model-based reinforcement learning and self-supervision be combined in real-world domains like autonomous driving to avoid the sample inefficiency of current RL methods?

EVERY SPOKEN WORD

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

At a glance

Yann LeCun outlines path to human-level AI through self-supervision

AI safety parallels human lawmaking: objective functions are like legal codes.

Deep learning works by violating classical theory, and that’s informative.

Reasoning in neural nets requires working memory, recurrence, and world models.

Symbolic logic is brittle and hard to learn; continuous representations scale better.

Self-supervised learning is crucial for common sense and data efficiency.

Current RL is far from human learning; model-based approaches are needed.

Human intelligence is highly specialized and not truly “general.”

If self-supervised world modeling is so central, what specific architectures or objective functions might finally crack uncertainty-aware video and image prediction?

How can we rigorously benchmark “common sense” and grounding in AI systems beyond language tasks like the Winograd schemas?

What would a practical, legally-informed “objective function” for a powerful general-purpose AI actually look like in code or system design?

To what extent can large language models acquire genuine causal understanding from text alone, and where will they fundamentally need non-linguistic grounding?

How might model-based reinforcement learning and self-supervision be combined in real-world domains like autonomous driving to avoid the sample inefficiency of current RL methods?

Get more out of YouTube videos.