Nikhil Kamath

WTF is Artificial Intelligence Really? | Yann LeCun x Nikhil Kamath | People by WTF Ep #4

Nikhil Kamath and Yann LeCun on yann LeCun explains AI’s roots, limits, and next architecture frontier.

Nikhil KamathhostYann LeCunguest
Nov 27, 20241h 36m
Engineer vs scientist; building to understandAI as elephant: multiple facets of intelligenceGOFAI: logic, rules, search, planningMachine learning types: supervised, reinforcement, self-supervisedBackpropagation and multilayer neural networksConvNets vs Transformers (inductive biases/equivariance)LLMs: next-token prediction, strengths and limitationsMemory in AI: parameters vs context vs external memoryWorld models, planning, Kahneman System 1 vs System 2JEPA and learning from videoOpen-source platforms, distributed training, sovereign AIIndia-focused opportunities: data centers, inference cost, vertical apps

In this episode of Nikhil Kamath, featuring Nikhil Kamath and Yann LeCun, WTF is Artificial Intelligence Really? | Yann LeCun x Nikhil Kamath | People by WTF Ep #4 explores yann LeCun explains AI’s roots, limits, and next architecture frontier Yann LeCun frames AI as a long-running quest to understand and build intelligence, arguing the field historically split between top-down reasoning/search (GOFAI) and bottom-up learning from data (machine learning/deep learning).

At a glance

WHAT IT’S REALLY ABOUT

Yann LeCun explains AI’s roots, limits, and next architecture frontier

  1. Yann LeCun frames AI as a long-running quest to understand and build intelligence, arguing the field historically split between top-down reasoning/search (GOFAI) and bottom-up learning from data (machine learning/deep learning).
  2. He explains core modern tools—backpropagation, convolutional nets, transformers, and self-supervised learning—and why LLMs excel at language while still lacking robust reasoning, persistent memory, and physical-world understanding.
  3. LeCun argues the next leap requires systems that learn “world models” from video and support planning (System 2), not just token-by-token generation (System 1). He presents JEPA (Joint Embedding Predictive Architecture) as a path to predict in abstract representation space rather than pixel space.
  4. He closes with pragmatic advice: build on open-source foundation models (e.g., Llama), fine-tune for vertical use-cases (legal, accounting, enterprise knowledge, health, local-language assistants), invest in local compute/inference infrastructure, and expect open-source dominance within ~5 years.

IDEAS WORTH REMEMBERING

9 ideas

AI is a problem space, not a single technique.

LeCun emphasizes AI as the investigation of intelligence; different eras focused on different “parts of the elephant,” from reasoning/search to learning/perception. This framing helps avoid equating “AI” with today’s LLMs.

Two historical branches shaped AI: search/reasoning and learning/perception.

GOFAI treated intelligence as planning and rule-based inference (dominant until the 1990s), while neural-net learning pursued brain-inspired adaptation. Modern AI largely comes from the learning branch, but planning/search will return via world-models.

Deep learning’s breakthrough was multilayer networks trained by backpropagation.

Single-layer perceptrons were too limited; stacking layers with nonlinearities enabled learning complex functions (e.g., real-world vision). Backprop remains the foundation of most practical AI systems.

Architectures matter because they bake in “biases” that reduce data needs.

ConvNets exploit translation structure in images/audio (nearby pixels correlate), while transformers handle sets/sequences via permutation-equivariant blocks. Matching architecture to data structure improves sample efficiency and performance.

LLMs are powerful language manipulators but weak world-modelers.

Because autoregressive LLMs operate in discrete token spaces, they can learn linguistic/statistical regularities and retrieve knowledge, yet still make “stupid mistakes” about physics/causality. LeCun argues scaling alone won’t yield human-level intelligence.

Next-gen AI needs persistent memory and planning (System 2), not just generation (System 1).

LLMs mainly store information in parameters and short context windows; they lack a hippocampus-like episodic memory and efficient deliberative search. Human-like intelligence requires models that can simulate outcomes of actions and plan hierarchically.

JEPA aims to make video prediction tractable by predicting in representation space.

Pixel-level future prediction is intractable in high-dimensional continuous worlds; JEPA encodes video into abstract embeddings and predicts future embeddings, discarding unpredictable details. This is positioned as a path to learning physical-world structure from video.

Open-source foundation models will dominate; value shifts to fine-tuning and vertical expertise.

LeCun forecasts an ecosystem analogous to Linux: portable, flexible, cheaper, and not controlled by one entity. Startups differentiate by domain data, workflows, and specialized fine-tunes rather than generic model access via APIs.

Inference economics (not just training) will determine mass adoption in India.

He argues India needs local compute for both sovereignty and scale, and notes inference costs have dropped ~100× in two years. For widespread use, token-level inference must become extremely cheap (e.g., “a million tokens for a few rupees”).

WORDS WORTH SAVING

6 quotes

AI is more of a problem than a solution.

Yann LeCun

LLMs are not the path to human-level intelligence.

Yann LeCun

The smartest LLMs are not as smart as your house cat.

Yann LeCun

Reinforcement learning is a situation where you don't tell the system what the correct answer is, you just tell it whether the answer it produced was good or bad.

Yann LeCun

Instead of predicting pixels, we predict abstract representations of those pixels, where all the things that are basically unpredictable have been eliminated.

Yann LeCun

Five years from now, the world is going to be dominated by open source platforms.

Yann LeCun

QUESTIONS ANSWERED IN THIS EPISODE

5 questions

If AI is a “problem space,” what practical definition of intelligence is most useful for builders—skills, learning speed, or zero-shot problem solving?

Yann LeCun frames AI as a long-running quest to understand and build intelligence, arguing the field historically split between top-down reasoning/search (GOFAI) and bottom-up learning from data (machine learning/deep learning).

What are the clearest examples where GOFAI-style search/planning should be combined with deep learning today (beyond toy demos)?

He explains core modern tools—backpropagation, convolutional nets, transformers, and self-supervised learning—and why LLMs excel at language while still lacking robust reasoning, persistent memory, and physical-world understanding.

You say LLM reasoning via generating many candidates and searching is inefficient—what would an efficient System-2 architecture look like operationally (modules, memory, objectives)?

LeCun argues the next leap requires systems that learn “world models” from video and support planning (System 2), not just token-by-token generation (System 1). He presents JEPA (Joint Embedding Predictive Architecture) as a path to predict in abstract representation space rather than pixel space.

In JEPA, what exactly counts as “unpredictable details” that should be removed from representations, and how do you prevent removing important causal factors?

He closes with pragmatic advice: build on open-source foundation models (e.g., Llama), fine-tune for vertical use-cases (legal, accounting, enterprise knowledge, health, local-language assistants), invest in local compute/inference infrastructure, and expect open-source dominance within ~5 years.

What benchmarks would convincingly show a model has learned a real “world model” from video rather than shortcut correlations?

EVERY SPOKEN WORD

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome