Lex Fridman Podcast

Ilya Sutskever: Deep Learning | Lex Fridman Podcast #94

Ilya Sutskever is the co-founder of OpenAI, is one of the most cited computer scientist in history with over 165,000 citations, and to me, is one of the most brilliant and insightful minds ever in the field of deep learning. There are very few people in this world who I would rather talk to and brainstorm with about deep learning, intelligence, and life than Ilya, on and off the mic. Support this podcast by signing up with these sponsors: - Cash App - use code "LexPodcast" and download: - Cash App (App Store): https://apple.co/2sPrUHe - Cash App (Google Play): https://bit.ly/2MlvP5w EPISODE LINKS: Ilya's Twitter: https://twitter.com/ilyasut Ilya's Website: https://www.cs.toronto.edu/~ilya/ PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 2:23 - AlexNet paper and the ImageNet moment 8:33 - Cost functions 13:39 - Recurrent neural networks 16:19 - Key ideas that led to success of deep learning 19:57 - What's harder to solve: language or vision? 29:35 - We're massively underestimating deep learning 36:04 - Deep double descent 41:20 - Backpropagation 42:42 - Can neural networks be made to reason? 50:35 - Long-term memory 56:37 - Language models 1:00:35 - GPT-2 1:07:14 - Active learning 1:08:52 - Staged release of AI systems 1:13:41 - How to build AGI? 1:25:00 - Question to AGI 1:32:07 - Meaning of life CONNECT: - Subscribe to this YouTube channel - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/LexFridmanPage - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

Lex FridmanhostIlya Sutskeverguest

May 7, 20201h 37mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Ilya Sutskever on Deep Learning’s Power, Limits, and Future AGI

Lex Fridman and Ilya Sutskever trace the modern deep learning revolution from AlexNet and ImageNet to today’s large-scale vision, language, and reinforcement learning systems.
They discuss why neural networks work so well despite being over‑parameterized, including ideas like double descent, cost functions, and the advantages of transformers over recurrent architectures.
The conversation explores the possibility of neural networks achieving reasoning, building long‑term knowledge, transferring from simulation to reality, and eventually reaching AGI, along with the safety, control, and societal implications of such systems.
They close by reflecting on human values, meaning, happiness, and how AGI might be aligned to help humans flourish rather than dominate them.

IDEAS WORTH REMEMBERING

5 ideas

Scale in data, compute, and model size unlocked deep learning’s potential.

Most core ideas (backprop, convolution, recurrence) existed for decades; what changed around ImageNet was abundant labeled data, fast GPUs/CUDA, and a few people with the conviction to push very large end‑to‑end models.

Over‑parameterized neural networks can generalize extremely well, often better as they grow.

The double descent phenomenon shows that test error first worsens when a model just fits the data (zero training loss) and then improves again as it grows larger, suggesting that very big networks with appropriate training are less sensitive to noise than mid‑sized ones.

Architectural unification across modalities is increasing, with transformers as a key step.

Vision and language now share many optimization techniques, and NLP has largely converged on transformers; Sutskever expects further unification across vision, language, and even reinforcement learning, possibly into a single flexible architecture.

Neural networks can likely reason, but only when the task truly requires reasoning.

Examples like AlphaZero’s Go play and human intelligence itself serve as existence proofs; networks tend to learn the easiest solution to the posed task, so explicit reasoning only appears when simpler pattern‑matching fails.

Large language models learn nontrivial semantics as they scale.

Experiments such as the ‘sentiment neuron’ and GPT‑2 show that small models capture syntax and surface patterns, while larger ones begin to encode sentiment, facts, and higher‑level meaning, even without explicit linguistic theory baked in.

WORDS WORTH SAVING

5 quotes

The most beautiful thing about deep learning is that it actually works.

— Ilya Sutskever

Every year we thought, ‘Okay, this is peak deep learning.’ Nope, it goes further.

— Ilya Sutskever

I think we are still massively underestimating deep learning.

— Ilya Sutskever

I think it’s unwise to bet against deep learning.

— Ilya Sutskever

It will be possible to program an AGI in such a way that it will have a deep drive to help humans flourish.

— Ilya Sutskever

Origins and intuition behind deep learning’s success (AlexNet, ImageNet, GPUs, data, conviction)Neural networks vs. the brain: architecture, learning rules, spiking, recurrence, transformersGeneralization, over‑parameterization, and the double descent phenomenonReasoning, knowledge, long‑term memory, and program‑like behavior in neural networksLarge language models (GPT‑2), transformers, semantics, and data/active learningSafety, staged release, and the societal impact of powerful AI systemsPaths to AGI, self‑play, embodiment, alignment, control, and human meaning

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.