Ishan Misra: Self-Supervised Deep Learning in Computer Vision | Lex Fridman Podcast #206

Ishan Misra is a research scientist at FAIR working on self-supervised visual learning. Please support this podcast by checking out our sponsors: - Onnit: https://lexfridman.com/onnit to get up to 10% off - The Information: https://theinformation.com/lex to get 75% off first month - Grammarly: https://grammarly.com/lex to get 20% off premium - Athletic Greens: https://athleticgreens.com/lex and use code LEX to get 1 month of fish oil EPISODE LINKS: Ishan's twitter: https://twitter.com/imisra_ Ishan's website: https://imisra.github.io Ishan's FAIR page: https://ai.facebook.com/people/ishan-misra/ PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 2:27 - Self-supervised learning 11:02 - Self-supervised learning is the dark matter of intelligence 14:54 - Categorization 23:28 - Is computer vision still really hard? 27:12 - Understanding Language 36:51 - Harder to solve: vision or language 43:36 - Contrastive learning & energy-based models 47:37 - Data augmentation 51:57 - Fixed audio spike by lowering sound with pen tool 1:00:10 - Real data vs. augmented data 1:03:54 - Non-contrastive learning energy based self supervised learning methods 1:07:32 - Unsupervised learning (SwAV) 1:10:14 - Self-supervised Pretraining (SEER) 1:15:21 - Self-supervised learning (SSL) architectures 1:21:21 - VISSL pytorch-based SSL library 1:24:15 - Multi-modal 1:31:43 - Active learning 1:37:22 - Autonomous driving 1:48:49 - Limits of deep learning 1:52:57 - Difference between learning and reasoning 1:58:03 - Building super-human AI 2:05:51 - Most beautiful idea in self-supervised learning 2:09:40 - Simulation for training AI 2:13:04 - Video games replacing reality 2:14:18 - How to write a good research paper 2:18:45 - Best programming language for beginners 2:19:39 - PyTorch vs TensorFlow 2:23:03 - Advice for getting into machine learning 2:25:09 - Advice for young people 2:27:35 - Meaning of life SOCIAL: - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/lexfridman - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Reddit: https://reddit.com/r/lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

Lex FridmanhostIshan Misraguest

Jul 31, 20212h 30mWatch on YouTube ↗

EPISODE INFO

Released: July 31, 2021
Duration: 2h 30m
Channel: Lex Fridman Podcast
Watch on YouTube: ▶ Open ↗

EPISODE DESCRIPTION

Ishan Misra is a research scientist at FAIR working on self-supervised visual learning. Please support this podcast by checking out our sponsors:
Onnit: https://lexfridman.com/onnit to get up to 10% off
The Information: https://theinformation.com/lex to get 75% off first month
Grammarly: https://grammarly.com/lex to get 20% off premium
Athletic Greens: https://athleticgreens.com/lex and use code LEX to get 1 month of fish oil
EPISODE LINKS: Ishan's twitter: https://twitter.com/imisra_ Ishan's website: https://imisra.github.io Ishan's FAIR page: https://ai.facebook.com/people/ishan-misra/ PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 2:27 - Self-supervised learning 11:02 - Self-supervised learning is the dark matter of intelligence 14:54 - Categorization 23:28 - Is computer vision still really hard? 27:12 - Understanding Language 36:51 - Harder to solve: vision or language 43:36 - Contrastive learning & energy-based models 47:37 - Data augmentation 51:57 - Fixed audio spike by lowering sound with pen tool 1:00:10 - Real data vs. augmented data 1:03:54 - Non-contrastive learning energy based self supervised learning methods 1:07:32 - Unsupervised learning (SwAV) 1:10:14 - Self-supervised Pretraining (SEER) 1:15:21 - Self-supervised learning (SSL) architectures 1:21:21 - VISSL pytorch-based SSL library 1:24:15 - Multi-modal 1:31:43 - Active learning 1:37:22 - Autonomous driving 1:48:49 - Limits of deep learning 1:52:57 - Difference between learning and reasoning 1:58:03 - Building super-human AI 2:05:51 - Most beautiful idea in self-supervised learning 2:09:40 - Simulation for training AI 2:13:04 - Video games replacing reality 2:14:18 - How to write a good research paper 2:18:45 - Best programming language for beginners 2:19:39 - PyTorch vs TensorFlow 2:23:03 - Advice for getting into machine learning 2:25:09 - Advice for young people 2:27:35 - Meaning of life SOCIAL:
Twitter: https://twitter.com/lexfridman
LinkedIn: https://www.linkedin.com/in/lexfridman
Facebook: https://www.facebook.com/lexfridman
Instagram: https://www.instagram.com/lexfridman
Medium: https://medium.com/@lexfridman
Reddit: https://reddit.com/r/lexfridman
Support on Patreon: https://www.patreon.com/lexfridman

SPEAKERS

Lex Fridman
host
Ishan Misra
guest

EPISODE SUMMARY

In this episode of Lex Fridman Podcast, featuring Lex Fridman and Ishan Misra, Ishan Misra: Self-Supervised Deep Learning in Computer Vision | Lex Fridman Podcast #206 explores self-Supervised Vision: Teaching Machines to See Without Human Labels Lex Fridman and Ishan Misra dive into self-supervised learning, focusing on how machines can learn visual representations from raw data without human annotations. They contrast supervised, semi-supervised, and self-supervised paradigms, explain key tricks like masking, cropping, contrastive learning, and transformers, and explore why language has progressed faster than vision. The conversation covers large-scale systems like SWaV and SEER, multimodal audio-visual learning, data augmentation, and active learning as crucial ingredients for scalable intelligence. They close by zooming out to the limits of deep learning, the role of embodiment and interaction, and philosophical questions about categories, reasoning, and the nature of intelligence.

RELATED EPISODES

Garry Kasparov: Chess, Deep Blue, AI, and Putin | Lex Fridman Podcast #46

Leonard Susskind: Quantum Mechanics, String Theory and Black Holes | Lex Fridman Podcast #41

Kai-Fu Lee: AI Superpowers - China and Silicon Valley | Lex Fridman Podcast #27

David Ferrucci: IBM Watson, Jeopardy & Deep Conversations with AI | Lex Fridman Podcast #44

Bjarne Stroustrup: C++ | Lex Fridman Podcast #48

Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning | Lex Fridman Podcast #36

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Episode Details