Yann LeCun: Dark Matter of Intelligence and Self-Supervised Learning | Lex Fridman Podcast #258

Yann LeCun is the Chief AI Scientist at Meta, professor at NYU, Turing Award winner, and one of the seminal researchers in the history of machine learning. Please support this podcast by checking out our sponsors: - Public Goods: https://publicgoods.com/lex and use code LEX to get $15 off - Indeed: https://indeed.com/lex to get $75 credit - ROKA: https://roka.com/ and use code LEX to get 20% off your first order - NetSuite: http://netsuite.com/lex to get free product tour - Magic Spoon: https://magicspoon.com/lex and use code LEX to get $5 off EPISODE LINKS: Yann's Twitter: https://twitter.com/ylecun Yann's Facebook: https://www.facebook.com/yann.lecun Yann's Website: http://yann.lecun.com/ Books and resources mentioned: Self-supervised learning (article): https://bit.ly/3Aau1DQ PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 0:36 - Self-supervised learning 10:55 - Vision vs language 16:46 - Statistics 22:33 - Three challenges of machine learning 28:22 - Chess 36:25 - Animals and intelligence 46:09 - Data augmentation 1:07:29 - Multimodal learning 1:19:18 - Consciousness 1:24:03 - Intrinsic vs learned ideas 1:28:15 - Fear of death 1:36:07 - Artificial Intelligence 1:49:56 - Facebook AI Research 2:06:34 - NeurIPS 2:22:46 - Complexity 2:31:11 - Music 2:36:06 - Advice for young people SOCIAL: - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/lexfridman - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Reddit: https://reddit.com/r/lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

Lex FridmanhostYann LeCunguest

Jan 22, 20222h 45mWatch on YouTube ↗

EVERY SPOKEN WORD

150 min read · 30,207 words

0:00 – 0:36
Introduction
1. LFLex Fridman
  The following is a conversation with Yann LeCun, his second time on the podcast. He is the chief AI scientist at Meta, formerly Facebook, professor at NYU, Turing Award winner, one of the seminal figures in the history of machine learning and artificial intelligence, and someone who is brilliant and opinionated in the best kind of way, and so is always fun to talk to. This is a Lex Fridman podcast. To support it, please check out our sponsors in the description, and now here's my conversation with Yann LeCun.
0:36 – 10:55
Self-supervised learning
1. LFLex Fridman
  You co-wrote the article "Self-Supervised Learning: The Dark Matter of Intelligence," great title by the way, with Ishaan Misra. So let me ask, what is self-supervised learning and why is it the dark matter of intelligence?
2. YLYann LeCun
  I'll start by the dark matter part.
3. LFLex Fridman
  (laughs)
4. YLYann LeCun
  Uh, there is obviously a kind of learning that humans and animals are, uh, are doing that we currently are not reproducing properly with machines or with AI, right? So the most popular approaches to machine learning today are, or paradigms I should say, are supervised learning and reinforcement learning and they're extremely inefficient. Supervised learning requires many samples for learning anything, and reinforcement learning requires a ridiculously large number of trial and errors to, for, you know, a system to learn anything. Um, and that's why we don't have self-driving cars.
5. LFLex Fridman
  (laughs) That was a big leap from one to the other. Okay. So that to solve difficult problems, you have to have a lot of, uh, human annotation for supervised learning to work, and to solve those difficult problems with reinforcement learning, you have to have some way to maybe simulate that problem such that you can do that large scale kind of learning that reinforcement learning requires.
6. YLYann LeCun
  Right. So how is it that, you know, most teenagers can learn to drive a car in about 20 hours of, uh, practice, whereas, uh, even with millions of hours of simulated practice-
7. LFLex Fridman
  Mm-hmm.
8. YLYann LeCun
  ... a self-driving car can't actually learn to drive itself properly? Um, and so obviously we're missing something, right? And, and it's quite obvious for a lot of people that, you know, the immediate response you get from many people is, "Well, you know, humans use their background knowledge to learn faster." And they're right. Now, how was that background knowledge acquired? And that's the big question. So now you have to ask, you know, how do babies in their first few months of life learn how the world works? Mostly by observation because they can hardly act in the world. Um, and they learn an enormous amount of background knowledge about the world that may be the, the basis of what we call common sense. Uh, this type of learning is not learning a task, it's not being reinforced for anything, it's just observing the world and figuring out how it, how it works. Building world models, learning world models. Um, how do we do this and how do we reproduce this in, in machines? So self-supervised learning is, you know, one instance or one attempt at trying to reproduce this kind of learning.
9. LFLex Fridman
  Okay, so you're looking at just observation, so not even the interacting part of a child, it's just sitting there watching mom and dad walk around, pick up stuff, all of that. That's the, that's what you mean by background knowledge.
10. YLYann LeCun
  Perhaps not even watching mom and dad just, you know-
11. LFLex Fridman
  Watching-
12. YLYann LeCun
  ... watching the world go by. (laughs)
13. LFLex Fridman
  Just having eyes open or having eyes closed, or the very act of opening and-
14. YLYann LeCun
  Yeah.
15. LFLex Fridman
  ... closing eyes, that the world appears and disappears, all that basic information. And you're saying in, in order to learn to drive, like the reason humans are able to learn to drive quickly, some faster than others, is because of the background knowledge. They were able to watch cars operate in the world in the many years leading up to it, the physics, the basics objects, and all that kind of stuff.
16. YLYann LeCun
  That's right. I mean, the basic physics of objects, you don't even know... You, you don't even need to know, you know, how a car works, right? Because that you can learn fairly quickly. I mean, the example I use very often is, uh, you're driving next to a cliff and you know in advance because of your, you know, understanding of intuitive physics that, uh, if you turn the wheel to the right, the car will veer to the right, will run off the cliff, fall off the cliff, and nothing good will come out of this, right? Um, but if you are a sort of, you know, tabula rasa reinforcement learning system that doesn't have a model of the world, uh, you have to repeat falling off this cliff thousands of times before you figure out it's a bad idea and then a few more thousand times before you figure out how to not do it, and then a few more million times before you figure out how to not do it in every situation you ever encounter.
17. LFLex Fridman
  So self-supervised learning still has to have some source of truth being told to it by somebody.
18. YLYann LeCun
  That's right. And it's-
19. LFLex Fridman
  So you have to figure out a way without human assistance or without significant amount of human assistance to get that truth from the world. So the mystery there is, um, how much signal is there, how much truth is there that the world gives you, whether it's the human world, like you watch YouTube or something like that, or it's the more natural world. So how much signal is there?
20. YLYann LeCun
  So here's the trick, there is way more signal in sort of a self-supervised setting than there is in either a supervised or reinforcement setting. And this is going to my, you know, analogy of the cake.
21. LFLex Fridman
  Yes.
22. YLYann LeCun
  Uh, the, you know, le cake as someone has called it, where when you try to figure out how much information you ask the machine to predict and how much feedback you give the machine at every trial, in reinforcement learning you give the machine a single scalar. You tell the machine, "You did good. You did bad." And you, and you, and you only tell this to the machine once in a while. When I say you, it could be the, the universe telling the machine, right?
23. LFLex Fridman
  Mm-hmm.
24. YLYann LeCun
  Um, but it's just one scalar. And so as a consequence there is... You, you cannot possibly learn something very complicated without many, many, many trials where you get many, many feedbacks.... uh, of this type. Supervised learning, you, you give a few bits to the machine, uh, at every, every sample. Let's say you're training image- uh, system on, you know, recognizing im- images on ImageNet, there is 1,000 categories, that's a little less than 10 bits of information per sample. But self-supervised learning, here is the setting, you... Ideally, we don't know how to do this yet, but ideally, you would show a machine a segment of a video and then stop the video and ask m- ask the machine to predict what's going to happen next. And so you let the machine predict and then you let time go by and, uh, show the machine what actually happened and hope the machine will, you know, learn to do a better job at predicting next time around. There's a huge amount of information you give the machine because it's an entire video clip, it's, uh, of, uh, you know, of the- the future after the video clip you fed it, um, in the first place.
25. LFLex Fridman
  So both for language and for vision, there's a subtle seemingly trivial construction, but maybe that's representative of what is required to create intelligence, which is fill in the gap. So-
26. YLYann LeCun
  Fill in the gaps.
27. LFLex Fridman
  It sounds dumb, but can you (laughs) is- it's- it is possible you could solve all of intelligence in this way just for both language, just give a sentence and continue it or give a sentence and there's a gap in it, uh, s- some words blanked out and you fill in what words go there. For vision, you get a sequence of images and predict what's going to happen next or you fill in what happened in between.
28. YLYann LeCun
  Something like that.
29. LFLex Fridman
  Do you think it's possible that formulation alone as a signal for self-supervised learning can solve intelligence for vision and language?
30. YLYann LeCun
  I think that's our best shot at the moment. Um, so-
10:55 – 16:46
Vision vs language
1. LFLex Fridman
  Well, let's kind of give a high-level overview. What's the difference in, uh, kind and in difficulty between vision and language? So you said people haven't been able to really kind of crack the problem of vision open in terms of self-supervised learning, but that may not be necessarily because it's fundamentally more difficult. Maybe, like, when we're talking about achieving, like, passing the Turing test in the full spirit of the Turing test in language might be harder than vision. That's- that's not obvious. So wha- in your view, wha- which is harder, or perhaps are they just the same problem? When, uh, the farther we get to solving each, the more we realize it's all the same thing, it's all the same cake.
2. YLYann LeCun
  I think what I'm looking for are methods that make- make them look essentially like the same cake, but currently, they're not. And the main issue with learning word models or learning predictive models is that, uh, the prediction is never a single, uh, thing, uh, because the world is not entirely predictable.
3. LFLex Fridman
  Yeah.
4. YLYann LeCun
  It may be deterministic or stochastic, we can get into the philosophical discussion about it, but, uh, but even if it's deterministic, it's not entirely predictable. And so-... if I play a short video clip and then I ask you to predict what's going to happen next, there is many, many plausible continuations for that, uh, video clip. And the number of continuation grows with the interval of time that you are asking the system to make a prediction, uh, for. And so one big question with self-supervised learning is how you represent this uncertainty, how you represent multiple discrete outcomes, how you represent a sort of continuum of possible outcomes, um, et cetera. And, you know, if you are a, sort of a, a classical machine learning person, you say, "Oh, you just represent the distribution," right? And that we know how to do when we're predicting words, missing words in a text, because, um, you can have a neural net give a score for every word in a dictionary. It's a big, you know, it's a big list of numbers, you know, maybe 100,000 or so, and you can turn them into a, a probability distribution that gives, that tells you when I say a sentence, you know, the, you know, "The cat is chasing the blank in the kitchen," you know, there are only a few words that make sense there, you know? It could be a mouse, or it could be a lizard spot, or, you know, something like that, right? Um, and, and if I, if I say, "The, the blank is chasing the blank in the savanna," you also have a bunch of plausible options for those two words, right?
5. LFLex Fridman
  Mm-hmm.
6. YLYann LeCun
  Um, that, that, that, because you, you have kind of a r- you know, underlying reality that you can refer to to sort of fill in those, those blanks. Um, so y- you cannot say for sure in the savanna if it's a, you know, a lion or a cheetah or whatever. You cannot know if it's a zebra or a gnu or, you know, whatever. Wildebeest, the same thing. Um, but, uh, but you can represent the uncertainty by just a, a long list of numbers. Now, if I, uh, if I do the same thing with video and I ask you to predict a, a video clip, it's not a discrete set of potential frames. You have to have some way of representing a sort of infinite number of, uh, plausible continuations of multiple frames in a, you know, high-dimensional continuous space. And we just have no idea how to do this properly.
7. LFLex Fridman
  A finite high-dimensional. So like you, 'cause-
8. YLYann LeCun
  It's finite high-dimensional, yes.
9. LFLex Fridman
  Just like the words, they try to get it to, uh, down to a s- a small finite set of, like, under a million, something like that.
10. YLYann LeCun
  Something like that.
11. LFLex Fridman
  I mean, it's kinda ridiculous that we're doing a distribution over every single possible word for language, and it works. It feels like that's a really dumb way to do it. Um, like, there seems to, there, there, it seems to be like there should be some more compressed representation of the distribution of the words.
12. YLYann LeCun
  You're right about that.
13. LFLex Fridman
  And so-
14. YLYann LeCun
  I agree.
15. LFLex Fridman
  Do you have any interesting ideas about how to represent all of reality in a compressed way, such that you can form a distribution over it?
16. YLYann LeCun
  That's one of the big questions, you know? How do, how do you do that, right?
17. LFLex Fridman
  (laughs)
18. YLYann LeCun
  I mean, what's, what's kind of, you know, another thing that, that really is stupid about, um, I, I shouldn't say stupid, but, like, simplistic about-
19. LFLex Fridman
  Okay.
20. YLYann LeCun
  ... current approaches to self-supervised learning in an, in, uh, NLP, in text, is that not only do you represent a giant distribution over words, but for multiple words that are missing, those distributions are essentially independent of each other.
21. LFLex Fridman
  Hm.
22. YLYann LeCun
  Um, and, you know, you don't pay too much of a price for this, so you, you, so you can't... so, you know, the system, you know, in the, the, the sentence that I gave earlier, uh, if it gives a, a certain probability for a lion and, and a cheetah-
23. LFLex Fridman
  Mm-hmm.
24. YLYann LeCun
  ... and then a certain probability for, uh, you know, a gazelle, uh, a wildebeest, and, and, and zebra, uh, those two probabilities would, are independent-
25. LFLex Fridman
  Mm-hmm.
26. YLYann LeCun
  ... of each other. Uh, and it's not the case that those things are independent. Lions actually attack, like, bigger animals than, than cheetahs. So, you know, there is a huge independence hypothesis in, in this, uh, process, which is not actually true. The reason for this is that we don't know how to represent, uh, properly, distributions over combinatorial, uh, sequences of symbols, essentially, when they're... because the number grows exponentially with the length of the, of the symbols. And so we have to use tricks for this, but, um, those techniques can, you know, get around, like, don't even deal with it. So, so the big question is, like, would there be some sort of abstract latent representation of text that would say that, you know, when I, when I switch lion for gazelle, uh, lion for cheetah, I also have to switch zebra for gazelle?
27. LFLex Fridman
  (laughs) Yeah, so
16:46 – 22:33
Statistics
1. LFLex Fridman
  this independence assumption, let me throw some criticism at you that I often hear and see how you respond. So this kind of filling in the blanks is just statistics. You're not learning anything, like, the deep underlying concepts. You're just mimicking stuff from the past. You're not learning anything new such that you can use it to generalize about the world. Um, or, okay, let me just say the crude version, which is just statistics.
2. YLYann LeCun
  (laughs)
3. LFLex Fridman
  It's not intelligence. Uh, what do you have to say to that? What do you usually say to that if you kinda hear this kinda thing?
4. YLYann LeCun
  I don't get into those discussions, because they are, they're kind of pointless. Um, so first of all, it's quite possible that intelligence is just statistics, is just statistics of a particular kind-
5. LFLex Fridman
  Yes.
6. YLYann LeCun
  ... uh, where-
7. LFLex Fridman
  But this is the philosophical question is, is kinda is, is intelli- is it possible that intelligence is just statistics?
8. YLYann LeCun
  Yeah. But what kind of statistics? So, uh, if you are asking the question, are the model of the world, the models of the world that we learn, um, do they have some notion of causality? Yes.
9. LFLex Fridman
  Hm. Hmm.
10. YLYann LeCun
  So if the criticism comes from people who say, you know, uh, current machine learning systems don't care about causality, which by the way is wrong, uh, you know, I agree with them. Yeah, you should, you know, your model of the world should have your actions as one of your, of your, of, of the inputs, and that will drive you to learn causal models of the world where you know what...... you know, what, uh, intervention in the world will cause what result, or you can do this by observation of other agents, uh, acting in the world and, and observing the effect. Uh, other humans, for example. So I think, you know, at some level of description, uh, intelligence is just statistics. Uh, but that doesn't mean you don't- you don't, you know, you won't have models that have, you know, deep mechanistic explanation for what goes on. Uh, the question is, how do you learn them? That's- that's the question I'm interested in. Um, because, you know, a lot of people who actually voice their criticism say that those mechanistic model has to- have to come from someplace else. They have to come from human designers, they have to come from... I don't know what. And obviously we learn them, uh, or if we don't learn them as an individual, nature learn them for us using evolution. So regardless of what you think, those processes have been learned somehow.
11. LFLex Fridman
  So if you look at the- the human brain, just like when we humans introspect about how the brain works, it seems like when we think about what is intelligence, we think about the high level stuff, like the models we've constructed, concepts like cognitive science, like concepts of memory and reasoning module, almost like these high level modules. Is there- does this serve as a good analogy? Like, are we ignoring the, uh, (laughs) the dark matter, the- the basic low level mechanisms just like we ignore the way the operating system works by just using the, uh, the- the high level software? We're ignoring that at the low level, the neural network might be doing something like statistics. (laughs) Like, meaning... Uh, sorry to use this word probably incorrectly and crudely, but doing this kind of fill in the gap kind of learning and just kind of updating the model constantly-
12. YLYann LeCun
  Mm-hmm.
13. LFLex Fridman
  ... in order to be able to support the raw sensory informa- information to predict it and then adjust to the prediction when it's wrong. But, like, high lev- when we look at our brain at the high level, it feels like we're doing, like, we're playing chess. Like, we're- we're, like, playing with high level concepts and we're stitching them together and we're putting them into long-term memory, but really what's going underneath is something we're not able to introspect, which is this kind of, uh, s- simple, large neural network that's just filling in the gaps.
14. YLYann LeCun
  Right. Well, o- okay. So there's a lot of questions-
15. LFLex Fridman
  Sure.
16. YLYann LeCun
  ... and not a lot of answers there.
17. LFLex Fridman
  (laughs)
18. YLYann LeCun
  Okay, so first of all, there's a whole school of thought in neuroscience, computational neuroscience in particular, um, that likes the idea of predictive coding, which is really related to the idea I was talking about in self-supervised learning.
19. LFLex Fridman
  Mm-hmm.
20. YLYann LeCun
  So everything is about prediction. The essence of intelligence is the ability to predict.
21. LFLex Fridman
  Yeah.
22. YLYann LeCun
  And everything the brain does is trying to predict, uh, predict everything from everything else, okay? And tha- that's really sort of the underlying, uh, principle, if you want, that, uh, self-supervised learning is trying to kind of reproduce this idea of prediction as kind of an essential mechanism of, uh, task-independent learning, if you want.
23. LFLex Fridman
  Mm-hmm.
24. YLYann LeCun
  The next step is what kind of intelligence are you interested in reproducing? And of course, you know, we all think about, you know, trying to reproduce sort of, you know, high level cognitive processes in humans, but, like, with machines, we're not even at the level of even reproducing the learning processes in a- in a cat brain. Um, you know, the most intelligent of our intelligent systems don't- don't have as much common sense as- as a house cat. So, um, how is it that cats learn? And, you know, cats don't do a whole lot of, uh, reasoning. They certainly have causal models. They certainly have, uh... Because, you know, many cats can figure out, like, how they can act on the world to get what they want. Um, they certainly have, uh, a fantastic model of intuitive physics, uh, certainly of their- the- the dynamics of their own bodies, but- but also of preys and things like that, right? So, um, they- they're- they're pretty smart. They- they only do this with about 800 million neurons. Uh, we are not anywhere close to reproducing this kind of, uh, thing. So to some extent, I- I could- I could say let's not even worry about, like, the high level cognition, uh, and kind of, you know, long-term planning and reasoning that humans can do until we figure out, like, you know, can we even reproduce what cats are doing? Now, that said,
22:33 – 28:22
Three challenges of machine learning
1. YLYann LeCun
  this ability to learn world models I think is the key to the possibility of learning machines that can also reason. So whenever I give a talk, I say there are- there are three challenges in the... three main challenges in machine learning. The first one is, uh, you know, getting machines to learn to represent the world, um, and I'm proposing self-supervised learning.
2. LFLex Fridman
  Mm-hmm.
3. YLYann LeCun
  Uh, the second is, uh, getting machines to reason in ways that are compatible with essentially gradient-based learning, because this is what deep learning is all about really.
4. LFLex Fridman
  Mm-hmm.
5. YLYann LeCun
  Um, and the third one is something we have no idea how to solve, at least I have no idea how to solve, is, uh, can we get machines to learn hierarchical representations of action plans, you know, like... You know, we know how to train them to learn hierarchical representations of perception, you know, with convolutional nets and things like that, and transformers, but what about action plans? Can we, uh, get them to spontaneously learn good hierarchical representations of actions?
6. LFLex Fridman
  Also gradient based.
7. YLYann LeCun
  Yeah. Uh, all of that ha- you know, needs to be somewhat differentiable so that you can apply sort of gradient-based learning, uh, which is really what deep learning is about.
8. LFLex Fridman
  (inhales deeply) So it's background knowledge, ability to reason in a way, uh, that's differentiable, that is somehow connected, deeply integrated with that background knowledge or builds on top of that background knowledge, And then given that background knowledge, be able to make hierarchical plans-
9. YLYann LeCun
  Right.
10. LFLex Fridman
  ... in the world.
11. YLYann LeCun
  So if- if you take classical optimal control, there's something in classical optimal control called, uh, model predictive control. And it's... You know, it's been around since the 19- the early '60s.... NASA uses that to compute trajectories of rockets. And the basic idea is that you have a pre- a predictive model of the rocket, let's say, or whatever system you are- you intend to control, which given the state of the system at time T and given an action that you are taking on the system, so for a rocket it would be thrust and, you know, all the controls you can have, uh, it gives you the state of the system at time T plus delta T, right? So basically a differential equation, something like that. Um, and if you have this model and you have this model in the form of some sort of neural net or some sort of, uh, set of formula that you can back propagate gradient through, you can do what's called model predictive control or gradient-based m- gr- uh, model predictive control. So you have, uh, you can unroll that, uh, s- that model in time. You- you- you- you feed it a hypothesized sequence of actions, and then you have some objective function that measures how well at the end of the trajectory the system has- has succeeded or matched what you want it to do. Um, you know, is it a robot arm? Has- have you grasped the object you wanna grasp? Uh, if it's a rocket, you know, are you, uh, at the right place near the space station? Things like that. Um, and by back propagation through time, and again, this was invented in the 1960s (laughs) by optimal control theorists, uh, you can figure out, uh, what is the optimal sequence of actions that will, uh, you know, get my system to the- the best final state? So that's a form of reasoning. It's basically planning. And a lot of, uh, planning, uh, systems in robotics are actually based on this. Uh, and, uh, uh, and you can think of this as a form of reasoning. So, you know, to take the example of the teenager driving a car again, you have a pretty good dynamical model of the car, it doesn't need to be very accurate, but you know again that if you turn the wheel to the right and there is a cliff, you're gonna run off the cliff, right? You don't need to have a very accurate model to predict that. And you can run this in your mind and decide not to do it for that reason, because you can predict in advance that the result is gonna be bad. So you can sort of imagine different scenarios and, uh, and then, you know, employ, uh, or take the first step in the scenario that is most favorable and then repeat the process of planning, that's called receding horizon model predictive control.
12. LFLex Fridman
  Mm-hmm.
13. YLYann LeCun
  So even, you know, all those things have names, you know, (laughs) going back, you know, decades. Um, and so if you are not- not, uh, you know, in classical optimal control, the model of the world is not generally learned. Uh, there's, you know, sometimes a few parameters you have to identify, that's called systems identification. But, um, but generally the model is mostly deterministic and mostly built by hand. So the big question of AI, uh, I think the big challenge of AI for the next decade is how do we get machines to run predictive models of the world that deal with uncertainty and deal with the real world in all this complexity? So it's not just the tr- trajectory of a rocket, which you can reduce to first principles. It's not e- it's not even just the trajectory of a robot arm, which again, you can model by, you know, careful mathematics, but it's everything else, everything we observe in the world, you know, people behavior, um, you know, physical systems that involve collective phenomena like water or- or, you know, trees and, you know, branches in a- in a tree or something or- or, uh, like complex things that, you know, humans have no trouble developing abstract representations and predictive model for, but we still don't know how to do with machines.
14. LFLex Fridman
  Where do you put in- in these three maybe in the- in the planning stages the- the game theoretic nature of this world where your actions not only respond to the dynamic nature of the world, the environment, but also affect it? So if there's other humans involved, is this- is this point number four or is it somehow integrated into the hierarchical representation of action in your view?
15. YLYann LeCun
  I think it's integrated, it's just, um, it's just that now your model of the world has to deal with, you know, it just makes it more complicated, right? The fact...
16. LFLex Fridman
  Yeah.
17. YLYann LeCun
  ... that, uh, humans are complicated and not easily predictable, that makes your model of the world much more complicated, that much more complicated.
18. LFLex Fridman
  Well, there's a chess...
28:22 – 36:25
Chess
1. LFLex Fridman
  I mean, I- I suppose chess is an analogy to Mo- uh, Monte Carlo tree search. I mean...
2. YLYann LeCun
  Right.
3. LFLex Fridman
  It... there is a, "I go, you go, I go, you go," like, um, Andrej Karpathy recently gave a talk at MIT about car doors. (laughs) I think there's some machine learning too, but mostly car doors. And there's a dynamic nature to the car, like the person opening the door checking... I mean, he wasn't talking about that. He was talking about the perception problem of what def- the ontology of what defines a car door, this big philosophical question. But to me it was interesting 'cause like it's obvious that the person opening the car doors, they're trying to get out, like here in New York, trying to get out of the car, you slowing down is going to signal something, you speeding up is going to signal something, and that's a dance. It's a asynchronous chess game, I don't know. (laughs) So I- it feels like, um, it's not just... I mean, I guess you can integrate all of them into one giant model, like the entirety of the- the g- th- these little interactions 'cause it's not as complicated as chess. It's just like a little dance.
4. YLYann LeCun
  Yeah.
5. LFLex Fridman
  We do like a little dance together and then we figure it out.
6. YLYann LeCun
  Well, in some ways it's way more complicated than chess-
7. LFLex Fridman
  Right.
8. YLYann LeCun
  ... because, uh, because it's continuous, it's uncertain in a continuous manner, uh, you know?
9. LFLex Fridman
  It doesn't feel more complicated. It feels simple.
10. YLYann LeCun
  But it doesn't feel more complicated because that's what we are... we've evolved to solve. This is the kind of...
11. LFLex Fridman
  Yeah.
12. YLYann LeCun
  ... problem we've evolved to solve. And so we're good at it because, you know, nature has made us good at it. Nature has not made us good at chess. We completely suck at chess.
13. LFLex Fridman
  Yeah.
14. YLYann LeCun
  Um, in fact that's why we designed it as a- as a game, is to be challenging.
15. LFLex Fridman
  Mm-hmm.
16. YLYann LeCun
  Uh, and if there is something that, you know, recent progress in, uh, chess and Go...
17. LFLex Fridman
  ... has made us realize, is that humans are really terrible at those things. (laughs)
18. YLYann LeCun
  Like, really bad, you know?
19. LFLex Fridman
  Yeah. (laughs) Uh, there was a story, right, before AlphaGo that, uh, uh, you know, the best Go player thought they were maybe two or three stones behind, you know, an ideal player that he would call God. Mm-hmm.
20. YLYann LeCun
  Uh, in fact, no, they were, like, nine or ten (laughs) stones behind. I mean, we're just bad.
21. LFLex Fridman
  Yeah.
22. YLYann LeCun
  So we- we're not good at... Uh, and it's because we have limited, uh, working memory. We, we, you know, we're not very good at, uh, like, doing this, uh, tree exploration that, you know, computers are much better, um, at doing than we are. But we are much better at learning differentiable models of the world. I mean, I, I say "differentiable" in a kind of... You know, I should say, not differentiable in the sense that, you know, we run back prop through it.
23. LFLex Fridman
  Mm-hmm.
24. YLYann LeCun
  But in the sense that, uh, our brain has some mechanism for estimating gradients, uh, of s- of some kind.
25. LFLex Fridman
  Yeah.
26. YLYann LeCun
  And that's what, you know, makes us, uh, efficient. So if you have an agent that consists of a, a model of the world, which, you know, in the human brain is basically the entire front half of your brain-
27. LFLex Fridman
  Hm.
28. YLYann LeCun
  Um, an objective function, which, uh, in human, in, in humans is a combination of two things. There is your sort of intrinsic motivation module, which is in the basal ganglia, you know, at the base of your brain. That's the thing that measures pain and hunger and things like that, like immediate, um, uh, f- feelings and emotions. Um, and then there is, you know, the equivalent of what people in refrence machine learning call a critic, which is a sort of module that predicts ahead what the outcome of a, uh, of a situation will be. And so it's, it's not a cost function, but it's sort of... It's not an objective function, but it's sort of a, you know, trained predictor of the ultimate objective function, and that also is differentiable. And so if all of this is differentiable, your cost function, your, your critic, your, uh, you know, your, your world model, then you can use gradient-based type methods to do planning, to do reasoning, to do learning, uh, to, you know, to do all the things that we'd like an intelligent, uh, agent, uh, uh, to do.
29. LFLex Fridman
  And, uh, gradient-based learning, like, what's your intuition? That's probably at the core of what can solve intelligence? So you don't need, like, um, logic-based reasoning, uh, in your view?
30. YLYann LeCun
  I don't know how to make logic-based reasoning compatible with, uh-
36:25 – 46:09
Animals and intelligence
1. YLYann LeCun
2. LFLex Fridman
  So I'm gonna ask you a series of impossible questions, as we keep asking, as I have been doing. So- so if that's the fundamental sort of dark matter of intelligence, this ability to form a background model, what's your intuition about how much knowledge is required? You- you know, I think dark matter, you- you could put a percentage o- on it of (laughs) of, uh, the composition of the universe and how much of it is dark matter, how much of it is dark energy. How much, uh, information do you think is required, uh, to- to be a house cat? So you have to be able to, uh, when you see a box, go in it. When you see a human, compute the most evil action. If there's a thing that's near an edge, you knock it off. All of that, plus the extra stuff you mentioned, which is a- a great self-awareness of the physics of your- of your own body and the- and the world. How much knowledge is required, do you think, to solve it?
3. YLYann LeCun
  Um...
4. LFLex Fridman
  I don't even know how to measure an answer to that question.
5. YLYann LeCun
  I'm not sure how to measure it, but whatever it is, it fits in about- about 800,000 neurons, uh, 800 million neurons. Sorry.
6. LFLex Fridman
  The representation does?
7. YLYann LeCun
  E- everything. All knowledge. Everything, right? Um, you know, it's less than a billion. A dog is two billion, but a cat is less than one billion.
8. LFLex Fridman
  All right.
9. YLYann LeCun
  And, uh, so multiply that by 1,000 and you get the number of synapses. Uh, and I think almost all of it is- is learned through this, you know, a sort of self-supervised learning. Although, you know, I- I think a- a tiny sliver is learned through reinforcement learning, and certainly very little through, you know, classical supervised learning, although it's not even clear how supervised learning actually works in, uh- in the biological world. Um, so I think almost all of it is, um- is, uh, self-supervised learning. But it's driven by, uh, the- the sort of ingrained objective functions that a cat or a human have at the base of their brain which kind of drives their, um- their behavior. So, you know, nature tells us, uh, "You're hungry." It doesn't tell us how to feed- feed ourselves. That's- that's something that the rest of our brain has to figure out, right?
10. LFLex Fridman
  Well, it's interesting because there might be more like deeper objective functions that are underlying the whole thing. So hung- hunger may be some kind of... y- now you go to like neurobiology, it might be just the brain, um, trying to maintain homeostasis. So hunger is just one of the human perceivable symptoms of the brain being unhappy with the way things are currently.
11. YLYann LeCun
  Right.
12. LFLex Fridman
  So it could be just like one really dumb objective function at the core.
13. YLYann LeCun
  But that's how, that's how behavior is- is- is driven. Uh, the- the fact that, you know, the, our basal ganglia, uh, drive us to do things that are- that are different from, say, an orangutan or certainly a cat, uh, is what makes, you know, human nature versus orangutan nature versus cat nature. Um, so for example, uh, you know, our basal ganglia drives us to seek the company of, uh, other humans. And that's because nature has figured out that we need to be social animals for our species to survive, and it's true of many, uh, uh, primates. It's not true of orangutans. Orangutans are solitary animals. Um, they don't seek the company of others. In fact, they avoid them. (laughs) Uh, in fact, they scream at them when they come too close because they're territorial.
14. LFLex Fridman
  Mm-hmm.
15. YLYann LeCun
  Uh, because for- for their s- survival, you know, uh, evolution has figured out that's the best thing. I mean, they are occasionally social, of course, for, you know, um...
16. LFLex Fridman
  (laughs)
17. YLYann LeCun
  ... reproduction and stuff like that.
18. LFLex Fridman
  Yeah.
19. YLYann LeCun
  But, um, but- but they're mostly solitary. So, uh, so all of those behaviors are not part of intelligence. You know, people say, "Oh, you're never gonna have intelligent machines because, you know, human intelligence is social." But then you look at orangutans, you look at octopus. Octopus never know their parents.
20. LFLex Fridman
  Mm-hmm.
21. YLYann LeCun
  They barely interact with any other... and they- and they get to be really smart in less than an- less than a year, in like half a year. Um, you know, in a year, they're adults. In two years, they're dead. So, um, there are things that we think, as humans, are intimately linked with intelligence, like social interaction, like language. We think... I think we give way too much importance to language as a substrate of intelligence as humans, because we think our reasoning is so linked with language.
22. LFLex Fridman
  So for- for... to solve the house cat intelligence problem, you think you could do it on a desert island? You could have...
23. YLYann LeCun
  Pretty much.
24. LFLex Fridman
  ... you could just have a cat sitting there, um, looking at the waves, at the ocean waves, and figure a lot of it out?
25. YLYann LeCun
  It needs to have sort of, you know, the right set of drives, uh, to kind of, you know, get it to do the thing and learn the appropriate things, right? But, uh, um, like for example, you know, uh, baby humans are- are driven to learn to stand up and walk. Okay, you know, it's not... that's kind of... th- this desire is hardwired. How to do it precisely is not. That's learned.
26. LFLex Fridman
  Mm-hmm.
27. YLYann LeCun
  But the desire to-
28. LFLex Fridman
  To walk?
29. YLYann LeCun
  ... move around and stand up.
30. LFLex Fridman
  Hmm.
46:09 – 1:07:29
Data augmentation
1. LFLex Fridman
  Can you, uh, tell me about data augmentation? What the heck is data augmentation and how is it used, maybe contrastive learning, for, uh, for video? What are some cool ideas here?
2. YLYann LeCun
  Right. So data augmentation, I mean, first, data augmentation, you know, is the idea of artificially increasing the size of your training set by distorting the, uh, images that you have in ways that don't change the nature of the image, right? So you take... You, you do, you do MNIST, you can do data augmentation on MNIST. And people have done this since the 1990s, right? You take a MNIST digit and you shift it a little bit, or you change the size or rotate it, skew it, you know, et cetera, right?
3. LFLex Fridman
  Add noise.
4. YLYann LeCun
  Add noise, et cetera. And it, it works better. If you train a supervised classifier with augmented data, you're going to get better results. Now, it's become really interesting over the last, uh, couple years, because, uh, a lot of self-supervised learning techniques to pre-train vision systems are based on data augmentation. And the, the basic techniques is originally inspired by, um, techniques that I worked on in the early '90s and Geoff Hinton worked on also in the early '90s. They were sort of parallel work. I used to call this Siamese network. So basically, you take, uh, two identical copies of the same network, they share the same weights, and you show two, uh, different views of the same object. Either those two different views may have been obtained by data augmentation, or maybe it's two different views of the same scene from a camera that you moved or at different times or something like that, right? Or two pictures of the same person, things like that. And then you train this neural net, those two identical copies of this neural net, to produce an output representation, a vector, uh, in such a way that the representation for those two images are as close to each other as possible, as identical to each other as possible, right?
5. LFLex Fridman
  Mm-hmm.
6. YLYann LeCun
  Because you want the system to basi- basically learn a function that will, that will be invariant, that will not change, whose output will not change when you transform those inputs, uh, in, in those, in those particular ways, right? Um, so that's easy to do.... was complicated is how do you make sure that when you show two images that are different, the system will produce different things.
7. LFLex Fridman
  Mm-hmm.
8. YLYann LeCun
  Because if you don't have a specific provision for this, uh, the system will just ignore the input, uh, w- when you train it, it will end up ignoring the input and just produce a constant vector that is the same for every input, right?
9. LFLex Fridman
  Yes.
10. YLYann LeCun
  That's called a collapse.
11. LFLex Fridman
  Uh-huh.
12. YLYann LeCun
  Now, how do you avoid collapse? So there's two ideas. Uh, one idea that I proposed in the early '90s, uh, with my colleagues at Bell Labs, uh, Jim Bromley and a couple of other people, uh, which we now call contrastive learning, which is to have negative examples, right? So you have pairs of images that you know are different, and you show them to the network, and, uh, those two copies, and then you, you push the two output vectors away from each other.
13. LFLex Fridman
  Mm-hmm.
14. YLYann LeCun
  And it will eventually guarantee that things that are semantically similar produce similar representations, and things that are different produce different representations.
15. LFLex Fridman
  Mm-hmm.
16. YLYann LeCun
  Uh, we actually came up with this idea for a project of doing, uh, signature verification. So we would collect signature, signatures from, like, multiple signatures on the same person, and then train a neural net to produce the same representation. And then, uh, you know, force the system to produce different representation for different signatures. Um, this was actually the, the problem was proposed by people from a, what was a s- subsidiary of AT&T at the time called NCR, and they were interested in storing, uh, representation of the signature on the 80 bytes of the magnetic strip of a credit card. So we came up with this idea of having a neural net with 80 outputs, you know, that we, we quantize them bytes (laughs) so, so that we could encode the, the s-
17. LFLex Fridman
  And that encoding was then used to compare whether the signature matches or not.
18. YLYann LeCun
  That's right. So then you would, you know-
19. LFLex Fridman
  Interesting.
20. YLYann LeCun
  ... sign, it would run through a neural net, and then you would compare the output vector to whatever is stored on your card.
21. LFLex Fridman
  Did it actually work?
22. YLYann LeCun
  It worked, but they ended up not using it (laughs) . Um, because nobody cares, actually. I mean, the American f- you know, financial payment system is incredibly lax in that respect compared to Europe, for example.
23. LFLex Fridman
  Oh, with the signatures? What's the purpose of the signatures anyway? This is very confusing.
24. YLYann LeCun
  Nobody looks at them. Nobody cares, you know?
25. LFLex Fridman
  And it's, uh, yeah.
26. YLYann LeCun
  Yeah. No, so, uh, so that, that's contrastive learning, right? So you need positive and negative pairs. And the problem with that is that, you know, even though I had the original paper on this, I'm actually not very positive about it because it doesn't work in high dimension. If your representation is high dimensional, there's just too many ways for two things to be different.
27. LFLex Fridman
  Mm-hmm.
28. YLYann LeCun
  And, and so you would need lots and lots and lots of negative pairs. Uh, so there is a, a particular implementation of this which is relatively recent from, actually, the Google Toronto group, uh, where, you know, Geoff Hinton is the, the senior member there, and it's called SimCLR, S-I-M-C-L-R.
29. LFLex Fridman
  Mm-hmm.
30. YLYann LeCun
  Uh, and it, you know, basically, a particular way of, uh, implementing this idea of, uh, contrastive learning with a particular objective function. Now, what I'm much more enthusiastic about these days is non-contrastive methods, so other ways to guarantee that, uh, uh, the representations would be different for different, um, uh, different inputs. And it's actually based on an idea that, um, Geoff Hinton proposed in the early '90s with his student at the time, Sue Becker, and it's based on the idea of maximizing the mutual information between the outputs of the two systems. You only show positive pairs, you only show pairs of images that you know are f- somewhat similar, and you train the two networks to be informative, but also to be as informative, uh, of each other as possible, so basically, one representation has to be predictable from the other, essentially. Uh, and, you know, he proposed that idea, had, you know, a couple papers in the early '90s, and then nothing was done about it for decades. And I kind of revived this idea together with my post-docs at, uh, at FAIR, um, particularly a post-doc who was Stephen Denis, who is now a junior professor in, in Finland at University of Aalto. Um, we came up with, uh, something called, uh, that we call Barlow Twins, and it's a, a particular way of maximizing the information content of, uh, of a vector, you know, using some, uh, hypotheses. And we have, uh, kind of a, another version of it that's more recent now called VicReg, V-I-C-R-E-G, that, that means variance, invariance, covariance, regularization. And, um, it's the thing I'm the most excited about in, In Machine Learning in the Last 15 Years. I mean, I'm, I'm not, I'm really, really excited about this.
1:07:29 – 1:19:18
Multimodal learning
1. YLYann LeCun
2. LFLex Fridman
  (laughs) Uh, what, what's your, um, do you find multimodal learning interesting? We've been talking about visual language, like, combining those together, maybe audio, all those kinds of things.
3. YLYann LeCun
  There's a lot of things that I find interesting in the short term, but are not addressing the important problem that I think are really kinda the big challenges. So I think, you know, things like multitask learning, continual learning, uh, you know, adversarial issues, I mean, those have, you know, great practical interests in the relatively short term, uh, possibly, but I don't think they're fundamental, you know, active learning, even to some extent, reinforcement learning. I think those things will become either obsolete, or, or useless, or easy-
4. LFLex Fridman
  Mm-hmm.
5. YLYann LeCun
  ... once we figure out how to do self-supervised representation learning or, or running predictive word models. And so I think that's what, you know, the entire community should be focusing on. Uh, at least people who are interested in sort of fundamental questions or, you know, really kind of pushing the envelope of, uh, of AI towards the next, the next stage. But of course, there's like a huge amount of, you know, very interesting work to do in sort of practical questions that, that have, you know, short-term impact.
6. LFLex Fridman
  Well, you know, it's, it's difficult to talk about the, the temporal scale, because all of human civilization will eventually be destroyed because the-
7. YLYann LeCun
  (laughs)
8. LFLex Fridman
  ... the, the sun will die out, and even if Elon Musk is successful, multi-planetary colonization across the galaxy, uh, eventually the entirety of it will just become, uh, giant black holes. And, um, that's gonna keep It's gonna take a while, though. But what I'm saying is then that logic can be used to say it's all meaningless. Uh, I'm saying all that to say that multitask learning might be y- you're solv- you're calling it practical or pragmatic or whatever, that might be the thing that achieves something very akin to intelligence f- while we're trying to solve the more, uh, general problem of self-supervised learning and background knowledge. So the reason I bring that up, maybe one way to ask that question, I've been very impressed by what, uh, Tesla autopilot team is doing. I don't know if you've gotten a chance to glance at this particular one example of multitask learning where they're literally taking the problem, like, I don't know, Charles Darwin start studying animals. They're studying th- the problem of driving and asking, "Okay, what are all the things you have to perceive?" And the way they're solving it is, one, there's an ontology where you're bringing that to the table. So you're formulating a bunch of different tasks, it's like over 100 tasks or something like that, that, they're involved in driving, and then they're deploying it and then getting data back from people that run into trouble. And they're trying to figure out, do we add task? Do we, like, we focus on each individual task separately.
9. YLYann LeCun
  Sure.
10. LFLex Fridman
  In fact, have, so the, I would say, I would classify Andrej Karpathy's talk in two ways. So one was about doors and the other one about how much ImageNet sucks. He w- he k- (laughs) he kept going b- uh, back and forth on those two topics, which ImageNet sucks meaning you can't just use a single benchmark. There's so, like, you, you have to have, like, a, a giant suite of benchmarks to understand how well your, your system actually works.
11. YLYann LeCun
  Oh, I agree with him. I mean, he's, uh, he's, he's a very sensible guy. Um, now, okay, it's, it's very clear that if you're faced with a, an engineering problem that you need to solve in a relatively short time, particularly if you have Elon Musk breathing down your neck, you're gonna have to take shortcuts, right? You, you, you might think about the, the fact that the, the, the right thing to do and the long-term solution involves, you know, some fancy self-supervised learning, but you have, you know, Elon Musk breathing down your neck, uh, and, you know, this involves, uh, you know, human lives, and so you, you have to basically just do the systematic, uh, engineering and, you know, uh, fine-tuning and refinements and trial and error and, and, and all that stuff. Um, there's nothing wrong with that. That's, that's called engineering. That's called, you know, uh, putting technology out, uh, in the, uh, in the world. Um, and, and you have to kind of iron climb it before, before you do this, you know? Um, uh, so much for, you know, grand, grand ideas and principles. Um, but, you know, I'm placing myself sort of, you know, some, you know, upstream of this, you know, quite a bit upstream of this.
12. LFLex Fridman
  Your Plato, think about platonic forms, your, your s-
13. YLYann LeCun
  Not platonic because eventually, uh, I want that stuff to get used-
14. LFLex Fridman
  Yeah.
15. YLYann LeCun
  ... but, uh, it's okay if it takes five or 10 years for the community to realize this is the right thing to do. I've g- I've done this before. It's been the case before that, you know, I've made that, that case. I mean, if you look back in the mid-2000, for example, and you ask yourself the question, "Okay, I want to recognize cars, or faces, or whatever," um, you know, I can use convolutional net, so I can use a more conventional, uh, kind of computer vision techniques, you know, using, uh, interest point detectors or SIFT, dense SIFT features and, you know, sticking an SVM on top. At that time, the data sets were so small that-... those methods that use more hand engineering work better than conv nets. There was just not enough data for conv nets. And conv nets were, were a little- a little slow with the kind of hardware that was available at the time. And there was a sea change when, uh, basically, when, you know, data sets became bigger and- and GPUs became available. That- that's what, you know, the two of the main factors that basically made people change their- change their mind. Um, and you can- you can look at the history of, like, all sub-branches of AI or pattern recognition, and there is a similar trajectory followed by techniques where people start by, you know, engineering the hell out of it, um, you know, be it optical character recognition, speech recognition, computer vision, like, image recognition in general, uh, natural language understanding, like, you know, translation, things like that, right? You start to engineer the hell out of it. Um, you start to acquire all the knowledge, the prior knowledge you know about image formation, about, you know, the shape of characters, about, you know, morphological operations, about, like, feature extraction, Fourier transforms, v- you know, Wernicke moments, you know, whatever, right? People have come up with thousands of ways of representing images so that y- they could be easily, uh, uh, classified, uh, afterwards. Same for speech recognition, right? There is, you know, two decades for people to figure out a good front end, uh, to pre-process, uh, uh, speech signals so that, you know, the information about what is being said is preserved, but most of the information about the identity of the speaker is gone. Um, you know, Kestrel coefficients or whatever, right? Um, and same for- for text, right? Uh, you do named entity recognition and you parse and you- you- you do, uh, tagging of- of- of the parts of speech, and, you know, you do this sort of tree representation of, uh, clauses and all that stuff, right, before you can do anything. Um, so that's where it starts, right? Just engineer the hell out of it. And then, you start having data, and maybe you have more powerful computers, maybe you know something about statistical learning, so you start using machine learning, and it's usually a small sliver on top of your kind of handcrafted system where, you know, you extract features by hand. Okay. And now, you know, nowadays, the standard way of doing this is that you train the entire thing end to end with a deep learning system and it learns its own features, and- and, you know, speech recognition systems nowadays, uh, OCR systems are completely end to end. It's, uh, you know, it's some giant neural net that takes raw wave forms and produces a sequence of characters coming out.
16. LFLex Fridman
  Mm-hmm.
17. YLYann LeCun
  And it's just a huge neural net, right? There's no, you know, Markov model, there's no language model that is explicit other than, you know, something that's ingrained in the- in the sort of neural language model, if you want. Same for translation, same for all kinds of stuff. So you see this continuous evolution from, you know, less and less handcrafting and more and more learning, um, and, uh, I- I think, uh, uh, it's true in biology as well.
18. LFLex Fridman
  (laughs) So, I mean, w- we might disagree about this, maybe not, uh, in this one little piece at the end, you mentioned active learning. It feels like active learning, which is the selection of data and also the interactivity, needs to be part of this giant neural network. You cannot just be an observer to do self-supervised learning. You have to... Well, I don't... Self-supervised learning is just a word, but I would... Whatever this giant stack of a neural network that's automatically learning, it feels... My intuition is that you have to have a system, whether it's a physical robot or a digital robot that's interacting with the world and doing so in a flawed way and improving over time in order to- to form the self-supervised learning well. You can't just give it a giant sea of data.
19. YLYann LeCun
  Okay. I agree and I disagree.
20. LFLex Fridman
  Okay.
21. YLYann LeCun
  I agree in the sense that I think, uh... I agree- I agree in two ways. The first- the first way I agree is that if you want, uh, and you certainly need a causal model of the world that allows you to predict the consequences of your actions, to train that model, you need to take actions, right? You need to be able to act in the world and see the effect for you to be... to learn causal models of the world, right?
22. LFLex Fridman
  Well, so that's not- that's not obvious, because you can observe others-
23. YLYann LeCun
  You can observe others.
24. LFLex Fridman
  ... and you can infer that they're similar to you, and then you can learn from that.
25. YLYann LeCun
  Yeah, but then you have to kind of hardwire that part, right?
26. LFLex Fridman
  Yeah.
27. YLYann LeCun
  You know, mirror neurons and all that stuff, right? So, um, and it's not clear to me how you would do this in a machine. So, um, um, so I think the- the action part would be necessary for having causal models of, uh, of the world. Uh, the second reason it may be necessary, or at least more efficient, is that, uh, active learning basically, you know, goes for the jugular of what you're- what you don't know, right?
28. LFLex Fridman
  Mm-hmm.
29. YLYann LeCun
  Is- is, you know, obvious areas of uncertainty, uh, about your- your world and about the- how the world behaves, and you can resolve this uncertainty by systematic exploration of that part that you don't- you don't know. And if you know that you don't know, then, you know, it makes you curious. You kind of look into situations that... And, uh, you know, across the animal, uh, world, different species have different levels of curiosity.
30. LFLex Fridman
  Curiosity (laughs) .
1:19:18 – 1:24:03
Consciousness
1. LFLex Fridman
  What do you think about Joscha Benjos, uh, talking about consciousness and all of these kinds of concepts?
2. YLYann LeCun
  Okay. Um, e- I don't know what consciousness is. But, uh...
3. LFLex Fridman
  (laughs) That's a good opener. (laughs) Yeah.
4. YLYann LeCun
  And to some extent, a lot of the things that are said about consciousness remind me of the questions people were asking themselves in the 18th century, or 17th century when they discovered that, um, you know, how the eye works, and the fact that the image at the back of the eye was upside down-
5. LFLex Fridman
  Mm-hmm.
6. YLYann LeCun
  ... right? Because you have a lens and, and so on your retina, the image that forms is an image of the world, but it's upside down. How is it that you see right side up?
7. LFLex Fridman
  Hm.
8. YLYann LeCun
  And, you know, with what we know today in science, you know, we realize this question doesn't make any sense, (laughs) or, or is kinda ridiculous in some way, right? So, I think a lot of what is said about consciousness is of that nature. Now that said, there is a lot of really smart people that, uh, for, for whom I have a lot of respect who are talking about this topic, people like, uh, David Chalmers, who is a colleague-
9. LFLex Fridman
  Mm-hmm.
10. YLYann LeCun
  ... of mine at NYU. Um, I have kind of a unorthodox folk, uh, speculative hypothesis about consciousness. So, we were talking about this idea of world model.
11. LFLex Fridman
  Mm-hmm.
12. YLYann LeCun
  And, uh, I think, you know, our entire prefrontal cortex basically is, uh, the engine for our world model. Uh, but when we are attending at a particular situation, we're focused on that situation, we basically cannot attend to anything else. And that seems to suggest that we basically have only one world model engine in our prefrontal cortex. Uh, that engine is configurable to the situation at hand, so we are building a box out of wood or we are, you know, driving, uh, uh, down the highway playing chess. We, we, we basically have, uh, a single model of the world that we're configuring to the situation at hand-
13. LFLex Fridman
  Mm-hmm.
14. YLYann LeCun
  ... which is why we can only attend to one task at a time. Now, if there is a task that we do repeatedly, um, it, it goes from the sort of deliberate reasoning using model of the world and prediction and perhaps something like model predictive control, which I was talking about earlier, to something that is more subconscious that becomes automatic. So, I don't know if you've ever played against a, a chess grandmaster. Uh, you know, I get wiped out in, you know, 10, 10 plies, right? Um, and, you know, I have to think about my move for, you know, like, 15 minutes, uh, and the person in front of me, the grandmaster, you know, would just, like, react within seconds, right?

Episode duration: 2:45:10

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode SGzMElJ11Cc

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome