Skip to content
Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning | Lex Fridman Podcast #36
This video isn’t embeddableWatch on YouTube →
Lex Fridman PodcastLex Fridman Podcast

Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning | Lex Fridman Podcast #36

Lex Fridman and Yann LeCun on yann LeCun outlines path to human-level AI through self-supervision.

Lex FridmanhostYann LeCunguest
Aug 31, 20191h 15mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:004:45

    HAL 9000, value misalignment, and laws as “objective functions”

    1. LF

      The following is a conversation with Yann LeCun. He's considered to be one of the fathers of deep learning, which, if you've been hiding under a rock, is the recent revolution in AI that's captivated the world with the possibility of what machines can learn from data. He's a professor at New York University, a vice president and chief AI scientist at Facebook, and co-recipient of the Turing Award for his work on deep learning. He's probably best known as the founding father of convolutional neural networks, in particular, their application to optical character recognition and the famed MNIST dataset. He is also an outspoken personality, unafraid to speak his mind in a distinctive French accent and explore provocative ideas, both in the rigorous medium of academic research and the somewhat less rigorous medium of Twitter and Facebook. This is the Artificial Intelligence podcast. If you enjoy it, subscribe on YouTube, give it five stars on iTunes, support it on Patreon, or simply connect with me on Twitter, @lexfridman, spelled F-R-I-D-M-A-N. And now, here's my conversation with Yann LeCun. You said that 2001: Space Odyssey is one of your favorite movies. HAL 9000 decides to get rid of the astronauts, for people who haven't seen the movie, spoiler alert, because he, it, she believes that the astronauts, they will interfere with the mission. Do you see HAL as flawed in some fundamental way, or even evil, or did he do the right thing?

    2. YL

      Neither. There's no notion of evil in that- in that context, other than the fact that people die. But it was an example of what people call, uh, value misalignment, right? You give an objective to a machine and the machine str- strives to achieve this objective, and if you don't put any constraints on this objective, like, don't kill people and don't do things like this, the machine, given the power, will do stupid things just to achieve this- this objective, or damaging things to achieve this objective. It's a little bit like... I mean, we are used to this in the context of human society. We- we put in place laws to prevent people from doing bad things because

    3. LF

      (laughs)

    4. YL

      ... spontaneously they would do those bad things, right? So, we have to shape their cost function, their objective function, if you want, through laws to kind of correct, and education, obviously, to sort of correct for- for those.

    5. LF

      So, maybe just pushing a little further on- on that point, HAL, you know, there's a mission. There's a- there's fuzziness around the- the ambiguity around what the actual mission is, but, you know, d- do you think that there will be a time, from a utilitarian perspective, where an AI system, where it is not misalignment, where it is alignment for the greater good of society, that an AI system will make decisions that are difficult?

    6. YL

      Well, that's the trick. I mean, uh, eventually, we'll have to figure out how to do this. And again, th- we're not starting from scratch because we've been doing this with humans for- for millennia.

    7. LF

      Oh, yeah, yeah.

    8. YL

      So, designing objective functions for people is something that we know how to do, and we don't do it by, um, you know, programming things, although th- the legal code is called code, um, so (laughs) that tells you something, and it's actually the design of an objective function. That's really what legal code is, right? It tells you, "Here is what you can do. Here is what you can't do. If you do it, you pay that much." That's- that's an objective function. So, there is this idea somehow that it's a new thing for people to try to design objective functions that are aligned with the common good, but no. We've been writing laws for millennia, and that's exactly what it is. Um, so this- that's where, you know, uh, the science of, uh, lawmaking and, uh- and computer science will-

    9. LF

      Come together.

    10. YL

      ... will come together.

    11. LF

      So, it's nothing- there's nothing special about HAL or AI systems. It's just a continuation of tools used to make some of these difficult ethical judgments that laws make, essentially.

    12. YL

      Yeah, and we- and we have systems like this already that, you know, uh, make many decisions for- for ourselves in society that, you know, need to be designed in a way that they, like, you know, rules about things that someti- sometimes have bad side effects, and we have to be flexible enough about those rules so that they can be broken when it's obvious that they shouldn't be applied. So, you don't see this on the camera here, but all the decoration in this room is all pictures from 2001: A Space Odyssey. (laughs)

    13. LF

      Wow.

    14. YL

      Um...

    15. LF

      Is that by accident, or is there a lot-

    16. YL

      It's not by accident. It's by design.

  2. 4:457:43

    Designing a better HAL: secrecy, lying, and hard limits on autonomy

    1. LF

      (laughs) Oh, wow. So, if you were, um... If you were to build HAL 10,000, so an improvement of HAL 9000, what would you improve?

    2. YL

      Well, first of all, I wouldn't ask it to hold secrets and tell lies, because that's really what breaks it in the end. That's the- the fact that it's asking itself questions about the purpose of the mission, and it, you know, pieces things together, that it's heard, you know, all the secrecy of the preparation of the mission and the fact that there was a discovery on- on the lunar surface that really was kept secret. And- and one part of HAL's memory knows this, and the other part is- does not know it and is supposed to not tell anyone, and that creates a internal conflict.

    3. LF

      So, you think there never should be a set of things that an AI system should not be allowed, like a- a set of facts that should not be shared with the human operators?

    4. YL

      Well, I think no. I think the- I think it should be a bit like, in the design of, um, autonomous AI systems, um, there should be the equivalent of, you know, the- the- the oath that, uh, Hippocratic Oath calls for-

    5. LF

      Hippocratic Oath, yeah.

    6. YL

      ... that, uh, doctors, uh, sign up to, right? So, there's certain things, certain rules that- that- that you have to abide by. And we can sort of hardwire this into- into our- into our machines to kind of make sure they don't go... So, I'm not, you know, an advocate of the- the three do- three laws of robotics, you know, the Asimov kind of thing, because I don't think it's practical, but...... but, you know, some, some level of, uh, of limits. But, but to be clear, uh, this is not, these are not questions that are kind of really worth asking today because we just don't have the technology to do this. We don't, we don't have autonomous intelligent machines. We have intelligent machines, semi-intelligent machines that are very specialized, but they don't, they don't really sort of satisfy an objective. They're just, you know, kind of trained to do one thing.

    7. LF

      Yeah.

    8. YL

      So, until we have some idea for a design of a full-fledged autonomous intelligent system, asking the question of how, how we design this objective I think is a little, a little too abstract.

    9. LF

      It's a little too abstract. There's useful elements to it, in that it helps us understand our own ethical codes as humans. So, even just as a thought experiment, if you imagine that an AGI system is here today, how, how would we program it is a kind of nice thought experiment of constructing how should we have a law, have a system of laws for us humans. It's, it's just a nice practical tool. And I think there's echoes of that idea, too, in the A- AI systems we have today that don't have to be that intelligent.

    10. YL

      Yeah.

    11. LF

      Uh, like autonomous vehicles.

    12. YL

      Right.

    13. LF

      Is these things start creeping in that, uh, we're thinking about, but certainly they shouldn't be framed as, as HAL.

    14. YL

      Yep.

  3. 7:439:10

    The surprising empirical fact of deep learning: huge nets + SGD actually work

    1. LF

      Looking back, what is the most... I'm sorry if it's a silly question, but what is the most beautiful or surprising idea in deep learning or AI in general that you have ever come across? Sort of personally where you sat back and, and just had this kind of, "Oh, that's pretty cool" moment, "That's nice-"

    2. YL

      Well-

    3. LF

      ... and surprising."

    4. YL

      ... I don't know if it's an idea rather than a sort of empirical, uh, fact. The fact that you can build gigantic neural nets, train th- train them on, you know, relatively small amounts of data, relatively, with stochastic gradient descent, and that it actually works, uh, breaks everything you read in every textbook, right? Every pre-deep learning textbook that told you you need to have fewer parameters than you have data samples, um, you know, if you have a non-convex objective function, you have no guarantee of convergence. You know, all those things that you read in textbook, and they tell you, "Stay away from this," and they're all wrong. And, uh...

    5. LF

      A huge number of parameters, non-convex, and somehow with just very relative to the number of, uh, parameters data, it's able to learn anything.

    6. YL

      Right.

    7. LF

      Does that still surprise you today?

    8. YL

      Well, it, it was kind of obvious to me before I knew anything that th- that's, that this is a good idea, and then it became surprising that it worked because I started reading those textbooks.

    9. LF

      (laughs)

    10. YL

      Okay? So, like, you know-

  4. 9:1012:32

    Learning as the core of intelligence; reasoning must fit gradient-based learning

    1. LF

      Wait, so, okay, so can you talk through the intuition of why it was obvious to you, if you remember?

    2. YL

      Well, okay, so the intuition was it's- it's sort of like, you know, those people in the late 19th century who proved that heavier than, than air f- uh, flight was impossible, right? And of course you have birds, right? They do fly. And so on the face of it, it- it's obviously wrong as an empirical question, right? And so we have the same kind of thing that, you know, the, we know that the brain works. We don't know how, but we know it works, and we know it's a large network of neurons and interaction, and that learning takes place by changing the connections. So, kind of getting this level of inspiration without copying the details, but sort of trying to derive basic principles, you know, that kind of gives you a, a clue as to which direction to go. There's also the idea somehow that I've, I've been convinced of since I was an undergrad, that... even before, that intelligence is inseparable from learning. So you... the idea somehow that you can create an intelligent machine by basically programming, for me was a non-starter for, you know, from the start. Ev- every intelligent entity that we know about arrives at this intelligence through learning. Um, so learning, you know, machine learning was, uh, a completely obvious path. Uh, also because I'm lazy, so, you know, kind of... (laughs)

    3. LF

      (laughs) You just automate basically everything in learning is the automation of-

    4. YL

      Exactly.

    5. LF

      ... intelligence.

    6. YL

      Right.

    7. LF

      S- so do you think... So, what is learning then? Wha- what falls under learning? Because do you think of reasoning as learning?

    8. YL

      Well, reasoning is certainly a, a, a consequence of, of learning as well, just like other functions of, uh, of the brain. The big question about reasoning is how do you make reasoning compatible with gradient-based learning?

    9. LF

      Do you think neural networks can be made to reason?

    10. YL

      Yes, that there is no question about that.

    11. LF

      (laughs)

    12. YL

      Again, we have a good example, right? The question is, is how. So the question is, how much prior structure do you have to put into the neural net so that something like human reasoning will emerge from it, you know, from learning? Uh, another question is all of our kind of model of what reasoning is that are based on logic are discrete and, and, and are therefore incompatible with gradient-based learning.

    13. LF

      Mm-hmm.

    14. YL

      And I'm a s- very strong believer in this idea of gradient-based learning. I don't believe that, uh, other types of learning that don't use kind of gradient information, if you want.

    15. LF

      So you don't like discrete mathematics! You don't like anything discrete?

    16. YL

      Well, that's... it's not that I don't like it, it's just that it's, it's incompatible with learning and I'm, I'm a big fan of learning, right? So in fact that's perhaps one reason why, uh, deep learning has been kind of looked at with suspicion by a lot of computer scientists, because the math is very different. The math that, uh, you use for deep learning i- you know, kind of has more to do with, you know, cybernetics, uh, the kind of st- math you do in electrical engineering than the, the kind of math you do in computer science. And, and you know, nothing in, in machine learning is exact, right? Computer science is all about sort of, you know, obsessive-compulsive attention to details of like, you know, every index has to be right and-

    17. LF

      Right.

    18. YL

      ... you can prove that an algorithm is correct, right? Uh, machine learning is th- the science of sloppiness, really. (laughs)

  5. 12:3216:25

    What a reasoning system needs: working memory, recurrence, and memory access

    1. LF

      (laughs) That's beautiful. So-Okay. Maybe let's feel around in the dark of what is a neural network that reasons, or a system that is, works with continuous functions, that's able to do, uh, build knowledge however we think about reasoning. Build on previous knowledge, build on extra knowledge, create new knowledge, generalize outside of any training set ever built? What does that look like if, uh... Yeah, ma- maybe do you have inklings of thoughts of what that might look like?

    2. YL

      Uh, yeah. I mean, yes and no. If I had precise ideas about this, I think, you know, (laughs) we'd be building it right now. But... And there are people working on this, or whose main research interest is actually exactly that, right? So, what you need to have is a working memory. So, you need to have some device, if you want, uh, some subsystem that can store, uh, a relatively large number of factual episodic information for, you know, a reasonable amount of, amount of time. So, y- you, you know, in the, in the brain, for example, there are kind of three main types of memory. One is the, the sort of memory of the, the, the state of your cortex and that sort of disappears within 20 seconds. You can't remember things for more than about 20 seconds or a minute if, if, if you don't have any other form of memory. Uh, the second type of memory, which is longer term but still short term, is the hippocampus. So you can... You know, you came into this building, you remember where the, where the, the exit is, where the, uh, elevators are. Um, you have some map of that building, that's stored in your hippocampus. You might remember something about what I said, you know, a few minutes ago-

    3. LF

      I forgot it all already, but part-

    4. YL

      ... of course, it's been erased, but-

    5. LF

      Yeah.

    6. YL

      ... you know, but that, that would be in your-

    7. LF

      Some parts, yeah.

    8. YL

      ... in your hippocampus. Uh, and then the, the longer-term memory is in the synapse, the synapses, right? Um, so what you need if you want a system that's capable of reasoning is that you want a hippocampus-like thing, right? And that's what people have tried to do with memory networks and, you know, neural training machines and stuff like that, right? A- and now with transformers, which have sort of a memory in their kind of self-attention system. You can, you can think of it this way. So, so that's one element you need. Another thing you need is some sort of network that can access this memory, get an information back, and then kinda crunch on it, and then do this iteratively multiple times. Because a chain of reasoning is, is a process by which you, you, you can, uh, you update your knowledge about the state of the world, about, you know, what's gonna happen, et cetera. And that, that has to be this sort of recurrent operation basically.

    9. LF

      And you think that kind of... I- if we think about a transformer, so that seems to be too small to contain the knowledge that's, that's, uh... To represent the knowledge that's contained in Wikipedia, for example.

    10. YL

      Well, a transformer doesn't have this idea of, uh, recurrence. It's got a fixed number of layers and that's the number of steps that, you know, limits basically its representation.

    11. LF

      So, but recurrence would build on the knowledge somehow.

    12. YL

      Yeah.

    13. LF

      I mean, it would, it would evolve the knowledge and ex- expand the amount of information perhaps, or useful information within that knowledge.

    14. YL

      Yeah.

    15. LF

      But is, is this something that just can emerge with size? Because it seems like everything we have now is too small.

    16. YL

      Not just. No, it's not cle- it's not, it's not clear. I mean, how, how you access and write into an associative memory in an efficient way... I mean, sort of the original memory network maybe had something like the right architecture. But, uh, if you try to scale up a memory network so that the memory contains all of Wikipedia, I mean, it doesn't quite work.

    17. LF

      Right.

  6. 16:2518:03

    Reasoning via planning: energy minimization and model predictive control

    1. YL

      So, so there, there's, there's a need for new ideas there. Okay. But it's not the only form of reasoning. So, there's another form of reasoning which is through... Which is very classical also in s- some types of AI, uh, and it's based on, uh... Let's call it energy minimization. Okay. So, you have, uh, some sort of objective, s- some energy function that represents the, the, the, um, the quality or the negative quality. Okay. Energy goes up when things get bad and they get low when things g- g- get good. So, let's say you, you want to figure out, you know, what gestures do I need to, to do to grab an object or walk out the door. If you have a good model of, uh, your own body, a good model of the environment, using this kind of energy minimization, you can make a, you can make. You can do planning. And it's, uh, a cl- in optimal control, it's called, it's called mode- model predictive control. You have a model of what's gonna happen in the world as a consequence of your actions, and that allows you to, by energy minimization, figure out the sequence of action that optimizes a particular objective function which measures, you know, minimizes the number of times you're gonna hit something and the energy you're gonna spend doing the gesture and et cetera. So, so that's one form of reasoning. Planning is a form of reasoning. And perhaps what led to the ability of humans to reason is the fact that... Or, you know, species, you know, that appeared before us had to do some sort of planning to be able to hunt and survive, and survive the winter in particular. And so, you know, it's the same capacity that, that you need to have.

  7. 18:0320:51

    Limits of symbolic graphs and logic; vectors and continuous “machine reasoning”

    1. LF

      So, in your intuition, is, um... If we look at expert systems and, and encoding knowledge as logic systems, and as graphs, in this kind of way, is not a useful way to think about knowledge?

    2. YL

      Graphs are a little brittle. Or, or logic, uh, representation, so basically, you know, variables that, that have values and then constrained between them that are represented by rules, uh, is a little too rigid and too brittle, right?

    3. LF

      Mm-hmm. Yes.

    4. YL

      So, one of the... You know, some of the early efforts in that respect, um...... were, were to put probabilities on them. So a rule, you know, you know, if you have this and that symptom, you know, you have this, uh, disease with that probability, and you should prescribe that antibiotic with that probability, right? That's the, the Mycin system from the, from the '70s. Um, and that, that's what that branch of AI led to, uh, you know, based on networks and graphical models and causal inference and variational pro- you know-

    5. LF

      Yep.

    6. YL

      ... method. So, so there, there is, I mean, certainly, uh, uh, a lot of interesting work going on in this area. The main issue with this is, is knowledge acquisition. How do you, uh, reduce a bunch of data to a graph of this type?

    7. LF

      Yeah, it relies on the expert to ... on the human being to encode that or to add knowledge.

    8. YL

      And that's essentially impractical. (laughs)

    9. LF

      Yeah. So it's not scalable, right?

    10. YL

      So that, that's, uh, that's, that's a big question. The second question is, do you want to represent knowledge as symbols, and do you want to manipulate them with logic? And again, that's incompatible with learning. So, uh, one suggestion which, you know, Geoff Hinton has been advocating for many decades is replace symbols by, uh, vectors. Think of it as pattern of activities in a bunch of neurons or units or whatever you want to call them, and replace logic by continuous functions.

    11. LF

      Mm-hmm.

    12. YL

      Okay? And that becomes now compatible. There's a very good set of ideas by, uh, written in a, a paper about 10 years ago by, uh, Leon Bottou, on, uh, who is here at, at Facebook. Um, um, the title of the paper is From Machine Learning to Machine Reasoning, and his idea is that, uh, a learning s- a learning system should be able to manipulate objects that are in the same sp- in a space and then put the result back in the same space. So, it's this idea of working memory, basically, and it's, uh, it's very enlightening.

    13. LF

      And in a sense, that might learn something like the simple expert systems. I mean, it's w- you can learn basic logic operations there.

    14. YL

      Yeah, quite possibly.

    15. LF

      Yeah.

    16. YL

      Yeah. There's a big debate on sort of how much prior structure you have to put in for this kind of stuff to emerge. That's the debate I have with Gary Marcus and people like that.

  8. 20:5124:43

    Causality and humans’ weak intuitions: Pearl, physics, and Papert’s wind example

    1. LF

      Yeah. Yeah, so ... And the other person, so I, I just talked to Judea Pearl-

    2. YL

      Mm-hmm.

    3. LF

      ... from you mentioned causal inference world. So, his worry is that the current neural networks are not able to learn what causes what, causal inference between things.

    4. YL

      So, I think, I think he's right and wrong about this. Uh, if he's talking about the sort of classic, uh, type of neural nets, people sort of didn't worry too much about this. But there's a lot of people now working on causal inference, and there's a paper that just came out last week by Leon Bottou, among others, David Lopez-Paz and a bunch of other people, exactly on that problem of, um, how do you kind of, you know, get a neural net to sort of pay attention to real causal relationships, which may also solve issues of bias in data and things like this, so ...

    5. LF

      I'd, I'd like to read that paper, because, uh, that ultimately the challenge there is also seems to fall back on the human expert to ultimately decide causality between things.

    6. YL

      Uh, people are not very good at establishing causality, first of all. So, first of all, you talk to physicists, and physicists actually don't believe in causality-

    7. LF

      Oh, I see.

    8. YL

      ... because look at the, all the basic laws of microphysics are time reversible.

    9. LF

      Yeah, there's no-

    10. YL

      So, there is no causality.

    11. LF

      The arrow of time is not real.

    12. YL

      Right.

    13. LF

      Yeah.

    14. YL

      It's, it's as soon as you start looking at macroscopic systems where there is unpredictable randomness, where there is clearly an arrow of time, but it's a big mystery in physics actually, uh, how that emerges. Um, so-

    15. LF

      Is it emergent, or is it, uh, part of the fundamental fabric of reality, yeah, of nature?

    16. YL

      Or is it a bias of intelligent systems that-

    17. LF

      Yeah.

    18. YL

      ... you know, because of the second law of thermodynamics, we perceive a particular arrow of time, but in fact, um, it's kind of arbitrary, right?

    19. LF

      Right. So yeah, physicists, mathematicians, they don't care about ... I mean, the math doesn't care about the flow of time.

    20. YL

      Well, certainly, certainly microphysics doesn't. People themselves are not very good at establishing causal, um, causal relationships. If you ask this, um, I think it was in one of, uh, Seymour Papert's book on, on, um, like children learning, you know, he studied with Jean Piaget, you know. He's the guy who co-authored the book Perceptron with Marvin Minsky that kind of killed the first wave of neural nets.

    21. LF

      (laughs)

    22. YL

      But, but he was actually a, a, a learning person. He, uh, in the sense of studying learning in humansand, and machines. That's why he got interested in perceptron. And, uh, and he, he, he, he wrote that if you ask a little kid, uh, about what is the cause of, of the wind, a lot of kids will say ... they will think for a while and they'll say, "Oh, it's the, the branches in the trees. They move and that creates wind," right?

    23. LF

      Mm-hmm.

    24. YL

      So, they get the causal relationship backwards, and it's because their understanding of the world and intuitive physics is not that great, right? I mean, these are like, you know, four or five-year-old kids. You know, it gets better, and then you understand that this ... it can't be, right?

    25. LF

      But there are many things which we can ... because of our common sense understanding of things, what people call common sense-

    26. YL

      Yeah.

    27. LF

      ... and with our understanding of physics, we can ... There's a lot of stuff that we can figure out causality. Even with diseases, we can figure out what's not causing what, uh, often.

    28. YL

      Yeah.

    29. LF

      There's a lot of mystery, of course, but the idea is that you should be able to encode that into systems, 'cause it seems unlikely they'd be able to figure that out themselves.

    30. YL

      Well, whenever we can do intervention, but, you know, all of humanity has been completely deluded for millennia, probably since exi- existence, about a very, very wrong causal relationship where whatever you can explain, you attribute it to, you know, some deity, some divinity, right?

  9. 24:4327:14

    Why neural nets fell out of favor in the 1990s: tooling, datasets, and “bag of tricks”

    1. LF

      Yeah.So, you mentioned Marvin Minsky and, uh, the irony of, um, uh, you know, maybe causing the first AI winter. You were there in the '90s, you were there in the '80s, of course. Uh, in the '90s, why do you think people lost faith in deep learning-

    2. YL

      (laughs)

    3. LF

      ... in the '90s and found it again a decade later, over a decade later?

    4. YL

      Yeah, it wasn't called deep learning yet. It was just called neural nets.

    5. LF

      Neural networks.

    6. YL

      Um, yeah, they lost interest... I mean, I think I would put that around 1995, at least the machine learning community. There was always a neural net community, but it, but it became kinda disconnected from sort of mainstream machine learning, if you want. There were... it was basically electrical engineering that ke- kept at it.

    7. LF

      Right.

    8. YL

      Uh, and computer science.

    9. LF

      Just gave up.

    10. YL

      Gave up on, uh, on neural nets. I don't, I don't know. You know, I was too close to it to really sort of analyze it with sort of a, a unbiased eye, if you want. But I, I would, I would, I would, I would make a few guesses. So, the first one is, at the time, neural nets were v- it was very hard to make them work in the sense that you would, you know, implement back prop in your favorite language, um, and that favorite language was not Python, it was not MATLAB, it was not any of those things 'cause they didn't exist, right? You had to write it in Fortran, or C, or something like this, right? Um, so you would, uh, experiment with it. You would probably make some very basic mistakes, like, you know, barely initialize your weights, make the network too small because you read in a textbook, you know, you don't want too many parameters, right? And of course, you know, and you would train on XOR because you didn't have any other dataset to train on. And of course, you know, it works half the time, so you'd say, "Hm, I give up." Also, you were training with batch gradient, which, you know, isn't really sufficient. So, there was a lot of... there was a bag of tricks that you had to know to make those things work or you, you had to reinvent, and a lot of people just didn't, and they just couldn't make it work. Um, so that's one thing. The investment in software platform to be able to kind of, you know, display things, figure out why things don't work, kind of get a good intuition for how to get them to work, have enough flexibility so you can create, you know, uh, network architectures like convolutional nets and stuff like that. It was hard. I mean, you had to write everything from scratch. And again, you didn't have any Python or MATLAB or anything, right? So what-

  10. 27:1433:03

    LeNet’s Lisp stack and early autodiff graphs; open-source constraints and patents

    1. LF

      I, I read that... Sorry to interrupt, but I read that you wrote in, in Lisp the, your first, uh, versions of, uh, LeNet with the convolutional networks, which-

    2. YL

      Definitely.

    3. LF

      ... by the way, one of my favorite languages.

    4. YL

      It's... yeah.

    5. LF

      Uh, that's how I knew you were legit.

    6. YL

      (laughs)

    7. LF

      Ah, Turing Award whatever. This, what, uh-

    8. YL

      (laughs)

    9. LF

      ... you programmed in Lisp. That's-

    10. YL

      It's still my favorite language. But it's not that we programmed in Lisp, it's that we had to write our Lisp interpreter, okay?

    11. LF

      Yeah.

    12. YL

      'Cause it's not like-

    13. LF

      That's right. (laughs)

    14. YL

      ... we could use one that existed.

    15. LF

      Yeah.

    16. YL

      So, we wrote our Lisp interpreter that we hooked up to, uh, you know, a backend library that we wrote also for sort of neural net computation. And then, uh, after a few years, around 1991, we invented this idea of, uh, basically having modules that know how to forward propagate and back propagate gradients and then interconnecting those modules in a graph. 01:02 had made proposals on this, about this in the late '80s, and we were able to implement this using our Lisp system. Eventually, we wanted to use that system to make... build production code for character recognition at Bell Labs, so we actually wrote a compiler for that Lisp interpreter, so that... Patricia Simard, who is now at Microsoft, kinda did, did the bulk of it with Leon and, and me. And, uh, and so we could write our system in Lisp and then compile to C, and then we'll have a self-contained complete system that could kinda do the entire thing. Neither PyTorch nor TensorFlow can do this today, yet.

    17. LF

      (laughs)

    18. YL

      Okay. It's coming.

    19. LF

      Yeah. (laughs)

    20. YL

      Uh, (laughs) I mean, there's something like that in PyTorch called, you know, uh, TorchScript.

    21. LF

      Mm-hmm.

    22. YL

      And so, you know, we had to write our Lisp interpreter, we had to write our Lisp compiler, we had to invest a huge amount of effort to do this. And not everybody... if, if you don't completely believe in the concept, you're not going to invest the time to do this.

    23. LF

      Right.

    24. YL

      Now, at the time also, you know, or today, this would turn into Torch or PyTorch, or TensorFlow, or whatever. We'd put it in open source so everybody would use it and, you know, realize it's good. Back before 1995, working at AT&T, there's no way the lawyers would let you release anything in open source of this nature, and so we could not distribute our code really.

    25. LF

      And, uh, on that point, and sorry to go on a million tangents, but on that point, I also read that there was some almost pat- like, a patent on convolutional networks-

    26. YL

      Yeah, there was.

    27. LF

      ... at Bell Labs. So that... (laughs)

    28. YL

      There was.

    29. LF

      First of all, I mean, just-

    30. YL

      There's two, actually. (laughs)

  11. 33:0336:04

    Benchmarks as reality checks: avoiding AGI hype and building shared evaluation tasks

    1. YL

      It accelerates the entire progress of, of the industry. And, you know, the- the problems that Facebook and Google and others are facing today is not whether Facebook or Google or Microsoft or IBM or whoever is ahead of the other. It's that we don't have the technology to build those things we want to build. We want to build intelligent virtual assistants that have common sense. We don't have a monopoly on good ideas for this. We don't believe we do. Maybe others do- believe they do, but we don't, okay? If a startup tells you they have the secret to, you know, human level intelligence and common sense, don't believe them. They don't. And it's gonna take the entire, uh, work of the world research community for a while to get to the point where you can go off and- and each of those companies can start to build things on this. We're not there yet.

    2. LF

      Absolutely, and this- this calls to the- the gap between the space of ideas and the rigorous testing of those ideas, of practical application that you often speak to. You've, uh, written advice saying, "Don't get fooled by people who claim to have a solution to artificial general intelligence, who claim to have an AI system that work just like the human brain, or who claim to have figured out how the brain works. Ask them what the error rate they get on MNIST or ImageNet." So...

    3. YL

      Yeah, this is a little dated, by the way. (laughs)

    4. LF

      Two thous- I mean, five years?

    5. YL

      Yes.

    6. LF

      Who's counting? Okay. But I think your opinion is still MNIST and ImageNet, uh, yes, may be dated. There may be new benchmarks, right? But I think that philosophy is one you still and- and- and somewhat hold, that benchmarks and the practical testing, the practical application is where you really get to test the ideas.

    7. YL

      Well, it may not be completely practical. Like for example, you know, it could be a toy dataset or it could... but it- but it has to be some sort of task that the community as a whole has accepted as some sort of standard, you know, kind of benchmark, if you want. It doesn't need to be real. So for example, uh, many years ago here at FAIR, um, people, you know, Jason West and Antoine Board and a few others proposed the- the Baby tasks, which were kind of a, a toy problem to test the ability of machines to reason actually, to access working memory and things like this. And, uh, it was very useful even though it wasn't a real task. MNIST is kind of halfway a real task. Uh, so, you know, toy problems can be very useful. It's just that... I was really struck by the fact that a- a lot of people, particularly a lot of people with money to invest, would be fooled by people telling them, "Oh, we have, you know, the algorithm of the cortex and you should give us 50 million."

    8. LF

      Yes, absolutely. So there's a lot of people who... who try to take advantage of the hype for business reasons and so on. But let me sort of talk to this idea that new ideas, the- the ideas that push the field forward may not yet have a benchmark, or it may be very difficult to establish a benchmark.

    9. YL

      I agree. That's part of the process. Establishing benchmarks is part of the process.

  12. 36:0444:46

    Interactive environments and why “AGI” is a misleading term; human specialization argument

    1. LF

      So what are your thoughts about... So we have these benchmarks on- around stuff we can do with images, from classification, to captioning-

    2. YL

      Mm-hmm.

    3. LF

      ... to just every kind of information you can pull off from images in the surface level. There's audio datasets, there's some video. What can we start... Natural language. What kind of stuff... What kind of benchmarks do you see that start creeping onto more something like intelligence, like reasoning, like... maybe you don't like the term, but AGI, echoes of that kind of, uh-

    4. YL

      Yeah, so-

    5. LF

      ... simulation?

    6. YL

      ... a lot of people are working on interactive environments in which you can- you can train and test, uh, intelligent systems. So, so there are- for example, uh, you know, it's... th- the classical paradigm of, uh, supervised learning is that you- you have a dataset, you partition it into a training set, validation set, test set, and there's a clear protocol, right?... but what if the, that assumes that the samples are static- statistically independent. You can exchange them, the order in which you see them doesn't, shouldn't matter, you know, things like that. But what if the answer you give determines the next sample you see? Which is the case, for example, in robotics, right? Your robot does something and then it gets exposed to a new room and depending on where it goes, the room would be different. So, that's the, that creates the exploration problem.

    7. LF

      Mm-hmm.

    8. YL

      The, what if the samples, so that creates also a dependency between samples, right? You, you, if you move, if you can only move in, in space, the next sample you're gonna see is gonna be probably in the same building most likely, right? So, so there's, so the, all the assumptions about the validity of this training set, test set, uh, hypothesis break-

    9. LF

      Mm-hmm.

    10. YL

      ... whenever a machine can take an action that has an influence in the, in the world, and it's what is gonna see. So, people are setting up artificial, uh, environments where, where that takes place, right? The robot runs around a, a 3D model of a house and can interact with objects and things like this. So, you do robotics by simulation, you have those, you know, OpenAI gym type thing, or MuJoCo kinda simulated, uh, robots, and you have games, you know, things like that. Um, so that, that's where the field is going really, um, this kind of environment. Now, back to the question of AGI. Like, I don't like the term AGI because it implies that human intelligence is general.

    11. LF

      Mm-hmm.

    12. YL

      And human intelligence is nothing like general. It's very, very specialized. We think it's general, we like to think of ourselves as having general intelligence, we don't. We're very specialized. We're only slightly more general than-

    13. LF

      Why does it feel general? So, you kind of, uh, the, the term general... I think what's impressive about humans is the ability to learn, as we were talking about learning, to learn in just so many different domains. It's perhaps not arbitrarily general, but just you can learn in many domains and integrate that knowledge somehow.

    14. YL

      Okay.

    15. LF

      And pers- and knowledge persists.

    16. YL

      So, let me take a very specific example.

    17. LF

      Yes.

    18. YL

      It's not an example, it's more like a, a quasi-mathematical demonstration. So, you have about 1 million fibers coming out of one of your eyes, okay? 2 million total, but let's, let's talk about just one of them.

    19. LF

      Mm-hmm.

    20. YL

      It's 1 million nerve fibers, your optical nerve. Let's imagine that they are binary, so they can be active or inactive, right? So, the input to your visual cortex is 1 million bits.

    21. LF

      Mm-hmm.

    22. YL

      Now, they're connected to your brain in a particular way, and your brain has connections that are kind of a little bit like a conventional net, they're kind of local, uh, you know, in space and things like this.

    23. LF

      Mm-hmm.

    24. YL

      Now, imagine I play a trick on you. It's a pretty nasty trick, I admit. I, uh, I cut your optical nerve-

    25. LF

      Mm-hmm.

    26. YL

      ... and I put a device that makes a random perturbation of, a permutation-

    27. LF

      Mm-hmm.

    28. YL

      ... of all the nerve fibers. Well, now what comes to your, to your brain is a fixed but random permutation of all the pixels.

    29. LF

      Mm-hmm.

    30. YL

      There's no way in hell that your visual cortex, even if I do this to you in infancy, will actually learn vision to the same level of quality that you can.

  13. 44:4651:32

    Self-supervised learning: why it works in language, struggles in vision, and the uncertainty problem

    1. LF

      And so on that topic, most successes in deep learning have been in supervised learning.

    2. YL

      Mm-hmm.

    3. LF

      What is your view on unsupervised learning? Is there hope to reduce involvement of human input and still have, uh, successful systems that are have practical use?

    4. YL

      Yeah, I mean, there's definitely a hope. It's, it's more than a hope actually. It's, uh, it's, you know, mounting evidence for it, and that's basically all I do. Like the only thing I'm interested in at the moment is, I call it self-supervised learning, not unsupervised, 'cause unsupervised learning is, is a loaded term. People who know something about machine learning, you know, tell you, "So you're doing clustering or PCA?"

    5. LF

      Right, right.

    6. YL

      Which is not the case. And the wide public, you know, when you say unsupervised learning, "Oh my God, you know, machines are gonna learn by themselves and without supervision?"

    7. LF

      (laughs)

    8. YL

      You know, they see this as-

    9. LF

      Where's, where's the parents? (laughs)

    10. YL

      Yeah. So, so I call it self-supervised learning because, in fact, the underlying algorithms that are used are the same algorithms as the supervised learning algorithms, um, except that what we train them to do is not predict a particular set of variables, like the category of a, of an image, and, and not to predict a set of variables that have been provided by human labelers. But what you train the machine to do is basically reconstruct a piece of its input that it's being, is being masked, masked out, essentially. You can think of it this way, right? So show a piece of video to a machine and ask it to predict what's gonna happen next. And of course, after a while, you can show what, what happens and the machine will kinda train itself to do better at that task. You can do, like all the latest, most successful models in natural language processing use self-supervised learning, you know, sort of BERT-style systems, for example, right? You show it a window of a dozen words on a test corpus, you take out 15% of the words, and then you train the machine to predict the words that are missing.

    11. LF

      Mm-hmm.

    12. YL

      That's self-supervised learning. It's not predicting the future, it's just, you know, predicting things in the middle. But you could have it predict the future. That's what language models do. Um-

    13. LF

      So you construct... So in an unsupervised way, you construct a model of language. Do you think-

    14. YL

      Or video, or the physical world, or whatever, right?

    15. LF

      Reality. How far do you think that can take us? Do you think-

    16. YL

      Very far, I think.

    17. LF

      ... BERT understands anything?

    18. YL

      To some level. It has, you know, a, a shallow understanding of, uh, of text, but it needs to... I mean, to have kind of true human-level intelligence, I think you need to ground language in reality, so some people are attempting to do this, right?

    19. LF

      Attempted to do this.

    20. YL

      Having systems that can have some visual representation of what, what is being talked about, which is one reason you need those interactive environments actually. But there's like a, a huge technical problem that is not solved, and that explains why self-supervised learning works in the context of natural language, but does not work in the context, or at least not well, in the context of image recognition and, uh, video, although it's making progress quickly. And the reason, uh, that reason is the fact that it's much easier to represent uncertainty in the prediction in the context of natural language than it is in the context of things like video and images. So for example, if I ask you to predict what words are missing, you know, 15% of the words that are-

    21. LF

      Right.

    22. YL

      ... have taken out.

    23. LF

      The possibility is just small. I mean that-

    24. YL

      It's small, right? There is 100,000 words in the, in the lexicon, and you, what the machine spits out is a, a big probability vector, right? It's a bunch of numbers between zero and one that sum to one. And we know how to do, how to do this with computers.

    25. LF

      Right.

    26. YL

      So there, representing uncertainty in the prediction is relatively easy, and that's, in my opinion, why those techniques work for NLP. For images, if you ask... if you block a piece of an image and you ask the system reconstruct that piece of the image, there are many possible answers. Uh, they are all perfectly legit, right? And how do you represent that th- this set of possible answers? You, you can't train a system to make one prediction. You can't train a neural net to say, "Here it is. That's the image," because it's, it, there's a whole set of things that are compatible with it. So how do you get the machine to represent not a single output, but a whole set of outputs?And, y- you know, similarly with, uh, video prediction. There's a lot of things that can happen in the future of video. You're looking at me right now, I'm not-

    27. LF

      Yeah.

    28. YL

      ... moving my head very much. But, you know, I might, you know, turn my, my head to the left or to the right.

    29. LF

      Right.

    30. YL

      If you don't have a system that can predict this, and you train it with least square to kind of minimize the error with the prediction and what I'm doing, what you get is a blurry image of myself-

  14. 51:321:16:04

    RL, active learning, and the road to autonomy: world models, self-driving, grounding, and emotions

    1. LF

      Yeah, it quickly explodes. What are your thoughts on the extension of this, which topic I'm super excited about, it's connected to something you were talking about in terms of robotics, is active learning. So, as opposed to sort of unsup- c- completely unsupervised or self-supervised learning, you ask the system for human help for, for selecting parts you want to annotate next.

    2. YL

      Uh-huh.

    3. LF

      So, if you think about a robot exploring a space, or a baby exploring a space, or a system exploring a dataset, every once in a while, asking for human input.

    4. YL

      Mm-hmm.

    5. LF

      Do, do you see value in that kind of work? Do, do-

    6. YL

      I don't see transformative value. Uh, it- it's gonna make things that we can already do more efficient, or they will learn slightly more efficiently.

    7. LF

      Yeah.

    8. YL

      But it's not gonna make machines sort of significantly more intelligent. I think... And I... And, and by the way, there is no opposition, there is no, um, uh, conflict between self-supervised learning, reinforcement learning, and supervised learning, or imitation learning, or active learning.

    9. LF

      Right.

    10. YL

      I see self-supervised learning as a, as a preliminary to all of the above.

    11. LF

      Yes.

    12. YL

      So, the example I use very often is, how is it that... So if you use classical reinforcement learning, d- deep reinforcement learning, if you want, the best methods today, s- so-called model-free reinforcement learning, to learn to play Atari games, uh, take about 80 hours of training to reach the level that any human can reach in about 15 minutes.

    13. LF

      Mm-hmm.

    14. YL

      Um, they get better than humans, but it takes them a long time. Uh, AlphaStar, okay, the, you know, Ariel Vinial's and, and his team's, uh, system to play, um, to, um, to play StarCraft, uh, plays, you know, a single map, a single type of, uh, player, and can reach better than human level with about the equivalent of 200 years of training, playing against itself. It's 200 years, right? It's not some- something that no, no human can-

    15. LF

      (laughs)

    16. YL

      ... could ever do. Now-

    17. LF

      I mean, I'm not sure what else to take away from that.

    18. YL

      Okay. Now, take those algorithms, the best RL algorithms we have today, to train a, a car to drive itself.

    19. LF

      Hmm.

    20. YL

      It would probably have to drive millions of hours. It will have to kill thousands of pedestrians. It will have to run into thousands of trees. It will have to run off cliffs.

    21. LF

      Yeah.

    22. YL

      And it had to run off cliff multiple times before it figures out that it's a bad idea, first of all.

    23. LF

      Yeah.

    24. YL

      And second of all, before it figures out how not to do it. And so, I mean, this type of learning obviously does not reflect the kind of learning that animals and humans do. There is something missing that's really, really important there. And my hypothesis, which I've been advocating for like five years now, is that we have predictive models of the world that include the ability to predict under uncertainty. And what allows us to not run off a cliff when we learn to drive, most of us can learn to drive in about 20 or 30 hours of training without ever crashing, causing any accident.

    25. LF

      Mm-hmm.

    26. YL

      And if we drive next to a cliff, we know that if we turn the wheel to the right, the car is gonna run off the cliff and nothing good is gonna come out of this. Because we have a pretty good model of intuitive physics that tells us, you know, the car is gonna fall. We know, we know about gravity. Babies learn this around the age of eight or nine months, that objects don't float, they fall.

    27. LF

      Mm-hmm.

    28. YL

      Um, and, you know, we have a pretty good idea of the effect of turning the wheel on the car, and you know, we know we need to stay on the road. So there's a lot of things that we bring to the table, which is basically our predictive model of the world...And that model allows us to not do stupid things and to basically stay within the context of, uh, things we need to do. We still face, you know, unpredictable situations and that's how we learn, but that allows us to learn really, really, really quickly. So that's called model-based reinforcement learning. There's some imitation and supervised learning because we have a driving instructor that tells us occasionally what to do. Uh, but most of the learning-

    29. LF

      Is model-based.

    30. YL

      ... is learning the model-

Episode duration: 1:15:58

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode SGSOCuByo24

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.