Skip to content
Lex Fridman PodcastLex Fridman Podcast

Yoshua Bengio: Deep Learning | Lex Fridman Podcast #4

Lex Fridman and Yoshua Bengio on yoshua Bengio on credit assignment, world models, and AI’s future.

Lex FridmanhostYoshua Bengioguest
Oct 20, 201842mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:0015:00

    What difference between biological…

    1. LF

      What difference between biological neural networks and artificial neural networks is most mysterious, captivating, and profound for you?

    2. YB

      First of all, there is so much we don't know about biological neural networks.

    3. LF

      Right.

    4. YB

      And that's very mysterious and captivating, because maybe it holds the key to improving artificial neural networks. One of the things I studied recently, uh, something that we don't know how biological neural networks do, but would be really useful for artificial ones, is the ability to do credit assignment through very long time spans. There are things that we can, in principle, do with artificial neural nets, but it's not very convenient, and it's not biologically plausible. And this mismatch, I think, this kind of mismatch may be an interesting thing to study to, A, understand better how brains might do these things, 'cause we don't have good corresponding theories with artificial neural nets, and B, maybe provide new ideas that we could explore about, um, things that brain do differently, and that we could incorporate in artificial neural nets.

    5. LF

      So, let's break credit assignment up a little bit.

    6. YB

      Yes.

    7. LF

      So, what ... It's a beautifully technical term, but it can incorporate so many things. So, is it more on the RNN memory side, that, thinking like that? Or is it something about knowledge, building up common sense knowledge over time? Or is it, uh, more in the reinforcement learning sense, that you're picking up rewards over time for a particular, uh, to achieve a certain kind of goal?

    8. YB

      So, I was thinking more about the first two meanings, whereby we store all kinds of memories, um, episodic memories in our brain, which we can access later in order to help us both infer causes of things that we are observing now, and assign credit to decisions or interpretations we came up with a while ago when, you know, those memories were stored. And then we can change the way we would have, uh, reacted or interpreted things in the past, and now that's credit assignment used for learning.

    9. LF

      So, in which way do you think artificial neural networks, the current LSTM, the current architectures are not able to capture the ... Presumably, you're, you're, you're thinking of very long term.

    10. YB

      Yes. So, current, recurrent nets are doing a fairly good jobs for sequences with dozens, or say, hundreds of time stamps, and then it gets sort of harder and harder, and depending on what you have to remember and so on, as you consider longer durations. Whereas humans seem to be able to do credit assignment through essentially arbitrary times. Like, I, I could remember something I did last year, and then now, because I see some new evidence, I'm gonna change my mind about, uh, the way I was thinking last year, and hopefully not do the same mistake again.

    11. LF

      I think a big part of that is probably forgetting. You're only remembering the really important things, so it's very efficient forgetting. Um-

    12. YB

      Uh, yes. So, there's a selection of what we remember, and I think, uh, there are really cool connection to higher level cognition here regarding consciousness, deciding and, and emotions. Like sort of the deciding what comes to consciousness and what gets stored in memory, wh- which, which are not trivial either.

    13. LF

      So, you've been at the forefront there all along, showing some of the amazing things that neural networks, deep neural networks can do in the field of artificial intelligence, and just broadly in all kinds of applications. But, uh, we can talk about that forever, but what in your view, because we're thinking towards the future, is the weakest aspect of the way deep neural networks represent the world? What is the, what is, in your view, is missing currently?

    14. YB

      So, current, current state-of-the-art neural nets trained on large quantities of images or texts have some level of understanding of, you know, what explains those data sets, but it's very basic. It's, it's very low level, and it's not nearly as robust and abstract and general as our understanding. Okay, so that doesn't tell us how to fix things, but I think it encourages us to think about how we can, uh, maybe train our neural nets differently, so that they would focus, for example, on causal explanation. Something that we don't do currently with neural net training. Also, one thing I'll talk about in my talk this afternoon-

    15. LF

      Mm-hmm.

    16. YB

      ... is instead of learning separately from images and videos on one hand, and from text on the other hand, we need to do a better job of, um, jointly learning about language, and about the world to which it refers, so that, you know, both sides can help each other. We need to have good world models in, in our neural nets.... for them to really understand sentences which talk about what's going on in the world, and I think we need language input to help provide clues about what high-level concepts, like semantic concepts, should be represented at the top levels of these neural nets. In fact, there is evidence that the purely unsupervised learning of representations doesn't give rise to high-level representations that are as powerful as the ones we are getting from supervised learning.

    17. LF

      Mm-hmm.

    18. YB

      And so the- the- the clues we're getting just with the labels, not even sentences, is already very powerful.

    19. LF

      Do you think that's an architecture challenge or is it a dataset challenge?

    20. YB

      Neither.

    21. LF

      (laughs)

    22. YB

      (laughs)

    23. LF

      Uh, I'm tempted to just end it there. (laughs)

    24. YB

      No, I guess-

    25. LF

      But, uh, can you elaborate slightly? (laughs)

    26. YB

      Yes, yes. (laughs) Ah, of course, datasets and architectures are- are something you wanna always play with, but- but I think the crucial thing is more the training objectives, the training frameworks. Um, for example, going from, um, passive observation of data to more active agents, which, um, uh, learn by intervening in the world, uh, the relationships between causes and effects, um, the sort of objective functions which could be important to allow the- the- the- the highest level, uh, explanations to- to- to rise from- from the learning, which I don't think we have now, the kinds of objective functions which could be used to, uh, reward exploration, the right kind of exploration. So these kinds of questions are neither in the dataset nor in the architecture, but more in how we learn, under, uh, what objectives and so on.

    27. LF

      Yeah, that's, uh, I- I've heard you mention in several contexts, the idea sort of the way children learn, they interact with objects of the world, and it seems f- uh, fascinating because, uh, in some sense, except with some cases in reinforcement learning, that idea is not part of the learning process, uh, in, uh, artificial neural networks. It's almost like do you envision something like, uh, an objective function saying, "You know what? If you poke this object in this kinda way, it would be really helpful for me to-"

    28. YB

      Yes. Yes.

    29. LF

      "... further- further learn?" (laughs)

    30. YB

      Right, right.

  2. 15:0030:00

    And on that line…

    1. YB

      in, in expert systems and so on is nicely decomposed into, like, a bunch of rules, whereas if you think about a neural net, it's the opposite. You have this big blob of parameters which work intensely together to represent everything the network knows, and it's not sufficiently factorized. And so, I think this is one of the weaknesses of current neural nets, that we have to take lessons from classical AI in order to bring in another kind of compositionality which is common in language, for example, and in these rules, but that isn't so native to neural nets.

    2. LF

      And on that line of thinking, disentangled representations.

    3. YB

      Yes.

    4. LF

      So...

    5. YB

      So let me connect with disentangled representations, if you mind, if you don't mind.

    6. LF

      Yes. Yeah, that's exact- yep, yep.

    7. YB

      So for many years, I've thought, and I still believe, that it's really important that we come up with learning algorithms, either unsupervised or supervised, but... Or enforcement, whatever, that build representations in which the, uh, important factors, uh, hopefully causal factors, are nicely separated and easy to pick up from the representation. So that's the idea of disentangled representations. Uh, uh, it says transform the data into a space where everything becomes easy. We can maybe just learn with linear models about the things we care about. And, and I still think this is important, but I think this is missing out on a very important ingredient which classical AI systems can remind us of. So let's say we have these disentangled representations. You still need to learn about the, the relationships between the variables, those high-level semantic variables. They're not gonna be independent. I mean, this is like-

    8. LF

      Mm-hmm.

    9. YB

      ... too much of a, an assumption. They're gonna have some interesting relationships that allow to predict things in the future, to explain what happened in the past. The kind of knowledge about those relationships in a classical AI system is encoded in the rules, like a rule is just like a little piece of knowledge that says, "Oh, I have these two, three, four variables that are linked in this interesting way, then I, I can say something about one or two of them given a couple others," right? In addition to disentangling the, the elements of the representation, which are like the variables in a rule-based system, you also need to disentangle the, the mechanisms that relate those variables to each other, so, like, the rules. So if the rules are neatly separated, like each rule is, you know, living on its own, and when I l- I change a rule because I'm learning, uh, it doesn't need to break other rules, whereas current neural nets, for example, are very sensitive to what's called catastrophic forgetting, where, uh, after I've learned some things and then I learn new things, I- they can destroy the old things that I had learned, right? If the knowledge was better factorized and, and, uh, separated, disentangled, then you would avoid a lot of that. Now, you can't do this in the sensory domain, but...... by ID-

    10. LF

      What do you mean by sensory domain?

    11. YB

      Yeah, like in pixel space.

    12. LF

      Mm-hmm.

    13. YB

      But, but my idea is that when you project the data in the right semantic space, it becomes possible to now represent this extra knowledge beyond the transformation from input to representations, which is how representations act on each other and predict the future, and so on, i- i- in a way that can be neatly, um, disentangled. So now it's the rules that are disentangled from each other, and not just the variables that are disentangled from each other.

    14. LF

      And you draw a distinction between semantic space and pixel. Like is-

    15. YB

      Yes, the, the-

    16. LF

      ... is, is there, does there need to be an architectural difference or is it could-

    17. YB

      Well, yeah. So, so there's the sensory space like pixels which, where everything is entangled, uh, and the, the, the, the, the information, like the variables are completely interdependent in very complicated ways. And also computation, like the, the, the, the... it's not just the variables, it's also how they are related to each other is, is all intertwined. But, but I- I'm hypothesizing that in the right high-level representation space, both the variables and how they relate to each other can be disentangled, and that will provide a lot of generalization power.

    18. LF

      Generalization power?

    19. YB

      Yes.

    20. LF

      Uh, distribution of the test set-

    21. YB

      Yes.

    22. LF

      ... is assumed to be the same as the distribution of the training set.

    23. YB

      Right. This is where current machine learning is too weak. It, it, it doesn't tell us anything... is not able to tell us anything about how our neural nets, say, are gonna generalize to a new distribution. And, and you know, people may think, "Well, but there's nothing we can say if we don't know what the new distribution will be." The truth is, uh, humans are able to generalize to new distributions.

    24. LF

      Yeah, how are we able to do that? So-

    25. YB

      Yeah, because there is something... these new distributions, even though they could look very different from the training distributions, they have things in common. So let me give you a concrete example. You read a science fiction novel. The science fiction novel maybe, you know, brings you in some other planet where things look very different on the surface, but it's still the same laws of physics, right? And so you can read the book and you understand what's going on. So the distribution is very different, but because you can transport a lot of the knowledge you had from Earth about the underlying cause and effect relationships and physical mechanisms and all that, and maybe even social interactions, uh, you can now make sense of what is going on on this planet where like visually, for example, things are totally different.

    26. LF

      Taking that analogy further and distorting it, let's enter a sci- science fiction world of say, uh, Space Odyssey 2001 with HAL.

    27. YB

      Yeah.

    28. LF

      Or, or maybe, uh, which is probably one of my favorite AI movies and then, uh-

    29. YB

      Me too.

    30. LF

      (laughs) And then there's another one that a lot of people love that are maybe a little bit outside the AI community, uh, is Ex Machina.

  3. 30:0042:24

    Mm-hmm. …

    1. YB

      a- something we'll achieve in the next five or ten years. How can we... You know, there's already work in detecting emotions, for example, in images, in sounds, in texts, and also studying how different agents interacting in different ways may correspond to...... patterns of, say, injustice, which could trigger anger. So, these are things we can do in, in the medium term, and eventually train computers to model-

    2. LF

      Mm-hmm.

    3. YB

      ... for example, how humans react emotionally, uh, I would say the simplest thing is unfair, uh, situations which trigger anger. This is one of the most basic emotions that we share with other animals.

    4. LF

      Mm-hmm.

    5. YB

      I think it's quite feasible within the next few years, so we can build systems that can detect these kind of things-

    6. LF

      Mm-hmm.

    7. YB

      ... to the extent, unfortunately, that they understand enough a- about the world around us, which is a long time away, but maybe we can initially do this in virtual environments. So, you can imagine, like, a video game-

    8. LF

      Mm-hmm.

    9. YB

      ... where agents interact in, in some ways, and then some situations trigger an emotion. Uh, I think we could train machines to detect those situations and predict that a particular emotion, you know, will likely be, uh, felt if a human was playing one of the characters.

    10. LF

      You, uh, have shown excitement and done a lot of excellent work with, uh, unsupervised learning, but on a supervi- you know, there's been a lot of, uh, success on the supervised learning side.

    11. YB

      Yes, yes.

    12. LF

      Uh, and one, one of the things, um, I'm really passionate about is how humans and robots work together, and, uh, uh, in the context of supervised learning, that means the process of annotation. Do you think about the problem of annotation, of, uh, put in a more, uh, interesting way as humans teaching machines?

    13. YB

      Yes.

    14. LF

      Is, is, is there a-

    15. YB

      Yes. I think it's an important subject. Reducing it to annotation may be useful for somebody building a system tomorrow, but longer term, the process of teaching I think is something that deserves a lot more attention from the machine learning community. And so there are people who have coined the term machine teaching.

    16. LF

      (laughs)

    17. YB

      So, what are good strategies for teaching a learning agent?

    18. LF

      Mm-hmm.

    19. YB

      And can we, uh, design and train a system that gonna be, is gonna be a good teacher? So, so in my group, we have a project called, uh, Baby AI or Baby AI Game where there is a, a game, uh, or a scenario where there's a, a, a, a learning agent and a teaching agent. Presumably, the teaching agent would eventually be a human, um, but, you know, we're not there yet. Um, and the, um, the role of the teacher is to use its knowledge of the environment, which it can acquire using whatever way, brute force, um, to help the learner learn as quickly as possible. So, the learner is gonna try to learn by itself, maybe be using some exploration and, and whatever, um, but the teacher can choose, can, can, can have a- an influence on the interaction with the learner so as to guide the learner, um, maybe teach it the things that the learner has most trouble with or just at the boundary between what it knows and doesn't know and so on. So, there's, there's a, there's a tradition of these kind of ideas from other fields, and, um, uh, like tutorial systems, for example, and AI, um, and, and of course, people in, in, in the humanities have been thinking about these questions. But I think it's time that machine learning people, um, look at this, because in the future, we'll have more and more, uh, human-machine interaction-

    20. LF

      Mm-hmm.

    21. YB

      ... with a human-in-the-loop, and I think understanding how to make this work better is gonna be-

    22. LF

      All the problems around that are very interesting-

    23. YB

      Yes.

    24. LF

      ... and not sufficiently addressed. You've done a lot of, of work with language too. What aspect of, um, the traditionally formulated Turing test, a test of natural language understanding and generation, in your eyes is the most difficult of, of conversation? What in your eyes is the hardest part of conversation to solve for machines?

    25. YB

      So, I would say it's everything having to do with the non-linguistic knowledge, which implicitly you need in order to make sense of sentences, things like the Winograd schemas. So, these sentences that are semantically ambiguous. In other words, you need to understand enough about the world in order to really interpret properly those sentences. I think these are interesting challenges for machine learning, because they point in the direction of building systems that both understand how the world works and its causal relationships in the world, and associate that knowledge with how to express it in language, either for reading or writing.

    26. LF

      You speak French?

    27. YB

      Yes, it's my mother tongue.

    28. LF

      It's one of the romance languages. Uh, do you think passing the Turing test and all the underlying challenges we just mentioned depend on language? Do you think it might be easier in French than it is in English?

    29. YB

      No.

    30. LF

      Or is it independent of language?

Episode duration: 42:18

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode azOmzumh0vQ

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome