Skip to content
Lex Fridman PodcastLex Fridman Podcast

Juergen Schmidhuber: Godel Machines, Meta-Learning, and LSTMs | Lex Fridman Podcast #11

Lex Fridman and Jürgen Schmidhuber on jurgen Schmidhuber on self-improving AI, curiosity, and universal intelligence.

Lex FridmanhostJürgen Schmidhuberguest
Dec 23, 20181h 19mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:004:05

    Teenage origins of recursive self-improvement: building a machine better than a physicist

    1. LF

      The following is a conversation with Jurgen Schmidhuber. He's the co-director of atia Swiss AI lab and the co-creator of long short-term memory networks. LSTMs are used in billions of devices today for speech recognition, translation, and much more. Over 30 years, he has proposed a lot of interesting out of the box ideas on meta-learning, adversarial networks, computer vision, and even a formal theory of, quote, "Creativity, curiosity, and fun." This conversation is part of the MIT course on Artificial General Intelligence and the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube, iTunes, or simply connect with me on Twitter @lexfridspelled F-R-I-D. And now here's my conversation with Jurgen Schmidhuber. Early on, you dreamed of AI systems that self-improve recursively. When was that dream born?

    2. JS

      When I was a baby. No, that's not true.

    3. LF

      (laughs)

    4. JS

      When I was a teenager.

    5. LF

      And what was the catalyst for that birth? What was the thing that first inspired you?

    6. JS

      When I was a boy, I'm, I was thinking about what to do in my life and then I thought the most exciting thing is to solve the riddles of the universe and, and that means you have to become a physicist. However, then I realized that there's something even grander, you can try to build a machine that isn't really a machine any longer that learns to become a much better physicist than I could ever hope to be. And that's how I thought maybe I can multiply my tiny little bit of creativity into infinity.

    7. LF

      But ultimately, that creativity will be multiplied to understand the universe around us? That's, that's the, the curiosity for that mystery that, that drove you?

    8. JS

      Yes. Uh, so if you can build a machine that learns to solve more and more complex problems and more and more general problem solver, then you basically ha-have, um, solved all the problems. At least all the solvable problems.

    9. LF

      So how do you think, what is the mechanism for that kind of general solver look like? Because obviously we don't quite yet have one or know how to build one, but we have ideas and you have had throughout your career several ideas about it. So how do you think about that mechanism?

    10. JS

      So in the '80s, I thought about how to build this machine that learns to solve all these problems that I cannot solve myself and I thought it is clear it has to be a machine that not only learns to solve this problem here and this problem here, but it also has to learn to improve the learning algorithm itself.

    11. LF

      Right.

    12. JS

      So it has to have the learning algorithm in, um, representation that allows it to inspect it and modify it, such that it can come up with a better learning algorithm. So I call that meta-learning, learning to learn and recursive self-improvement, um, that is really the pinnacle of that where you then not only learn, um, how to improve on that problem and on that, but you also improve the way the machine improves, and you also improve the way it improves the way it improves itself. And that was my 1987 diploma thesis which was all about that, hierarchy of meta-learners that have no computational limits except for the well-known limits, uh, that Goedel identified in 1931, and, uh, for the limits are physics.

  2. 4:056:35

    Meta-learning vs transfer learning: “learning to learn” by inspecting and rewriting the learner

    1. LF

      Mm-hmm. In the recent years, meta-learning has gained popularity in a v- in a specific kind of form. You've talked about how that's not really meta-learning w-with neural networks, that's more m- basic transfer learning. Can you talk about the difference between the big general meta-learning-

    2. JS

      Mm-hmm.

    3. LF

      ... and a more narrow sense of meta-learning the way it's used today, the way it's talked about today?

    4. JS

      Let's take the example of a deep neural network that has, uh, learned to classify images and maybe you have trained that, um, network on 100 different databases of images.

    5. LF

      Mm-hmm.

    6. JS

      And now a new database comes along and you want to quickly learn the new thing as well. So one simple way of doing that is you take the network which already knows 100 types of databases and then you just take the top layer of that and you retrain that, uh, using the new label data that you have in the new image database. And then it turns out that it really, really quickly can learn that too.

    7. LF

      Mm-hmm.

    8. JS

      One shot basically.

    9. LF

      Mm-hmm.

    10. JS

      Because from the first 100 datasets, it already has learned so much about, about computer vision that it can reuse that and that is then almost good enough to solve the new task except you need a little bit of, um, adjustment on the top. So that is transfer learning and it has been done in principle for many decades. People have done similar things for decades. Meta-learning, true meta-learning is about having the learning algorithm itself open to introspection by the system that is using it.... and also open to modification, such that the learning system has an opportunity to modify any part of the learning algorithm and then evaluate the consequences of that modification and then learn from that, uh, to create a better learning algorithm and so on recursively. So that's a very different animal where you are opening the space of possible learning algorithms to the learning system itself.

  3. 6:359:08

    Gödel machines and universal solvers: proof search, optimality, and the “constant overhead” problem

    1. LF

      Right. So you've, uh, like in the 2004 paper, you described, uh, Gödel machines and programs that rewrite themselves.

    2. JS

      Yeah.

    3. LF

      Right? Philosophically and even in your paper mathematically, these are really compelling ideas. But practically, do you see these self-referential programs being successful in the near term to having an impact where sort of it demonstrates to the world that th- this direction is a, is a good one to pursue in the near term?

    4. JS

      Yes. We had these two different types of, um, fundamental research, um, how to build a universal problem solver. One basically exploiting proof search and things like that that you need to come up with asymptotically optimal, theoretically optimal self-improvers and problem solvers. However, one has to admit that through this proof search comes in an additive constant, an overhead, an additive overhead that vanishes in comparison to, uh, what you have to do to solve large problems.

    5. LF

      Mm-hmm.

    6. JS

      However, for many of the small problems that we want to solve in our everyday life, we cannot ignore this constant overhead and that's why we also have been, um, doing other things, non-universal things such as recurrent neural networks which are trained by gradient descent-

    7. LF

      Mm-hmm.

    8. JS

      ... and local search techniques which aren't universal at all, which aren't provably optimal at all like the other stuff that we did, but which are much more practical as long as we only want to solve the small problems that we are typically trying to solve in this environment here. Yeah, so the universal problem solvers-

    9. LF

      Yeah.

    10. JS

      ... uh, like the Gödel machine but also Marcus Hutter's, um, fastest way of solving all possible problems which he developed around 2002, uh, two in my lab. Uh, they are associated with these c- constant overheads, uh, for proof search which guarantees that the thing you're doing is optimal. For example, there is this fastest way of solving all problems with a computable solution which is due to Marcus, Marcus Hutter.

    11. LF

      Mm-hmm.

  4. 9:0811:32

    Hutter’s fastest method and the TSP example: asymptotic optimality in plain terms

    1. JS

      And, uh, and to explain what's going on there, let's take traveling salesman problems.

    2. LF

      Mm-hmm.

    3. JS

      With traveling salesman problems you have a number of cities, N cities, and you try to find the shortest path-

    4. LF

      Mm-hmm.

    5. JS

      ... through all these cities without visiting any city twice. And nobody knows the fastest way of solving traveling salesman problems, TSPs, but let's assume there is a method of solving them within N to the 5 operations where N is the number of cities. Then, uh, the universal method of Marcus is going to solve the same traveling salesman problem also within N to the 5 steps.

    6. LF

      Mm-hmm.

    7. JS

      Plus-

    8. LF

      A constant.

    9. JS

      ... O of one plus a constant number of steps that you need for the proof searcher which, uh, you need to, um, show that this particular class of problems, the traveling salesman, salesman problems can be solved within a certain time bound, um, within order N to the 5 steps basically. And this, uh, additive constant doesn't care for N which means as N is getting larger and larger, as you have more and more cities, the constant overhead pales in comparison. And that means that almost all large problems are solved in the best possible way already today. We already have a universal problem solver like that. However, it's not practical because the overhead, the constant overhead is so large that for the small kinds of problems that we want to solve in this little biosphere...

    10. LF

      By the way, when you say small, you're talking about things that fall within the constraints of our computational systems. So they can, they can seem quite large to us mere humans, right?

    11. JS

      That's right, yeah. So they seem large and even unsolvable in a practical sense today, but they are still small compared to almost all problems because almost all problems are large problems which are much larger than any constant.

  5. 11:3215:00

    P vs NP and the role of theory: why today’s best AI works with little theoretical grounding

    1. LF

      Do you find it useful as a person who has dreamed of creating a general learning system, has worked on creating one, has done a lot of interesting ideas there, to think about P versus NP, this, uh, formalization of how hard problems are, how they scale, this kind of worst-case analysis type of thinking?

    2. JS

      Mm-hmm.

    3. LF

      Do you find that useful or is it only just a mathematical... it, it's a set of mathematical techniques to give you intuition about what's good and bad?

    4. JS

      Mm-hmm. So P versus NP, that's, uh, super interesting from a theoretical point of view and in fact as you are thinking about that problem, you can also get inspiration for better practical problem solvers.... on the other hand, we have to admit that at the moment, the best practical problem-solvers for all kinds of problems that we are now solving through what is called AI at the moment-

    5. LF

      Mm-hmm.

    6. JS

      ... they are not of the kind that is inspired by these questions.

    7. LF

      Yes.

    8. JS

      No. There we are using, um, general purpose computers such as recurrent neural networks, but we have a search technique which is just local search, gradient descent-

    9. LF

      Mm-hmm.

    10. JS

      ... to try to find a program that is running on these recurrent networks such that it can solve some interesting problems such, such as speech recognition or machine translation and something like that. And there is very little theory behind the best solutions that we have at the moment that can do that.

    11. LF

      Do you think that needs to change? Do you think that will change or can we go, can we create a general intelligent systems without ever really proving that that system is intelligent in some kinda mathematical way? Solving machine translation perfectly or something like that, within some kinda syntactic definition or language or can we just be super impressed by the thing working extremely well and that's sufficient?

    12. JS

      There's an old saying, and I don't know who brought it up first, uh, which says there is nothing more practical than a good theory.

    13. LF

      (laughs)

    14. JS

      (laughs) And, um-

    15. LF

      Yeah.

    16. JS

      ... and a good theory of problem-solving under limited resources like here in this universe or on this little planet, has to take into account these limited resources. And so probably there is lurking a theory which is related to what we already have, these asymptotically optimal problem-solvers, which, which, uh, tells us what we need in addition to that to come up with a practically optimal problem-solver. So I believe we will have something like that, and maybe just a few little tiny twists unnecessary to, to change what we already have to come up with that as well. As long as we don't have that, we, mm, admit that we are taking suboptimal ways and recurrent neural networks and long short-term memory for, uh, equipped with local search techniques, and we are happy that it works better than any competing method but, um, that doesn't mean that we, we think we are done.

  6. 15:0017:29

    Why AGI may be “a few lines of code”: simplicity, abstractions, and standing on giants’ shoulders

    1. LF

      You said that an AGI system will ultimately be a simple one. Uh, a general intelligence system will ultimately be a simple one, maybe a pseudocode of a few lines will be able to describe it. Can you talk through your intuition behind this idea, why you feel that a s- at its core, intelligence is a simple algorithm?

    2. JS

      Experience tells us that the stuff that works best is really simple. So the asymptotically optimal ways of solving problems, if you look at them, they're just a few lines of code. It's really true. Although they have these amazing properties, just a few lines of code. Then the most, mm, promising and most useful practical things maybe don't have this proof of optimality associated with them. However, they are also just a few lines of code. The most successful, mm, recurrent neural networks, you can write them down in five lines of pseudocode.

    3. LF

      Th- that's a beautiful, almost poetic idea, but w- what you're describing there is the s- the lines of pseudocode are sitting on top of layers and layers of abstractions in a sense.

    4. JS

      Mm-hmm.

    5. LF

      So y- you're saying at the very top-

    6. JS

      Mm-hmm.

    7. LF

      ... it will be a beautifully written sort of, uh, algorithm.

    8. JS

      Mm-hmm.

    9. LF

      But do you think that there's many layers of abstractions we have to first learn to construct?

    10. JS

      Yeah. Of course, we are building on all these, um, great abstractions that people have invented over the millennia such as matrix multiplications and real numbers and basic arithmetics and calculus and derivations of, um, arrow functions and derivatives of arrow functions and stuff like that. So without that language, that greatly simplifies our way of thinking about these problems, we couldn't do anything. So in that sense as always, we are standing on the shoulders of the giants who, in the past, um, simplified the problem of problem-solving so much that now we have a chance to do the final step.

  7. 17:2925:38

    Determinism, quantum randomness, and the beauty of compressible universes

    1. LF

      (laughs) So the final step will be a simple one. Uh, w- if we, if we take a step back through all of human civilization and just the universe in general (laughs) , uh, w- how do you think about evolution and what if creating a universe is required to achieve this final step?

    2. JS

      Mm-hmm.

    3. LF

      What if going through the very painful and inefficient process of evolution is needed-

    4. JS

      Mm-hmm.

    5. LF

      ... to come up with this set of abstractions that ultimately lead to intelligence? Do you think there's a shortcut or do you think we have to create something like our universe in order to create something like human level intelligence?

    6. JS

      Hmm. So far the only example we have, uh-

    7. LF

      (laughs)

    8. JS

      ... is this one.

    9. LF

      Yeah.

    10. JS

      This universe, um, in which we are living.

    11. LF

      Do you think it can do better?

    12. JS

      Maybe not.

    13. LF

      (laughs)

    14. JS

      But, um, we are part of this whole process.

    15. LF

      Right.

    16. JS

      So...... apparently, so it might be the case that the code that runs the universe is really, really simple. Everything points to that possibility because gravity and other basic forces are really simple laws that can be easily described also in just a few lines of code basically. And, uh, and then there are these other, um, events that... the apparently random events in the history of the universe which, as far as we know at the moment, don't have a compact code, but who knows? Maybe somebody in the near future is going to figure out the pseudo-random generator which is, um, which, um, is computing whether the measurement of that, um, spin up or down thing here is, um, going to be positive or negative.

    17. LF

      Underlying quantum mechanics?

    18. JS

      Yes. So-

    19. LF

      Do you ultimately think quantum mechanics is a, a pseudo-random number gen- so it's all deterministic? There's no randomness in our universe? Does God play dice?

    20. JS

      So a couple of years ago, a famous physicist, quantum physicist, um, Anton Zeilinger, he wrote an essay in nature and it started more or less like that. One of the fundamental insights of the, of the 20th century was that the universe is fundamentally random on the quantum level. And that whenever you measure spin up or down or something like that, a new bit of information enters the history of the universe. And while I was reading that, I was already typing, uh, the response and they had to publish it because I was right.

    21. LF

      (laughs)

    22. JS

      That there is no evidence, no physical evidence, uh, for that. So there's an alternative explanation where everything that we consider random is actually pseudo-random, such as the decimal expansion of pi, 3.141 and so on, which looks random but isn't. So pi is interesting because every three-digit sequence, ev- every sequence of, uh, three digits appears roughly one in a thousand times, and every five digit sequence appears roughly one in 10,000 times what you would, would expect if it was ran- random. But there's a very short algorithm, a short program that computes all of that. So it's extremely compressible. And who knows? Maybe tomorrow somebody, some grad student at CERN goes back over all these, um, data points, beta decay and whatever, and figures out, oh, it's, uh, the s- the second billion digits of pi or something like that.

    23. LF

      (laughs)

    24. JS

      We don't have any fundamental reason at the moment to believe that this is truly random and not just a deterministic video game. If it was a deterministic video game, it would be much more beautiful because beauty is simplicity, and many of the basic laws of the universe, like gravity and, um, the other basic forces are very simple. So very short programs can explain what these are doing. And, um, and it would be awful and ugly. The universe would be ugly. The history of the universe would be ugly if for the extra things, the random, the seemingly random data points that we get all the time, that we really need a huge number of extra bits to describe all these, um, these extra bits of information. So as long as we don't have evidence that there is no short program that computes the entire history of the entire universe, we are, as scientists, compelled to look further for that, um, shortest program.

    25. LF

      Your intuition says there exists a shortest... a program that can backtrack to the, to the creation of the universe-

    26. JS

      Yeah. So-

    27. LF

      ... to compute the shortest path to the creation of the universe?

    28. JS

      Yes. Including all the, um, entanglement things and all the, um, spin up and down measurements, uh, that have been taken place, um, since 13.8 billion years ago and so on. Yeah. So we don't have a proof that it is, uh, random. We don't have a proof that it is compressible to a short program. But as long as we don't have that proof, we are obliged as scientists to keep looking for that simple explanation.

    29. LF

      Absolutely. So you said simplicity is beautiful or beauty is simple, either one works, but you also work on curiosity, discovery, you know, the romantic notion of randomness, of serendipity, of, of, um, being surprised by things that are about you. Kind of in our poetic notion of reality, we think as humans require randomness. So you don't find randomness beautiful? You, you s- you find simple determinism beautiful?

    30. JS

      Yeah.

  8. 25:3829:36

    Science as compression progress: Kepler → Newton → Einstein and predictive coding

    1. JS

      All the history of science is a history of compression progress.

    2. LF

      Yeah, so you, you've (smacks lips) described sort of as we build up abstractions and you've t- talked about the idea of, uh, compression, how do you see this, the history of science, the history of humanity or civilization and life on Earth as some kind of, uh, path towards greater and greater compression? What do you mean by that? How do you think about that?

    3. JS

      Hmm. Indeed, the history of science is a history of compression progress. What does that mean? Hundreds of years ago, there was an astronomer whose name was Kepler, and he looked at the data points that he got by watching planets move, and then he had all these data points and suddenly it turned out that he can greatly compress the data by predicting it through an ellipse law. So it turns out that all these data points are more or less on ellipses around the sun. And another guy came along whose name was Newton, and before him Hooke, and they said the same thing that is making these planets move like that is what makes the apples fall down, and it also holds for stones and for all kinds of other objects. And suddenly, many, many of these compressions, o- of these observations became much more compressible because as long as you can predict the next thing given what you have seen so far, you can compress it, but y- you don't have to store that data extra. This is called predictive coding. And then there was still something wrong with that theory of the universe, and you had deviations from these predictions of the theory, and 300 years later another guy came along whose name was Einstein, and he, um, he was able to explain away all these deviations from the, um, predictions-

    4. LF

      Mm-hmm.

    5. JS

      ... of the old theory through a new theory, uh, which was called the General Theory of Relativity, which at first glance looks a little bit more complicated and you have to warp space and time, but you can phrase it within one single sentence which is, no matter how fast you accelerate and how fast or hard you, um, decelerate, and, um, no matter what is the, um, gravity in your local framework, light speed always looks the same. And from, from that you can calculate all the consequences, so it's a very simple thing, and it allows you to further compress all the observations because suddenly there are hardly any deviations any longer that you can measure from the predictions of this new theory. So all of science, um, is a history of compression progress. You never arrive immediately at the shortest, uh, explanation of the data, but you're making progress.

    6. LF

      Mm-hmm.

    7. JS

      Whenever you are making progress, you have an insight. You see, oh, first I needed so many bits of information to describe the data, to describe my falling apples, my video of falling apples, I need so many data-

    8. LF

      Mm-hmm.

    9. JS

      ... and so many pixels have to be stored, but then suddenly I realize, no, there is a very simple way of predicting the third frame in the video from the first two, and, um, and maybe not every little detail can be predicted, but more or less most of these orange blaks, uh, blobs that are coming down, they accelerate in the same way, which means that I can greatly compress the video. And the amount of compression, progress, that is the depth of the insight that you have

  9. 29:3630:28

    Intrinsic motivation and curiosity: rewarding agents for “depth of insight”

    1. JS

      at that moment, that's the fun that you have, the scientific fun, the fun in that discovery, and we can build artificial systems that do the same thing, that measure the depth of their insights as they are looking at the data which is coming in through their own experiments, and we give them a reward, an intrinsic reward in proportion to this depth of insight.

    2. LF

      Mm-hmm.

    3. JS

      W- and since they are trying to maximize the, um, rewards they get, they are suddenly motivated to come up with new action sequences, with new experiments that have the property that the data that is coming in as a consequence of these experiments has the property that they can learn something about, see a pattern in there which they hadn't seen yet before.

  10. 30:2835:35

    PowerPlay: systems that invent their own problems and expand capabilities without forgetting

    1. LF

      S- there's an idea of power play that you've described-... uh, uh, training a general problem solver in this kind of way of looking for the unsolved problems.

    2. JS

      Yeah.

    3. LF

      Can you describe that idea a little further?

    4. JS

      Yeah. It's another very simple idea. So normally, what you do in computer science, you have, um, you have some guy who gives you a problem, and then there is a- a huge, uh, search space of potential solution candidates. And you somehow try them out and, um, you have more or less sophisticated ways of, mm, moving around in that search space until you finally found a solution, uh, which you consider satisfactory.

    5. LF

      Mm-hmm.

    6. JS

      That's what most of computer science is about. Power play just goes one little step further and says, "Let's not only search for solutions to a given problem, but let's search two pairs of problems and their solutions where the system itself has the opportunity to phrase its own problem." So we are looking suddenly at pairs of problems and their solutions or, uh, modifications of the problem solver that is supposed to generate a solution to that, um, new problem. And- and this additional, um, degree of freedom allows us to build courier systems that are like scientists in the sense that they not only try to solve and try to find answers to existing questions, no, they are also free to, um, pose their own questions. So if you want to build an artificial scientist, we have to give it that freedom and power play is exactly doing that.

    7. LF

      So that's- that's a dimension of freedom that's important to have, but how do you th- how hard do you think that, how multi-dimensional and difficult the space of then coming up with your own questions is?

    8. JS

      Yeah.

    9. LF

      So it's w- as, it's one of the things that as human beings we, uh, consider to be the thing that makes us special-

    10. JS

      Mm.

    11. LF

      ... the intelligence that makes us special is that brilliant insight-

    12. JS

      Yeah.

    13. LF

      ... that can create something totally new.

    14. JS

      Yes. So now let's look at the extreme case. Let's look at the set of all possible problems that you can formally describe-

    15. LF

      Mm-hmm.

    16. JS

      ... which is infinite, which should be the next problem that a scientist or power play is going to solve. Well, it should be the easiest problem that goes beyond what you already know. So it should be the simplest problem that the current problem solver that you have, which can already solve 100 problems, that he cannot solve yet by just generalizing. So it has to be new, so it has to require a modification of the problem solver such that the new problem solver can solve this new thing, but the old problem solver cannot do it. And in addition to that, we have to make sure that the problem solver doesn't forget any of the previous solutions.

    17. LF

      Right.

    18. JS

      And so by definition, power play is now trying always to search in this pair of-

    19. LF

      Mm-hmm.

    20. JS

      ... in- in- in the set of pairs of problems and problem solver modifications for a combination that, uh, minimize the time-

    21. LF

      Mm-hmm.

    22. JS

      ... to achieve these criteria. So it's always trying to find the problem which is easiest to add to the repertoire.

    23. LF

      So just like grad students and, uh, academics and researchers can spend their whole career in a local minima-

    24. JS

      Mm-hmm.

    25. LF

      ... stuck trying to, uh, come up with interesting questions-

    26. JS

      Mm-hmm.

    27. LF

      ... but ultimately doing very little, do you think it's easy w- in this approach of looking for the simplest unsolvable problem to get stuck in a local minima and not never really discovering new, uh, you know, really jumping outside of the 100 problems that you've already solved-

    28. JS

      Mm-hmm.

    29. LF

      ... in- in a genuine creative way?

    30. JS

      No, because that's the nature of power play, that it's always trying to break its current generalization abilities by coming s- up with a new problem which is beyond the current horizon.

  11. 35:3537:58

    Humans as curious agents: meaning, exploration trade-offs, and evolution’s built-in biases

    1. LF

      So in the, uh, paper with the amazing title Formal Theory of Creativity, Fun and Intrinsic Motivation you talk about discovery as intrinsic reward.

    2. JS

      Mm.

    3. LF

      So if you view humans as intelligent agents, what do you think is the purpose and meaning of life for us humans? Is, you've talked about this discovery, uh, do you see humans a- as an instance of power play agents?

    4. JS

      Yeah. Uh, so humans, uh, are curious and, um, that means they behave like scientists, not only the official scientists but even the babies-

    5. LF

      Mm-hmm.

    6. JS

      ... behave like scientists and they play around with their toys to figure out how the world works and how it is responding to their actions, and that's how they learn about gravity and everything.And, yeah, in 1990, we had the first systems like that. We would just try to, to play around with the environment and, uh, come up with situations that, um, go beyond what they knew at that time, and then get a reward for creating these situations and then becoming, um, more general problem solvers and being able to understand more of the world. So, yeah, I think in principle that, um, that, that curiosity, um, strategy or sophis- more sophisticated versions of what I just described, they are what we have built in as well because evolution discovered that's a good way of exploring-

    7. LF

      Mm-hmm.

    8. JS

      ... the unknown world.

    9. LF

      Beautiful.

    10. JS

      And a guy who explores the unknown world has a, a higher chance of, um, solving problems that he needs to survive in this world. On the other hand, those guys who were too curious, they were weeded out as well. So, you have to find this trade-off. Evolution found a certain trade-off. Apparently in our society, there are, um, is a certain percentage of extremely explorative guys and it doesn't matter if they die because many of the others are more conservative.

    11. LF

      Mm-hmm.

    12. JS

      And, um, and, and so yeah, it, it would be surprising to me if, um, if that principle of artificial curiosity wouldn't be present in almost exactly the same form here.

  12. 37:5846:10

    Creativity and consciousness as byproducts: self-models emerge from compression and planning

    1. LF

      In our brains. So, you're a bit of a musician and an artist. So, continuing in this topic of creativity, what do you think is the role of creativity in intelligence? So, you've kind of implied that it's essential for intelligence, if you think of intelligence as a problem-solving system, as ability to solve problems. But do you think it's essential-

    2. JS

      (laughs)

    3. LF

      ... this idea of creativity?

    4. JS

      We never have a program, a sub-program that is called creativity or something. It's just a side effect of what our problem solvers do. They are searching a space of problems, or a space of, uh, candidates, of solution candidates until they hopefully find a solution to a given problem.

    5. LF

      Mm-hmm.

    6. JS

      But then there are these two types of creativity and, uh, both of them are now present in our machines. Um, the first one has been around for a long time, which is human gives problem to machine, machine tries to, um, find a solution to that. And this has been happening for many decades. And for many decades, machines have found creative solutions-

    7. LF

      Mm-hmm.

    8. JS

      ... to interesting problems where humans, um, were not aware of these, um, particularly creative solutions, but then appreciated that the machine found that.

    9. LF

      Mm-hmm.

    10. JS

      The, the second is the pure creativity. That I would call, what I just mentioned, I would call the applied creativity.

    11. LF

      Mm-hmm.

    12. JS

      Like applied art where somebody tells you, "Now, make a nice picture of, of this pope and you will get money for that." Okay. So, here is the artist and he makes a convincing picture of the pope and the pope likes it and gives him the money. And then there is the pure creativ- creativity which is more like the power play and the artificial curiosity thing where you have the freedom to select your own problem, like a scientist who defines his own question to study. And so, that is the pure creativity if you will.

    13. LF

      And that-

    14. JS

      As opposed to the applied creativity which serves another.

    15. LF

      And in that distinction, there's almost echoes of narrow AI versus general AI. So this kind of constrained painting of a pope seems like the, the approaches of what people are calling narrow AI.

    16. JS

      Mm-hmm.

    17. LF

      And pure creativity seems to be... maybe I'm just biased as a human, but it seems to be an essential element of human-level intelligence. Is that what you're implying to a degree?

    18. JS

      If you zoom back a little bit and you just look at a general problem-solving machine which is trying to solve arbitrary problems, then this machine will figure out in the course of solving problems that it's good to be curious. So, all of what I said just now about this pre-wired curiosity-

    19. LF

      Mm-hmm.

    20. JS

      ... and this will to invent new problems, um, that the system doesn't know how to solve yet should be just a byproduct of the general search. However, apparently evolution has built it into us because it turned out to be so successful, uh, uh, uh, pre-wiring a bias, a very successful exploratory bias that, um, that we are born with.

    21. LF

      And you've also said that consciousness in the same kind of way may be a byproduct of, of problem-solving.

    22. JS

      Yeah.

    23. LF

      Do you think, do you find it's an interesting byproduct? Do you think it's a useful byproduct? What are, what are your thoughts on consciousness in general? Or is it simply a byproduct of greater and greater capabilities of problem-solving that's, um, that's similar to creativity in that sense?

    24. JS

      Yeah. We never have a procedure called consciousness in our machines. However, we get as side effects of what these machines are doing things that seem to be closely related to what people call consciousness.

    25. LF

      Mm-hmm.

    26. JS

      So for example, uh, in 1990, we had simple systems which were basically, um, recurrent networks and therefore universal computers trying to, mm, map incoming data into-... actions that lead to success. Uh, maximizing reward in a given environment, always finding the charging station in time whenever the battery is low and negative signals are coming from the battery-

    27. LF

      Mm-hmm.

    28. JS

      ... always find the charging station in time without bumping against painful obstacles on the way. So complicated things, but very easily motivated.

    29. LF

      Mm-hmm.

    30. JS

      And then, uh, we give these little guys a separate recurrent neural network, which is just predicting what's happening if I do that and that-

  13. 46:1050:57

    LSTMs and the meaning of depth: long time lags, credit assignment, and practical limits

    1. LF

      to me. You've described RNNs, developed, uh, LSTMs, long short-term memory networks that are, they're a type of recurrent neural networks that have gotten a lot of success recently. So these are networks that model the temporal aspects in the data, t- temporal patterns in the data, and you've called them the deepest of the neural networks, right? So what do you think is the value of depth in the models that we use to learn?

    2. JS

      Yeah. Since you mentioned the long short-term memory and the LSTM, um, I have to mention the names of the brilliant students who made that possible.

    3. LF

      Yes, of course, of course.

    4. JS

      Um, first of all, my first student ever, Sepp Hochreiter, who had fundamental insights already in his diploma thesis. Then Felix Gears, who had additional important contributions. Alex Gray is a guy from Scotland who, um, uh, is mostly responsible for this, uh, CTC algorithm which is now often used to, to train, uh, the LSTM to do the speech recognition on all the Google Android phones and whatever, um, and CRV and so on. So, um, uh, these guys, without these guys, um-

    5. LF

      Yeah.

    6. JS

      ... I would be nothing.

    7. LF

      It's a lot of incredible work.

    8. JS

      What is now the depth? Uh, what is the importance of depth? Well, um, most problems in the real world are deep in the sense that, um, the current input doesn't tell you all you need to know about the environment.

    9. LF

      Mm-hmm.

    10. JS

      So instead, um, you have to have a memory of what happened in the past. And often important parts of that memory are dated. They are pretty old.

    11. LF

      Mm-hmm.

    12. JS

      So, um, when you're doing speech recognition, for example, and somebody says, "11,"-

    13. LF

      Mm-hmm.

    14. JS

      ... then that's about half a second or something like that-

    15. LF

      Mm-hmm.

    16. JS

      ... which means it's already 50 timesteps.

    17. LF

      Mm-hmm.

    18. JS

      And another guy or the same guy sa- says, "Seven," so the ending is the same, -even.

    19. LF

      Mm-hmm.

    20. JS

      But now the system has to see the distinction between seven and 11, and the only way it can see the difference is it has to store that, uh, 50 steps ago, there was an S or an L, 11 or a 7.... so there, you have already a problem of depth 50 because for each time step, you have something like a virtual, uh, layer in the expanded unrolled version of this recurrent network which is doing the speech recognition. So these long time lags, they translate into problem depth and most problems in this world are such that you really, um, have to look far back in time to understand what is the problem and to solve it.

    21. LF

      But just like with LSTMs, you don't necessarily need to, when you look back in time, remember every aspect. You just need to r- remember the important aspects.

    22. JS

      That's right. The network has to learn to put the important stuff in- into memory and to ignore the unimportant noise.

    23. LF

      So, but in that sense, deeper and deeper is better or is there a limitation? Is- is there... I mean, LSTM is- is one of the great examples of architectures that, uh, do something beyond just deeper and deeper networks. Uh, there's clever mechanisms for filtering data, for remembering and forgetting. Uh, so do you think th- that kind of thinking is necessary? If you think about LSTMs as a leap, a big leap forward over traditional vanilla RNNs, what do you think is the next leap-

    24. JS

      Hmm.

    25. LF

      ... i- within this context? So LSTM is a very clever improvement, but LSTMs still don't have the same kind of ability to see far back in the futu- in the- in the past as us humans do, the credit assignment problem across way back-

    26. JS

      Hmm.

    27. LF

      ... not just 50 time steps or 100 or 1,000, but millions and billions?

    28. JS

      It's not clear what are the practical limits of the LSTM when it comes to looking back. Already in 2006, I think, we had examples where it not only looked back tens or thousands of steps, but really millions of steps and, um, Juan Perez, um, Ortiz in my lab, I think was the first author of a paper where, um, we really, was it 2006 or something?

    29. LF

      Mm-hmm.

    30. JS

      Had, uh, examples where it learned to look back, um, for more than 10 million steps.

  14. 50:571:06:32

    Controller-Model (CM) systems, the next RL wave, and robots that learn like children

    1. JS

      So for most problems of speech recognition, it's not necessary to look that far back, but there are examples where it does. Now, the looking back thing, that's rather easy because there's only one past, but there are many possible futures.

    2. LF

      Hm.

    3. JS

      And so a reinforcement learning system which is trying to maximize its future expected reward and doesn't know yet which of these many possible futures should I select given this one single past is facing problems that the LSTM by itself cannot solve. So the LSTM is good for coming up with a compact representation of the history so far, of the history and of observations and actions so far-

    4. LF

      Mm-hmm.

    5. JS

      ... but now how do you plan in an efficient and good way among all these... How do you select one of these many possible action sequences that a reinforcement learning system has to consider to maximize reward in this unknown future? So again, it... we have this basic, um, setup where you have one recurrent network which gets in the video and the speech and whatever, and is executing the actions and is trying to maximize reward. So there is no teacher who tells it what to do at which point in time. And then there's the other network which is just predicting what's going to happen if I do them then, and that could be an LSTM network and it learns to look back all the way to make better predictions-

    6. LF

      Mm-hmm.

    7. JS

      ... of the next time step. So essentially, although it's m- predicting only the next time step, it is motivated to learn to put into memory something that happened maybe a million steps ago because it's important, uh, to memorize that if you want to predict that, uh, at the next time step, the next event, you know? Now, um, how can a model of the world like that, a predictive model of the world be used by the first guy, let's call it the controller-

    8. LF

      Mm-hmm.

    9. JS

      ... and the model, the controller and the model, how can the model be used by the controller to efficiently select amou- among these many possible futures? The naive way we had, um, about 30 years ago was let's just use the model of the world as a stand-in, as a simulation of the world and millisecond by millisecond, we plan the future-

    10. LF

      Mm-hmm.

    11. JS

      ... and, um, that means we have to roll it out really in detail and it will work only if the model is really good and it will still be inefficient because we have to look at all these possible futures and- and there are so many of them. So instead, what we do now since 2015 in our CM systems, Controller Model systems, we give the controller the opportunity to learn by itself how to use the potentially relevant parts of the M, of the model network-

    12. LF

      Mm-hmm.

    13. JS

      ... to solve new problems more quickly and if it wants to, it can learn to ignore the M and sometimes it's a good idea to ignore the- the M because it's really bad. It's a bad predictor in this particular, um, situation of life, uh, where the control is currently trying to maximize reward. However, it can also learn to address and exploit some of the subprograms that, uh, came about in the model network through compressing the data by predicting it.... so it now has an opportunity to reuse that code, the algorithmic information in the model network, to reduce its own search space, such that it can solve a new problem more quickly than without the model.

    14. LF

      Compression. So, you're ultimately optimistic and excited about the power of RL, of reinforcement learning, in the context of real systems?

    15. JS

      Absolutely, yeah.

    16. LF

      So you see RL as a potential having a huge impact beyond just sort of the M part that is often developed on, you know, supervised learning methods? You see RL a- a- as a, uh, for problems of self-driving cars or any kind of applied cyber robotics? That's the correct, interesting direction for research in your view?

    17. JS

      I do think so. We have a company called Nescens-

    18. LF

      Nescens.

    19. JS

      ... which, um, has applied reinforcement learning to little Audis.

    20. LF

      Little Audis.

    21. JS

      Which learn to park without a teacher. The same principles were used, of course. So these little Audis, they are small, maybe like that, so much smaller than the real Audis, but they have all the sensors, uh, that you find in the real Audis, you find the cameras, the LiDAR sensors. They go up to 120-20 kilometers an hour if you, if, if they want to.

    22. LF

      (laughs)

    23. JS

      And, um, and they have, um, pain sensors basically and they don't want to bump against obstacles and other Audis, and so they, um, must learn like little babies to park. Take the raw vision input and translate that into actions that lead to successful parking behavior, which is a rewarding thing. And yes, they learn that.

    24. LF

      They learn, successfully.

    25. JS

      Um, so we have examples like that, and it's only in the beginning. Um, you know, this is just the tip of the iceberg, and I believe the next wave of AI is going to be all about that. Uh, so at the moment, the current wave of AI is about passive pattern observation and, uh, prediction and, um, and that's what you have on your smartphone, and what the major companies on the Pacific Rim are using to sell you ads, to do marketing.

    26. LF

      (laughs) Yeah.

    27. JS

      That's the current, uh-

    28. LF

      Yes.

    29. JS

      ... source of profit in AI, and that's only one or 2% of the world economy, um, which is big enough to make these companies pretty much the most valuable companies in the world. But there's a much, much bigger fraction of the economy going to be affected by the next wave, which is really about machines that shape the data through their own actions. And-

    30. LF

      Do you think simulation is ultimately the biggest, uh, way that, that th- those methods will be successful in the next 10, 20 years? We're not talking about 100 years from now.

  15. 1:06:321:20:04

    Jobs, existential risk, and cosmic expansion: AI ecologies, resources beyond Earth, and alien intelligence

    1. LF

      Are you optimistic about that future? Are you concerned? Uh, there's a lot of people concerned on, in the near term about the transformation of the nature of work. The kind of ideas that you just suggested would have a significant impact of what kind of things could be automated. Are you optimistic about that future, or are you nervous about that future? And looking a little bit farther into the future, there's people like Elon Musk, uh, Stuart Russell, concerned about the existential threats-

    2. JS

      Mm.

    3. LF

      ... of that future.

    4. JS

      Mm.

    5. LF

      So, in the near term, job loss, in the long term, existential threat. Are these concerns to you, or are you ultimately optimistic?

    6. JS

      So, let's first address the near future. We have had predictions of job losses for many decades. For example, when industrial robots came along-

    7. LF

      Mm-hmm.

    8. JS

      ... many pe- many people predicted lots of jobs are going to get lost. And, in a sense, they were right.

    9. LF

      Mm-hmm.

    10. JS

      Because back then, there were, um, car factories and hundreds of people, and these factories assembled cars. And today, the same car factories have hundreds of robots and maybe three guys watching the robots. On the other hand, those countries that have lots of, uh, robots per capita, Japan, Korea, Germany, Switzerland, couple of other countries, they have really low unemployment rates. Somehow, all kinds of new jobs were created. Back then, nobody anticipated those jobs. And, um, decades ago, I already said, it's really easy to, um, say which jobs are going to get lost, but it's really hard to predict the new ones. Thirty years ago, who would have predicted all these people making money as, uh, YouTube bloggers, for example?

    11. LF

      Mm.

    12. JS

      200 years ago, 60% of all people used to work in agriculture.

    13. LF

      Mm-hmm.

    14. JS

      Today, maybe 1%. But still, only, I don't know, 5% unemployment. Lots of new jobs were created, and Homo Ludens, the, the playing man, is inventing new jobs all the time. Most of these jobs are not existentially necessary-

    15. LF

      Mm.

    16. JS

      ... for the survival of our species. There are only very few existentially necessary jobs, such as farming and building houses, and, and warming up the houses, but less than 10% of the population is doing that. And most of these newly, um, invented jobs are about, um, interacting with other people in new ways, through new media and so on, getting new ki- types of kudos and forms of likes and whatever, and even making money through that. So, Homo Ludens, the playing man, doesn't want to be unemployed, and that's why he's inventing new jobs all the time, and he keeps considering these jobs as really important and is investing a lot of energy and hours of work into, into those new jobs.

    17. LF

      That's, uh, quite beautifully put. We're really nervous about the future because we can't predict what kind of new jobs will be created-

    18. JS

      Mm-hmm.

    19. LF

      ... but you're ul- uh, ultimately optimistic that we, uh, humans are so restless that we create and give meaning to newer and newer jobs, totally new, uh, cl- likes on Fa- things that get likes on Facebook or whatever the social platform is. So, what about long-term existential threat of AI, where our whole civilization may be swallowed up by this ultra super intelligent systems?

    20. JS

      Maybe it's not going to be swallowed up, but, um, I'd be surprised if, uh, we were, uh, we humans were the last step in the evolution of the universe. And, um-

    21. LF

      You- you've actually had this beautiful comment somewhere that, uh, I've seen saying that, uh, artificial... (laughs) Uh, quite insightful. So, artificial general intelligence systems, just like us humans, will likely not want to interact with humans. They'll just interact amongst themselves, just like ants interact amongst themselves, and only, uh, tangentially interact with humans.

    22. JS

      Mm-hmm.

    23. LF

      And it's- it's quite an interesting idea that once we create AGI, they will lose interest in humans and- and have- compete for their own Facebook likes on their own social platforms.

    24. JS

      Mm-hmm.

    25. LF

      So, within that, uh, quite elegant idea, how do we know, in a hypothetical sense, that there's not already intelligent systems out there? How do you think broadly of general intelligence greater than us? How would we know it's out there?

    26. JS

      Mm.

    27. LF

      How would we know it's around us, and could it already be?

    28. JS

      I'd be surprised if n- with- within the next few decades or something like that, we, um, we won't have AIs that are truly smart in every single way and better problem solvers in almost every single important way. And I'd be surprised if they wouldn't realize what we have realized a long time ago, which is that, um, almost all physical resources are not here in this biosphere, but further out.... the rest of the solar system gets two billion times more solar energy than our little planet. There's lots of material out there that you can use to build robots and self-replicating robot factories and all this stuff. And they are going to do that, and they will be scientists, um, and curious, and they will explore what they can do. And in the beginning, they will be fascinated by life and by their own origins in our civilization. They will want to understand that completely, just like, uh, people today would like to understand how life works and, um, and also, um, the history of our own, um, existence and civilization, but then also the physical laws that created all of that. So they, um, in the beginning, they will be fascinated by life. Once they understand it, they lose interest, um, like anybody who loses interest in things he understands. And then, as you said, um, the most interesting sources of, um, information for them will be others of their own kind.

    29. LF

      Mm-hmm.

    30. JS

      So at least in the long run, there seems to be some sort of protection through lack of interest on the other side. And, um, and now it seems also clear as far as we understand physics, you need matter and energy to compute and to build more robots and infrastructure and more AI civilization and A-AI ecologies consisting of trillions of different types of AIs. And so it seems inconceivable to, to me that this thing is not going to expand. Some AI ecology not controlled by one AI, but, but by trillions of different types of AIs competing in all kinds of quickly evolving and disappearing ecological niches in ways that we cannot fathom at the moment. But it's going to expand, limited by light speed and physics, but it's going to expand and, and now we realize that the universe is still young. It's only 13.8 billion years old, and it's going to be a thousand times older than that. So there's plenty of time to conquer the entire universe and to fill it with intelligence and senders and receivers such that AIs can travel the way they are traveling in our labs today, which is by radio from sender to receiver. And let's call the current age of the universe one eon. One eon. Now, it will take just a few eons from now and the entire visible universe is going to be full of that stuff. And let's look ahead to a time when the universe is going to be 1,000 times older than it is now. They will look back and they will say, "Look, almost immediately after the Big Bang, only a few eons later, the entire universe started to become intelligent." Now to your question, how do we see whether anything like that has already happened or is already in a more advanced stage in some other part of the universe, of the visible universe? We are trying to look out there and nothing like that has happened so far, or is that true?

Episode duration: 1:19:58

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode 3FIo6evmweo

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.