Skip to content
Lex Fridman PodcastLex Fridman Podcast

Ian Goodfellow: Generative Adversarial Networks (GANs) | Lex Fridman Podcast #19

Lex Fridman and Ian Goodfellow on ian Goodfellow on GANs, AI Limits, Security, and Future Intelligence.

Lex FridmanhostIan Goodfellowguest
Apr 18, 20191h 8mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:0015:00

    The following is a…

    1. LF

      The following is a conversation with Ian Goodfellow. He's the author of the popular textbook on Deep Learning, simply titled Deep Learning. He coined the term of generative adversarial networks, otherwise known as GANs, and with his 2014 paper, is responsible for launching the incredible growth of research and innovation in this subfield of Deep Learning. He got his BS and MS at Stanford, his PhD at University of Montreal with Yoshua Bengio and Aaron Courville. He held several research positions, including an OpenAI, Google Brain, and now at Apple as the director of machine learning. This recording happened while Ian was still at Google Brain. But we don't talk about anything specific to Google or any other organization. This conversation is part of the Artificial Intelligence podcast. If you enjoy it, subscribe on YouTube, iTunes, or simply connect with me on Twitter @lexfridman, spelled F-R-I-D. And now, here's my conversation with Ian Goodfellow. You open your popular Deep Learning book with a Russian doll type diagram that shows Deep Learning is a subset of representation learning, which in turn is a subset of machine learning, and finally a subset of AI. So this kind of implies that there may be limits to Deep Learning in the context of AI. So what do you think is the current limits of Deep Learning, and, uh, are those limits something that we can overcome with time?

    2. IG

      Yeah. I think one of the biggest limitations of Deep Learning is that right now, it requires really a lot of data, especially labeled data. Um, there are some unsupervised and semi-supervised learning algorithms that can reduce the amount of labeled data you need, but they still require a lot of unlabeled data.

    3. LF

      Mm.

    4. IG

      Reinforcement learning algorithms, they don't need labels, but they need really a lot of experiences. Um, as human beings, we don't learn to play pong by failing at pong two million times. So just getting the generalization ability better is one of the most important bottlenecks in the capability of the technology today. And then I guess I'd also say Deep Learning is like a component of a bigger system. Um, so far, nobody is really proposing to have, uh, only what you'd call Deep Learning as the entire ingredient of intelligence. You use Deep Learning as sub-modules of other systems, like AlphaGo has a Deep Learning model that estimates the value function. Um, you know, most reinforcement learning algorithms have a Deep Learning module that estimates which action to take next, but you might have other components.

    5. LF

      So you're basically as, uh, building a function estimator. Do you think it's, uh, possible, you said nobody's kind of been thinking about this so far, but do you think neural networks could be made to reason in the way symbolic systems did in the ’80s and '90s to do more, create more, like, programs as opposed to functions?

    6. IG

      Yeah. I think we already see that a little bit. I already kind of think of neural nets as a kind of, of program. I think of Deep Learning as basically learning programs that have more than one step. Um, so if you draw a flowchart or, or if you draw a tensor flow graph describing your machine learning model, I think of the depth of that graph as describing the number of steps that run in sequence, and then the width of that graph is the number of steps that run in parallel. Now it's been long enough that we've had Deep Learning working that it's a little bit silly to even discuss shallow learning anymore.

    7. LF

      Mm.

    8. IG

      But back when I first got involved in AI, when we used machine learning, we were usually learning things like support vector machines, where you could have a lot of input features to the model, and you could multiply each feature by a different weight. But all those multiplications were done in parallel to each other, and there wasn't a lot done in series. I think what we got with Deep Learning was really the ability to have, uh, steps of a program that run in sequence. And I think that we've actually started to see that what's important with Deep Learning is more the fact that we have a multi-step program rather than the fact that we've learned a representation. If you look at things like, uh, ResNets, for example, they take one particular kind of representation, and they update it several times. Uh, back when Deep Learning first really took off in the academic world in 2006, when Geoff Hinton, uh, showed that you could train deep belief networks, everybody who was interested in the idea thought of it as each layer learns a different level of abstraction, that the first layer, uh, trained on images learns something like edges, and the second layer learns corners, and eventually, you get these kind of grandmother cell units that recognize specific objects. Today, I think most people think of it more as a computer program, where as you add more layers, you can do more updates before you output your final number. But I don't think anybody believes that layer 150 of the ResNet, uh, is a grandfa- grandmother cell, and, you know, layer 100 is contours or something like that.

    9. LF

      Okay. So you think... You're not thinking of it as a singular representation that keeps building. You think of it as a program, sort of almost like a state representation is a state of understanding in something ?

    10. IG

      Yeah. I think of it as a program that makes several updates and arrives at better and better understandings, but it's not replacing the representation at each step. It's refining it. And in some sense, that's a little bit like reasoning. It's not reasoning in the form of deduction, but it's reasoning in the form of taking a thought and refining it and refining it carefully until it's good enough to use.

    11. LF

      So do you think... and I hope you don't mind, we'll jump philosophical every once in a while. Do, (laughs) do, do you think, uh, of, you know, cognition, human cognition or even consciousness as simply a result of this kind of, uh, sequent- sequential representation learning? Do you think that can emerge?

    12. IG

      Cognition, yes, I think so. Consciousness, it's really hard to even define what we mean by that.I guess there's, consciousness is often defined as things like having self-awareness-

    13. LF

      Mm-hmm.

    14. IG

      ... and that's relatively easy to turn into something actionable for a computer scientist to reason about. People also define consciousness in terms of having qualitative states of experience, like qualia-

    15. LF

      Right.

    16. IG

      ... and there's all these philosophical problems, like, could you imagine a, a zombie who does all the same information processing as a human, but doesn't really have the qualitative experiences that we have? That sort of thing, I have no idea how to formalize or turn it into a scientific question. I don't know how you could run an experiment to tell whether a person is a zombie or not, and similarly, I don't know how you could run an experiment to tell whether an advanced AI system had become conscious in the sense of qualia or not.

    17. LF

      But in the more practical sense, like almost like self-attention, you think consciousness and cognition can, in an impressive way, emerge from current types of architectures that we think of as deep learning?

    18. IG

      Yeah, yeah. Or, or if, if you think of consciousness in terms of self-awareness and just, um, making plans based on the fact that the agent itself exists in the world, reinforcement learning algorithms are already more or less forced to model the agent's effect on the environment. So that, that more limited version of consciousness is already something that we get limited versions of with reinforcement learning algorithms, if they're trained well.

    19. LF

      But, uh, you say limited. So the, the, the big question really is how you jump from limited to human level.

    20. IG

      Yeah.

    21. LF

      Right? And, uh, whether it's possible, it, th- y- you know, the, even just building common sense reasoning seems to be exceptionally difficult. So can... if we scale things up, if we get much better on supervised learning, if we get better at labeling, if we get bigger datasets, uh, the more compute, do you think we'll start to see really impressive things that go from limited to, you know, uh, something, echoes of human-level cognition?

    22. IG

      I think so, yeah. I'm, I'm optimistic about what can happen just with more computation and more data. Uh, I do think it'll be important to get the right kind of data. Uh, today, most of the machine learning systems we train are mostly trained on one type of data for each model. Uh, but th- the human brain, w- we get all of our different senses and we have many different experiences, like, you know, riding a bike, driving a car, talking to people, reading. Um, I think when you get that kind of integrated dataset working with a machine learning model that can actually close the loop and interact, w- we may find that algorithm's not so different from what we have today, learn really interesting things when you scale them up a lot and train them on a large amount of multimodal data.

    23. LF

      And so multimodal is really interesting, but within, like your work in, uh, adversarial examples, so selecting within modal, within a one mode of data, (laughs) uh, selecting better at what are the difficult cases from which are most useful to learn from.

    24. IG

      Oh, yeah. Like, like, could we, could we get a whole lot of mileage out of, uh, designing a model that's resistant to adversarial examples or something like that?

    25. LF

      Right.

    26. IG

      Yeah.

    27. LF

      That's the question.

    28. IG

      My, my thinking on that has evolved a lot over the last few years, when I-

    29. LF

      Oh, interesting.

    30. IG

      ... when I first started to really invest in studying adversarial examples, I was thinking of it mostly as adversarial examples reveal a big problem with machine learning-

  2. 15:0030:00

    Mm-hmm. …

    1. IG

      stabilized to the point where some core ideas from the 1980s are still used today. When I first started studying machine learning, almost everything from the 1980s had been rejected, and now some of it has come back. So, that stuff that's really stood the test of time is what I focused on putting into the book. Uh, there is also, I guess, two different philosophies about how you might write a book. Uh, one philosophy is you try to write a reference that covers everything.

    2. LF

      Mm-hmm.

    3. IG

      And the other philosophy is you try to provide a high level summary that gives people the language to understand a field and tells them what the most important concepts are. Uh, the first deep learning book that I wrote with Yoshua and Aaron was somewhere between the- the two philosophies, that it's trying to be both a reference and an introductory guide. Uh, writing this chapter for Russell and Norvig's book, I was able to focus more on just a concise introduction of the key concepts and the language you need to read about them more. In a lot of cases, I actually just wrote paragraphs that said, "Here is a rapidly evolving area that you should pay attention to." It's-

    4. LF

      (laughs)

    5. IG

      It's pointless to try to tell you what the latest and best version of a, you know, learn to learn model is.

    6. LF

      Right.

    7. IG

      Um, you know, I can- I can point you to a paper that's recent right now, but there isn't a whole lot of a reason to delve into exactly what's going on with the latest learning to learn approach or the latest module produced by a learning to learn algorithm. You should know that learning to learn is a thing, and that it may very well be the source of the latest and greatest convolutional net or recurrent net module that you would want to use in your latest project. But there isn't a lot of point in trying to summarize exactly, uh, which architecture and which learning approach got to which level of performance.

    8. LF

      So, you maybe focus more on the basics of the methodology, so f- from back propagation to feedforward to recurrent neural networks, convolutional, that kind of thing?

    9. IG

      Yeah, yeah.

    10. LF

      So, if I were to ask you, I remember I took, um, algorithms and data structures algorithms course, I remember the- the professor asked, uh, "What is an algorithm?" And, uh, yelled at everybody, in a good way, that nobody was answering it correctly. Everybody knew what the algor- it was a graduate course. Everybody knew what an algorithm was, but they weren't able to answer it well. So, let me ask you, uh, in that same spirit, what is deep learning?

    11. IG

      I would say deep learning is any kind of machine learning that involves learning parameters of more than one, uh, consecutive step. So that, I mean, shallow learning is things where you learn a lot of operations that happen in parallel. You might have a system that makes multiple steps, um, like you might have hand-designed feature extractors, uh, but really only one step is learned. Deep learning is anything where you have multiple operations in sequence, and that includes the things that are really popular today, like convolutional networks and recurrent networks. Uh, but it also includes some of the things that have died out, uh, like Boltzmann machines-

    12. LF

      Mm.

    13. IG

      ... where w- we weren't using back propagation. Today, I hear a lot of people define deep learning as gradient descent a- applied to these differentiable functions.

    14. LF

      Mm-hmm.

    15. IG

      And I think that's a legitimate usage of the term. It's just different from the way that I use the term myself.

    16. LF

      So, what's an example...... of, uh, deep learning that is not gradient descent on differentiable functions, uh, in your, I mean, not specifically perhaps, but more even looking into the future. What- what- what's your thought about that space of approaches?

    17. IG

      Yeah. So I tend to think of machine learning algorithms as decomposed into really three different pieces. There's the model, which can be something like a neural net or a Boltzmann machine or a- a recurrent model. And- and that basically just describes how do you take data and how do you take parameters, and, you know, wha- what function do you use to make a prediction given the data and the parameters? Um, another piece of the learning algorithm is the optimization algorithm. Or not every algorithm can be really described in terms of optimization, but what's the algorithm for updating the parameters or updating whatever the state of the network is? Uh, and then the- the last part is the- the dataset. Like, how do you actually represent the world as it comes into your machine learning system? Um, so I think of deep learning as telling us something about, what does the model look like? And basically to qualify as deep, I say that it just has to have multiple layers. That can be multiple steps in a feedforward differentiable computation. That can be multiple layers in a graphical model. There's a lot of ways that you could satisfy me that something has, uh, multiple steps that are each parameterized separately. Um, I think of gradient descent as being all about that other piece, the how do you actually update the parameters piece. So you can imagine having a deep model, like a convolutional net, and training it with something like evolution or a genetic algorithm, and I would say that still qualifies as deep learning. Uh, and then in terms of models that aren't necessarily differentiable, uh, I guess, Boltzmann machines are probably the main example of something where you can't really take a derivative and use that for the learning process. Uh, but you- you can still argue that the model has many steps of processing that it applies when you run inference in the model.

    18. LF

      So it's the steps of processing that's key. So Geoff Hinton suggests that we need to throw away back prop- back propagation and start all over. What do you think about that? What could an alternative direction of training neural networks look like?

    19. IG

      I- I don't know that back propagation is gonna go away entirely. Most of the time, when we decide that a machine learning algorithm isn't on the critical path to research for improving AI, the algorithm doesn't die. It just becomes used for some specialized set of things.

    20. LF

      Right.

    21. IG

      Uh, a lot of algorithms, like logistic regression, don't seem that exciting to AI researchers who are working on things like speech recognition or autonomous cars today. But there's still a lot of use for logistic regression in things like analyzing really noisy data in medicine and finance or, um, making really rapid predictions in really time-limited contexts. So I think- I think, uh, back propagation and gradient descent are around to stay, but they may not end up being, um, everything that we need to get to real human level or superhuman AI.

    22. LF

      Are you optimistic about us discovering... S- you know, back propagation has been around for a few decades. So, uh, are you optimistic about- about us as a community being able to discover something better?

    23. IG

      Yeah, I am. I think- I think we likely will find something that works better. You could imagine things like having stacks of models where, uh, some of the lower level models predict parameters of the higher level models, and so th- at the top level, you're not learning in terms of literally calculating gradients, but just predicting how different values will perform. You can kind of see that already in some areas like Bayesian optimization, where you have a Gaussian process that predicts how well different parameter values will perform. We already use those kinds of algorithms for things like hyper-parameter optimization, and in general, we know a lot of things other than back prop that work really well for specific problems. The main thing we haven't found is a way of taking one of these other non-back prop based algorithms and having it really advance the state of the art on an AI level problem.

    24. LF

      Right.

    25. IG

      But I- I- I wouldn't be surprised if eventually we find that some of these algorithms that... even the ones that already exist, not even necessarily a new one, we might find some way of, uh, customizing one of these algorithms to do something really interesting at the level of cognition or- or the- the level of, um... I think one system that we really don't have working quite right yet is, uh, like short-term memory.

    26. LF

      Mm-hmm.

    27. IG

      Uh, we have things like LSTMs. They're called long short-term memory. Uh, they still don't do quite what a human does with short-term memory. Um, like gradient descent, to learn a specific fact has to do multiple steps on that fact. Like, if I- I tell you, "The meeting today is at 3:00 PM," um, I don't need to say over and over again, "It's at 3:00 PM. It's at 3:00 PM. It's at 3:00 PM. It's at 3:00 PM-"

    28. LF

      Right.

    29. IG

      ... for you to do a gradient step on each one. You just hear it once and you remember it. Um, there's been some work on things like, uh, self-attention and attention-like mechanisms, like the Neural Turing Machine, that can write to memory cells and update themselves with facts like that right away, but I don't think we've really nailed it yet. And that's one area where I'd imagine that new optimization algorithms or different ways of applying existing optimization algorithms could give us a way of just lightning fast updating the state of a machine learning system to contain a- a specific fact like that without the need to have it presented over and over and over again.

    30. LF

      So some of the success, uh, of symbolic systems in the '80s is they were able to assemble these kinds of facts, uh, better, but th- there's a lot of expert input required, and it's very limited in that sense. Do you ever look back to that?... as, uh, something that we'll have to return to eventually, sort of dust off the book from the shelf and, uh, think about how we build knowledge representation, knowledge base-

  3. 30:0045:00

    Mm-hmm. …

    1. IG

      There are a lot of theory papers showing that under some theoretical settings the, the GAN algorithm does actually converge.

    2. LF

      Mm-hmm.

    3. IG

      But those settings are restricted enough, uh, that they don't necessarily explain the whole picture in terms of all the results that we see in practice.

    4. LF

      So taking a step back, can you, uh, in the same way as we talked about deep learning, can you tell me what generative adversarial networks are?

    5. IG

      Uh, yeah. So generative adversarial networks are a particular kind of generative model.A, a generative model is a machine learning model that can train on some set of data, like say you have a collection of photos of cats and you want to generate more photos of cats, or you want to estimate a probability distribution over cats, so you can ask how likely it is that some new image is a photo of a cat. Um, GANs are one way of doing this. Uh, some generative models are good at creating new data. Uh, other generative models are good at estimating that density function and telling you how likely particular, uh, pieces of data are to come from the same distribution as the training data. Uh, GANs are more focused on generating samples rather than estimating the density function. There are some kinds of GANs, like FlowGAN, that can do both, but mostly GANs are about generating samples of generating new photos of cats that look realistic. Uh, and they do that completely from scratch. It's analogous to human imagination when a GAN creates a new image of a cat. It's using a neural network to produce a cat that has not existed before. It isn't doing something like compositing photos together. You're not, you're not literally taking the eye off of one cat and the ear off of another cat. It's, it's more of this digestive process where the, the neural net trains on a lot of data and comes up with some representation of the probability distribution and generates entirely new cats. Uh, there are a lot of different ways of building a generative model. What's specific to GANs is that we have a two-player game in the game theoretic sense, and as the players in this game compete, one of them becomes able to generate realistic data. The first player is called the generator. It produces output data such as just images, for example.

    6. LF

      Mm-hmm.

    7. IG

      And at the start of the learning process, it'll just produce completely random images. The other player is called the discriminator. The discriminator takes images as input and guesses whether they're real or fake. Uh, you train it both on real data, so photos that come from your training set, actual photos of cats, and you train it to say that those are real. You also train it on images that come from the generator network, and you train it to say that those are fake. Uh, as the two players compete in this game, the discriminator tries to become better at recognizing whether images are real or fake, and the generator becomes better at fooling the discriminator into thinking that its outputs are, are real. Uh, and you can analyze this through the language of game theory and find that there's a Nash equilibrium where the generator has captured the correct probability distribution. So in the cat example, it makes perfectly realistic cat photos, and the discriminator is unable to do better than random guessing because all the, all the samples coming from both the data and the generator look equally likely to have come from either source.

    8. LF

      So do you ever, do you ever sit back and does it just blow your mind that this thing works? So from very... So it's able to estimate that density function enough to generate, generate realistic images. I mean, uh, does it... Yeah. Do you ever sit back (laughs) -

    9. IG

      Yeah.

    10. LF

      ... and think "How does this even... Why..." This is quite incredible, especially where GANs have gone in terms of realism.

    11. IG

      Yeah. And, and not just to flatter my own work, but generative models, all of them have this property that if they really did what we asked them to do, they would do nothing but memorize the training data.

    12. LF

      Right. Exactly.

    13. IG

      Um, models that are based on maximizing the likelihood, the way that you obtain the maximum likelihood for a specific training set is you assign all of your probability mass to the training examples and nowhere else. Uh, for GANs, the game is played using a training set. So the way that you become unbeatable in the game is you literally memorize training examples. One of my former interns wrote a paper, uh, his name is Vaishnav Nagarajan, and he showed that it's actually hard for the generator to memorize the training data. Uh, hard in a-

    14. LF

      Right.

    15. IG

      ... a statistical learning theory sense that you can actually create reasons for w- why it, it would require, um, quite a lot of learning steps and, and a lot of observations of, of different latent variables before you could memorize the training data. That still doesn't really explain why when you produce samples that are new, why do you get compelling images rather than, you know, just garbage that's different from the training set, and I don't think we really have a good answer for that, especially if you think about how many possible images are out there-

    16. LF

      Right.

    17. IG

      ... and, uh, how few images the generative model sees during training. It seems just unreasonable that generative models create new images as well as they do. Uh, especially considering that we're basically training them to memorize rather than generalize.

    18. LF

      W-

    19. IG

      Uh, I, I think part of the answer is there's a paper called Deep Image Prior-

    20. LF

      Mm-hmm.

    21. IG

      ... where they show that you can take a convolutional net, and you don't even need to learn the parameters of it at all. You just use the model architecture, and it's already useful for things like inpainting images. I think that shows us that the convolutional network architecture captures something really important about the structure of images, and we don't need to actually use learning to capture all the information coming out of the convolutional net. Uh, that would, that would imply that it would be much harder to make generative models in other domains. Uh, so far, we're able to make reasonable speech models and things like that.

    22. LF

      Mm-hmm.

    23. IG

      But to be honest, we haven't actually explored a whole lot of different datasets all that much. We don't, for example, see a lot of, uh, deep learning models of, um, like biology datasets, where you have lots of microarrays measuring the amount of different enzymes and things like that. So we may find that some of the progress that we've seen for images and speech turns out to really rely heavily on the model architecture, and we were able to do what we did for vision by trying to reverse engineer the human visual system.

    24. LF

      Right. (laughs)

    25. IG

      Um, and, and maybe it'll turn out that we can't just, uh, use that same trick for arbitrary kinds of data.

    26. LF

      Right, so there's aspects of the human vision system, the hardware of it, that, uh, makes it, uh, without learning, without cognition, just makes it really effective at detecting the patterns we see in the visual world.

    27. IG

      Yeah.

    28. LF

      Uh, that's, yeah, that's really interesting. Uh, what, um, in a big quick overview, in your view, eh, eh, eh, eh, in your view, what types of GANs are there and what other generative models besides GANs are there?

    29. IG

      Yeah, um, so it's maybe a little bit easier to start with what kinds of generative models are there other than GANs. Um, so most generative models are likelihood based, where to train them, you have a model that tells you how, how much probability it assigns to a particular example, and you just maximize the probability assigned to all the training examples. It turns out that it's hard to design a model that can create really complicated images or really complicated audio waveforms and still have it be possible to, uh, estimate the, the likelihood function, uh, eh, from a computational point of view. Uh, most interesting models that you would just write down intuitively, it turns out that it's almost impossible to calculate the amount of probability they assign to a particular point. Um, so there's a few different schools of generative models in the likelihood family. Uh, one approach is to very carefully design the model so that it is computationally tractable to measure the density it assigns to a particular point, so there are things like, uh, autoregressive models-

    30. LF

      Mm-hmm.

  4. 45:001:00:00

    Hm. …

    1. IG

      DeepMind's BigGAN project.

    2. LF

      Hm.

    3. IG

      And they showed that they can match the performance of BigGAN using only 10%, I believe, of the, of the labels. Uh, BigGAN was trained on the ImageNet dataset, which is about 1.2 million images, and had all of them labeled. Um, this latest project from Brain Zurich shows that they're able to get away with only having about 10% of the, uh, of the images labeled. And they do that essentially using a, a clustering algorithm, where the discriminator learns to assign the objects to groups, and then this understanding that objects can be grouped-

    4. LF

      Mm.

    5. IG

      ... into, you know, similar types helps it to form more realistic ideas of what should be appearing in the image, because it, it knows that every image it creates has to come from one of these archetypical groups rather than just being some arbitrary image. If you train a GAN with no class labels, you tend to get things that look sort of like grass or water or brick-

    6. LF

      Mm-hmm.

    7. IG

      ... or, or dirt, but, um, but without necessarily a lot going on in them. And I think that's partly because if you look at a large ImageNet image, the object doesn't necessarily occupy the whole image, and so you learn to create realistic sets of pixels, but you don't necessarily learn that the object is the star of the show and you want it to be in every image you make.

    8. LF

      Yeah, you've... I've, I've heard you talk about the, uh, the horse, the zebra cycle GAN mapping, and how, uh, it turns out, uh, again thought-provoking, that horses are usually on grass and zebras are usually on drier terrain. So when you're doing that kind of generation, you're going to end up generating greener horses or whatever. Uh, so th- those are connected together. It's not just-

    9. IG

      Yeah. Yeah.

    10. LF

      ... you might be able to d- you're not able to segment, uh-

    11. IG

      Yeah.

    12. LF

      ... be able to generate in a segmented way. So are there other types of games you come across in your mind, uh, th- that, uh, neural networks can play with each other to, uh, uh, to, uh, to be able to, uh, solve problems?

    13. IG

      Yeah. The, the one that I spend most of my time on is, in security, you can model most interactions as a game where there's attackers trying to break your system and you're the d- the defender trying to build a resilient system. Um, there's also domain adversarial learning, which is, uh, an approach to domain adaptation that looks really a lot like GANs. Uh, the, the authors had the idea before the GAN paper came out. Their paper came out a little bit later, um, and, you know, they... (laughs) They were very nice and cited the GAN paper, but, a, you know, I, I know that they actually had the idea before it came out. Um, domain adaptation is when you want to train a machine learning model in one, one setting, called a domain, and then deploy it in another domain later. And you would like it to perform well in the new domain, even though the new domain is different from how it was trained. Um, so for example, you might want to train on a really clean image dataset, like ImageNet, but then deploy on users' phones, where the user is taking, you know, pictures in the dark or pictures while moving quickly and just pictures that aren't really centered or composed all that well. When you take a normal machine learning model, it often degrades really badly when you move to the new domain because it looks so different from what the model was trained on. Uh, domain adaptation algorithms try to smooth out that gap, and the domain adversarial approach is based on training a feature extractor where the features have the same statistics regardless of which domain you extracted them on. So in the domain adversarial game, you have one player that's a feature extractor and another player that's a domain recognizer.

    14. LF

      Hm.

    15. IG

      Uh, the domain recognizer wants to look at the output of the feature extractor and guess which of the two domains-

    16. LF

      Oh, interesting.

    17. IG

      ... uh, the features came from. So it's a lot like the real-versus-fake discriminator in GANs.

    18. LF

      Mm-hmm.

    19. IG

      Uh, and then the feature extractor, you can think of as loosely analogous to the generator in GANs, except what it's trying to do here is, uh, both fool the domain recognizer into not knowing which domain the data came from and also extract features that are good for classification. So at the end of the day, you can... I- in, in the cases where it works out, you can actually get, um...... features that work about the same in both domains. Uh, sometimes this has a drawback, where in order to make things work the same in both domains, it just gets worse at the first one.

    20. LF

      (laughs)

    21. IG

      Uh, but there are a lot of cases where it actually works out well in both.

    22. LF

      S- do, do you think of GANs being useful in the context of data augmentation?

    23. IG

      Yeah. W- one thing you could hope for with GANs is, you could imagine I've got a limited training set and I'd like to make more training data to train something else, like a classifier.

    24. LF

      Right.

    25. IG

      You could train the GAN on the training set and then create more data, and then maybe the classifier would perform better on the test set after training on this bigger GAN-generated dataset. Uh, so that's the simplest version of, of something you might hope would work. I've never heard of that particular approach working, but I think there's some, there's some closely related things that, that I think could work in the future, and some that actually already have worked. Um, so if you think a little bit about what we'd be hoping for if we use the GAN to make more training data, we're hoping that the GAN will generalize to new examples better than the classifier would have generalized if it was trained on the same data.

    26. LF

      Got it.

    27. IG

      And I don't know of any reason to believe that the GAN would generalize better than the classifier would, um, but what we might hope for is that the GAN could generalize differently from a specific classifier. So, one thing I, I think is worth trying that I haven't personally tried but someone could try is, what if you trained a whole lot of different generative models on the same training set, create samples from all of them, and then train a classifier on that? Because each of the generative models might generalize in a slightly different way, they might capture many different axes of variation that one individual model wouldn't, and then the classifier can capture all of those ideas by training on all of their data. So, it would be a little bit like making an ensemble of classifiers, and I think that-

    28. LF

      Ensemble of GANs (laughs) -

    29. IG

      Yeah.

    30. LF

      ... in a way.

  5. 1:00:001:08:21

    What do you think…

    1. IG

      mathematical quantitative sense. Right now, we all talk a lot about how interpretable different machine learning algorithms are, but it's really just people's opinion, and everybody probably has a different idea of what interpretability means in their head. If we could define some concept related to interpretability that's actually measurable, that would be a huge leap forward, even without a new algorithm that increases that quantity. And also, once, once we had the definition of differential privacy, it was fast to get the algorithms that guaranteed it. So, you could imagine, once we have definitions of good concepts and interpretability, we might be able to provide the algorithms that have the interpretability guarantees quickly too.

    2. LF

      What do you think it takes to build a system with human-level intelligence, as we quickly venture into the philosophical? So, artificial general intelligence, what do you think it takes?

    3. IG

      Yeah. Um, I, I think that it, it definitely takes, uh, better environments than we currently have for training agents, that we want them to have a really wide diversity of experiences. Uh, I also think it's gonna take really a lot of computation. It's hard to imagine exactly how much.

    4. LF

      So, you're optimistic about simulation, simulating a variety of environments is the path forward, as opposed to operating in the real world?

    5. IG

      I, I think it's a necessary ingredient. Yeah, I, I, I don't think that we're going to get to artificial general intelligence by training on fixed data sets or, or by thinking really hard about the problem. I, I think that the, the agent really needs to interact and have a variety of experiences within the same...... uh, lifespan. And today, we have many different models that can each do one thing, and we tend to train them on one data set or one RL environment. Um, sometimes there are actually papers about getting one set of parameters to perform well in many different RL environments.

    6. LF

      Mm-hmm.

    7. IG

      But we don't really have anything like an agent that goes seamlessly from one type of experience to another and, and really integrates all the different things that it does over the course of its life. Uh, when we do see multi-agent environments, they tend to be, um, or sorry, many, uh, multi-environment agents, they tend to be similar environments, like all, all of them are playing, like, an action-based video game.

    8. LF

      Right.

    9. IG

      Um, we don't really have an agent that goes from, you know, playing a video game to, like, reading the Wall Street Journal, uh, to predicting how effective a molecule will be as a drug or something like that.

    10. LF

      What do you think is a good test for intelligence in your view? There's been a lot of benchmarks, started with the, uh, with Alan Turing, natural conversation being good, being a good benchmark for intelligence. What, what are, what would, uh, uh, Ian Goodfellow sit back and be really damn impressed if a system was able to accomplish?

    11. IG

      Um, something that doesn't take a lot of glue from human engineers. So, imagine that instead of having to go to the, uh, CIFAR website and download CIFAR-10, and then write a Python script to parse it and all that, you could just point an agent at, uh, the CIFAR-10 problem, and it downloads and extracts the data and trains a model and starts giving you predictions. Um, I feel like something that doesn't need to have every step of the pipeline assembled for it-

    12. LF

      Mm-hmm.

    13. IG

      ... but definitely understands what it's doing.

    14. LF

      Is AutoML moving into that direction or are you thinking way even bigger? Just-

    15. IG

      AutoML has mostly been moving toward, um, once we've built all the glue, can the machine learning system, uh-

    16. LF

      Automate it.

    17. IG

      ... design the architecture really well?

    18. LF

      (laughs) Okay.

    19. IG

      And so I'm more of saying, like, if something knows how to pre-process the data so that it successfully accomplishes the task, then it would be very hard to argue that it doesn't truly understand the task in some fundamental sense. And I, I don't necessarily know that that's, like, the philosophical definition of intelligence, but that's something that would be really cool to build, that would be really useful and would impress me and would convince me that we've made a step forward in real AI.

    20. LF

      So, you give it, like, uh, the URL for Wikipedia, and then, uh, uh, next day expect it to be able to, uh, solve CIFAR-10. Okay.

    21. IG

      Or, like, you type in a paragraph explaining what you want it to do, and it figures out what web searches it should run and downloads all the, all the necessary ingredients.

    22. LF

      So, uh, you have a very clear, calm way of speaking, no ums-

    23. IG

      Mm-hmm.

    24. LF

      ... easy to edit. I've seen comments for both you and I, uh, have been, uh, identified as both potentially being robots. Uh, if you have to prove to the world that you are indeed human, how would you do it?

    25. IG

      (laughs) Uh, well, I d- I can understand thinking that I'm a robot. Uh, it's actually-

    26. LF

      It's the flip side-

    27. IG

      Yeah.

    28. LF

      ... of the Turing test, I think.

    29. IG

      Yeah, yeah, the prove, prove you're human test. Um... (laughs)

    30. LF

      (laughs) Intellectually, so you have to, uh... I- is there something that's truly unique, um, in your mind, as sup- does it, does it go back to just natural language again, just being able to, uh, talk your way out of-

Episode duration: 1:08:36

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode Z6rxFNMGdn0

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome