Skip to content
Lex Fridman PodcastLex Fridman Podcast

Jeremy Howard: fast.ai Deep Learning Courses and Research | Lex Fridman Podcast #35

Lex Fridman and Jeremy Howard on jeremy Howard on democratizing deep learning, tools, and real impact.

Lex FridmanhostJeremy Howardguest
Aug 27, 20191h 44mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:0015:00

    The following is a…

    1. LF

      The following is a conversation with Jeremy Howard. He's the founder of fast.ai, a research institute dedicated to making deep learning more accessible. He's also a distinguished research scientist at the University of San Francisco, a former president of Kaggle, as well as a top-ranking competitor there. And in general, he's a successful entrepreneur, educator, researcher and an inspiring personality in the AI community. When someone asks me, "How do I get started with deep learning?" fast.ai is one of the top places I point them to. It's free. It's easy to get started. It's insightful and accessible. And if I may say so, it has very little BS that can sometimes dilute the value of educational content on popular topics like deep learning. Fast.AI has a focus on practical application of deep learning and hands-on exploration of the cutting edge that is incredibly both accessible to beginners and useful to experts. This is the Artificial Intelligence podcast. If you enjoy it, subscribe on YouTube, give it five stars on iTunes, support it on Patreon, or simply connect with me on Twitter, @lexfridman, spelled F-R-I-D-M-A-N. And now here's my conversation with Jeremy Howard. What's the first program you've ever written?

    2. JH

      First program I wrote, that I remember, would be at high school. Um, (sighs) I did an assignment where I decided to try to find out if there were some, like, better musical scales than the normal 12 tone, 12 s- interval scale. So I wrote a program on my Commodore 64 in Basic that searched through other scale sizes to see if it could find one where there were, uh, more accurate, you know, uh, harmonies.

    3. LF

      Like mid-tone, like finding-

    4. JH

      Like, like you want an actual exactly three to two ratio, whereas with a 12 interval scale it's not exactly three to two, for example. So that's-

    5. LF

      And, and the Common-

    6. JH

      ... well tempered as they say in the... (laughs)

    7. LF

      In Basic on a Commodore 64?

    8. JH

      Yeah.

    9. LF

      Where was the interest in music from? Or is it just technical?

    10. JH

      I did music all my life, so I played saxophone and clarinet and piano and guitar and drums and whatever, so...

    11. LF

      How does that thread go through your life? Where's music today? Is it-

    12. JH

      Uh, it's not where I wish it was. I, for various reasons, couldn't really keep it going, particularly because I had a lot of problems with RSI with my fingers, and so I had to kind of like cut back anything that used hands and fingers.

    13. LF

      Mm-hmm.

    14. JH

      Um, I hope one day I'll be able to get back to it healthwise.

    15. LF

      So there's a love for music underlying it all?

    16. JH

      For sure, yeah.

    17. LF

      What's your favorite instrument?

    18. JH

      Uh, saxophone.

    19. LF

      Sax.

    20. JH

      Baritone saxophone. Well, probably bass saxophone but they're awkward.

    21. LF

      Well, um, I always love it when, uh, music is coupled with programming.

    22. JH

      Mm-hmm.

    23. LF

      There's something about a brain that utilizes those, that, uh, emerges with creative ideas. So you've used and studied quite a few programming languages.

    24. JH

      Mm-hmm.

    25. LF

      Can you give an, an overview of what you've used? What are the pros and cons of each?

    26. JH

      Uh, my favorite programming environment almost certainly was Microsoft Access, back in like the earliest days. So that w- that was-

    27. LF

      That's like-

    28. JH

      ... Visual Basic for Applications which is not a good programming language, but the programming environment was fantastic. It's like, the ability to create, you know, user interfaces and tie data and actions to them and create reports and all that is... I've never seen anything as good. There's things nowadays like Airtable which are like s- small subsets of that which people love for good reason, but unfortunately nobody's ever, uh, achieved anything like that.

    29. LF

      What is that? If, if you could pause on that for a second.

    30. JH

      Oh, Access?

  2. 15:0030:00

    But what do you…

    1. JH

      good enough.

    2. LF

      But what do you think the future of programming looks like? What do you hope the future of programming looks like if we zoom in on the computational fields, on data science, on machine learning?

    3. JH

      I, I hope Swift is successful because the, the goal of Swift, the way Chris Lattner describes it, is to be infinitely hackable, and that's what I want. I want something where, um, me and the people I do research with, and my students can look at and change everything from top to bottom. There's nothing mysterious and magical and inaccessible.

    4. LF

      Mm-hmm.

    5. JH

      Unfortunately with Python, it's the opposite of that because Python's so slow. It's, um, extremely unhackable. You get to a point where it's like, okay, from here on down, it's C. So your debugger doesn't work in the same way, your profiler doesn't work in the same way, your build system doesn't work in the same way, it's really not very hackable at all.

    6. LF

      What's the, what's the part you would like to be hackable? Is it for the objective of optimizing training of neural networks, inference of neural networks? Is it performance of the system? Or is there some non-performance related just creative idea?

    7. JH

      It's, it's, it's everything. I mean, in the end, I want to be productive as a practitioner. So that means that, uh, so like at the moment, our understanding of deep learning is incredibly primitive. There's very little we understand, most things don't work very well even though it works better than anything else out there.

    8. LF

      Right.

    9. JH

      There's so many opportunities to make it better. So you look at any domain area, like, I don't know, speech recognition with deep learning or natural language processing classification with deep learning or whatever. Every time I look at an area with deep learning, I always see like, oh, it's, it's terrible. There's lots and lots of obviously stupid ways to do things that need to be fixed. So then I want to be able to jump in there and quickly-

    10. LF

      And, and-

    11. JH

      ... experiment and make them better.

    12. LF

      ... you think the programming language is, um, has a role in that?

    13. JH

      Huge role. Yeah. So currently, Python, um, has a big, uh, gap in terms of our ability to, um, innovate particularly around recurrent neural networks and, um, natural language processing because, uh, 'cause it's so slow. The, the, the actual loop where we actually loop through words, we have to do that whole thing in CUDA C.

    14. LF

      Mm-hmm.

    15. JH

      So we actually can't innovate with the, the kernel, the heart of that most important algorithm.

    16. LF

      Mm-hmm.

    17. JH

      Um, and it's just a huge problem. And this happens all over the place. So we hit, you know, research limitations. Another example, convolutional neural networks which are actually the most popular architecture for lots of things, maybe most things in deep learning, we almost certainly should be using sparse convolutional neural networks.

    18. LF

      Mm-hmm.

    19. JH

      Um, but only like two people are, because to do it, you have to rewrite all of that CUDA C level stuff. And yeah, just researchers and practitioners don't. So like there's just big gaps in like s- what people actually research on, what people actually implement, because of the programming language problem.

    20. LF

      So you think, uh...... do you think it's, it's just too difficult to write in CUDA C, uh, that a programming lang- a higher level programming language like Swift should enable the, the easier imp- th- fooling around creative stuff with RNNs or with sparse convolutional networks?

    21. JH

      Kind of.

    22. LF

      Who, who, who's, uh, who's at fault? Who's e- who's in charge of making it easy for a researcher to play around?

    23. JH

      I mean, no one's at fault.

    24. LF

      (laughs)

    25. JH

      It's just nobody's got around to it yet, or-

    26. LF

      Yeah.

    27. JH

      ... it's just, it's hard, right? And, I mean, part, part of the fault is that we ignored that whole APL kind of direction most... well, nearly everybody did for 60 years, 50 years. But, uh, recently, people have been starting to reinvent pieces of that and kind of create some interesting new directions in the compiler technology. So the place, um, where that's particularly happening right now is, uh, something called MLIR, which is something that, again, Chris Lattner, the Swift guy, is leading. And, uh, yeah, 'cause it's actually not going to be Swift on its own that solves this problem.

    28. LF

      Mm-hmm.

    29. JH

      Because the problem is that currently writing a acceptably fast, you know, GPU program is too complicated regardless of what language you use.

    30. LF

      Right.

  3. 30:0045:00

    Go ahead. …

    1. JH

    2. LF

      Go ahead.

    3. JH

      But it's gonna be a long process. The regulators have to learn how to regulate this. They have to build, you know, um, guidelines. And then, um, the lawyers at hospitals have to develop a new way of understanding that sometimes it makes sense for data to be-... you know, looked at in raw form in large quantities in order to create world-changing results.

    4. LF

      Yeah, so the regulation around data, all that, um, it sounds, uh, probably the hardest problem, but sounds reminiscent of autonomous vehicles as well, many of the same regulatory challenges, many of the same data challenges.

    5. JH

      Yeah, I mean, funnily enough, the problem is less the regulation and more the interpretation of that regulation by l- by lawyers in hospitals. So HIPAA is actually, was designed to, it's, it... The P in HIPAA is not standing, does not stand for privacy. It stands for portability. It's actually meant to be a way that data can be used. Uh, and it was created with lots of gray areas because the idea is that would be more practical and it would help people to use this, this legislation to actually share data in a more thoughtful way. Unfortunately, it's done the opposite because when a lawyer sees a gray area, they see, "Oh, if we don't know we won't get sued, then we can't do it."

    6. LF

      Right.

    7. JH

      So HIPAA is not exactly the problem. The problem is more that there's... Hospital lawyers are not incented to make bold decisions (laughs) about data portability.

    8. LF

      Or even to, uh, embrace technology that saves lives.

    9. JH

      Right.

    10. LF

      They more want to not get in trouble for embracing that technology.

    11. JH

      Right. Also, it em- it, it has also saves lives in a very abstract way, which is like, "Oh, we've been to release these 100,000 anonymized records."

    12. LF

      Right.

    13. JH

      I can't point at the specific person whose life that saved. I can say like, "Oh, we ended up with this paper which found this result, which, you know, diagnosed 1,000 more people than we would have otherwise." But it's like which ones were helped?

    14. LF

      Yeah.

    15. JH

      It's, it's very abstract.

    16. LF

      Yeah. And on the counter side of that, you may be able to point to, uh, a life that was taken because of something that was-

    17. JH

      Yeah. Or, or, or a person whose privacy was violated.

    18. LF

      Violated.

    19. JH

      It's like, "Oh, this specific person-"

    20. LF

      Right.

    21. JH

      ... "you know, was de-identified."

    22. LF

      So b- (laughs)

    23. JH

      Identified.

    24. LF

      S- just a fascinating topic. We're jumping around. We'll get back to fast.ai, but on the question of privacy, data is the fuel for so much innovation in deep learning. What's your sense in privacy, whether we're talking about Twitter, Facebook, YouTube, uh, just the technologies like in the medical field that rely on people's data in order to create impact? How do we get that right, uh, respecting people's privacy and yet creating technology that-

    25. JH

      Well-

    26. LF

      ... uh, is learned from data?

    27. JH

      ... one of my areas of focus is on doing more with less data. Um, which, so most vendors, unfortunately, are strongly incented to find ways to require more data and more computation. So Google and IBM being the most obvious, uh-

    28. LF

      IBM?

    29. JH

      Yeah.

    30. LF

      Sorry.

  4. 45:001:00:00

    Although, I mean, the…

    1. LF

      option. Speaking-

    2. JH

      Although, I mean, the nice thing is, nowadays, everybody is now working on NLP transfer learning.

    3. LF

      (laughs)

    4. JH

      Since that time, we've had GPT and GPT-2 and BERT and, you know-

    5. LF

      Yeah. Yeah.

    6. JH

      ... it's like, it's... So yeah, once you show that something's possible-

    7. LF

      Right.

    8. JH

      ... everybody jumps in, I guess, so.

    9. LF

      I hope, hope to be a part of and I hope to see more innovation in active learning in the same way. I think-

    10. JH

      Yeah, me too.

    11. LF

      ... transfer learning and active learning are fascinating, public open work.

    12. JH

      I actually helped start a startup called Platform AI which is really all about active learning. And, uh, yeah, it's been interesting trying to kind of see what research is out there and make the most of it, and there's basically none, so we've had to do all our own research. (laughs)

    13. LF

      Once again, just as you described. Can you tell the story of, uh, the Stanford competition, DAWNBench, and fast.ai's achievement on it?

    14. JH

      Sure. So, something which I really enjoy is that I, I basically teach two courses a year. Um, the practical Deep Learning for Coders, which is kind of the introductory course, and then Cutting-Edge Deep Learning for Coders, which is the kind of research-level course. And when I teach those courses, um, I have a, a, I, I basically have a big office, uh, at, at University of San Francisco, big enough for like 30 people, and I invite anybody, any student who wants to come and hang out with me while I build the course.

    15. LF

      Mm-hmm.

    16. JH

      And so generally it's full. And so we have 20 or 30 people in a big office with nothing to do but study deep learning.

    17. LF

      Mm-hmm.

    18. JH

      Uh, so it was during one of these times that somebody in the group said, "Oh, there's a thing called DAWNBench that looks interesting." And I was like, "What the hell is that?" And they said, "Oh, it's some competition to see how quickly you can train a model. Seems kind of, not exactly relevant to what we're doing, but it sounds like the kind of thing which you might be interested in." And I checked it out and I was like, "Oh, crap, there's only 10 days till it's over."

    19. LF

      (laughs)

    20. JH

      "It's pretty much too late. And we're kind of busy trying to teach this course." (laughs)

    21. LF

      Yeah.

    22. JH

      But we're like, "Uh, it would make an interesting case study for the course. Like, it's all the stuff we're already doing. Why don't we just put together our current best practices and ideas?"

    23. LF

      Mm-hmm.

    24. JH

      So me and, I guess, about four students just decided to give it a go. And we focused on this small one called CIFAR-10, which is little 32 by 32 pixel images.

    25. LF

      Can you say what DAWNBench is that-

    26. JH

      Yeah, so it's a, it's competition to, to train a model as fast as possible. It was run by Stanford. Uh-

    27. LF

      And as cheap as possible too.

    28. JH

      Uh, that's also another one, for as cheap as possible. And there was a couple of categories, uh, ImageNet and CIFAR-10. So ImageNet's this big 1.3 million, uh, image thing that took couple of days to train. Remember a friend of mine, uh, Pete Warden, who's now at Google, um, I remember he told me how he trained ImageNet a few years ago when he basically, like, had this, uh, uh, little granny flat out the back that he turned into his ImageNet training center.

    29. LF

      (laughs)

    30. JH

      And he figu- you know, after like a year of work, he figured out how to train it in like a 10 days or something. It was like, that was a big job. Whereas CIFAR-10, at that time, you could train in a few hours. You know, it's much smaller and easier. So we thought we'd try CIFAR-10. And yeah, I'd really never done that before. Like I'd never really... Like things like using more than one GP- uh, GPU at a time was something I...... try to avoid, 'cause to me it's like very against the whole idea of accessibility is should be able to do things with one GPU.

  5. 1:00:001:15:00

    Mm-hmm. …

    1. JH

      So unlike in physics where you could say like, "I just saw a s- a sub-atomic particle do something which the theory doesn't explain."

    2. LF

      Mm-hmm.

    3. JH

      You could publish that without an explanation.

    4. LF

      Right.

    5. JH

      And then in the next 60 years people can try to work out how to explain it. We don't allow this in the deep learning world so it's, it's literally impossible for Leslie to publish a paper that says, "I've just seen something amazing happen, this thing trained ten times faster than it should have, I don't know why." And so the reviewers were like, "Well, you can't publish that 'cause you don't why." So anyway.

    6. LF

      That's important to pause on because there's so many discoveries that would need to start like that.

    7. JH

      Every, every other scientific field I know of works of that way. I don't know why ours is uniquely disinterested in-... publishing unexplained experimental results. But there it is. So it wasn't published. Having said that, uh, I read a lot more unpublished papers than published papers, 'cause that's where you find the interesting insights.

    8. LF

      Mm-hmm.

    9. JH

      So I absolutely read this paper. And I was just like, "This is astonishingly mind-blowing and weird and awesome." And like, "Why isn't everybody only talking about this?" Because like, if you can train these things 10 times faster... They also generalize better because you're, you're doing less epochs, which means you look at the data less, so you get better accuracy. So I've been kind of studying that ever since. And, uh, eventually, Leslie kind of figured out a lot of how to get this done, and we added minor tweaks. And a big part of the trick is starting at a very low learning rate, very gradually increasing it. So as you're training your model, you take very small steps at the start, and you gradually make them bigger and bigger until eventually you're taking much bigger steps than anybody thought was possible.

    10. LF

      Right.

    11. JH

      There's a few other little tricks to make it work, but e- ev- basically, we can reliably get super convergence. And so for the DAWNBench thing, we were using just much higher learning rates than people expected to work.

    12. LF

      What do you think the future of... I mean, it makes so much sense for that to be a critical hyper-parameter, learning rate that you vary. What do you think the future of learning rate magic looks like?

    13. JH

      Well, there's been a lot of great work in the last 12 months in this area. It's... And people are increasingly realizing that optimize... Like, we just have no idea really how optimizers work. And, uh, the combination of weight decay, which is how we regularize optimizers, and the learning rate, and then other things like the epsilon we use in, in the Adam optimizer, they all work together in weird ways. And different parts of the model... This is another thing we've done a lot of work on, is research into how different parts of the model should be trained at different rates in different ways.

    14. LF

      Mm-hmm.

    15. JH

      Um, so we do something we call discriminative learning rates, which is really important, particularly for transfer learning. Um, so really, I think in the last 12 months, a lot of people have realized that this, all this stuff is important. There's been a lot of great work coming out. And we're starting to see algorithms appear which have very, very few dials, if any, that you have to touch. So like the... I think what's going to happen is the idea of a learning rate will... It almost already has disappeared-

    16. LF

      Mm-hmm.

    17. JH

      ... in the latest research. And instead, it's just like, you know, we, we know enough about how to interpret the gradients and the change of gradients we see to know how to set every parameter off to

    18. LF

      That you can automate it. So you see the future of, uh, of, uh, deep learning where really... Where is the input of a human expert needed in the future?

    19. JH

      Well, hopefully, the input of the human expert will be almost entirely unneeded from the deep learning point of view. So, um, again, like Google's approach to this is to try and use thousands of times more compute to run lots and lots of models at the same time and hope that one of them is good.

    20. LF

      AutoML kind of thing?

    21. JH

      Yeah, AutoML kind of stuff, which I think is insane.

    22. LF

      (laughs)

    23. JH

      Um, when you better understand the mechanics of how models learn, you don't have to try a thousand different models to find which one happens to work the best.

    24. LF

      Mm-hmm.

    25. JH

      You can just jump straight to the best one, uh, which means that it's more accessible in terms of compute, cheaper, and also with less hyper-parameters to set. It means you don't need deep learning experts to train your deep learning model for you, which means that domain experts can do more of the work, which means that now you can focus the human time on the kind of interpretation, the data gathering, identifying what all errors, and stuff like that.

    26. LF

      Yeah, the data side. How often do you work with data these days in terms of the cleaning? Looking at... Like Darwin looked at different species while traveling about. Do you look at data? Ha- have you in your roots in Kaggle-

    27. JH

      Always. Yeah.

    28. LF

      ... just looked at data and just-

    29. JH

      Yeah, I mean, it's, it's a key part of our course. It's like before we train a model in the course, we see how to look at the data, and then aft-... The first thing we do after we train our first model, which we fine-tune an ImageNet model for five minutes. And then the thing we immediately do after that is we learn how to analyze the results of the model by looking at examples of misclassified images and looking at a classification matrix and then doing, like, research on Google to learn about the kinds of things that it's misclassifying.

    30. LF

      Mm-hmm.

  6. 1:15:001:26:36

    (laughs) Yeah. …

    1. JH

      slower. Like, you'd just go like, "Oh, this is taking too long."

    2. LF

      (laughs) Yeah.

    3. JH

      And also, there's a lot of things which are just less programmable, like tf.data, which is the way-

    4. LF

      Yeah.

    5. JH

      ... data processing works in Tensorflow. It's just this big mess, it's incredibly inefficient. And they kind of had to write it that way because of the TPU problems I described-

    6. LF

      Yeah.

    7. JH

      ... earlier. So I- I just, um, you know, I just feel like they've got this huge technical debt which they're not gonna solve without starting from scratch.

    8. LF

      So, here's an interesting question then. If, uh, there's a new student starting today, what would you recommend they use?

    9. JH

      Well, I mean, we obviously recommend fast.ai and PyTorch because we teach new students, and that's what we teach with. So, we would very strongly recommend that because it will let you get on top of the concepts much more quickly, uh, so then you'll become an actua- And you'll also learn the actual state-of-the-art techniques, you know? So you'll actually get world-class results. Honestly, it doesn't much matter what library you learn because switching from-

    10. LF

      Yeah.

    11. JH

      ... Chainer to MXNet to Tensorflow to PyTorch is gonna be a couple of days work if- well, long as you understand the foundations well.

    12. LF

      But you think S- will Swift creep in there as a thing, uh, that people start using?

    13. JH

      Not for a few years. Particularly because, like Swift has no data science community, libraries-

    14. LF

      Yeah, so codebases are not there yet.

    15. JH

      ... tooling. And the Swift community has, um, a total lack of appreciation and understanding of numeric computing, so like, they keep on making stupid decisions. You know, for years they've just done dumb things around performance and prioritization. Um, that's clearly changing now, um, because the developer of Chris, Chris, a developer of Swift, Chris Lattner, is working at Google on Swift for Tensorflow, so like, that's- that's a priority. It'll be interesting to see what happens with Apple, because like Apple hasn't shown any sign of caring about numeric programming in Swift. Um, so I mean, hopefully they'll get off their ass and-

    16. LF

      (laughs)

    17. JH

      ... start appreciating this. 'Cause currently all of their lower-level libraries are not written in Swift, they're not particularly Swifty at all. Stuff like Core ML, they're really pretty rubbish, so. Yeah, so there's a long way to go, um, but at least one nice thing is that Swift for Tensorflow can actually directly use Python code and Python libraries-

    18. LF

      Mm-hmm.

    19. JH

      ... in, uh, literally the entire lesson one notebook of fast.ai runs in Swift right now in Python mode, so that's- that's a nice intermediate thing.

    20. LF

      How long does it take, um... So th- if you look at the two- two fast.ai courses, how long does it take to get from point zero to completing both courses?

    21. JH

      Um, it varies a lot. Somewhere between two months and two years, generally.

    22. LF

      So for two months, how many hours a day on average?

    23. JH

      So like, so like a- a- a- somebody who is a very competent coder can- can do 70 hours per course and pick up-

    24. LF

      70? Seven zero?

    25. JH

      70, yeah.

    26. LF

      That's it? Okay.

    27. JH

      Yeah. But a lot of people I know who take a year off to study fast.ai full time-

    28. LF

      Yeah.

    29. JH

      ... and say at the end of the year, they feel pretty competent 'cause generally there's a lot of other things you do. Like they're, generally they'll be entering Kaggle competitions, they-

    30. LF

      Yeah, exactly.

Episode duration: 1:44:10

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode J6XcP4JOHmk

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome