Lex Fridman PodcastJeremy Howard: fast.ai Deep Learning Courses and Research | Lex Fridman Podcast #35
EVERY SPOKEN WORD
150 min read · 30,011 words- 0:00 – 15:00
The following is a…
- LFLex Fridman
The following is a conversation with Jeremy Howard. He's the founder of fast.ai, a research institute dedicated to making deep learning more accessible. He's also a distinguished research scientist at the University of San Francisco, a former president of Kaggle, as well as a top-ranking competitor there. And in general, he's a successful entrepreneur, educator, researcher and an inspiring personality in the AI community. When someone asks me, "How do I get started with deep learning?" fast.ai is one of the top places I point them to. It's free. It's easy to get started. It's insightful and accessible. And if I may say so, it has very little BS that can sometimes dilute the value of educational content on popular topics like deep learning. Fast.AI has a focus on practical application of deep learning and hands-on exploration of the cutting edge that is incredibly both accessible to beginners and useful to experts. This is the Artificial Intelligence podcast. If you enjoy it, subscribe on YouTube, give it five stars on iTunes, support it on Patreon, or simply connect with me on Twitter, @lexfridman, spelled F-R-I-D-M-A-N. And now here's my conversation with Jeremy Howard. What's the first program you've ever written?
- JHJeremy Howard
First program I wrote, that I remember, would be at high school. Um, (sighs) I did an assignment where I decided to try to find out if there were some, like, better musical scales than the normal 12 tone, 12 s- interval scale. So I wrote a program on my Commodore 64 in Basic that searched through other scale sizes to see if it could find one where there were, uh, more accurate, you know, uh, harmonies.
- LFLex Fridman
Like mid-tone, like finding-
- JHJeremy Howard
Like, like you want an actual exactly three to two ratio, whereas with a 12 interval scale it's not exactly three to two, for example. So that's-
- LFLex Fridman
And, and the Common-
- JHJeremy Howard
... well tempered as they say in the... (laughs)
- LFLex Fridman
In Basic on a Commodore 64?
- JHJeremy Howard
Yeah.
- LFLex Fridman
Where was the interest in music from? Or is it just technical?
- JHJeremy Howard
I did music all my life, so I played saxophone and clarinet and piano and guitar and drums and whatever, so...
- LFLex Fridman
How does that thread go through your life? Where's music today? Is it-
- JHJeremy Howard
Uh, it's not where I wish it was. I, for various reasons, couldn't really keep it going, particularly because I had a lot of problems with RSI with my fingers, and so I had to kind of like cut back anything that used hands and fingers.
- LFLex Fridman
Mm-hmm.
- JHJeremy Howard
Um, I hope one day I'll be able to get back to it healthwise.
- LFLex Fridman
So there's a love for music underlying it all?
- JHJeremy Howard
For sure, yeah.
- LFLex Fridman
What's your favorite instrument?
- JHJeremy Howard
Uh, saxophone.
- LFLex Fridman
Sax.
- JHJeremy Howard
Baritone saxophone. Well, probably bass saxophone but they're awkward.
- LFLex Fridman
Well, um, I always love it when, uh, music is coupled with programming.
- JHJeremy Howard
Mm-hmm.
- LFLex Fridman
There's something about a brain that utilizes those, that, uh, emerges with creative ideas. So you've used and studied quite a few programming languages.
- JHJeremy Howard
Mm-hmm.
- LFLex Fridman
Can you give an, an overview of what you've used? What are the pros and cons of each?
- JHJeremy Howard
Uh, my favorite programming environment almost certainly was Microsoft Access, back in like the earliest days. So that w- that was-
- LFLex Fridman
That's like-
- JHJeremy Howard
... Visual Basic for Applications which is not a good programming language, but the programming environment was fantastic. It's like, the ability to create, you know, user interfaces and tie data and actions to them and create reports and all that is... I've never seen anything as good. There's things nowadays like Airtable which are like s- small subsets of that which people love for good reason, but unfortunately nobody's ever, uh, achieved anything like that.
- LFLex Fridman
What is that? If, if you could pause on that for a second.
- JHJeremy Howard
Oh, Access?
- 15:00 – 30:00
But what do you…
- JHJeremy Howard
good enough.
- LFLex Fridman
But what do you think the future of programming looks like? What do you hope the future of programming looks like if we zoom in on the computational fields, on data science, on machine learning?
- JHJeremy Howard
I, I hope Swift is successful because the, the goal of Swift, the way Chris Lattner describes it, is to be infinitely hackable, and that's what I want. I want something where, um, me and the people I do research with, and my students can look at and change everything from top to bottom. There's nothing mysterious and magical and inaccessible.
- LFLex Fridman
Mm-hmm.
- JHJeremy Howard
Unfortunately with Python, it's the opposite of that because Python's so slow. It's, um, extremely unhackable. You get to a point where it's like, okay, from here on down, it's C. So your debugger doesn't work in the same way, your profiler doesn't work in the same way, your build system doesn't work in the same way, it's really not very hackable at all.
- LFLex Fridman
What's the, what's the part you would like to be hackable? Is it for the objective of optimizing training of neural networks, inference of neural networks? Is it performance of the system? Or is there some non-performance related just creative idea?
- JHJeremy Howard
It's, it's, it's everything. I mean, in the end, I want to be productive as a practitioner. So that means that, uh, so like at the moment, our understanding of deep learning is incredibly primitive. There's very little we understand, most things don't work very well even though it works better than anything else out there.
- LFLex Fridman
Right.
- JHJeremy Howard
There's so many opportunities to make it better. So you look at any domain area, like, I don't know, speech recognition with deep learning or natural language processing classification with deep learning or whatever. Every time I look at an area with deep learning, I always see like, oh, it's, it's terrible. There's lots and lots of obviously stupid ways to do things that need to be fixed. So then I want to be able to jump in there and quickly-
- LFLex Fridman
And, and-
- JHJeremy Howard
... experiment and make them better.
- LFLex Fridman
... you think the programming language is, um, has a role in that?
- JHJeremy Howard
Huge role. Yeah. So currently, Python, um, has a big, uh, gap in terms of our ability to, um, innovate particularly around recurrent neural networks and, um, natural language processing because, uh, 'cause it's so slow. The, the, the actual loop where we actually loop through words, we have to do that whole thing in CUDA C.
- LFLex Fridman
Mm-hmm.
- JHJeremy Howard
So we actually can't innovate with the, the kernel, the heart of that most important algorithm.
- LFLex Fridman
Mm-hmm.
- JHJeremy Howard
Um, and it's just a huge problem. And this happens all over the place. So we hit, you know, research limitations. Another example, convolutional neural networks which are actually the most popular architecture for lots of things, maybe most things in deep learning, we almost certainly should be using sparse convolutional neural networks.
- LFLex Fridman
Mm-hmm.
- JHJeremy Howard
Um, but only like two people are, because to do it, you have to rewrite all of that CUDA C level stuff. And yeah, just researchers and practitioners don't. So like there's just big gaps in like s- what people actually research on, what people actually implement, because of the programming language problem.
- LFLex Fridman
So you think, uh...... do you think it's, it's just too difficult to write in CUDA C, uh, that a programming lang- a higher level programming language like Swift should enable the, the easier imp- th- fooling around creative stuff with RNNs or with sparse convolutional networks?
- JHJeremy Howard
Kind of.
- LFLex Fridman
Who, who, who's, uh, who's at fault? Who's e- who's in charge of making it easy for a researcher to play around?
- JHJeremy Howard
I mean, no one's at fault.
- LFLex Fridman
(laughs)
- JHJeremy Howard
It's just nobody's got around to it yet, or-
- LFLex Fridman
Yeah.
- JHJeremy Howard
... it's just, it's hard, right? And, I mean, part, part of the fault is that we ignored that whole APL kind of direction most... well, nearly everybody did for 60 years, 50 years. But, uh, recently, people have been starting to reinvent pieces of that and kind of create some interesting new directions in the compiler technology. So the place, um, where that's particularly happening right now is, uh, something called MLIR, which is something that, again, Chris Lattner, the Swift guy, is leading. And, uh, yeah, 'cause it's actually not going to be Swift on its own that solves this problem.
- LFLex Fridman
Mm-hmm.
- JHJeremy Howard
Because the problem is that currently writing a acceptably fast, you know, GPU program is too complicated regardless of what language you use.
- LFLex Fridman
Right.
- 30:00 – 45:00
Go ahead. …
- JHJeremy Howard
- LFLex Fridman
Go ahead.
- JHJeremy Howard
But it's gonna be a long process. The regulators have to learn how to regulate this. They have to build, you know, um, guidelines. And then, um, the lawyers at hospitals have to develop a new way of understanding that sometimes it makes sense for data to be-... you know, looked at in raw form in large quantities in order to create world-changing results.
- LFLex Fridman
Yeah, so the regulation around data, all that, um, it sounds, uh, probably the hardest problem, but sounds reminiscent of autonomous vehicles as well, many of the same regulatory challenges, many of the same data challenges.
- JHJeremy Howard
Yeah, I mean, funnily enough, the problem is less the regulation and more the interpretation of that regulation by l- by lawyers in hospitals. So HIPAA is actually, was designed to, it's, it... The P in HIPAA is not standing, does not stand for privacy. It stands for portability. It's actually meant to be a way that data can be used. Uh, and it was created with lots of gray areas because the idea is that would be more practical and it would help people to use this, this legislation to actually share data in a more thoughtful way. Unfortunately, it's done the opposite because when a lawyer sees a gray area, they see, "Oh, if we don't know we won't get sued, then we can't do it."
- LFLex Fridman
Right.
- JHJeremy Howard
So HIPAA is not exactly the problem. The problem is more that there's... Hospital lawyers are not incented to make bold decisions (laughs) about data portability.
- LFLex Fridman
Or even to, uh, embrace technology that saves lives.
- JHJeremy Howard
Right.
- LFLex Fridman
They more want to not get in trouble for embracing that technology.
- JHJeremy Howard
Right. Also, it em- it, it has also saves lives in a very abstract way, which is like, "Oh, we've been to release these 100,000 anonymized records."
- LFLex Fridman
Right.
- JHJeremy Howard
I can't point at the specific person whose life that saved. I can say like, "Oh, we ended up with this paper which found this result, which, you know, diagnosed 1,000 more people than we would have otherwise." But it's like which ones were helped?
- LFLex Fridman
Yeah.
- JHJeremy Howard
It's, it's very abstract.
- LFLex Fridman
Yeah. And on the counter side of that, you may be able to point to, uh, a life that was taken because of something that was-
- JHJeremy Howard
Yeah. Or, or, or a person whose privacy was violated.
- LFLex Fridman
Violated.
- JHJeremy Howard
It's like, "Oh, this specific person-"
- LFLex Fridman
Right.
- JHJeremy Howard
... "you know, was de-identified."
- LFLex Fridman
So b- (laughs)
- JHJeremy Howard
Identified.
- LFLex Fridman
S- just a fascinating topic. We're jumping around. We'll get back to fast.ai, but on the question of privacy, data is the fuel for so much innovation in deep learning. What's your sense in privacy, whether we're talking about Twitter, Facebook, YouTube, uh, just the technologies like in the medical field that rely on people's data in order to create impact? How do we get that right, uh, respecting people's privacy and yet creating technology that-
- JHJeremy Howard
Well-
- LFLex Fridman
... uh, is learned from data?
- JHJeremy Howard
... one of my areas of focus is on doing more with less data. Um, which, so most vendors, unfortunately, are strongly incented to find ways to require more data and more computation. So Google and IBM being the most obvious, uh-
- LFLex Fridman
IBM?
- JHJeremy Howard
Yeah.
- LFLex Fridman
Sorry.
- 45:00 – 1:00:00
Although, I mean, the…
- LFLex Fridman
option. Speaking-
- JHJeremy Howard
Although, I mean, the nice thing is, nowadays, everybody is now working on NLP transfer learning.
- LFLex Fridman
(laughs)
- JHJeremy Howard
Since that time, we've had GPT and GPT-2 and BERT and, you know-
- LFLex Fridman
Yeah. Yeah.
- JHJeremy Howard
... it's like, it's... So yeah, once you show that something's possible-
- LFLex Fridman
Right.
- JHJeremy Howard
... everybody jumps in, I guess, so.
- LFLex Fridman
I hope, hope to be a part of and I hope to see more innovation in active learning in the same way. I think-
- JHJeremy Howard
Yeah, me too.
- LFLex Fridman
... transfer learning and active learning are fascinating, public open work.
- JHJeremy Howard
I actually helped start a startup called Platform AI which is really all about active learning. And, uh, yeah, it's been interesting trying to kind of see what research is out there and make the most of it, and there's basically none, so we've had to do all our own research. (laughs)
- LFLex Fridman
Once again, just as you described. Can you tell the story of, uh, the Stanford competition, DAWNBench, and fast.ai's achievement on it?
- JHJeremy Howard
Sure. So, something which I really enjoy is that I, I basically teach two courses a year. Um, the practical Deep Learning for Coders, which is kind of the introductory course, and then Cutting-Edge Deep Learning for Coders, which is the kind of research-level course. And when I teach those courses, um, I have a, a, I, I basically have a big office, uh, at, at University of San Francisco, big enough for like 30 people, and I invite anybody, any student who wants to come and hang out with me while I build the course.
- LFLex Fridman
Mm-hmm.
- JHJeremy Howard
And so generally it's full. And so we have 20 or 30 people in a big office with nothing to do but study deep learning.
- LFLex Fridman
Mm-hmm.
- JHJeremy Howard
Uh, so it was during one of these times that somebody in the group said, "Oh, there's a thing called DAWNBench that looks interesting." And I was like, "What the hell is that?" And they said, "Oh, it's some competition to see how quickly you can train a model. Seems kind of, not exactly relevant to what we're doing, but it sounds like the kind of thing which you might be interested in." And I checked it out and I was like, "Oh, crap, there's only 10 days till it's over."
- LFLex Fridman
(laughs)
- JHJeremy Howard
"It's pretty much too late. And we're kind of busy trying to teach this course." (laughs)
- LFLex Fridman
Yeah.
- JHJeremy Howard
But we're like, "Uh, it would make an interesting case study for the course. Like, it's all the stuff we're already doing. Why don't we just put together our current best practices and ideas?"
- LFLex Fridman
Mm-hmm.
- JHJeremy Howard
So me and, I guess, about four students just decided to give it a go. And we focused on this small one called CIFAR-10, which is little 32 by 32 pixel images.
- LFLex Fridman
Can you say what DAWNBench is that-
- JHJeremy Howard
Yeah, so it's a, it's competition to, to train a model as fast as possible. It was run by Stanford. Uh-
- LFLex Fridman
And as cheap as possible too.
- JHJeremy Howard
Uh, that's also another one, for as cheap as possible. And there was a couple of categories, uh, ImageNet and CIFAR-10. So ImageNet's this big 1.3 million, uh, image thing that took couple of days to train. Remember a friend of mine, uh, Pete Warden, who's now at Google, um, I remember he told me how he trained ImageNet a few years ago when he basically, like, had this, uh, uh, little granny flat out the back that he turned into his ImageNet training center.
- LFLex Fridman
(laughs)
- JHJeremy Howard
And he figu- you know, after like a year of work, he figured out how to train it in like a 10 days or something. It was like, that was a big job. Whereas CIFAR-10, at that time, you could train in a few hours. You know, it's much smaller and easier. So we thought we'd try CIFAR-10. And yeah, I'd really never done that before. Like I'd never really... Like things like using more than one GP- uh, GPU at a time was something I...... try to avoid, 'cause to me it's like very against the whole idea of accessibility is should be able to do things with one GPU.
- 1:00:00 – 1:15:00
Mm-hmm. …
- JHJeremy Howard
So unlike in physics where you could say like, "I just saw a s- a sub-atomic particle do something which the theory doesn't explain."
- LFLex Fridman
Mm-hmm.
- JHJeremy Howard
You could publish that without an explanation.
- LFLex Fridman
Right.
- JHJeremy Howard
And then in the next 60 years people can try to work out how to explain it. We don't allow this in the deep learning world so it's, it's literally impossible for Leslie to publish a paper that says, "I've just seen something amazing happen, this thing trained ten times faster than it should have, I don't know why." And so the reviewers were like, "Well, you can't publish that 'cause you don't why." So anyway.
- LFLex Fridman
That's important to pause on because there's so many discoveries that would need to start like that.
- JHJeremy Howard
Every, every other scientific field I know of works of that way. I don't know why ours is uniquely disinterested in-... publishing unexplained experimental results. But there it is. So it wasn't published. Having said that, uh, I read a lot more unpublished papers than published papers, 'cause that's where you find the interesting insights.
- LFLex Fridman
Mm-hmm.
- JHJeremy Howard
So I absolutely read this paper. And I was just like, "This is astonishingly mind-blowing and weird and awesome." And like, "Why isn't everybody only talking about this?" Because like, if you can train these things 10 times faster... They also generalize better because you're, you're doing less epochs, which means you look at the data less, so you get better accuracy. So I've been kind of studying that ever since. And, uh, eventually, Leslie kind of figured out a lot of how to get this done, and we added minor tweaks. And a big part of the trick is starting at a very low learning rate, very gradually increasing it. So as you're training your model, you take very small steps at the start, and you gradually make them bigger and bigger until eventually you're taking much bigger steps than anybody thought was possible.
- LFLex Fridman
Right.
- JHJeremy Howard
There's a few other little tricks to make it work, but e- ev- basically, we can reliably get super convergence. And so for the DAWNBench thing, we were using just much higher learning rates than people expected to work.
- LFLex Fridman
What do you think the future of... I mean, it makes so much sense for that to be a critical hyper-parameter, learning rate that you vary. What do you think the future of learning rate magic looks like?
- JHJeremy Howard
Well, there's been a lot of great work in the last 12 months in this area. It's... And people are increasingly realizing that optimize... Like, we just have no idea really how optimizers work. And, uh, the combination of weight decay, which is how we regularize optimizers, and the learning rate, and then other things like the epsilon we use in, in the Adam optimizer, they all work together in weird ways. And different parts of the model... This is another thing we've done a lot of work on, is research into how different parts of the model should be trained at different rates in different ways.
- LFLex Fridman
Mm-hmm.
- JHJeremy Howard
Um, so we do something we call discriminative learning rates, which is really important, particularly for transfer learning. Um, so really, I think in the last 12 months, a lot of people have realized that this, all this stuff is important. There's been a lot of great work coming out. And we're starting to see algorithms appear which have very, very few dials, if any, that you have to touch. So like the... I think what's going to happen is the idea of a learning rate will... It almost already has disappeared-
- LFLex Fridman
Mm-hmm.
- JHJeremy Howard
... in the latest research. And instead, it's just like, you know, we, we know enough about how to interpret the gradients and the change of gradients we see to know how to set every parameter off to
- LFLex Fridman
That you can automate it. So you see the future of, uh, of, uh, deep learning where really... Where is the input of a human expert needed in the future?
- JHJeremy Howard
Well, hopefully, the input of the human expert will be almost entirely unneeded from the deep learning point of view. So, um, again, like Google's approach to this is to try and use thousands of times more compute to run lots and lots of models at the same time and hope that one of them is good.
- LFLex Fridman
AutoML kind of thing?
- JHJeremy Howard
Yeah, AutoML kind of stuff, which I think is insane.
- LFLex Fridman
(laughs)
- JHJeremy Howard
Um, when you better understand the mechanics of how models learn, you don't have to try a thousand different models to find which one happens to work the best.
- LFLex Fridman
Mm-hmm.
- JHJeremy Howard
You can just jump straight to the best one, uh, which means that it's more accessible in terms of compute, cheaper, and also with less hyper-parameters to set. It means you don't need deep learning experts to train your deep learning model for you, which means that domain experts can do more of the work, which means that now you can focus the human time on the kind of interpretation, the data gathering, identifying what all errors, and stuff like that.
- LFLex Fridman
Yeah, the data side. How often do you work with data these days in terms of the cleaning? Looking at... Like Darwin looked at different species while traveling about. Do you look at data? Ha- have you in your roots in Kaggle-
- JHJeremy Howard
Always. Yeah.
- LFLex Fridman
... just looked at data and just-
- JHJeremy Howard
Yeah, I mean, it's, it's a key part of our course. It's like before we train a model in the course, we see how to look at the data, and then aft-... The first thing we do after we train our first model, which we fine-tune an ImageNet model for five minutes. And then the thing we immediately do after that is we learn how to analyze the results of the model by looking at examples of misclassified images and looking at a classification matrix and then doing, like, research on Google to learn about the kinds of things that it's misclassifying.
- LFLex Fridman
Mm-hmm.
- 1:15:00 – 1:26:36
(laughs) Yeah. …
- JHJeremy Howard
slower. Like, you'd just go like, "Oh, this is taking too long."
- LFLex Fridman
(laughs) Yeah.
- JHJeremy Howard
And also, there's a lot of things which are just less programmable, like tf.data, which is the way-
- LFLex Fridman
Yeah.
- JHJeremy Howard
... data processing works in Tensorflow. It's just this big mess, it's incredibly inefficient. And they kind of had to write it that way because of the TPU problems I described-
- LFLex Fridman
Yeah.
- JHJeremy Howard
... earlier. So I- I just, um, you know, I just feel like they've got this huge technical debt which they're not gonna solve without starting from scratch.
- LFLex Fridman
So, here's an interesting question then. If, uh, there's a new student starting today, what would you recommend they use?
- JHJeremy Howard
Well, I mean, we obviously recommend fast.ai and PyTorch because we teach new students, and that's what we teach with. So, we would very strongly recommend that because it will let you get on top of the concepts much more quickly, uh, so then you'll become an actua- And you'll also learn the actual state-of-the-art techniques, you know? So you'll actually get world-class results. Honestly, it doesn't much matter what library you learn because switching from-
- LFLex Fridman
Yeah.
- JHJeremy Howard
... Chainer to MXNet to Tensorflow to PyTorch is gonna be a couple of days work if- well, long as you understand the foundations well.
- LFLex Fridman
But you think S- will Swift creep in there as a thing, uh, that people start using?
- JHJeremy Howard
Not for a few years. Particularly because, like Swift has no data science community, libraries-
- LFLex Fridman
Yeah, so codebases are not there yet.
- JHJeremy Howard
... tooling. And the Swift community has, um, a total lack of appreciation and understanding of numeric computing, so like, they keep on making stupid decisions. You know, for years they've just done dumb things around performance and prioritization. Um, that's clearly changing now, um, because the developer of Chris, Chris, a developer of Swift, Chris Lattner, is working at Google on Swift for Tensorflow, so like, that's- that's a priority. It'll be interesting to see what happens with Apple, because like Apple hasn't shown any sign of caring about numeric programming in Swift. Um, so I mean, hopefully they'll get off their ass and-
- LFLex Fridman
(laughs)
- JHJeremy Howard
... start appreciating this. 'Cause currently all of their lower-level libraries are not written in Swift, they're not particularly Swifty at all. Stuff like Core ML, they're really pretty rubbish, so. Yeah, so there's a long way to go, um, but at least one nice thing is that Swift for Tensorflow can actually directly use Python code and Python libraries-
- LFLex Fridman
Mm-hmm.
- JHJeremy Howard
... in, uh, literally the entire lesson one notebook of fast.ai runs in Swift right now in Python mode, so that's- that's a nice intermediate thing.
- LFLex Fridman
How long does it take, um... So th- if you look at the two- two fast.ai courses, how long does it take to get from point zero to completing both courses?
- JHJeremy Howard
Um, it varies a lot. Somewhere between two months and two years, generally.
- LFLex Fridman
So for two months, how many hours a day on average?
- JHJeremy Howard
So like, so like a- a- a- somebody who is a very competent coder can- can do 70 hours per course and pick up-
- LFLex Fridman
70? Seven zero?
- JHJeremy Howard
70, yeah.
- LFLex Fridman
That's it? Okay.
- JHJeremy Howard
Yeah. But a lot of people I know who take a year off to study fast.ai full time-
- LFLex Fridman
Yeah.
- JHJeremy Howard
... and say at the end of the year, they feel pretty competent 'cause generally there's a lot of other things you do. Like they're, generally they'll be entering Kaggle competitions, they-
- LFLex Fridman
Yeah, exactly.
Episode duration: 1:44:10
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode J6XcP4JOHmk
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome