The Twenty Minute VCDouwe Kiela: Why Data Size Matters More Than Model Size; Why Open Source Isn't Going to Win | E1032
EVERY SPOKEN WORD
100 min read · 20,439 words- 0:00 – 15:00
So there are a…
- DKDouwe Kiela
So there are a couple of just really big issues. Hallucination, these models make things up with very high confidence. Attribution, we don't know why they're saying what they're saying. We can't really trace it back to anything. There's compliance issues, so we can't really remove information from them, uh, which is kind of tricky from a GDPR perspective for example. We can't revise information. We can't keep it up to date. There's massive data privacy issues where you have to send your very valuable company data. If you're an enterprise, you have to send that through somebody else's servers.
- HSHarry Stebbings
(digital music) Dao, I am excited for this. Listen, we've chatted before. We've known each other for a while, but thank you so much for joining me today.
- DKDouwe Kiela
Yeah. Thanks very much for having me on the show. I'm a big fan.
- HSHarry Stebbings
Oh, it's very, very kind of you. I literally paid you $25,000 to say that but, um... (laughs)
- DKDouwe Kiela
(laughs)
- HSHarry Stebbings
Um, but my question to you is, it's such a hot space and there's very few people who've actually been in it for a while. You are one of them. How did you first make your way into the world of ML and NLP first?
- DKDouwe Kiela
Yeah. My, my journey has been a little bit unusual actually. So, um, uh, when I was in high school in the Netherlands, um, I wanted to be a cool kid during the day but at night I was secretly fascinated by computers. Um, so I started off as a script kiddie wanting to hack other people's computers and I figured out that if I really wanted to do that, I had to learn to code. So I taught myself to code. Uh, figured I needed to understand operating systems, so I made my own operating system, uh, with bootloader and everything, um, when I was 16. Uh, so then by the time, uh, I had to go to college and, and go study something, I, I thought I already knew everything about computer science. Uh, so I decided to study philosophy instead, um, and, and so that was really a, a very, uh, radical departure from what I had been interested in at the time, um, but it was fascinating, uh, learning a lot about the mind and language and things like that. It, it's... I, I use it still every day, I think. Um, but then, uh, at some point in my career they... it became clear that I had to start making money so I needed a real job, and philosophy is not really a real job. And I did some logic in between foundations of math which is also not really a real job. So I decided to study computer science after all. Uh, so I went to Cambridge, uh, in the UK. I had a fantastic time there, um, and, and so that's really where I started doing NLP, natural language processing, and, um, one of my internships was done at Microsoft Research in New York with a very famous researcher called Leon Bottou, who is Yann LeCun's kind of, um, I wouldn't say sidekick because that doesn't really do justice to, to what he's done. He's invented like stochastic gradient descent and things like that, so one of the godfathers of deep learning. And I had the opportunity to work with him, um, and, uh, that was really an amazing time. So afterwards when Yann and Leon started FAIR, Facebook AI Research, uh, I joined that out of my PhD and that really kicked off my career.
- HSHarry Stebbings
I actually spoke to Leon as part of the prep for my interview with Yann. (laughs) Amazing, amazing person. But I do want to talk about that five years that you spent at kind of Facebook's AI Research team. It's such a transformational team, as you said, with incredible individuals like Yann and Leon. What are your biggest takeaways from that experience and how did it impact how you think today?
- DKDouwe Kiela
It's really a one-of-a-kind place, uh, especially when I, when I just joined. Uh, they had these amazing people who I really respected, uh, a ton like Jason Weston and Tomas Mikolov, really kind of the godfathers of modern deep learning for NLP. I, I learned so much there mostly around how to focus your research direction. So I think initially I was doing all kinds of weird stuff, um, and it took me a while to figure out that having a, a very clear real world application for the research that you're doing makes it much more valuable than, than going off on a tangent and maybe being a bit too far ahead of the rest of the field. Um, so, um, yeah, it's really a, a special place and I think actually they don't get enough credit for, uh, the impact that they've had on the world, so... And I, I mean like Facebook or Meta in general actually. So um, almost every web app in the world runs on React which is a, an open source, uh, uh, project coming out of Meta. Almost every deep learning project-
- HSHarry Stebbings
Can I, can I, can I, can I, can I interrupt you and ask bluntly, I agree with you. Do you think they've captured that value though? Because so much of the value that they've created seems to not be inherent within the company. I would argue from a pure economist standpoint it isn't enshrined in the company and actually it's been too open source.
- DKDouwe Kiela
If... From the per- perspective of making money, maybe. But from the perspective of, of, uh, doing something for the world. So I think... So Facebook and Meta's mission is to connect the world and so through things like React and PyTorch maybe they're getting closer to that mission without making more money because they're already making enough money anyway. I don't know. But, uh, so, so PyTorch really... without PyTorch none of this stuff would be happening right now, right? So i- it's really like fundamental for all of the, all of the AI breakthroughs.
- HSHarry Stebbings
I s- wonder why I'm such a natural venture capitalist when I have a proclivity for making money. (laughs)
- DKDouwe Kiela
(laughs)
- HSHarry Stebbings
Um, but you then spent time at Hugging Face, okay? And so with that transition, again, one of the most prominent kind of rising stars in the space. How did that impact your mindset today?
- DKDouwe Kiela
Yeah. Hugging Face is really a fascinating company and, uh, um... So I was at Meta and I was looking for something new. I was thinking about maybe doing a startup already, um, and then, uh, figured I, I needed to get some more experience first at a successful AI startup and, and Hugging Face, uh, very clearly is a very successful AI startup and it also really aligns with some of my values around open source and, and open science and things like that. And they have amazing people like Meg Mitchell working there. Um, so I went there and, uh, I think-... what really impressed me, actually, um, uh, is th- how good they are just at marketing and branding and community building. It's like everybody just loves the company. Um, and, and I still, I don't understand how they do that. Um, so it's, it's really, uh, uh, yeah, some- something to see even when you're on the inside. It's like, "Wow, like, uh, look how, look how popular we are." (laughs)
- HSHarry Stebbings
I thought it was amazing, I actually c- uh, spoke to Clem, and, uh, in my research they said about doing swag where you could essentially pay to have whatever swag you wanted, it just had to have a Hugging Face logo on it.
- DKDouwe Kiela
(laughs)
- HSHarry Stebbings
And so everyone gets a budget, and you could buy, I don't know, a, I'm making it up, maybe a Balenciaga hoodie, probably not given the price.
- DKDouwe Kiela
(laughs)
- HSHarry Stebbings
But put a Hugging Face, and so it makes it much more personalized, real, cool. I thought it was really interesting actually, just kind of, I've never heard that before. Okay, so we have Hugging Face and Experience. Where does Contextual come from? What was that aha, "I've got to do this, this is now the idea and the time"?
- DKDouwe Kiela
Yeah, so this really started at, at the beginning of the year, uh, so me and my co-founder, Amanpreet Singh, um, who I'd worked with first at Facebook and then at Hugging Face, um, and he really is one of the, the smartest people I know. He's really an incredible guy, and we were talking about what to do next, and, uh, we, we just saw this great need after ChatGPT had gone viral. We saw this, this kind of great excitement in the world, but at the same time, a lot of disappointment, uh, about it not being quite ready yet for real-world adaption, uh, in, in enterprises, uh, where you actually want to use this technology. So we decided that now really is the right time to, to build a company to try to tackle that, um, and we think it's still very early innings in the game. So I, I think a lot of people sometimes think that it's, uh, you know, the game has been played, but it's just getting started.
- HSHarry Stebbings
It's a, it's a Friday afternoon, we're gonna have a fun chat. Fuck the normal stuff. Okay, so you said there about not being ready for, like, traditional adoption.
- DKDouwe Kiela
Mm-hmm.
- HSHarry Stebbings
I think a lot of the general public would say, "Absolutely, it's, it's, it's cool." W- what makes it not ready for general adoption, do you think?
- DKDouwe Kiela
Yeah, uh, so there are a couple of just really big issues. Um, hallucination is a very big one. These models make things up with very high confidence. Um, attribution, we don't know why they're saying what they're saying. We can't really trace it back to anything. Um, there's compliance issues, so we can't really remove information from them, uh, which is kind of tricky from a GDPR perspective, for example. Uh, we can't revise information. We can't keep it up to date. Uh, there's massive data privacy issues where you have to send your very valuable company data, if you're an enterprise, you have to send that through somebody else's servers. These models are also quite inefficient still. So you can make them much faster. And so what we are building at Contextual is a different kind of language model, um, and, uh, so we're really thinking about this as the, the next generation of language models where we think about it from first principles for enterprise use cases. And what that means is that we want to solve all of these problems by being a bit smarter about the architecture, and the architecture we're specifically basing it on is Retrieval Augmented Generation, which is something that, uh, me and my colleagues at Fair came up with in 2020. And what you do there is you decouple the memory from the generative capacity of the large language model, and this allows you to ground the generations from the language model in the things you've retrieved, i- in your memory essentially. So you get much less hallucination, you get attribution for free, you can always update the memory, so you can remove information on the fly, you can add things, you can revise them, you can have a stream of memory. Uh, it's much more efficient because you're compressing a lot of the compute inside the memory, and you have a very clean separation between the data plane and the model plane, as they call it, which means that you can, uh, have better data privacy guarantees.
- HSHarry Stebbings
I mean, I have so many questions for you. I, I, I literally have, like, this, uh, wonderful, like, scrap of paper where I'm writing notes. (laughs) Uh, fi- first off, uh, you said there about kind of an attribution of knowing where it comes from and being a bit of a black box. I had someone on the show the other day, and they said, "Open or closed, you still don't really know what's going on in the core foundational model layer. The open or closed is not really the point. You still don't know." Is that true? And how do we think about actual true transparency of knowing what's going on and why it's producing what it is?
- DKDouwe Kiela
Yeah, so, so we're not gonna be able to, um, really know why a neural net wha- does what it does at the scale that neural networks operate, uh, at. So this is kind of like your own brain, right? Like, wh- so I, I, I think your behavior is, is relatively predictable, um, uh, for, so that goes for every human, right? We all, like, can predict each other's actions. But I have no idea what's going on in your brain, and I will never know. There's no way I can know. The only, only way I can kind of find out is by asking you. Um, but if you train the architecture the way we are training it right now, then at least you make sure that the model has learned to rely on the information that it finds, uh, and that gives you much stronger attribution than if it's just predicting the next word based on what it has seen before, uh, because it doesn't have this ability from birth basically to find relevant information and ground its generation, uh, on that, uh, uh, thing it found.
- HSHarry Stebbings
Uh, another thing that I have to ask, you mentioned the word hallucinations there, and I had Emad at Stability on the show, and he said, "Hallucinations are a feature, not a bug," which I thought was a very tweetable statement.
- DKDouwe Kiela
Yeah, yeah, that's a, that's a great quote. (laughs)
- HSHarry Stebbings
But I didn't quite under- but I didn't quite understand it. Do you agree hallucinations are a feature, not a bug? He is the only person to have said this on the show, to be clear.
- 15:00 – 30:00
I, I'm saying given…
- DKDouwe Kiela
you asking if we'll, we'll achieve AGI at the same time as ASI?
- HSHarry Stebbings
I, I'm saying given the closed nature of, like, tasks available, if you think about AGI, it's an infinite pool of options, and if you look at ASI, if you were to take certain roles or functions, there are finite outcome scenario plans that could happen, in which case it would be quicker to get to higher quality ASI than AGI. Is that correct?
- DKDouwe Kiela
Yeah, that's right. Uh, so AGI, uh, I, I think ... So it's, it's slightly better defined now, but one of the big issues has always been that we don't really know what that term even means. (laughs)
- HSHarry Stebbings
(laughs)
- DKDouwe Kiela
Um, and, and so ASI, I mean, so that basically just means the kind of old-school applications of NLP but then done better with better models. Um, and, and I think you're totally right that that's a much easier problem to solve much quicker and then, then slowly, um, uh, grow with the capabilities of these models.
- HSHarry Stebbings
Yeah, in terms of, like, growing with the capabilities of the models, uh, we've seen size of model matter less and less it would seem. How do you think about the importance of size of model today, and does it matter as much as it used to, and will it matter even less with every year, month, day?
- DKDouwe Kiela
Yeah, great question. I, I think S- Sam Altman had this interesting quote where he was saying that he, he thought models would stop growing in size. Um, and, and, uh, so that, that GPT-4 kind of hit this ceiling. And I thi- I think that's probably right but n- not really because size doesn't matter. It, it's just that data size matters even more than model size. And I think the LLaMA paper, uh, out of, out of Meta really brill- brilliantly showed this, um, where if you train a smaller model on more data for longer, then you get a better model. So you get more bang for your buck if you train it on more data rather than having more parameters. But in an ideal world, if you had infinite compute budget and infinite data, then you would train the biggest possible model because that's the most likely to give you sort of emergent capabilities as we call them, uh, in the field. Uh, the th-
- HSHarry Stebbings
Sorry, I'm, I'm, uh, I'm really stupid, so forgive me for this. Does that take more time then?
- DKDouwe Kiela
So-
- HSHarry Stebbings
If you have smaller models with more data, given you need to feed more data through the model, does it not take more time than if you needed less data going through the model? (laughs)
- DKDouwe Kiela
Uh, uh, it depends. So it's a trade-off here, right? Um, but, but these big models also need a lot of data. So, uh, it really is, um, um, a, a function of the number of GPUs that you have available, right? And so if you, let's say you have a thousand GPUs, you can choose-
- HSHarry Stebbings
Wow.
- DKDouwe Kiela
... to train a huge model on, on relatively, uh, little data, and it will be okay, but it will be under-trained. So you have some sort of optimal point where you can train the model to perfection. Um, and, and I think in the field, we were underestimating where that optimal point is, and it seems that data is much more important than model size when it comes to what's optimal.
- HSHarry Stebbings
Okay, if we take this to a next, like, like a layer deeper, data more important than model size...What does that mean then in terms of who's advantaged? Does that mean that startups are more advantaged actually than incumbents because they, like, th- they don't need to ... like, I, I, I don't understand. Who's more advantaged in that case? Is it incumbents 'cause they already have existing massive data moats?
- DKDouwe Kiela
It depends on where the data comes from. I, I think incumbents definitely have an advantage there, uh, but only some of them. Uh, a lot of the data is just freely available on the internet, right? So the LLaMA model was not trained on any proprietary data. It was just trained on open data on the web, and there's a lot more data to be had there. And as, as a society, we're generating a ton of data every day to add to that big pile of data, uh, so you can really train very high-quality language models just on public, uh, data on, on the web. But I think if you look at the secret sauce to a lot of these other models, like, why is GPT-4 so awesome, a, a part of that, uh, is that they, they went through, uh, enormous, uh, lengths to, to get, like, special data that nobody else has. So allegedly, they did this Whisper project where they're very good at transcribing audio because that would allow them to transcribe, like, all of the podcasts in the world, which gives you very high-quality, uh, language. Uh, uh, so if you can train on that language but nobody else has it, that puts you, uh, in, in a position of advantage.
- HSHarry Stebbings
I really hope no one's training on my podcasts. I feel sorry for those models.
- DKDouwe Kiela
(laughs)
- HSHarry Stebbings
But, uh, I would love to ... y- y- you mentioned kind of, uh, the proprietary data element there. How important is proprietary data? I'm a VC, uh, for my sins. Uh, the main reason I would say why VCs are turning down startup AI companies is 'cause they do not have a proprietary data set to operate against and they are defined as, like, a thin layer of generative AI on top of a foundational model. How important is proprietary data, do you think, for startups innovating in this space?
- DKDouwe Kiela
It really depends o- on the s- specific startup. Uh, so i- if you want to build a deep tech AI startup, then you really want to get a big data flywheel going. So, uh, you want to start with a lot of data and then have a way to generate lots more data, and that data is gonna be your moat. But I think one of the interesting things about these large language models is that they're incredibly sample-efficient or data-efficient. So you can do cool things with them with relatively little data, uh, that just previously just wasn't possible. Uh, so, uh, that unlocks all kinds of possibilities that just didn't exist even a couple of years ago. So on the one hand, yes, you need lots of data if you wanna build, like, big AI-first things. But at the same time, if you want to do a startup that builds on top of this technology, you need very little data to get started. Um, and actually, one, one interesting ... maybe a little bit of a tangent, but one of the use cases I've been seeing now for GPT-4 is actually that people are using it to generate data and then they're training on that data with cheaper models. Um, uh, and so, so GPT-4 might end up disrupting not, um, like, knowledge workers necessarily, but it might just disrupt, like, Mechanical Turk and is just a- an annotator on steroids. And you can use all of that data to get much more custom models that you can then deploy very cheaply on specialized use cases. So then that's a quite interesting development.
- HSHarry Stebbings
Uh, uh, can I ask, uh, in terms of, like, the proprietary data element, pretrained data changes a lot. Can you just help anyone who doesn't, now, understand, what is pretrained data and how does it change the game for a lot of companies that don't have existing data moats?
- DKDouwe Kiela
Yeah. So I, I think m- maybe it's useful to kind of go through the, the steps. If you want to build your own ChatGPT, like, what do you need? And so the first thing you need is a core pretrained model, and this tends to be just trained on the web. The, the task you're training it on is just next word prediction. Then once you have that core model, then you want to do supervised fine-tuning. So essentially, you want to fix the user interface to that model because the model doesn't really, uh, know how to follow instructions, for example. So you want the model to listen to you, but it has only been trained on predicting the next word, so it doesn't really know how to do that. So that supervised fine-tuning, that's also proprietary data, uh, if you want. Um, you can, you can get a much better model out of that. And then the final step is RLHF, reinforcement learning from human feedback, where you get this feedback loop to make the model even better for your specific use case even if you don't have signal at the word level, you just have signal at the sequence level. So you can tell it, like, "Okay, that was a good response," or, "That wasn't a good response," but you can't tell it, like, "What did you do wrong?" necessarily. So if you h- if you go through those three steps, then you get a ChatGPT. It's as easy as that. Um, but obviously there's a ... the devil's in the details.
- HSHarry Stebbings
Can I ask you, which company do you think has the b- best data acquisition flywheel? When you look at them today, who do you admire and respect most?
- DKDouwe Kiela
OpenAI. Um, so maybe that's not the, the answer you expected but, uh ... so it's just an incredible company, uh, and, uh, they've really shown the world what's possible. Um, and, uh, they haven't even really trained, as far as I know, on the data that comes out of ChatGPT going viral. Right? So they had ChatGPT, it went viral. This led to this giant, giant data moat that they haven't even really used, used yet. Um, so I, I think in terms of data moats and, uh ... and maybe you, you've seen this come by actually, there was this Google memo, uh, from an internal Google employee who, who had written-
- HSHarry Stebbings
Sure.
- DKDouwe Kiela
... that OpenAI and Google have no moat. I, I think, uh, for me, for me as a AI researcher, when I read that memo, I was like, "This person has no idea what they're talking about." Uh-
- HSHarry Stebbings
Uh, well, un- un- unpack that for me. Why are they wrong and don't understand?
- DKDouwe Kiela
So, um, these places have a giant moat because, as I said, it's really all about data.And OpenAI has this very deep understanding of how people want to use language models that basically nobody else has. And they have this giant economy of scale where they can serve up language models, um, very, very cheaply because they get so many requests coming in at the same time. So they have a giant moat. Um, and- and as much as... uh, so I'm a big fan of open source, right? I would like it to be true that with open source we could just keep up with- with all of that, but, uh, I think that's just incredibly naive.
- HSHarry Stebbings
Okay. Uh, so they do actually have the moats. Um, what do you think are the biggest challenges that they face? 'Cause I think we all dismissed Google quite significantly, if I'm honest. And then Bard came out and was pretty impressive. Like, how do you evaluate Bard and Google's display actually?
- DKDouwe Kiela
Yeah. Uh, language model evaluation is- is a whole separate topic. It's- it's a super interesting question, actually. I've been fascinated by AI evaluation for a really long time and- and the answer is, we don't really know how to evaluate the quality of these models anymore. Uh, so what we've seen people do in the field now is they're using GPT-4 to evaluate the quality of other language models, uh, and that- that just feels wrong. So I think there's a giant opportunity in the market actually for a startup or several startups becoming like the Moodys or the S&P, uh, sort of, uh- uh, you know, uh, the folks who- who evaluate the quality of AI for specific use cases, um, because nobody really knows. It's really the Wild West out there. Um, and one of the big problems, for example, is data contamination, where, uh, a bunch of these language models are trained on the things that they are being evaluated on. So GPT-4 looks like it's an amazing coder, but it might also just be, uh, trained on the data that it's evaluated on, which means that it's not actually that great of a coder.
- HSHarry Stebbings
Okay. Sorry. I- I- I think this is, like, fascinating when you think about kind of the Moodys and the S&P opportunity here. The question is though, that I- I still don't quite understand, how do we know a yardstick for progress or measurement? What is the right way to approach AI measurement and effectiveness?
- 30:00 – 45:00
I- I- I have…
- DKDouwe Kiela
now, human, uh, annotators who are- are, uh, checking their models, so they probably have a pretty good sense of how good their model actually is, uh, but obviously they're not gonna share that with the world.
- HSHarry Stebbings
I- I- I have to ask, you know, I had Yann LeCun on the show, who we discussed earlier, obviously a very big proponent of kind of open models. Um, it's kind of a very big debate, obviously, between open versus closed. Where do you sit in terms of the model that rules for the next five to 10 years, and is it different for the model that rules for the next five years versus that that rules for the next 10?
- DKDouwe Kiela
So the way I think about the language model space is kind of as a pyramid. So at the top of the pyramid we have these frontier models. Uh, so these are GPT-4 and, and Tropic models, and things like that, that are just much better than everything else, but also much more expensive and, and much bigger than everything else. Um, and then at the bottom of the pyramid you have open source models. Anybody can train on them, anybody can fine tune them on their data. Uh, there's... You can run them on your phone now and on your laptop, and things like that. So that's a, that's a very, uh, fruitful area for research. But I think the most interesting part is kind of the middle piece of that pyramid where you have the, the most bang for your buck. Uh, so that, that's from a business perspective the most interesting part, where you have mid-size models that have capabilities that you don't really see at this bottom of the pyramid, um, that you can monetize it in various ways. So I, I think it's not gonna be the case that there's just one model that wins everything. It's going to be lots of models at different parts, uh, uh, different layers of, of this pyramid being used for different kinds of applications. So if you have very strong AGI requirements you probably want to have a frontier model. If you care about it a bit less, maybe you want to have artificial specialized intelligence. If you care about it even less, then you can just take an off the shelf, uh, open source model. So there is always going to be a place for open source, but why I don't think that open source models will move up that pyramid to the frontier is because they're just too expensive. And, and this whole flourishing that you see right now of open source models, that basically comes from Meta's generosity in giving LLaMA away for free. And if they hadn't done that, then you wouldn't see that.
- HSHarry Stebbings
Can you just help me understand, why are they so much more expensive?
- DKDouwe Kiela
So they're just bigger models. An- and if you have to, uh, train a bigger model then you need more GPUs, and GPUs are expensive.
- HSHarry Stebbings
(laughs) You're like, "God, Harry, this is like teaching a two-year-old." (laughs)
- DKDouwe Kiela
(laughs)
- HSHarry Stebbings
Because I told you, I'm like the only VC that's like not afraid to ask the most dumb questions.
- DKDouwe Kiela
(laughs) That's a very good question.
- HSHarry Stebbings
Um, honestly, it's a really good skill to have, by the way. Um, but I- I... You know, so we mentioned there kind of the five years and the 10 years out. We saw Elon's, um, petition. Uh, Emad at Stability very much, um, was in favor of it on the show. Um, where do you sit in terms of Elon's petition, and how did you read it?
- DKDouwe Kiela
So this is the petition where he asks everybody to stop working on AI so that he can catch up, right?
- HSHarry Stebbings
Yeah, for six months.
- DKDouwe Kiela
Yeah. So, so I, I, I think, uh, a lot of the n- narrative in the media right now is really driven by self-interest from, from, uh, a- a bunch of folks in the field. So the whole kind of existential risk, uh, debate, I, I think it actually comes from a very good place, and a lot of people are worried about this. And, and, uh, I think there is a non-zero probability of AI extinction risk. Um, and, and, uh, so it's something we need to think about. And a- a lot of smart people are thinking about this, like Benjo, and Hinton, and all of these folks.
- HSHarry Stebbings
What ma- what makes you say there's a non-zero chance, Daal?
- DKDouwe Kiela
Well, so, so there is just a non-zero but very, very, very small chance that there will be some sort of paper- paperclip maximizer scenario. Have you heard of this paperclip maximizer?
- HSHarry Stebbings
No.
- DKDouwe Kiela
Um-
- HSHarry Stebbings
Tell me more.
- DKDouwe Kiela
So, so if you give, uh, an- a very intelligent system an instruction like, "You need to make as many paperclips as possible," then it's going to turn everything into paperclips, um, and it's going to basically destroy the planet and turn everything into paperclips because that's its objective function. So in the process of maximizing paperclips, it will destroy everything else, uh, and turn the whole universe into paperclips.
- HSHarry Stebbings
(laughs)
- DKDouwe Kiela
Uh, so, so, eh, I mean, you, you can, you can see why I'm saying that these are very, very small probabilities, right? So probably the chance of, of me getting hit by lightning, like, right now is much higher than that happening. Um, so I, I think one of the issues I have with the whole debate around existential risk is that, uh, it's, it's really a tiny probability, but we're pretending like it, it's a massive issue. Um, and I think there are much bigger risks, right? Like nuclear war, and pandemics, and climate change. And those are things we should be focusing on much more. So the people who are pushing this narrative are really the people who are benefiting from this being the narrative. So these are the incumbent AI companies who are, who, who want to have either the market regulated, right? Because then they, they benefit because they can deal with the regulation, but small companies like mine can't. Um, or, uh, they are, uh, the folks who, um, uh, benefit from kind of fear-mongering in the broader public, where, where people start having a lot of respect for AI's capabilities and want to use it everywhere because AI is so smart that it might even kill us all. Um, so, so I, I think there's a lot of kind of dubious motives behind the scene there.
- HSHarry Stebbings
Can I ask you bluntly, are there one to two fear-mongerers who you think are the most successive?
- DKDouwe Kiela
Uh, are you asking me to name names?
- HSHarry Stebbings
Yeah.
- DKDouwe Kiela
Um, I, I... So I, I don't think I, I, I can single out, like, o- one individual. I think, I think... So there, there's a whole movement around this, and a lot of it is really motivated by, by just, like, smart people wanting to, to do something that matters. Um, but, uh, so my, my, my good friend, Kyung young Cho, uh, uh, did this nice interview the other day where he was worried about hero scientist narratives, uh, where, where a lot of, uh, people in AI just like to feel special and like we're sort of, like, saving the world by doing good research. Um, and, and so I kind of like (laughs) that narrative. Um, but it feels a l- a little bit, uh, naive.
- HSHarry Stebbings
I, I, I absolutely love that. Um, no, I totally get... You mentioned regulation there. I think a big question for me is, like, I don't think the chasm has ever been greater between private, like, company, uh-... knowledge, especially around kind of AI, well specifically around AI, and then also the regulator's knowledge which is significantly, uh, behind. How can effective regulation be set with such a large chasm between private sector knowledge and regulator knowledge?
- DKDouwe Kiela
Yeah, we have to, uh, invest a lot in educating regulators. Uh, and I, I think the AI community has been terrible at this. A- and, um, I think e- the broader populace just needs to understand much better what AI is and what it can do and what it can't do. Um, and- and I think, uh, it- it's been slightly self-interest driven, I think, uh, in that a lot of folks i- in AI have just wanted to keep the technology for themselves, and that's why they haven't really invested in- in educating, uh, the rest of society. Uh, so I- I think that that's really a- a huge issue. Uh, and so there's a bit of a side point there, but I think the people who tend to write the regulation, they generally don't really understand technology all that way- all that much anyway. Um, so if you look at like the Senate hearings with Zuck where these Senators-
- HSHarry Stebbings
Brilliant, brilliant.
- DKDouwe Kiela
... were asking him questions about the social network algorithm or whatever, and they didn't know what an algorithm was, right? So the- that- that's a big con-
- HSHarry Stebbings
(sighs) Didn't you love it? Like, "How do you make money?" "Adverts." Huh.
- 45:00 – 53:59
Can I ask you,…
- DKDouwe Kiela
where I was like, "Wow," like, "Why are they getting this much money?" There, there's a few of them where I thought, "Okay, they, they really have to live up to massive expectations now and they need to actually start making real revenue now." Um, because, uh, at s- at some point there's going to be a disillusionment with the technology and then funding might dry up, and then these places are really in trouble.
- HSHarry Stebbings
Can I ask you, when you look at the incumbent set, who do you think has the strongest strategy execution to date? Is it Facebook? Is it Google? Is it Microsoft? Is it Apple?
- DKDouwe Kiela
So I, I've been very impressed a- actually by, by how Microsoft has managed to turn everything around, um, by, by strategically collaborating with a, a better AI lab in the shape of OpenAI. Um, and, and they've just really turned that into this narrative where Microsoft is an AI leader again, uh, and they really weren't an AI leader, uh, even a few years ago.
- HSHarry Stebbings
On the flip side, who do you think's not done well and not adjusted to the new landscape?
- DKDouwe Kiela
So I, I'm still very curious to see what Apple will do. Um, I, I think they had this great vision of, of having, like, Siri on your phone and things like that. So that's like one of the first personal assistants. So if that could be a super powerful language model, then that could, uh, do very interesting things. But, uh, so far I haven't really seen a lot of interesting things coming out of Apple.
- HSHarry Stebbings
I wanna do a quick-fire round. I've peppered you with questions anyway, but this is like a more structured peppering with, uh, 60 seconds per one.
- DKDouwe Kiela
(laughs)
- HSHarry Stebbings
Uh, and so it's much more, you know, informed and the questions will actually be on schedule, unlike the rest of the, you know, last 45 minutes. Uh, does that sound okay?
- DKDouwe Kiela
Yeah, sounds great.
- HSHarry Stebbings
So what do others not know that you know to be true?
- DKDouwe Kiela
So I, I think others are underestimating how early it still is. So, uh, AI, uh, it feels like we've made so much progress that it's very hard to enter the market right now, and I, I think that that is just not true. It's still very early innings. We haven't settled on a, a lot of, uh, things that need to be solved before we can really have this technology be ready. Um, so it's still very early. And, uh, so me realizing that gives me a competitive advantage hopefully.
- HSHarry Stebbings
What do you advise founders who are building AI companies not in the Valley? Do they need to be in the Valley?
- DKDouwe Kiela
No, absolutely not. Um, I, I think, um, the, the Valley is kind of a, a, a dangerous bubble in a way, where, uh, there's this giant echo chamber, uh, happening. And, uh, I think if you look at a, a lot of great AI companies, they're not in the Valley and they don't have to be here. Maybe they should have an office here because there are great universities to recruit from a- and things like that. But, uh, I, I really don't see a reason why you would have to be here.
- HSHarry Stebbings
What would you most like to change about the AI community?
- DKDouwe Kiela
I think there's way too much hype, um, and, and I think it would be good if the community at least acknowledges that and, and tries to really pay attention to the things that matter, like having technology that actually works and not just jumping on the next hype train, and there's an AutoGPT thing that is going to change the world but doesn't actually work. And so there's, there's a lot of, um, debate right now that is just driven by Twitter, uh, and just kind of like sound bites and quotes li- like Emad, um, where I think it would be better to be a bit deeper and think a bit, uh, more carefully about what we're doing.
- HSHarry Stebbings
Some fantastic quotes though, weren't they? I mean, credit.
- DKDouwe Kiela
They were. (laughs)
- HSHarry Stebbings
(laughs) No, I'm, I'm with you totally. Uh, okay. Do you agree that some of the biggest businesses to be built in AI over the next years will be services businesses for large enterprises helping with AI implementation?
- DKDouwe Kiela
Yeah, totally. I, and I... So I don't know if those are gonna be new businesses or, or, uh, existing incumbents. So the hyperscalers are also trying to, to play that role. Um, but, uh, there's, there's just so much demand right now for AI in, in any kind of enterprise, um, and it's, it's still very hard to get it right. So a lot of these companies are just looking for help, and, and so yeah, there, there are just opportunities there.
- HSHarry Stebbings
You mentioned the philosophy degree earlier. How do AI and philosophy help each other in your day-to-day role?
- DKDouwe Kiela
Yeah, I-- I-- I'm very happy that I studied philosophy. So, philosophy is really about conceptualizing anything and kind of any arbitrary level of abstraction, um, and, and that ability, uh, you can use anywhere, right? So, um, for AI in particular, I, I think that, uh, philosophy and AI are, are an interesting combination because philosophy is about the stuff that you can't really do science about yet. So, at some point, uh, the things that people are philosophizing about now, they become scientific questions that just have answers or hypotheses, and then they're no longer philosophy, right? So, natural philosophy, that used to be a thing. We now call that physics and mathematics. And, and, uh, I think with AI, there are lots of questions that we still don't even really know how to ask yet, and philosophy is great for thinking about those kinds of questions. So, uh, yeah, philosophy of AI is a fascinating topic.
- HSHarry Stebbings
What's the strongest belief that you had which turned out to be wrong?
- DKDouwe Kiela
The strongest belief I had that turned out to be wrong, uh, is that I really underestimated how important scale is in artificial intelligence. Um, so, and, and I think this is really one of the things that OpenAI has excelled at, um, is that if you throw an order of magnitude more compute and data at O- AI, uh, systems, then they just become much, much better. And if you keep that scaling up, you have the scaling laws that we know about now. I, I really underestimated this, and, and for a long time, I was just saying like, "Oh, yeah, look at these, uh, silly, uh, OpenAI researchers. They're, they're just scaling things. They're not inventing new algorithms. That's not cool." Uh, and I was very, very wrong.
- HSHarry Stebbings
I, I'm gonna apologize in advance for this one, okay? I'm sorry. I'm asking it anyway. What do you think the timeline is for superintelligence?
- DKDouwe Kiela
(laughs) Um, so, so despite, like, Nick Bostrom's book, I, I still think that superintelligence is, is actually very ill-defined. Um, and, and, um, in, in many ways, we have already achieved superintelligence, right? So, in the '50s, we achieved mathematical superintelligence. So, computers in the '50s were already better at calculating stuff than humans. Um, so, so, uh, I, I don't think that that really is a, a, a well-formed question. Uh, if you're asking about AGI, right? So, uh, and, and I think AGI itself, a lot of people... And this, this is a mistake I made where I thought AGI kind of meant artificial consciousness or something like that, which also doesn't really have a meaning. But if you look at how OpenAI and Anthropic and these places define AGI, and it's as, um, um, systems achieving capabilities that allow them to, uh, effectively do the work of humans, uh, for the majority of economically valuable human tasks, then we're not that far away from it. Um, and so, uh, I think in the next, like, five to 10 year, that sort of economic displacement i- is, uh, likely to happen.
- HSHarry Stebbings
What's the most painful lesson that you've learned that you're also pleased to have gone through?
- DKDouwe Kiela
I think when I was young, I, I, uh, m- maybe put ambition before people sometimes, uh, and, and, uh, so I, I learned this the, the hard way, uh, I think, uh, where I, I just, uh, didn't have enough empathy for the people I worked with. Uh, and, and, uh, as I grew older and more mature, I, I realized more and more just that it's really all about people, uh, and, and working with fantastic people and, and doing cool things together and, and, uh, uh, changing the world together. So, uh, I'm very happy to have learned that lesson because that makes me much better at my job right now.
- HSHarry Stebbings
Final one. Ten years' time, if all the stars align, where's Contextual then?
- DKDouwe Kiela
So, if all the stars align, then, um, OpenAI and Anthropic and all of these places, they had this great first-generation technology, so they're kind of like the Lycos and AltaVista of search engines. And the technology we have is more like PageRank, um, and that would make us the, the Google, uh, of language models with the, the right technology at the right time and with the right execution. So, uh, that's what I would hope for.
- HSHarry Stebbings
Dao, I've absolutely loved doing this. I can't thank you enough for putting up with my wayward, blunt, sometimes very naive questions. I can't thank you enough for letting me invest in you, and I really appreciate the time today, my friend.
Episode duration: 53:59
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode 7uFZxHKLnto
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome