No PriorsNo Priors Ep. 5 | With Huggingface’s Clem Delangue
EVERY SPOKEN WORD
65 min read · 13,334 words- 0:00 – 1:53
Introduction
- EGElad Gil
(instrumental music) . Clem, welcome to the podcast.
- CDClem Delangue
Thanks for having me.
- EGElad Gil
Oh, thanks so much for joining us. So, um, we were hoping to start with your background, which I think is really interesting. You grew up in France, where you ran an electronics shop on eBay, and were just so prolific that you ended up earning an internship opportunity with eBay. How'd you go from that to image recognition and eventually to Hugging Face?
- CDClem Delangue
Yeah, it's actually quite a funny story, um, because I was one of the biggest French seller on, on eBay when I was working with them. I was kind of like the user-facing, uh, team member, and so they were sending me to all these, like, trade shows in France, uh, which, which were, like, the worst experiences ever, because at the time, PayPal belonged to eBay, and so we had a shared booth. And so all the PayPal users would come to the booth and basically shout at me, because PayPal was keeping their money or blocking their accounts o- or things like that. It was basically kind of like the worst days ever. Uh, but during one of, one of these days, I kind of like, uh, uh, bumped into a guy with, like, big round glasses, like looking very, very nerdy, and, um, I, I remember pretty vividly, he told me, like, "Oh, you guys are eBay. Uh, you acquired not so long ago a barcode scanning company, uh, called, uh, red, Red Laser, to recognize objects and be able to, you know, show listings to, to people. Uh, but it sucks. You guys, you need to know that's pretty soon with machine learn-" I mean, he wasn't calling it machine learning at the time, but "With, with these new algorithms, you won't even need the barcodes anymore. You'll just recognize the object itself." And at the time, I was like, "Who is this crazy guy?" Wasn't really, uh, paying attention too much. But at night, I kind of like did my research, um, and, and realized that he was a pretty
- 1:53 – 3:34
how Clem first became interested in ML, being shouted at by eBay sellers, and the foretelling of the end of barcode scanning
- CDClem Delangue
legit guy coming out of a legit, uh, engineering, uh, school in France, uh, with his small startups, uh, which raised a, uh, a little bit of money. Um, and, and one thing after the other, uh, I ended up leaving eBay, uh, to join this startup doing, doing machine learning for computer vision, and that's, uh, kind of like I, I, I made the move to, to machine learning. It was almost 15 years ago now, uh, but I, I, I don't regret it at all. Uh, it's, it's funny how, like, a single small encounter like that can completely change your trajectory.
- EGElad Gil
That's really cool, yeah. I think some other time I'll tell you a bit a- a- about a meeting I had at a trade show that completely took my life in a different direction too, so I think it's kind of odd how sometimes those things happen early in peoples' careers. Um, could you tell us a bit more about the early iteration of Hugging Face, how you decided to start it, the early days as a talking emoji and sort of where, where it went from there?
- CDClem Delangue
Yeah, absolutely. With my co-founders, Julien and, and Thomas, we kind of like always shared this, this passion, excitement for, for AI and, and for machine learning, um, and when we started Hugging Face, we were like, "Okay, what can we work on that is both scientifically challenging, but also fun?" We, we didn't want to, you know, do some- something boring, so we were like, "Okay, we're gonna build some sort of an AI Tamagotchi." Uh, we were heavy users of, you know, Alexa and Siri, and we were like, "Why is it so boring? You know-"
- EGElad Gil
(laughs)
- CDClem Delangue
"... why is it only talking about productivity stuff? Why is it, you know, just telling you the weather?" And so we, we started to build that, kind of like some sort of an AI friend, Tamagotchi AI, uh, basically what you, what you see, uh, in a lot of movies as sci-fi. Um, probably
- 3:34 – 5:36
early iterations of Hugging Face, trying to make a less boring AI tamagotchi, and switching directions towards open source tools
- CDClem Delangue
a lot of what people are using ChatGPT for t- today actually. And we, we did that for, uh, almost, almost three years, uh, got some, some level of traction, uh, billions of messages exchanged between, between users, um, and, and, and the chatbots. Uh, so that's, that's how Hugging Face started.
- EGElad Gil
And at what point did you decide to shift it towards an open source community and model repository? How did that come about?
- CDClem Delangue
It was three years in, um, after our, our seed round. Um, we've always been kind of like big open source people, so we've always kind of like open sourced, uh, part, part of, uh, what, what we were doing. Um, and then at some point, especially when, when transformer models started to work, when we started to see, you know, BERT, uh, getting, getting some, some traction, we just saw, like, the number of people using our open source, uh, just kind of like, uh, blow up and, and start, start to skyrocket, right? We went from kind of like a couple of people looking at it to hundreds of, hundreds of companies using it, uh, and so it was very, very progressive move, right? It went from mostly Thomas in the team working on it to, you know, a couple of team members, and very soon we realized, like, everyone was more excited about it, uh, and so everyone was, was, was working on it. Um, we raised our series A base- based on this early traction, and that was really kind of like the signal that, uh, we needed to put, uh, most of the efforts of the company on, on this new direction.
- EGElad Gil
That's cool. So this was open source that you'd already developed and put out into the wild, and then you saw people starting to use this, and then you said, "Wow, there's so much attention here. We should go and do that instead."
- CDClem Delangue
Exactly, yeah.
- EGElad Gil
Yeah, yeah. Yeah, it's always interesting to see these shifts in direction. I feel like that's every Stewart Butterfield company, right? That was Slack and that was Flickr before that. They just built something and then it kinda took off, um, sort of separately. Do you have any sort of advice to founders who are considering changing direction or thinking of new directions for their company, or how do you, how do you keep your eye out for the things that are really interesting or working that may or may not be the core thing of
- 5:36 – 7:39
advice for founders considering a change in direction, 30%+ experimentation
- EGElad Gil
what you're doing?
- CDClem Delangue
Well, I think the best way to do it is to find, like, the good ratio in your company between, like, exploitation and exploration, um, and, and I think that's where a lot of startups are not always getting right. Um, not, not only before product-market fit, but also after product-market fit. I feel like sometimes companies, before product-market fit, are experimenting kind of like too much, changing directions every week, uh, and, and I don't think you learn a lot from, from that. And then after market fit, uh, product-market fit, they kind of like stop-... experimenting and, and kind of, like trying new things, and trying to stay away from the local optimum in a way, and, and looking more for, like the global, global optimum. So for us, so what, what we've always done, a- a- and I think we'll always do, um, with, with Hugging Face, is to make sure that, you know, whenever, always kind of like make sure to spend at least like 30 or 40% of the company's efforts on explorating new things, and, and kind of like finding the long-term bets that is going to make you, make you successful. Um, and then give these, uh, you know, experiments and initiatives, like a chance, right? Uh, for us, we were lucky that, uh, Thomas, one of our co-founders, was leading this kind of like, uh, experiment, uh, so it m- it made it easier. Uh, but we have examples of other initiatives that started as experiments, uh, from team members, um, who made it and, and graduated to, uh, a very big bet, uh, for, for the company. One example of that is, is Spaces, uh, which, which are our machine learning demos. They've been insanely successful. We just crossed 50,000 of machine learning demos in, in the past year, year and a half on that. And it started just from one team member kind of like experimenting with that and, and being like, "Oh, I think I can build something cool there." Uh, and one step after the other, it, it led to where, where it is today.
- EGElad Gil
That's really cool. Yeah. It seems like in general, um,
- 7:39 – 10:47
1st users, MLTwitter, approach to community
- EGElad Gil
companies that iterate or l- launch new things early keep launching things later in the life of the company, and companies that never innovate early don't ever innovate again in their lives. It's kind of like the difference between eBay and Stripe, or, you know, you can name different companies. And so, it's, it's awesome that you folks are investing really early in that innovation. Um, when you first, um, started getting traction with what Hugging Face does now, did that happen organically, and it just started growing and taking off? Or did you reach out to specific communities? Or how did you first get those, those first users to, to use your open source platform?
- CDClem Delangue
The distribution really started on, on Twitter at the beginning. Um, we just started to, to tweet about some of the things that we were doing on open source, and, and people retweeted the, the... The machine learning community on, on Twitter, uh, was already pretty, pretty strong. And then it kind of like snowballed, I think. Classic kind of like network effect kind of things, where, um, you know, researchers started to share their models, um, and, and obviously like they were getting visibility for, for their models, so people in the industry, in companies using these machine learning models were hearing about Hugging Face through that, um, and then, you know, were asking more researchers to, uh, add their models to, to Hugging Face. So kind of like more, uh, typical kind of like marketplace, uh, net- network effect. And then something that we did that I think, uh, worked really, really well for us is, um, that we never hired any kind of like community manager, any kind of like communication, PR, uh, kind of team members, uh, because we wanted it to be part of, uh, every single team member's work. Um, even kind of like the most technical specialized scientist, um, we've always told them, like, "Okay, it's, it's part of your job to interact with the community, to share with the community, to, uh, get visibility for, for what you're working on." And so we ended up with this organization where, um, talking to the community and, you know, getting visibility is, is part of everyone's job, instead of, like, outsourced to, uh, to a team. And I think that, that's one of those things that, uh, people also appreciated with us from the community, because they could really talk to the builders directly and the people doing the things, and it, it created kind of like more meaningful interactions, I would say.
- SGSarah Guo
Yeah, it's clearly really authentic to, um, Hugging Face's culture, the sort of commitment to community and open source. And I remember hearing that, uh, you know, every... I don't know if it's everyone in the company, but many people in the company run the public Twitter.
- CDClem Delangue
Yeah, everyone, everyone in the team has access to the Twitter account, and are tweeting from the, from the Twitter account.
- SGSarah Guo
Yes. I, I think most organizations are not capable of that sort of risk-taking. Um, so, um, that is, that's really cool. Um, what else w- what else do you think you guys have done right on the sort of community growth aspect? Because I, I think now everyone knows that's such a powerful driver for business for an increasing number of technology companies, but it's, it's pretty hard to actually
- 10:47 – 12:54
enterprise ML maturity, days to production
- SGSarah Guo
execute against.
- CDClem Delangue
Um, that's a good question. I think timing. Uh, obviously we've been really, really lucky with, with, with timing. Um, you know, trying to listening, listen, listen, it sounds a bit cliche, right? But, uh, actually listening to, to the community and implementing what, what the community is, is asking. Yeah, and, and then just like build your culture around it, uh, to have people who are, like, excited about contributing to the community, um, even independently of, of everything else. You know, I think sometimes you have companies where, you know, they're doing community or open source work, but it's almost as like a mean for other things, and there's like, sometimes feels like they're almost like, um, they have to do it to get other things that they're more excited about. For us, it's been useful to try to hire people who are genuinely excited about this work, and, and if they were to do, they could kind of like almost work for free, uh, for the community on, on open source and they'd, they'd be happy about it. Um, and, and so that, that creates like the right culture for this kind of work, I feel like.
- SGSarah Guo
I feel like one of the roles I see Hugging Face play is as this conduit for this, like, amazing, you know, pace of research in terms of ingest into industry. And it's interesting to hear you say that, you know, you released Transformers as a project, and you had a bunch of companies, um, you know, the, the, um, sort of open source model of trans-... model that you guys released for Transformers, and you had a bunch of companies using it. It feels...And tell me if this is not your understanding, but it feels to me like there's a huge distance between where your average enterprise is with their machine learning journey and all the amazing cutting-edge research being shared on Hugging Face. Like, how do you reconcile that and how does that gap close?
- CDClem Delangue
Um, I mean, I think first, uh, compared to traditional science, this gap in machine learning is extremely tiny. My, my co-founder
- 12:54 – 15:56
open source vs. proprietary models
- CDClem Delangue
Thomas who, who did his PhD in quantum physics and some research in quantum physics before, uh, could tell you way better than, than me about it. But in, in traditional science, uh, it's really used to be the norm that, you know, you would have some research and some research paper and it wouldn't make its way into production before, you know, 10 years, 20 years. And what we're seeing in, in machine learning is that it's actually making its way into production after, you know, a year, a few months, a few weeks, uh, sometimes a few days now. So this is... In my opinion, this is, this is amazing and that's what's driving most of the speeds of the progress in, in machine learning. Uh, that's actually why I'm, I'm excited to keep investing so much on open source and sometimes a bit worried about, uh, more proprietary models coming up is that I think if you, you know, remove the open source from, from that, from that equation, if there wouldn't have been as much open source as there's been in the fi- past five years, we would be, like, decades away, uh, from, you know, where, where we are now. And I really hope in the future that we'll keep this very, very fast virtuous cycle, uh, iteration loop between, between science to production, production to science, um, because to me it's the main driver for the speed of progress in machine learning.
- SGSarah Guo
So I think a lot of people share this general vein of concern in the ML community that, of course we have this wealth of open source models, but large transformers based models tend to get better when they get bigger and they can be prohibitively expensive to train. So, there is a concern that the state of the art, um, which unlocks a bunch of use cases will be in proprietary labs like DeepMind or OpenAI or Anthropic, what have you. How do you think about the performance of what's in the open source versus the state of the art?
- CDClem Delangue
So I mean, I think first, um, sometimes we tend to say like, okay, open source wins or proprietary wins. Um, the truth is that there's always going e- going to be both, right? Uh, I think if you see most, most technologies, you know, if you, if you look at search, you always have the Elasticsearch and the Algolia or like if you look at databases, you have the MongoDB and the proprietary approaches. Uh, so I'm, I'm not too worried about, you know, one winning against, against the other. I think there's always going to be both, and I think the way it works is, is very, uh, you know, similar to how science has, has always worked, um, in the sense that in some specific area, uh, sometimes you're gonna have proprietary approaches that, you know, have taken some advances and have, have gone faster for X or Y reason. You know, for example, that's the case right now maybe in, in text generation, right? With like ChatGPT being, like, more, more, uh, giving
- 15:56 – 19:12
main model tasks, architectures and sizes
- CDClem Delangue
better results than, than open source approaches. Um, and then on other domains, like for example, you know, text classification, uh, information extraction or you build, like, image generation with stable diffusion and stuff like that, open source is ahead of, of proprietary and probably it's gonna flip in like a few weeks, uh, to, to see other way around. And, and that's the case for all the, all the tasks. So it's kind of like the race wi- a race with like, uh, uh, you know, dozens of, of, like, racers and sometimes one is going ahead of the other, but it's at, at the end there's always going to be like, uh, both approaches. I don't really believe in this scenario of like one model, uh, one, one company to, to rule them all. Um, and one, uh, kind of like data proof that I see is that on Hugging Face, uh, we just crossed 250,000 models, right? A qu- quarter of a mi- of a million models uploaded by almost 15,000 companies now. Uh, and I don't believe they're building models just to build models, right? If there was one, one model that would be better, uh, for everything than, than the others, they, they, they wouldn't. Um, so I think, I think, uh, you know, we're always gonna be in a world where there's gonna be, uh, multiple models, multiple, multiple companies. Um, especially because, um, wh- when you look at why are companies using so many models on, on Hugging Face, you usually realize that, uh, you know, a more specialized, uh, model is, uh, more efficient. Uh, it's cheaper to run, it's usually faster to run, uh, and most of the time actually more accurate for, for a specific use case. So that's kind of like, um, what, what we're seeing now and that's also, um, to be honest what we are hoping to see in the future because when we think, you know, we, we, we build or we fund startups to see the future that we wanna, we wanna see, right? Uh, and, and personally, uh, I'm more excited about the future where machine learning is, uh, available for everyone and everyone can build machine learning, uh, versus a world where it's very concentrated and, and monopolistic.
- SGSarah Guo
So I have to ask you 'cause you have this amazing viewpoint into what's happening in the community. Of those 250,000 models, like can you characterize the sort of distribution of like h- what percentage maybe is like image versus language, other modalities and then sort of, uh, from an architectural perspective like diffusion transformers, other, other interesting approaches?
- CDClem Delangue
Yeah. The, the three main tasks right now are, uh, NLP, so text, right? From, like, information extraction, text generation, text classification, um... The second one is, uh, text to image and computer vision, right? So object detection, text to image, text g- uh, image generation. Uh, the third one is audio, so like, uh, speech to text, text to speech, uh, information extraction, but from, uh, from audio rather than, than text. Uh, and then, uh, we're starting to see more and more models, um, on time
- 19:12 – 24:16
decentralized infrastructure, data opt out
- CDClem Delangue
series, right? So for example, the ETA from Uber, when, when you get your Uber is like a transformer time series. Um, or, like, financial models for, for fraud or, like, this, this kind of use cases. Um, and then biology and chemistry also we're starting to see more and more models, more and more data sets, and, and more and more demo- demos there. So that, that would be kind of like the, the main buckets. And then what's interesting is that you have all sizes of models from, you know, like a few, uh, million parameters up to 180 billion parameters, right? The biggest, biggest open source models out there.
- SGSarah Guo
And do all the sizes get used?
- CDClem Delangue
Yeah. It's, it's pretty, um... Yeah, it's pretty distributed. That's, that's something we kind of like always look at to inform our, our thinking. It's very, uh, distributed because, you know, it depends on your use case, uh, what you wanna use. If you... For example, we have Bloomberg as, as users, right, and, and as, as customers and, um, in the Bloomberg terminal, the more real-time, the better for, for them, right? Um, and so because they want to be real-time and be... have, uh, as little latency as possible, um, they wanna use kind of like a smaller model that is automatically going to be faster than, than the bigger, bigger model. So depending on the use case and, and others, you know, and, you know, some companies that wanna build something very general, very, you know, able to apply to a lot of different use case from, you know, customer support to the meaning of life and who don't care so much about latency or cost, uh, for, for it, then they, they can go for, you know, like a, a bigger model that makes more sense for them.
- EGElad Gil
Are, are there specific areas or trends you're most excited about from either a research perspective or from a model implementation perspective?
- CDClem Delangue
I mean, I'm, I'm really excited these days, it's, it's a bit of an unsexy, uh, thing to, to say, but, uh, by, by the infrastructure side of things, um, because, you know, I think so far, uh, as a whole for like the machine learning domain and, and ecosystem, we haven't thought too much about, you know, uh, what it costs to run some of these models, um, you know, how, how fast they can go, how slow they can be. Um, and, and I, I hope that, uh, this year there's gonna be some sort of like more, more clarity a- a- around that, um, to make sure that, you know, as, as a community, uh, we build something, you know, healthy and, and sustainable and, and not... You know, I feel like sometimes in the field there's something that I call the, the cloud money laundering where you almost kind-
- EGElad Gil
Yeah.
- CDClem Delangue
... of like disconnect the, you know, infrastructure cost to, like, the actual, actual use cases. Um, I think as, as the field is maturing you're gonna see much better alignment between the two and, and I'm actually excited about that because I think it's gonna be a big enabler for, for the fields in the, in the long run.
- EGElad Gil
Yeah. That makes a lot of sense. I guess from a infrastructure perspective or tooling perspective, is there anything that Hugging Face isn't directly working on, so it's not gonna be, you know, competitive with you all, that you really wish existed or that people are working on more actively?
- CDClem Delangue
Um, something we've worked a little bit on but we haven't really managed to make it work, uh, and I'd be excited to see more teams working on it is, is to create some, some more like, uh, decentralization on the infrastructure side. Because right now it's very centralized, uh, both in terms of like, uh, players, uh, but also in terms of like, uh, uh, timing for example. Like most of the time, you know, the way you, you build models is that you train them once and then, you know, maybe you're gonna train it again six months later or a year later, which sounds kind of like archaic in, in a way and, and that's creates a lot of challenges like not being able to be current, right? Like a lot of these models like they don't know who's the current president in the United States or like stuff, stuff like that. Um, so having more kind of like decentralization, more online learning ways of kind of like, uh, you know, going from big training to small more regular training, I'm, I'm really excited about that. Um, and the second thing is that I'm really excited about creating more, uh, consents from people in the data set. Um, so for example we've been working with a project called BigCode which has been released, uh, code generation, open source code generation models on the ability for, uh, developers to opt out from the data set that the model is, is trained on. Um, in a similar way, we're starting to see on Hugging Face more and more opt-in data sets, meaning like data sets that have been trained only on data that the, uh, creators
- 24:16 – 28:09
Hugging Face’s business model, GitHub
- CDClem Delangue
of the data have, uh, consented to, to having a model trained on. Uh, it's very interesting for example for text to image models where there's a lot of, you know, debates right now with the underlying work of the artists being used in, in the training. So I'm e- I'm excited to see more and more work around, around that too this year.
- EGElad Gil
Will you explain what BLOOM is and more broadly, like, how you decide where Hugging Face should be a first-party participant in, in a model training or research?
- CDClem Delangue
Yeah. BLOOM, um, is the result of an initiative called Big Science, which also led to Big Code that I just, just mentioned. Um, and Big Science was, like, the largest, uh, collaboration in, in machine learning to date, with like a thousand researchers from 200 organizations kind of, like, coming together in order to, uh, build and train a large language model completely in the open, right? So everything was, was publicly available. You can see all the runs that they did, all the brainstorm that they did to get to the decisions that, that they did. And it, it's been really excited to, to see that kind of like almost, uh, building organically with ou- with our support and the support of o- other organizations, uh, like Gen C for example, which is a French computer that, that provided the compute, compute for, for this. And it informed a lot our, our thinking around, uh, ethics and, and kind of like openness, uh, because one, one of the reasons why we're so focused on, you know, open source, um, and, and open science at, at Hugging Face is that we believe, like, the two main challenges with AI today are one, kind of like the concentration of power and, and second, uh, biases that are encoded in, in these models. And for both, we kind of like learned that building in the open, with open source is actually, uh, more part of the solution than part of the problem because obviously control of power as you, uh, democratize it much more and biases you actually include in the process people who are impacted by these biases, especially underrepresented populations, which is kind of like otherwise really hard to do if you only do the work, uh, behind closed door in a lab, uh, in Silicon Valley with mostly old white dudes. Um, and so that informed a lot, kind of like our thinking around, um, open source and, and open science.
- EGElad Gil
There's a lot of excitement in the research community and amongst entrepreneurs on, like, increasingly reinforcement learning with human feedback. Does that impact your strategy at all at Hugging Face, like beyond... you know, if that's the next step beyond this pre-training?
- CDClem Delangue
Yeah, it's, it's a very interesting kind of like, uh, additional step, um, on kind of like a classic machine learning pipelines, uh, that we've been invested in for quite, quite a while. I think the first reinforcement learning with human feedback, uh, models added to, to the hub, I think it was like eight, eight months ago. So, so way, way before it was kind of like as, as popular as, as now. Um, we're leading the development of, um, of, uh, an open source library, uh, that is helping companies integrate that into, into their mo- models and into their, into their, their workflows. Um, it's really, uh... yeah, it's a really exciting new, new development as we see, we've been around the block a little bit, so each, uh, each every few months there's always kind of like a new thing. Uh, that's one of the challenges of, of building a machine learning startup these days is that you have to have the flexibility to constantly evolve. Uh, it's been the same when
- 28:09 – 37:25
What Clem is excited about in AI
- CDClem Delangue
like diffusers started to kind of like, uh, uh, pick up, right? And you started to see this new, new generation of models. So each time we kind of like adapting to it, trying to empower, empower the community to, to be able to take advantage of, of these new progress in, in machine learning.
- EGElad Gil
That's cool. Can you tell us a bit more about, um, Hugging Face's business model today and how that's gonna evolve over the, over the coming years?
- CDClem Delangue
Yeah. So I'm really excited about, uh, business models for machine learning startups this year. I think it's gonna also, speaking of like how the field can mature, I think it's gonna be the big thing this, this year. And, and for us, um, as, as a platform, um, our vision is that, um, the model is, is going to be probably, uh, at the high level, uh, fairly similar to a, to a freemium model as, as you would expect, right? So right now we have like 15,000 companies using, using the platform. Uh, and we have 3000 companies paying us, right? So you, you see like the, the majority being, um, open, being, uh, free, free usage and, and some of them, uh, paying. Now the big question is, you know, where is the, the limitation, uh, between, between the two? Um, obviously we're starting to see a lot of interest for security features, for, uh, compliance features, especially for like bigger companies, like when we're talking about Bloomberg, for example, using us or Meta using us, both being kind of like customers. Uh, so that's, that's one kind of like way of delimiting it. Uh, we're starting to see or saw a lot of, uh, interest in more kind of like, uh, of our features around, around infrastructure. Uh, so for example, you can, you can upgrade your spaces to, uh, GPUs, uh, or you can use our inference, inference endpoints. There's an in- interesting thing there too, um, around, around compute and, and infrastructure. It's kind of like a little bit more, more early, but, uh, we, we are seeing like a lot of interest of companies there, uh, around like helping them optimize. Uh, we, we talked a little bit about the machine learn- the infrastructure cost of, of machine learning. So we are seeing a lot of interest from companies in order to help them optimize that for their use cases.
- EGElad Gil
Yeah, that's really cool. Yeah, it sounds like, um, to your point, there's a ever-evolving field here and you know, it's kind of interesting because, um, GitHub is obviously an imperfect analogy, but an interesting one because there are so many things that they could have done, some of which they're doing now, some of which...
... you know, they're, they've sort of foregone. It's everything from the GitLab opportunity, in terms of providing, like, a on-prem enterprise approach, um, supply chain monitoring and open source software, th- so things like what Snyk or Socket or others are doing. Uh, profiles, developer transitions and tooling. Like, there's so much around this sort of product, in terms of all the things that you can add, both as something that's very valuable to your users, but also as potential almost, like, lines of business. How do you think about some of those missed opportunities for GitHub, and how they may or may not apply to you?
- CDClem Delangue
So first, I think, um, Git- GitHub tends to be, uh, a little bit, uh-
- EGElad Gil
It's an imperfect analogy, but yeah.
- CDClem Delangue
Yeah, yeah.
- EGElad Gil
Yeah, yeah.
- CDClem Delangue
Yeah, yeah, but, uh-
- EGElad Gil
Yeah, yeah.
- CDClem Delangue
... I, I, I think what, what I want to say is that, um, uh, GitHub is, is a... is, is an amazing product and amazing company. Um, they have, like... I think they just announced that they passed 100 million, uh, software engineers using them. They crossed a billion dollar in, in annual revenue. Uh, so when, when you talk about, you know, money and GitHub, for me, like, the, the... if there was a mistake, it's probably to have sold it, uh, too, too early to, to Microsoft.
- EGElad Gil
Yeah. No, they, they did an amazing job, so my question wasn't at all meant to say anything bad about GitHub.
- CDClem Delangue
Yeah.
- EGElad Gil
It just, um, felt like there was... it's one of those things where there's so much you can do-
- CDClem Delangue
Yeah, yeah.
- EGElad Gil
... and it's very hard to choose from it. So, I agree with you. Like, amazing company, amazing what they've accomplished. So, I was more just... there's, there's all these other things that could be done, and I'm just sort of curious how you think about that, so.
- CDClem Delangue
Yeah. I mean, one big things that we're thinking about, and that's, like, related to what I said just, just before, is that, uh, maybe, uh, GitHub could have gone a little bit earlier in the compute game and the infrastructure game. Because what you see with these platforms, and I think it's the same for us, is that when you get so much usage, so much network effects, and, and you actually, like... for a lot of these projects, you're, like, the starting point of, of projects. Like, the same way, I think, probably companies are starting on GitHub to see the open source projects before starting a project. For us when... for machine learning projects, we see companies starting with HuggingFace, trying to find a model, find a data set, find a demo in HuggingFace, and then taking their infrastructure decisions based on that. So, when you're so early in projects, you can become sort of a... some sort of a gate for, for compute, in, in my opinion, or a gate for, for infrastructure. And I think that's something that GitHub started to work on pretty late in their journey. Um, so for us, we're trying and testing that a bit, a bit earlier. We already start to have kind of like some, some infrastructure products. We have, like, uh, amazing collaborations with three cloud providers, uh, three big cloud pro- providers. Um, um, and so that's maybe a... i- if I had to point one, it could, could be this one, like kind of like, uh, uh, testing the ability to become kind of like your gate for compute and monetize with infrastructure earlier than, than they did.
- SGSarah Guo
Um, maybe just to zoom out before we, uh, run out of time here. What are you most excited about in the next year of AI or, you know, expanding into the next five years?
- CDClem Delangue
Um, I think, I think, um, uh, we, we talked a little bit about it in the past. Um, I'm really excited about, uh, biology and chemistry for, for machine learning. Um, because I think... the way I see machine learning is really as kind of like this new paradigm to build old tech, right? It's, it's kind of like this analogy from Anish Karpati where software 1.0 was like the first paradigm, and now we're in software 2.0, which is like machine learning power technology building. And so if you look at, you know, like the big sectors and, and the bi- big kind of like impactful topics, um, that it could... it could change, um, obviously biology and chemistry are, are kind of like, uh, up there. So I'm... and, and we're seeing, uh, kind of like the numbers of models and data sets and demos on, on HuggingFace increasing. Like a few days ago, there was a release of, uh, BioGPT by, by Microsoft. Meta has been doing a lot of work on, uh, protein generation and prediction. Um, so I think there's gonna be really, really cool stuff coming up, uh, on, on these two topics.
- SGSarah Guo
Uh, are there particular application areas, like within biology and chemistry, that you think are gonna emerge?
- CDClem Delangue
No, I've stopped, uh, trying to predict, uh (laughs) -
- SGSarah Guo
Smart. Yeah.
- CDClem Delangue
... it's, it's proven to be, uh, to, to be too difficult in, in machine learning. Um, I've made too many, too many mistakes in the past, so now I'm, I'm not, not taking the risk anymore. Uh, but I'm, I'm particularly excited, uh, about what we call kind of like full stack machine learning companies. Really kind of like, uh, companies like, uh, what, what we've seen more like in, um, you know, the Domains, you know, like, uh, Runway, like Grammarly, like Wombo, Photoroom, like Stability. Like these companies that are not just like using machine learning, uh, but really kind of like building, building machine learning. Because I think, you know, the same way as like for software 1.0, there was like the companies more like using a Squarespace or like using something to build a website, and then there was the companies who are like really building technology. I think we're gonna see the same thing for, for machine learning. And when you look at the capabilities of some of these companies, when they're like translated into product building, um... I mean, Runway, you've all seen, like, the, the videos of, of Runway. I think that's, that's, that's amazing, and I think they're going to be able to really challenge, challenge the incumbent thanks to this ability of, of building, uh, machine learning as machine learning native companies.
- SGSarah Guo
Yeah. And I'm sure Elad sees more of these than I do, but we're gonna have, uh, Daphne from insitro come onto the podcast, uh, later in the season.
- CDClem Delangue
Yes.
- SGSarah Guo
And I think, uh... I definitely think there's a, there's a generation of, um, companies trying to leverage these capabilities that are, you know, trying to build drug IP, right? Or new platforms, um, for diagnostics or, or vaccines or, or treatments that are really interesting. Clem, that's all we have time for today. Thank you so much for joining us on the podcast.
- CDClem Delangue
Thanks for having me.
Episode duration: 37:25
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode dvKdTzYQJCc
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome