Lessons from Building Open Source Libraries

During last month’s NeurIPS 2025 conference, YC’s Diana Hu sat down with Thomas Wolf, co-founder and CSO of Hugging Face to discuss his unconventional journey from physics and law to building one of the most influential open-source AI platforms. They discussed why open research accelerates innovation, the real challenges of turning AI demos into products, and how great open models and the application layer unlock the biggest opportunities for founders. Apply to Y Combinator: https://www.ycombinator.com/apply Work at a startup: https://www.ycombinator.com/jobs Chapters: 00:00 — From Physicist to Hugging Face Founder 01:50 — Switching Careers 02:45 — How Hugging Face Was Born (Almost by Accident) 04:50 — The Limits of Closed Models 05:45 — Why Demos Often Don’t Become Real Products 07:05 — Fine-Tuning vs. Scaffolding: Startup Tradeoffs 08:40 — Turning Research into Widely Used Products 09:50 — Designing Great Developer Experiences 11:55 — The Future: Open Models and the App Layer

Diana HuhostThomas Wolfguest

Jan 16, 202614mWatch on YouTube ↗

EVERY SPOKEN WORD

15 min read · 2,678 words

0:00 – 1:50
From Physicist to Hugging Face Founder
1. DHDiana Hu
  [upbeat music] I'm excited to welcome today Thomas Wolf, the co-founder and chief science officer of Hugging Face. We're here today at beautiful San Diego for the NeurIPS 2025 conference. So let's get started. Thomas, you had a very unusual career path before you became the founder of one of the best open source AI companies. You studied originally physics, then even did law, [laughs] and then started, uh, Hugging Face. Tell us about how, how that journey shaped eventually Hugging Face.
2. TWThomas Wolf
  Yeah, I mean, ev- even in physics ... So I studied-- I was at Berkeley. I was working on laser fusion interaction. In, in some of the team, I began the team that did the fusion experiments at Livermore, and then I worked on superconducting material. So even the physics part had multiple life. Uh, I guess one of the thing for me was always to try to work with people I wanted to work with even more than what I was specifically working on, so that was one of the defining moments. And then I just think life is too short to do just one thing, right?
3. DHDiana Hu
  Mm-hmm.
4. TWThomas Wolf
  So once you, once you, once you did one-- once you do one thing for six years, like so I did PhD, post-doc, and I was like, "I want to do something else." I really love writing. I was like, I want-- "I'm very interested in law. Why not switching?" So I switched to become a lawyer. Very different type of things, like your time. You have a, you have a, an amount. You know, your time costs three hundred dollars. Like, you start counting the hours. Totally the opposite of PhD, where you spend your hours on, like, nothing or, like, s- fall in rabbit holes for two days or something like that, so very different. I think each of
1:50 – 2:45
Switching Careers
1. TWThomas Wolf
  these kind of taught me, taught me something in retrospect, like, like, like science is this, this idea that you can go really deep in one topic and you can actually explore all these things to build. Law is this idea that you-- for me was this idea that my time is valuable, and I should assign it very well. And then, and then Hugging Face was kind of by, uh, by accident. I, I fell in entrepreneurship kind of like I needed a job in the US basically at that time, so this company was being created, which was a game company. And I joined, and I just started to do this exploration deep learning that became this open source library that went viral. And then we decided to pivot the company around this open source library. And then we kind of found our mission along the way. That's, uh, building this idea that community, open source, open science, there's kind of a mutual win-win that's possible in the AI world instead of just racing. Uh, and there is this idea that distribution of power is actually extremely powerful and, and power catalyzing ecosystem
2:45 – 4:50
How Hugging Face Was Born (Almost by Accident)
1. TWThomas Wolf
  instead of just cannibalizing them, which is what you try to do. We try to be this platform. On top of it, people can build billion dollars company. I think it's very, very, uh, exciting.
2. DHDiana Hu
  That's very cool. I mean, following this, you've been one of the first proponents of really open research as opposed to the efforts that the big labs do. They keep it all private.
3. TWThomas Wolf
  Mm-hmm.
4. DHDiana Hu
  What do you think are some of the things that the open source community does a lot better f- to push AI forward? What are the strengths, and also what are some of the limitations?
5. TWThomas Wolf
  Yeah. I think open source is probably the best thing that computer science brought to humanity. It was created really by, you know, this, like, couple of, like, deep hardcore computer science people against a lot of other possibility. Um, I think this is even something that I want to see applied in, in much more fields of research. I think there's many advantage. Uh, there is obviously the collaboration aspect, which is, like, if something is open source, in particular code, you can try to tweak it. And, like, the way AI research is, is, is progressing is still, like, something take this model and try to change positional embedding of things, so you need something to start from. If every day-- If every time you need to reinvent the whole model because everything's closed, it's just much slower. So it, it catalyze, accelerate progress. But then more than that, I think it, it's allowed to explore many areas. So just before the interview, we were talking about interactive world model and gaming and how gaming can re- be reinventing. The only way to really, you know, be able to do that, or one of the way at least, is you take a very good pre-trained model, for instance, an image, and you just add some interactivity component, and that's a very quick way to actually explore, you know, alternative use that maybe the person who trained the model didn't even thought about, right? So there is this kind of, uh, opening of possibility. Like, the model is
4:50 – 5:45
The Limits of Closed Models
1. TWThomas Wolf
  created. It's a very strong base. You basically ... People have distilled in it, like, 100 million hours of GPU and data, and you can use that as a very powerful starting point to explore new use cases. And if you don't have access to the model, you're much more limited, right? Just like ChatGPT, you can basically use it for what OpenAI intended you to use it. But if I want to use it outside of the training domain, I don't know, like, very, um, DSL, specific DSL they didn't train of, it doesn't work basically. So the only option is to have access to this and tweak it. So I think it's really vital for exploration, creativity, and, and I think creativity, like entrepreneur- entrepreneurial creativity as well. Yeah.
2. DHDiana Hu
  So one of the things now following on, on this to the audience that are watching here, a bunch of founders and builders, you've seen many people build on top of all these
5:45 – 7:05
Why Demos Often Don’t Become Real Products
1. DHDiana Hu
  open source models on Hugging Face. It can be easy to get a cool demo quickly.
2. TWThomas Wolf
  Mm-hmm.
3. DHDiana Hu
  But it's actually hard to get something that actually passes the bar to build a good product.
4. TWThomas Wolf
  Yeah.
5. DHDiana Hu
  So what do you think are the things that the audience should think of, things that look good but don't necessarily translate into real products for, for users?
6. TWThomas Wolf
  I mean, to be honest, I think it's the same with closed model. So most of the company, right, when you want to build your product, it- most of the time you won't fall exactly squarely in this, like, good example where ChatGPT works. And what happens, so they start building this scaffolding all the time and say, "Oh, we need to pre-process. We need to be careful about this edge case because now it doesn't work." And so most of the company I've seen building, be it on closed source on top of open source, they, they have to anyway, like, in a way know their domain knowledge, their domain really well, and just, like, do kind of the productionalization of this model themself, and it involves scaffolding. And hopefully as the model improve, they are more reliable on a wider range, and you, and you can even, like, some cases of course you can sell your data and, like, hope that the next generation of model is gonna be better because you get then more data to fine-tune. But I think we're still at this stage where it's
7:05 – 8:40
Fine-Tuning vs. Scaffolding: Startup Tradeoffs
1. TWThomas Wolf
  quite rare that you can take a model out of the box and expect it just to work in production environment. Um, the- just the way you do it between closed source and open source is quite different. Closed source, you mostly work with this kind of scaffolding, I think. In open source, you have the option to fine-tune, uh, maybe to train it. And of course, the, the, the associated drawback is that it's a non-trivial, it's still like a non-trivial, I would say, operation to do. So you, you ... As a, as a young startup where you're just, like, three, you have to know, you have to know if you want to allocate some of your time to fine-tune the model or not. And so I would say if, if it's a core thing, like, like this interactive world model or whatever, you- fine-tuning is a core thing because the feature doesn't even exist, right? But in many case, you probably also want to, to scaffold or, or, or use ... I mean, more and more, by the way, this, uh, fine-tuning I think is something that you can, you can find framework that help actually do, like Tinkered, like a couple of new company emerging that try to help you do that. So probably it's something you won't have to, to fully do in-house. But yeah, I think in any case, there is no super shortcut from demo to production. It's still a painful process. But that's part of the mode and the knowledge and the, like, what you're building is also the pain. Also mean that once you made it work, you actually solve something non-trivial, and that's where your company value lie.
2. DHDiana Hu
  So talking about pain-
3. TWThomas Wolf
  Yeah. [laughs]
4. DHDiana Hu
  ... you've been very good at delivering at Hugging Face these cutting-edge research ideas and turn them into actually widely used tools and APIs, especially with your role as the
8:40 – 9:50
Turning Research into Widely Used Products
1. DHDiana Hu
  chief science officer. What is the hardest part about turni- turning research into real products? I mean, we talked about pain.
2. TWThomas Wolf
  Mm.
3. DHDiana Hu
  What, what are some of the nuts and bolts of how to, how you get the best out of your teams and ship these products?
4. TWThomas Wolf
  I mean, in practice I'm chief science, but what I really see myself more in terms of open source or when I created Transformer dataset was kind of a chief product officer, because this is kind of a product.
5. DHDiana Hu
  Mm.
6. TWThomas Wolf
  And your clients is basically the open source community. And they're very difficult to please as clients because everything is free, so they want actually the best. It's very easy for them to move from one to the other framework if they have to. So you really have to, uh, think a lot. And I, I think for me, the key thing I've always tried to work a lot on is, um ... I think there's probably two, and they're both equally important. The first one is the kind of onboarding experience, which is your first experience with a library, the number of abstraction you will have to learn to be able to master it, the, like, clarity, like the, the distance from I download this library to I run the
9:50 – 11:55
Designing Great Developer Experiences
1. TWThomas Wolf
  first kinda ex- nontrivial example of what I want to do, which is this example where you're kind of, "Wow." You're like, "This is something I was not able to do before I had this library." This should be an extremely pleasant experience in a way. Like you, just like when you talk about this unboxing, it's a bit the equivalent of un- unboxing a product. So you want people, when they come, when they start playing-
2. DHDiana Hu
  Like the Apple experience
3. TWThomas Wolf
  ... it should be extremely obvious. Like, the, the abstraction you have, you should ... They should be almost tangible and they look obvious to you. You should have very few abstraction, because the more you have, usually each new abstraction you impose your user to learn is like a friction point. So, so this is very important. And the only way to get this to work is, I think, to keep before you release or af- after or before, each time you add a feature, to put yourself in the f- in the shoes of someone who is trying your library for the first time, has no idea what are your abstraction, don't want to read the doc because no user of any software want to read the documentation. So, uh, it shouldn't even have to write the documentation. So everything should look really obvious. And it's, it's getting harder and harder as you know your library more and more because you're, you know, y- you start to, to d- to not have this fresh mind. So I think getting this fresh mind. And so every time I prepare for a release, in the past I was always, uh, keep reiterating on this first step. I install, I try this first thing. Does it look obvious? I install, try this first thing. Does it look... So that's the first one. And then the second one is, is where you put your level of s- abstraction. There is always a cursor between, um, how is easy to use, how much control you give, and how complex it is, and this cursor is something extremely hard to pinpoint really well. And I feel, I feel like the only way is to try it a lot yourself on many use case and see where, if you get this balance between flexibility, intuitiveness, right? Uh, there's a lot of taste here I think. There's a lot of design. And I think a good open source library overall is something that's, uh, extremely design opinionated.
4. DHDiana Hu
  Hm. Now to close it off,
11:55 – 14:29
The Future: Open Models and the App Layer
1. DHDiana Hu
  imagine we get to the point which probably should happen at some point, the capabilities of open source AI models matches those of the closed ones.
2. TWThomas Wolf
  Mm-hmm.
3. DHDiana Hu
  So what should the world update about the priors of how AI innovation is done? Because so much funding goes into the private labs, right?How would the world change?
4. TWThomas Wolf
  [laughs] Yeah. May- maybe my take would be that I don't think this is so far off, and I, I feel like we had regularly this year this kind of challenging on the closed source model, yeah. Uh, it started, I mean, the, the 2025 started with DeepSeek at the very beginning of the year saying, boom, and was the first moment the world realized, oh, there's this open source AI. What is that? And it's actually competitive. More recently, Kimi, uh, Kimi was also, like, really close to the state of the art or not so far. Um, I would say, yeah, the, the... for me, the prior is that there is still probably, uh... I mean, there's still value in training model, I think, and for sure this is thing, something that will keep, will keep continue pushing frontier. But more and more I think the value and the, and that's also the direction of many frontier company, the value will be the interaction, the app layer with this model. So the model is something, but basically ChatGPT is very interesting because the interaction friction is very low because the, it's a product that really works. And even Entropic has this type of aspect. So this user interface and how you use the model in application or, or, um, application that exists on new application is, I think, where the value will really lie. So right now the interesting thing is that the, the foundation model labs are also, also attacking this, but that's something you can attack without training models. So there's, uh, there's a lot of room for startups there to do amazing things.
5. DHDiana Hu
  Which is awesome news for the audience. So what I'm hearing is that there's just so much more value to unlock at the application layer for models. So if we're really now neck and neck in terms of foundation model performance-
6. TWThomas Wolf
  Mm-hmm
7. DHDiana Hu
  ... a lot of the cool things and value that exists is building the application stack-
8. TWThomas Wolf
  Yeah
9. DHDiana Hu
  ... which is great for all the founders.
10. TWThomas Wolf
  Yeah.
11. DHDiana Hu
  Great.
12. TWThomas Wolf
  Hundred percent.
13. DHDiana Hu
  Thank you so much for coming and chatting with us, Thomas.
14. TWThomas Wolf
  Thanks. It was a pleasure. [outro music]

Episode duration: 14:29

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode 6g8qRhCI8AU

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome