No PriorsBaseten CEO Tuhin Srivastava on Custom Models, and Building the Inference Cloud
EVERY SPOKEN WORD
45 min read · 8,716 words- 0:00 – 0:31
Intro
- SGSarah Guo
[upbeat music] Hi, listeners. Today, Elad and I are here with Tuhin Srivastava, the founder and CEO of Baseten, the AI Inference Cloud. We're here to talk about capacity constraints for AI compute, why inference is the last market, how the workload is changing, the open source and perhaps multi-chip future, and what thirty X scale in a year looks like. Tuhin, welcome back.
- TSTuhin Srivastava
Hi.
- EGElad Gil
Great to see you.
- TSTuhin Srivastava
Thanks for having me.
- 0:31 – 1:55
Baseten growth
- SGSarah Guo
All right, you are in one of the, uh, craziest markets, AI inference. Uh, uh, it's very important and there's a lot going on. You guys have grown 30X over the last year, and I think I can say you're expecting to do more than a billion dollars in revenue this year.
- TSTuhin Srivastava
Mm-hmm.
- SGSarah Guo
What's going on? Tell us about scale.
- TSTuhin Srivastava
Yeah. No, it's been, it's been nuts. I, I think what's happened over the last, I wanna say at 24 months, but just kind of keeps getting bigger and bigger, is that I think everyone is real-realizing that you can put AI everywhere. Um, you have, you have all these great options available from closed source, open source models. The open source models have clo- crossed some sort of chasm in terms of their baseline capability, and that I think RL techniques and post-training is, um, for specialized models, um, has become mainstream enough and, you know, there's enough examples of it work- of it working, the customers realizing they can, you know, kind of own their inference, um, more and more. And what that's meant for us is, um, more, you know, the long tail of models coming through, customers in-housing a lot of that intelligence themselves. And as the application layer just gets, you know, bigger and bigger and bigger, and that's growing when we are, we are just someone index on that, and we've been around to be able to collect the demand.
- 1:55 – 5:57
Why the app layer wins
- SGSarah Guo
There's an existential question in here that I think everybody is, uh, continually asking of, does the independent application layer get to exist at all versus the labs? Like, how do you... You have to believe this. Why do you believe it?
- TSTuhin Srivastava
Yeah. Look, I, I, I think it'd be, it'd be a sad thing if it didn't exist in general, and I think that's, like, my... But, you know, sa-sadness is fine. Um, the, uh [laughing] -
- EGElad Gil
Sad all the time.
- TSTuhin Srivastava
Oh, yeah. Sa-sa-sadness is fine. Um, but like-
- SGSarah Guo
It happens, yeah
- TSTuhin Srivastava
... that, that, that's not the reason why I think the application layer will exist. I think the application r- layer will exist for a number of reasons. One is because, you know, I think this idea, um, that what i- what is valuable to a company, um, is, you know, the, the user signal that they can gather, that only they can gather. Um, and to the extent that that is encoded, um, in a model, I think a lot of their business will, um, be at risk. But to the, to the extent that it is encoded in workflows, um, that is where they will be able to develop moat. So a good, I think a good example of that is, say, a company like Abridge, where the clinician's edits of the notes and what they do with those notes after the fact and the, um, the thing that happens in, um, inside the EMR three steps down, you know, that becomes a workflow that only-
- SGSarah Guo
Can you explain what Abridge does? This is one of your customers, yeah.
- TSTuhin Srivastava
Sorry. Abridge is a, um, ambient, um, scribe, um, that is used by physicians in, um, well, like, you know, almost all hospitals in the US. I think Elad's an investor. Um, great ship's amazing, gr-great company, great team, um, great product. Um, and you know, they, they've basically, uh, you know, got this very, very deep integration into, into hospitals, into clinician workflows. And my argument would be here is that actually, you know, it's very, very hard for, um, a frontier model company to be able to eat away at that 'cause they just don't have access to that user signal. And what will happen over time is the folks who have access to that user signal can start to post-train models on that reward signal and, and start to get long, long horizon agentic models running that. And I think to the extent that that is possible and that signal is differentiated and unique and in the-- and is, um, somewhat, um, rare to, to get access to, there will be an application layer. And I think, you know, it's like support company is another example of that, where, you know, a support, the, a support task isn't one-shotted. Usually at a company like Baseten, when a ticket comes in, there's like, what? Like one to ten, 20 actions that get taken, and that is where, you know, someone can develop a specialized model.
- EGElad Gil
So there, there's almost two versions of this then. There's the new companies like Abridge or Decagon or some of these other things-
- TSTuhin Srivastava
Yeah
- EGElad Gil
... that you mentioned that are doing these new types of applications that are using AI, and they sell it to customers.
- TSTuhin Srivastava
Yep.
- EGElad Gil
The other is enterprises building things in-house-
- TSTuhin Srivastava
Yep
- EGElad Gil
... or building their own models. What proportion of the market today do you think is, um, these new application companies-
- TSTuhin Srivastava
Yeah
- EGElad Gil
... versus enterprises just adopting AI?
- TSTuhin Srivastava
Yeah.
- EGElad Gil
And how do you think that looks in a couple years?
- TSTuhin Srivastava
Yeah, I, I, I think that's a, that's a... You know, we did, I think you asked me the same question two years ago-
- EGElad Gil
I, uh-
- TSTuhin Srivastava
... on, on the podcast. [laughing]
- EGElad Gil
I hate to be repetitive.
- TSTuhin Srivastava
Um, I, it, it is crazy-
- EGElad Gil
At least I'm consistent.
- TSTuhin Srivastava
The answer is just that it's crazy that the answer is still, I think, I think if you look by inference count-
- EGElad Gil
Yeah
- TSTuhin Srivastava
... it'd be 99% the former.
- EGElad Gil
Yeah.
- TSTuhin Srivastava
Um, and that is, that kind of represents the scope of the opportunity here, is that the majority of the market hasn't come online and-
- 5:57 – 7:55
Serving frontier customers
- TSTuhin Srivastava
... today.
- SGSarah Guo
So if the majority of your customer base today is, as you described, the, the former like application companies, AI natives-
- TSTuhin Srivastava
Yeah
- SGSarah Guo
... the, um, fast-growing... I mean, some of them are at considerable scale now-
- TSTuhin Srivastava
Yeah
- SGSarah Guo
... like the Abridge, Cursor-
- TSTuhin Srivastava
Yeah. OpenEvidence
- SGSarah Guo
... OpenEvidences of the world.
- TSTuhin Srivastava
Yeah.
- SGSarah Guo
Uh, what-
- EGElad Gil
You know, what do they, what do they teach you? What does that push the company to do? How do you think about serving them versus evolving for the enterprise?
- TSTuhin Srivastava
Yeah. Um, I, I think firstly, like you just learn a lot by building with the companies the greatest scale, doing the most interesting things. Um, we, we think of it two ways. Like, I think there's like the, the, the most obvious way, which is just build for the highest scale. Um, you know, most... Uh, the, the customers that will push you the most from technologically and everything kinda will fall into play. I think the Stripe evolution as a company showed that, which is like Stripe now, like serves like so many enterprises, but twelve years ago that wasn't the case. But they just built for the frontier and kind of went with them. Um, I think the second way we think about that-- this is to just think about building for companies that are serving enterprises. So yes, we don't serve the enterprise, but our customers serve enterprises. Um, Abridge serves enterprises. OpenEvidence, Decagon, um, all these-- Writer, Gamma, all these co-- Clay, all, all these companies serve enterprises en masse, and what we actually get is like a translation of the requirements from them, which is like, you know, they're like, "Hey, we need this sort of data retention. We need this ty-- these web models need to be deployed. Um, this is the types of GPUs or the latencies they're okay with. This is the model requirements from like a transparency perspective that they care about." And so I think that is actually the more nuanced answer, is that, uh, if you listen to what their needs are, we actually get a full translation of what the enterprise would require. Like, I would say that by serving companies like Abridge and OpenEvidence, we're probably pretty well suited to go serve the healthcare system, given that they are selling-- and latent health, given that they are selling
- 7:55 – 9:21
Open source model mix
- TSTuhin Srivastava
to them.
- EGElad Gil
How, how much of a shift are you seeing in terms of the types of open source models that are being used? And so I think we've seen an evolution where two, three years ago, I think the main thing was kind of Mistral and then a few other things, and then Meta kind of came along with Llama, and then it kind of really shifted-
- TSTuhin Srivastava
Yep
- EGElad Gil
... in terms of the most performant models are of Chinese origin in different ways. Do you see that sort of mix reflected in terms of what's being used by your customers?
- TSTuhin Srivastava
Yeah. I, I, I think customers, at least the customers we are serving, are very... And these are like the fastest growing AI companies in the world that are very forward-thinking. They, they wanna use the best models, and they, they are optimizing. I think there is... There, there are a, there's a subset of tasks, which I think is small today, where people really start to start with cost.
- EGElad Gil
Mm-hmm.
- TSTuhin Srivastava
Um, but everyone comes for capability first because that's really where the economic growth is being unlocked, where the value's being delivered, and then they optimize. And I think that's like actually been, you know... And so with that in mind, you know, you, you, you name like, you name it, everything from GPT OSS all the way to, um, Moonshot model to DeepSeeks, um, to, um, Canopy or, or Orpheus, which is like really good text to speech-
- EGElad Gil
Mm-hmm
- TSTuhin Srivastava
... um, models. Um, customers generally wanna use whatever's at the frontier. And, and I think the, um, the difference has just been, um... I think we have a lot more visibility into how to run these and how to run these really well,
- 9:21 – 13:07
Chinese models and geopolitics
- TSTuhin Srivastava
and secondly, that they're good now.
- EGElad Gil
There have been a number of different concerns raised about the use of Chinese models-
- TSTuhin Srivastava
Yeah
- EGElad Gil
... in particular security, or is there something embedded in the models-
- TSTuhin Srivastava
Yeah
- EGElad Gil
... or, you know, Trojan horses or other things. Um, A, do you think there's any real concern there? And B, you know, people often talk about how there should be like US counterweights to this. From a geopolitical perspective, do you think that's something that's legitimate or something we should be worried about, or how do you think about the-
- TSTuhin Srivastava
Yeah
- EGElad Gil
... sort of origins of these models versus their uses?
- TSTuhin Srivastava
Yeah. Look, I, I, I think these, these models firstly are fantastic. They're amazing. We work with these teams. They're truly awesome. Um, I'd say... Look, I, I don't-- It, it is hard for me, it is hard for me to see, and I, I could be wrong, but like, you know, if, if I, if I net- if I network bound these models that they're not magically, you know, gonna be able to cross those network boundaries-
- EGElad Gil
Mm-hmm
- TSTuhin Srivastava
... into data centerse. And, you know, I, I don't... And we-- I've never seen any real evidence, um, except from some very early models that I think people picked up on very quickly, that there is some agenda or bias b- built into these. Um, I do think that to so, um, to some extent is... I, I think there is importance to the US that we develop our own models. I think that that would be a massive loss if that there are five companies, you know, five different labs in China that are creating open source models, um, and we're struggling to get one set up. So it's necessary. Um, I also think it's inevitable. Um, and you know, like the t- DeepSeek, the DeepSeek moment a year ago, um, I remember someone saying to me, and I, I thought it was like very well said, which is like, and the world's changed a lot, but they said, "Hey, you know, we should kind of just forget-"
- EGElad Gil
Mm-hmm
- TSTuhin Srivastava
"... that this is a Chinese model. We should just act like this came from-"
- EGElad Gil
Mm-hmm
- TSTuhin Srivastava
"... from Meta and, and build, and build with that in mind."
- EGElad Gil
Mm-hmm.
- TSTuhin Srivastava
It's like, you know, I, I think you're kind of missing the forest from the trees. Like th- there's two, there's two scenarios, right? E-either America does not ever come up with good open source models-
- EGElad Gil
Mm-hmm
- TSTuhin Srivastava
... and there's probably a fundamental problem there, or we will get there, and we need to be ready for that world.
- EGElad Gil
Yeah, that makes sense. It's interesting because, um, you know, like you, I, I think it's very important for the US to have a strong-
- TSTuhin Srivastava
Yep
- EGElad Gil
... open source footprint here. Um, at least for now, it looks like effectively the Chinese government is subsidizing at least a large subset of these models, and that subsidy or surplus is effectively just being passed on to US enterprises who are adopting these models. In other words, it's a way for the Chinese government to effectively subsidize US enterprise-
- TSTuhin Srivastava
Yeah
- EGElad Gil
... in an indirect manner.
- TSTuhin Srivastava
Yeah.
- EGElad Gil
And I think that's a little bit lost right now. Um, but you know, it's always interesting to weigh that against some of the other concerns that are raised. So I appreciate your, your comments on this.
- TSTuhin Srivastava
Yeah. W- well, and I, and I think the concern also just, there just becomes is like what happened if we aren't able to... Like i-i-if it is fun, like I, I think if you think about the economics here, which is D-D-DeepSeek by most... DeepSeek's a good, a very good model.
- EGElad Gil
Mm-hmm.
- TSTuhin Srivastava
Um, you know, like, and like you could argue whether it's at the absolute frontier or not, but like let's, let's go back three months-
- EGElad Gil
Mm-hmm
- 13:07 – 14:22
Custom inference dominates
- SGSarah Guo
What has been... Actually, maybe you can just characterize like workload a little bit, like how... Of tokens being served on Baseten, like how many of them are, uh, from custom models of some kind versus like vanilla open source today?
- TSTuhin Srivastava
It is all custom. It's basically-
- SGSarah Guo
Okay
- TSTuhin Srivastava
... yeah, like it, it-
- SGSarah Guo
So like 95% plus?
- TSTuhin Srivastava
90, 95%. Like and, and I think that's really cool, to be honest. Like look, we have, we have two businesses. We have, we have, we have three business... We have four... We have three businesses right now.
- SGSarah Guo
[laughs]
- TSTuhin Srivastava
Um, and like one-
- SGSarah Guo
Should we help you count? Yeah.
- TSTuhin Srivastava
No, no. So we have like dedicated, dedicated inference, which is basically custom model inference. Um, your SLA is your SLA. Then we have shared inference, which is shared inference endpoint, um, shared SLAs, um, and then we have a training business. Um, um, I'd say 95% of the tokens today are on the, the f- the first business, and almost all of them, um, there's probably a, a... Yeah. For almost all of them, the customer is making some modifications to the model with their, with their own data, um, s- specialized for the use case. And I think what's even more important is, um, they might be compiling it in different ways. No one is just running the vanilla open source weights. Like you, you might be customizing it for quality, but you also might be customizing it for performance.
- 14:22 – 17:10
Post training acquisition
- SGSarah Guo
You made an acquisition of a research team a few months ago.
- TSTuhin Srivastava
Yeah.
- SGSarah Guo
You've mentioned, uh, post-training customization. Uh, what was the rationale behind the acquisition? What is that team doing today?
- TSTuhin Srivastava
Yeah. Um, so the, the rationale ar- around the acquisition was, you know, we, we are infrastructure and product people. Um, we are product people, um, and now are really good infrastructure people, and the um... And we didn't have much of a, um, research capability ourselves. And, and what we saw was, um, the market moving heavily and heavy. Like that we could accelerate the market itself, um, with post-training, um, resources, um, either productize or onto even just as resources for that market. Um, so Parsed was a, a company that was a Baseten customer, so they were post-training models and running them on Base- on Baseten. And I think th- what they realized was, um, that they would eventually need to become an inference company. Um, and what we realized was like, hey, we, we really needed that expertise because it is... It represents a way for us to get closer to the customer earlier, um, and, you know, be able to support them more, and it just made sense, um, as a, like pairing them together. And, um, just as, as I said in the opening statement here, which is, you know, as more and more post-training models have come up, we've realized that the deme- the demand for people, for people to either, um, for software loops to do post-training or for post-training expertise is very high, and we're, we're really, really investing, um, in that. Um, they're also a bunch of Australians. Um, you know, w- I like to think that we had a bun- a bit of alpha there. Um, but yeah, that, that's been fantastic. They're working with all sorts of customers. Um, and it, it's also very interesting when you, when you, um, start... You know, we were doing a lot of research on the performance side and less so on the post-training side. Um, it's interesting as we've started to do a lot more research on the post-training side, um, you start to see how linked inference and post-training are and like, you know, even e- even when you think about stuff like quantization and when you should do that, and like, you know, h- how, how, how training, um, how, how you train the model affects how you need to quantize for inference and how paired these problems are-
- SGSarah Guo
Mm-hmm
- TSTuhin Srivastava
... um, has become like very apparent. And more and more we realize the post-training inference are kind of both sides of the same problem. So 'cause inference will ideally will beget more post-training, where inference creates data, you do evals, you can now post-tran- post-train on the, um, on that reward function that you, that you found with those evals and, and hopefully just set up
- 17:10 – 18:35
When to invest in custom models
- TSTuhin Srivastava
the entire loop.
- SGSarah Guo
Plenty of folks from, uh, ANT and OpenAI, Sam, Greg, et cetera, have said in recent months that like inference is super strategic, inference talent is strategic, capacity is strategic. Uh, so between that and post-training, these are, uh, very, uh, difficult to gather, like capabilities.
- TSTuhin Srivastava
Yep.
- SGSarah Guo
Um, I imagine that lots of your customers go to you guys for advice on like how to do this progression of moving to custom models. Like what do you tell people about the life cycle and when they should invest in that?
- TSTuhin Srivastava
Yeah, I, I, I think it's, hey, go find... Go prove to yourself with the best-in-class model that you have something worth optimizing. [laughs] Um, and, and I think, you know, a lot of... You know, if a customer comes to us, you know, th- th- was that meme, which was like, it was like two years ago. It feels like there's no GPUs-
- SGSarah Guo
I was gonna say that
- TSTuhin Srivastava
... pre-product market fit. It's like no post-training pre-product market fit is what [laughs] I...
- EGElad Gil
Yeah, yeah.
- TSTuhin Srivastava
Is what I'd say.
- SGSarah Guo
So people that you're working with here are very, very at scale first-
- TSTuhin Srivastava
Yeah. They, they, they are-
- SGSarah Guo
... in the use case. Yeah
- TSTuhin Srivastava
... they have a user signal that they know how to optimize, and they've shown that they can, you know, they can serve customer value, and that value, and that they have something special around that value. And once you have that value, it's like, okay, now how can I do that better, faster, and cheaper? With the idea being that, hey, if you need to be very good at-... customer support, you can-- you maybe don't need to be that good at coding, and that a specialized model might be a better fit for that problem, and you can do it better, faster, cheaper.
- 18:35 – 22:25
Supply crunch and data centerse
- SGSarah Guo
What about the capacity side? You, uh, started with, you know, unifying capacity across all the clouds and neo clouds.
- TSTuhin Srivastava
Yeah.
- SGSarah Guo
How do you think about this when everybody keeps talking about a, a supply crunch and a multi-year supply crunch?
- TSTuhin Srivastava
Yeah. I think, um, you know, there, there's so much narrative around the supply crunch, um, and no matter, like, as much as we hear about it, I don't think people realize how bad it really is. Like, there is, you know, there is very, very little Slack compute available. Like, you know, we, we run pretty large clusters ourselves, and we run them at, like, uncomfortably high utilization. You know, we-when I'm saying we're, like, mid-nineties utilization-
- SGSarah Guo
Mm.
- TSTuhin Srivastava
-um, most of the time. Um, there is-- we have may-- we, we have-- we sit in eighteen different clouds now. Uh, we have 90 clusters around the world across 18 different clouds. And like, you know, initially, we started-- we, like, built this technology to be able to, like, kind of create one runtime fabric that spans all these different clouds and try to abstract that away from our customers, um, as a way to think about reliability, latency, failover, all these things that we think are gonna be very important for very mission-critical use cases. Um, that same technology, like just our ability to get compute wherever humanly possible, um, has been really, really helpful in our ability to get supply. And, and what, what I mean by that is we can be introduced to a new provider in a different country, um, and have it up and running with the whole Baseten inference stack-
- SGSarah Guo
As part of the fabric. Yeah.
- TSTuhin Srivastava
Part of fabric in, uh, half a day, half a... May-maybe less.
- SGSarah Guo
I know.
- TSTuhin Srivastava
Uh, um, even for... And that gives us enormous flexibility. Um, even for us, it is hard for us to grow. Like, we have a, we have a... I think that's... Yeah. I'll say it. Um, we have a, a full PM standing meeting for the company where we basically like, like how do we, like how do we, how do we manage capacity for the demand right now? I think the second part which people don't really, um, the two-- the, the, the second part that people don't really understand is, um, that there are also a lot of, um, suppliers right now, um, that it's kinda grifty. You know, like the... I, I thi-I think, you know, they haven't run, they haven't run data centers before. You know, they don't understand SLAs for... E-especially for inference. Um, and so, like, you know, even when there is capacity available, um, there's a lot of dil-- like, there's probably... We, we run a lot more of those, and we have redundancy, and so it's fine. But if you... You know, there's probably like a dozen good, like, clouds, and I'd probably like put like three or four of them in like the, the gold tier.
- SGSarah Guo
Mm-hmm.
- TSTuhin Srivastava
Um, and I think that just means that, like, supply, like, not only are we supply crunched, we're supplier and operationally crunched onto people who can, who can run these data centerse as well. So-
- SGSarah Guo
How, how far ahead can you actually buy capacity right now? In other words, like, is there, uh, any, any slack in the market if you buy two years ahead or five years a-- you know what they're comparing.
- TSTuhin Srivastava
You, you mean like ac-like contract length or actually like, "Hey, I want this in January '28."
- SGSarah Guo
Yeah, either one. Yeah.
- TSTuhin Srivastava
Yeah.
- SGSarah Guo
I mean, it's more the, "I want this in January '28," or at least-
- TSTuhin Srivastava
Yeah
- SGSarah Guo
... I have some visibility into my future supply.
- TSTuhin Srivastava
Yeah. Um, you could buy that, but you gotta also remember how quickly the market is, how quickly the market is moving. Um, and like, you know, that gets balanced somewhat off, like, the fact that the H100 is such a great chip.
- SGSarah Guo
Yeah.
- TSTuhin Srivastava
Um, and like, and then s- you know, it's crazy. If it's four years, four and a half years old, the price is going up still.
- SGSarah Guo
Yeah.
- TSTuhin Srivastava
Maybe it has a useful life of nine years.
- SGSarah Guo
Yeah.
- TSTuhin Srivastava
Um, so, um, you know, that, that's, that's good,
- 22:25 – 24:09
Longer GPU Contracts
- TSTuhin Srivastava
but at the same time, at the same time, you know, ye-yes, you can do that, um, but, you know, you're making a lot, like y-you're making a lot of bets-
- SGSarah Guo
Yeah
- TSTuhin Srivastava
... um, as part of that. And then in terms of... I think that's the big thing that's changed over the last six months, is that the term length that people want, um, has just gone up. So if you, if you wanted, um, a thou-a thousand, thousand 24 B200s-
- SGSarah Guo
Mm-hmm
- TSTuhin Srivastava
... um, which is, you know, um, from a good cloud, right now you're not getting that less than a three to five-year contract-
- SGSarah Guo
Mm-hmm
- TSTuhin Srivastava
... right now with a probably a twenty to thirty percent TCV, TCV prepay. Um, so, like, actually what becomes important when acquiring capacity, um, is you need to have enough demand to supply it, um, to serve, but then you also need like a low cost of capital, um, which is, which is actually changing the dynamic pretty significantly.
- SGSarah Guo
Does that, does that impact how you think about going public as a company?
- TSTuhin Srivastava
Oh.
- SGSarah Guo
Because arguably-
- TSTuhin Srivastava
Yeah.
- SGSarah Guo
Yeah.
- TSTuhin Srivastava
I think you'd go sooner.
- SGSarah Guo
Yeah, exactly.
- TSTuhin Srivastava
Yeah. Yeah. I, I think you need... Like, I, I think the... And I think there is demand for that. Um, but I think, you know, the pull, the... It al- it also, you know, one, one of our, one of the, one of the realizations that we had recently, and we're, we're software people, um, and so we don't, we don't think like this all the time, is that, you know, our, our business has, like, very interesting working capital-
- SGSarah Guo
Mm
- TSTuhin Srivastava
... um, requirements.
- SGSarah Guo
Mm-hmm.
- TSTuhin Srivastava
Like, you know, um, and, and I, and I think, you know, even-- and that as a result of that, it has very interesting financing-
- SGSarah Guo
Yeah, yeah
- TSTuhin Srivastava
... um, requirements, and we're not-
- SGSarah Guo
Mm
- TSTuhin Srivastava
... at least right now, we're not even going down to the, down to the debt.
- SGSarah Guo
Yeah, I mean, in a sense, there's also things you could do in terms of debt or other structures that-
- TSTuhin Srivastava
Yeah
- SGSarah Guo
... yeah.
- TSTuhin Srivastava
Yeah. And yeah, I've learned a lot about debt-
- 24:09 – 26:07
What Makes a Winner
- SGSarah Guo
Yes
- TSTuhin Srivastava
... um, re-recently.
- SGSarah Guo
Given the, uh, supply crunch, uh, inference being one of, you know, the top couple markets-
- TSTuhin Srivastava
Yeah
- SGSarah Guo
... you're going after, you have, um, plenty of people who understand this problem and therefore, you know, some competition. How do you, uh, how do you think about, like, what are the factors that-Create a dominant player here or a winning player. Is it, as you mentioned, cost of capital? Is it access to supply? Is it software? Is it demand?
- TSTuhin Srivastava
Yeah.
- SGSarah Guo
Just being excellent, everything.
- TSTuhin Srivastava
Yeah. It, it, it's, um... Look, I, I think what's so interesting about inference is GP-
- SGSarah Guo
Is it operations, like it's such a cloud?
- TSTuhin Srivastava
Yeah, yeah, yeah. I think so. Yeah.
- SGSarah Guo
[laughs]
- TSTuhin Srivastava
I think like GPUs as a service is not sticky. I think that's been seen. Like pe- customers generally just see that as, as commodity. Um, inference with the software layer included is incredibly sticky. Um, you know, like just, just like, you know, none of our top 30 customers have ever churned. You know, we're talking like 400% annual NDR-
- SGSarah Guo
Mm-hmm
- TSTuhin Srivastava
... uh, uh, um, around our business. And so it's like very... it's, um, it's very, very sticky. So I think that software layer is very important. The optimist in me is like, oh, there's so much value in the software, and I, we will build the best software layer for inference that exists. I think, you know, as I think is becoming clear now, access to inference compute is-
- SGSarah Guo
Yeah
- TSTuhin Srivastava
... is a strategic, strategic advantage, and I think that is like the... I think that is the, um, strategy that even the labs are going after, which is like, if we have the, if, if we have all the compute, good luck running inference.
- SGSarah Guo
Yeah, yeah. In, in a world of constrained compute, the number one thing to own is compute.
- TSTuhin Srivastava
Yeah.
- SGSarah Guo
And so, you know, just owning it in and of itself is an asset, and I think people underappreciate that.
- TSTuhin Srivastava
Yeah. You can't, you can't make a good hot chocolate without milk and, you know, the, um, [laughs]
- SGSarah Guo
Unless you're a vegan.
- TSTuhin Srivastava
Unless you're a vegan. [laughs]
- SGSarah Guo
Yeah, yeah.
- TSTuhin Srivastava
No one wants a vegan inference.
- SGSarah Guo
Yeah. [laughs]
- TSTuhin Srivastava
[laughs]
- SGSarah Guo
Well, I gotta ask you, people might want, um... they might, they might want alternative milk, right?
- TSTuhin Srivastava
Yeah.
- 26:07 – 28:19
Multi Chip Future
- SGSarah Guo
So, okay, like when you, you... the H100 is a great chip. People, you know, want a B200, they want GB200. They want, of course, tons and tons of Nvidia. Um, when you think about making a bet, you know, several years in the future, do you believe that there's a, like, multi-chip world? Like, what do you, what do you think happens from a compute perspective, um, on chip side?
- TSTuhin Srivastava
Yeah. Um, I think, I think, you know, like diversification everywhere is a-
- SGSarah Guo
Mm-hmm
- TSTuhin Srivastava
... same way I want a world of many models. I think, you know, we want a world of many, most things. Um, and I think-
- SGSarah Guo
You'd be sad if it didn't happen.
- TSTuhin Srivastava
Yeah. And I, and I, I think everyone would be sad. I, I, I will say, um, to some extent, which is, um... Yeah, and I think there will be inference-specific chips. I think you have like decode-specific chips, I think. And we're, we're looking at the-
- SGSarah Guo
And Nvidia said this.
- TSTuhin Srivastava
Yeah, yeah. I mean, that was, that was the whole Groq, um, LP thing. It's like, you know... I, I think, I think that is, um, very straightforward and, and makes sense. I think people really, really, really underestimate supply chain stuff from Nvidia, like how good they are at that. CUDA, how good CUDA is, the developer ecosystem around it. Um, and, you know, we... it-- the ability... Like, to me, like one of the most important things as an infrastructure company in this moment is how fast you can move, and you can move fastest with Nvidia today. Um, and I think that is the reality, and like, it just like given the scale that they operate at, given the scale that they operate at, it's, um, it's hard to, it's hard to see, um, a t- it's hard to see the, the... and I'm not saying it won't happen, like the short term, like in the next couple of years, how anyone's gonna be able to compete with that. Es-especially with, you know, so much of the other, the other players. Like, what you need, um, to be able to compete here is the ecosystem to form around you, and if you tie up all your supply with one buyer, which, you know, a bunch of the other chip providers have done, it's actually hard for that ecosystem to form. You know, like if you, if you think about if, if you're a big lab and you have a proprietary deal with one chip type where you get 90% of the supply, it's actually in your best interest to make sure you get 95% of the supply to everything that's built for you, and no one else can ever use
- 28:19 – 31:08
Runtime Roadmap
- TSTuhin Srivastava
it.
- SGSarah Guo
When you think about reacting to the market, um, what do you think is, like happening with the actual workloads that you have to go invest in, right? Like, obviously, code agents and long-horizon agents over time have become-
- TSTuhin Srivastava
Yeah
- SGSarah Guo
... a big deal. People talk a lot more about CPU compute, video inference is different.
- TSTuhin Srivastava
Yep.
- SGSarah Guo
Um, I don't know if it's that sandbox that's like, what, what's important for you guys to invest in now?
- TSTuhin Srivastava
Yeah. Look, I, I, I think there's... for, for us, all the runtime stuff is obviously very important, and what that means is like what chips we run on, how we run, what kind of workloads we support. Like how... do we get very good at diffusion transformers? Yes. Um, coding agents need sandboxes. We should go build sandboxes. Um, there's all sorts of new speculation techniques to get faster inference. We need to do that. Um, even stuff like, um, KB cache-aware routing and, you know, that stuff's a bit old now, but like getting... continuing to be very good at that and, um, somewhat disentangling prefill and decode and starting to treat them as separate problems. I think that's, you know, something we are very focused on, and we're seeing massive gains there. That's at the runtime level. Um, I'd say, you know, beyond that, you know, everything we think about is how to create more of that loop between inference, post-training, because we think that just begets more inference. Um, and so, like we, we will build or partner on almost everything there. So, you know, we're gonna work with, you know, the best evals company in the world to make sure that's very well, well integrated, like brain trust, um, into and around Baseten. You know, we will partner with or on the sandbox side, build, build the best sandboxes experience, um, that will exist. Um, and then we'll create the, the best training APIs to make it so continual learning becomes somewhat of a solved problem. It's not just like a discrete thing. That's, I think, the core Baseten product thesis is like how do we build that loop, and then everything out, out around that becomes how do we make sure that we can do everything we can to, um, ensure that gets as big as possible. That's access to compute, that's on infrastructure, make sure we can get compute anywhere, make sure we have access to our own compute, um-And then I think it's all the primitives that come off of that just, ju-that just become incredibly, like, margin accretive both for us and our customers, um, which is, you know, stuff like, you know, sandboxes and, like, the, um, async batch inference. Like, how do we drive utilization by having a first-class batch inference experience? To me, this is, like, what an inference cloud looks like. It's like you are very good at inference, and then you, you start to do all the things tangential or that loop into inference and partner and where necessary and build where necessary. Um, but we really do wanna own, like, start with that core inference story and then go down to unblock supply or create margin and go up the stack to unlock value.
- 31:08 – 33:48
Scaling Edge Cases
- SGSarah Guo
What, uh, would surprise people about some of the issues you discover only at scale? I'll give you an example. I was surprised when, uh, you guys ran into scale limitations, like fundamental limitations with some of the hyperscaler products that you were consuming.
- TSTuhin Srivastava
Yeah.
- SGSarah Guo
And I-- because I kind of think of, you know, the AWS GCPs of the world as supporting infinite scale.
- TSTuhin Srivastava
Yeah. I mean, I, I think you just... And, like, again, like you-- I think very, very large companies w-like, that run services at big scale, it's probably the same stuff.
- SGSarah Guo
Mm-hmm.
- TSTuhin Srivastava
Is that all the edge cases, um, just become-
- SGSarah Guo
You, you actually experience it.
- TSTuhin Srivastava
You experience them. And like, you know, and you-- I, like-- I'll give you a few examples here. Like you, you see, you know, you start seeing, you know, yesterday we had for the first time ever, we saw some kernel panic, um, and that only happened because some, um, uh, fluent bit worker was creating too many logs and it, and the scale was too big, and it was all into one node, and it was happening two, two, two times at the same time by two different workers. Um, so you see all, like, the systems level and kernel level problems. Um, but then you start to see... I think the, the craziest stuff is that you start to see with, with LLMs, um, that these runtimes are pretty immature. Even how we use KV cache is, you know, um, you know, probably a little less sophisticated than most people see-
- SGSarah Guo
Mm-hmm
- TSTuhin Srivastava
...than most people see. And we, we, we are starting to see the, the limitations of the current and the next set of primitives that need to be built from a scale security, a performance perspective. But I, I think it's really at the runtime level and the systems level and then... But the edge cases are, I'd say, a lot more systems level than they are LLM specific.
- SGSarah Guo
What are the things that keep you up at night?
- TSTuhin Srivastava
Capacity. Um, I think, I, I, I think, you know-
- SGSarah Guo
Quick answer. Yeah.
- TSTuhin Srivastava
Yeah. I think capacity. I, I think the other one is probably just this market's so big and it's so, like... It, it represents, um, a moment when you should be as aggressive as possible. Um, and you know, r-r-really, you know, we, we've grown a ton, obviously, over the last 12 months, the last few months, but the answer's always just go, you know, go bigger, go faster. And I think that's really, really fun. It's also a little exhausting, and it's also, like, we are, we are all in somewhat uncharted territory in terms of how fast and how big you can go and how things can get. But I, but I think the big one is compute. I think, like, there's no world in which there's enough compute to, you know, get the amount of, the am- the amount of value that we wanna get out of LLMs in the next five to ten years.
- SGSarah Guo
Or we have to invent a lot of new stuff.
- TSTuhin Srivastava
Yeah.
- 33:48 – 36:44
Hiring and Leadership
- TSTuhin Srivastava
To, yeah.
- SGSarah Guo
Maybe if we just talk a little bit about, uh, what you're learning scaling, you know, 30x is like an aggressive thing to go through as a company. Uh, you've brought in a, a lot of, um, really amazing talent like, um, Danny and Samir and Stephen Dev, folks on both the, the technical and the, um, go-to-market side. Like, what do you, what do you think is working about how you are recruiting and scaling or, or what's your philosophy on that?
- TSTuhin Srivastava
We were very, very flat, like, until, I know, 12 to 18 months ago. I remember I went on walk with Elad, actually, and Elad was like, "You just need leaders." And I-- and, and, like, it's actually, like, so contrary to, um, everything. You know, as engineers, you're like, "Oh, you, you-"
- SGSarah Guo
It's all overhead.
- TSTuhin Srivastava
It's all, it's all... Everything is overhead.
- SGSarah Guo
[laughs]
- TSTuhin Srivastava
Everything is overhead. Um, and I, and I-
- SGSarah Guo
You once told me, I think, that you, you didn't... You're like, "Hey, Sarah, Sarah, what about we just have engineers instead of salespeople?"
- TSTuhin Srivastava
Yeah.
- SGSarah Guo
Yeah.
- TSTuhin Srivastava
Yeah. Yeah. That. [laughs]
- SGSarah Guo
[laughs]
- EGElad Gil
Everybody learns it.
- TSTuhin Srivastava
Everyone-
- EGElad Gil
It's all the same.
- TSTuhin Srivastava
We're all, we're all... But I remember, like, you know, you, you, you said it so clearly at the time, Elad, and I, and I think that's what we've noticed, which is, like, actually having a leadership team, um, that you can trust, that you can trust, um, is, is, is so important. I, I think the, the two or three things that I would say is, like, you want people where you can give them whole problems.
- EGElad Gil
Yeah.
- TSTuhin Srivastava
And so, like, you know, if you, if you are, if you feel like you are micromanaging, if you feel like you need-- if you feel like, you know, you, you have to be involved in everything, I think that's a bit of a cop-out as a founder because you're just like, "I just need to be involved in everything." It's like, no, you probably don't have the right people. Um, I think the second thing is, um, be very, very clear what you're optimizing for, because I think when you're very, very clear what you're optimizing for, the people on the... And, like, if it's something generic like, "We want the smartest, hardworking people," like, you can't do much with that. Like, with us, what we cared about was, hey, actually, we don't care about a lot of people who have done this before. We care about first prin- people who are thinking first principled, prin- first principles. Work has to be, um, a high priority, but they also have to be very kind and nice and, you know, care about the collaborative environment. We don't have a hero culture. Um, you know, very low ego. Um, and you know, if you need, if you need a manager, like, um, it's probably not-
- SGSarah Guo
[laughs]
- TSTuhin Srivastava
[laughs] It's probably not the right place, um, to be. But I think once you have, when you have that clear rubric, the, the people become very apparent that will fit into it, and the people that don't, um, fit into it also become very apparent. And I think what's more, like, we've hired amazing people like you mentioned, but I think what's a lot more interesting is, like, I think we'veWe haven't had a ton of, like, turnover there unnecessarily. Like, pe-people tend to work, um, 'cause we, 'cause we have a ve- we are very clear on what we want early on. It took us a while to get there, though.
- 36:44 – 38:19
Operations Pager Culture
- SGSarah Guo
What about the idea of, like, an operations culture? You know, we were talking to Alyssa Henry about this, and she's like, "Well, the hard thing about cloud is actually just operations."
- TSTuhin Srivastava
Yeah.
- SGSarah Guo
"I slept with a pager under my-"
- TSTuhin Srivastava
Yeah
- SGSarah Guo
... pillow for a decade." I don't think I've seen you detached from your Slack channel-
- TSTuhin Srivastava
Yeah
- SGSarah Guo
... for-
- TSTuhin Srivastava
My phone is buzzing right now.
- SGSarah Guo
Yeah.
- TSTuhin Srivastava
The, um, the, um, the, um... It's like I, I'm getting an-
- SGSarah Guo
[laughs] That's just not a strong one, yeah
- TSTuhin Srivastava
... I, I, I'm, I'm, I'm getting, I'm getting anxious. So, um-
- SGSarah Guo
And, and you've, you've been concerned before. Like, do people get it? Like, uh-
- TSTuhin Srivastava
Yeah
- SGSarah Guo
... you know, what is distinctive about that?
- TSTuhin Srivastava
I, I think, I think j- like, one, I think if you've worked at an infrastructure company... Like, we, we were once in a meeting with a bunch of AWS execs, and this was, you know, like, very senior AWS folks who all their pages went off multiple times-
- SGSarah Guo
[laughs]
- TSTuhin Srivastava
... um, during our 45-minute meeting. You know, like, it's a... I, I, I think, like, it's, it's, it's very much like just a cultural thing. Um, but yeah, like I, I don't... You know how, like, inference can't go down, and like, you know, we, you know, the, you, you, you, you learn to l- like, you know... What's this? Like, I think Amir, my co-founder, when his pager goes off, his seven-year-old said, "Is that a P0?"
- SGSarah Guo
[laughs]
- TSTuhin Srivastava
[laughs] Oh. Oh, is that, is that, is that a P0? And so, you know, I, I think that is... You just have to get used to it, and that's the culture you live in, and it, it, it just changes the speed. Um, but also it's, it's, you know, becomes like a, you know, a cultural thing. A- I think it's very, very... It reject, it rejects people that don't fit into it very, very quickly.
- SGSarah Guo
Like engineers who avoid PagerDuty.
- TSTuhin Srivastava
Yeah. You know, when we, when we have P0s, we're like, "Everyone on the call." Like, you know, like, there's been a joke that there may as well be a siren that goes off in the office-
- 38:19 – 40:41
Efficiency Drives Demand
- SGSarah Guo
[laughs]
- TSTuhin Srivastava
... when, when, when there's an incident, so.
- SGSarah Guo
So people have been talking ad nauseam in the AI community about Jevons paradox-
- TSTuhin Srivastava
Yeah
- SGSarah Guo
... um, where if you decrease the-
- TSTuhin Srivastava
Yeah
- SGSarah Guo
... cost of... This is, it's a, a... It's really a, well, question around price elasticity-
- TSTuhin Srivastava
Yeah
- SGSarah Guo
... and availability. If you decrease the cost of a good, say intelligence as a good, um, people actually consume more of it.
- TSTuhin Srivastava
Yeah.
- SGSarah Guo
Um, like the personal or business ROI of it, the demand for it goes up-
- TSTuhin Srivastava
Yeah
- SGSarah Guo
... not down. Um, do you see this, and are you, are you working against yourself trying to make these models more efficient? Do people just use them-
- TSTuhin Srivastava
More?
- SGSarah Guo
... more or less?
- TSTuhin Srivastava
Yeah. I, I think you gotta think about this from a developer's perspective and a consumer perspective. I think, like... I think consumers just want the best answers and the, and the, and the best experience. That's somewhat, um, governed by, you know, more intelligence to some extent. I think when you go to the developers, from the developer's perspective, um, they would insert more intelligence if you make it cheaper. Like, that's, you know... And they will, they will, they will insert more intelligence anyway.
- SGSarah Guo
Mm-hmm.
- TSTuhin Srivastava
But if you make it in more cheaper, they'll, they'll insert a hell of a lot more intelligence. And you see this with agents and such. Agents are just longer running now, and I think that's what we have seen with the cost of inference going down, which is, you know, folks are just like, "Okay, we can, we can run this for longer. We can make it do a bit more work, and we'll get to a, um, a larger end." I think, like, if, like, compute scale is from an inference perspective as well, um, and you know, I think we are seeing that with almost all our customers, which is, you know, they either, they either start with like, "This is the quality of answer I, I need to get to, and this is the amount of inference, like, I need to do to get that," or, "This is the base level model that I can start with, uh, that I can work with to get there." And I think the more we drive down the costs, um, what they realize is, um, more intelligence just means better user experience-
- SGSarah Guo
I just want a better answer
- TSTuhin Srivastava
... better answers, better experiences, more dollars-
- SGSarah Guo
More actions
- TSTuhin Srivastava
... more dollars, even more revenue. So yeah, I think, I think inference going down just begets more infer- I, like, it, it is truly... Like, I think we're kind of in a world that is, you know, it is the last market, right? Like, even if there's AGI, all that's left is inference.
- SGSarah Guo
Yeah. So you do not see in your customers a, a, like, this, this answer is enough and this action is enough dynamic?
- TSTuhin Srivastava
No, no.
- SGSarah Guo
Yeah, it's gonna keep going for a long time,
- 40:41 – 42:34
Concierge Everything Future
- SGSarah Guo
it looks like.
- TSTuhin Srivastava
Yeah.
- SGSarah Guo
How do you view all this kind of evolving towards the future? So basically, this is one of the... It, it seems like it's gonna be one of the biggest markets of all times.
- TSTuhin Srivastava
Yeah.
- SGSarah Guo
We have this massive shift where we're moving from software and seats and digitization into actual intelligence, selling units of cognition-
- TSTuhin Srivastava
Yep
- SGSarah Guo
... selling agentic workflows. What does this all look like in a couple years? Like, what is your view of this future world?
- TSTuhin Srivastava
I think for the c- for consumers it's, it's the best possible thing, right? Like, every- everything is somewhat s- smarter. You know, your doc- you get better care 'cause your doctors have access to better-
- SGSarah Guo
Mm-hmm
- TSTuhin Srivastava
... um, better tools. Um, there's more... You know, like, there's all this stuff about there being less software engineers, and I think we just build more software.
- SGSarah Guo
Mm-hmm.
- TSTuhin Srivastava
I think we just build a ton more software and, like, you know, like, I, I see... You know, we're not slowing down hiring on software engineers. We're just building-
- SGSarah Guo
Mm-hmm
- TSTuhin Srivastava
... more things. Um, and that, for the consumers, that just means better tools, more software, um, all those, all those good things.
- SGSarah Guo
It's almost like everybody has their own team for everything, right? You have an agent which helps with your doctor. You have an agent-
- TSTuhin Srivastava
Yeah
- SGSarah Guo
... that helps you learn stuff. You have an agent that helps you organize your life.
- TSTuhin Srivastava
It's a concierge. It's concierge everything.
- SGSarah Guo
Yeah. Yeah, concierge everything for everyone.
- TSTuhin Srivastava
Yeah. And, and, and, and, and I think, like, what that means, what that... So that's amazing. I think that's great. And I think the, the, the... And education, same thing. You have concierge education. Like, every... You get personalized access to everything. I think then you go one step back in how it, uh, um, affects developers, I think it's, you know, um... And, and companies, I think if you don't embrace this-
- SGSarah Guo
Mm-hmm
- TSTuhin Srivastava
... I think it's the extinc- extinction moment-
- SGSarah Guo
Mm-hmm
- TSTuhin Srivastava
... for, for a bunch of folks, which is like, you know, everything needs... And I, and I don't think that means that, you know, core design needs Figma. I think that's a thing. I, I, I think, like, what, what's more, what's more interesting is just like, you know, all these workflow and software companies need to figure out-
- SGSarah Guo
Mm-hmm
- TSTuhin Srivastava
... what is the intelligent or intelligent inserted versions that, that drive the amount, the, all that user value for those end consumers that we talked about.
- SGSarah Guo
Yeah,
- 42:34 – 42:55
Conclusion
- SGSarah Guo
very exciting. Thank you so much for joining us today.
- TSTuhin Srivastava
Yeah. Thanks, guys. [upbeat music]
- SGSarah Guo
Find us on Twitter at nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.
Episode duration: 42:57
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode XAbKflCncDo
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome