Skip to content
No PriorsNo Priors

Baseten CEO Tuhin Srivastava on Custom Models, and Building the Inference Cloud

Baseten CEO and co-founder Tuhin Srivastava sits down with Sarah Guo and Elad Gil to discuss the rapid growth of AI inference demand, Baseten’s 30x growth, and why inference is becoming the strategic “last market.” Tuhin Srivastava argues the application layer will persist because companies with unique user signals can encode value into workflows and post-train specialized models, citing examples like Abridge and support workflows. The conversation covers GPU capacity constraints, Baseten’s multi-cloud fabric across 18 clouds and 90 clusters, long-term contracting dynamics, the importance of the software layer for stickiness, evolving workloads, multichip possibilities, and operational lessons at scale. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Tuhinone Chapters: 00:31 Baseten growth 01:55 Why the app layer wins 05:57 Serving frontier customers 07:55 Open source model mix 09:21 Chinese models and geopolitics 13:07 Custom inference dominates 14:22 Post training acquisition 17:10 When to invest in custom models 18:35 Supply crunch and data centerse 22:25 Longer GPU Contracts 24:09 What Makes a Winner 26:07 Multi Chip Future 28:19 Runtime Roadmap 31:08 Scaling Edge Cases 33:48 Hiring and Leadership 36:44 Operations Pager Culture 38:19 Efficiency Drives Demand 40:41 Concierge Everything Future 42:34 Conclusion

Sarah GuohostTuhin SrivastavaguestElad Gilhost
May 1, 202642mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:000:31

    Intro

    1. SG

      [upbeat music] Hi, listeners. Today, Elad and I are here with Tuhin Srivastava, the founder and CEO of Baseten, the AI Inference Cloud. We're here to talk about capacity constraints for AI compute, why inference is the last market, how the workload is changing, the open source and perhaps multi-chip future, and what thirty X scale in a year looks like. Tuhin, welcome back.

    2. TS

      Hi.

    3. EG

      Great to see you.

    4. TS

      Thanks for having me.

  2. 0:311:55

    Baseten growth

    1. SG

      All right, you are in one of the, uh, craziest markets, AI inference. Uh, uh, it's very important and there's a lot going on. You guys have grown 30X over the last year, and I think I can say you're expecting to do more than a billion dollars in revenue this year.

    2. TS

      Mm-hmm.

    3. SG

      What's going on? Tell us about scale.

    4. TS

      Yeah. No, it's been, it's been nuts. I, I think what's happened over the last, I wanna say at 24 months, but just kind of keeps getting bigger and bigger, is that I think everyone is real-realizing that you can put AI everywhere. Um, you have, you have all these great options available from closed source, open source models. The open source models have clo- crossed some sort of chasm in terms of their baseline capability, and that I think RL techniques and post-training is, um, for specialized models, um, has become mainstream enough and, you know, there's enough examples of it work- of it working, the customers realizing they can, you know, kind of own their inference, um, more and more. And what that's meant for us is, um, more, you know, the long tail of models coming through, customers in-housing a lot of that intelligence themselves. And as the application layer just gets, you know, bigger and bigger and bigger, and that's growing when we are, we are just someone index on that, and we've been around to be able to collect the demand.

  3. 1:555:57

    Why the app layer wins

    1. SG

      There's an existential question in here that I think everybody is, uh, continually asking of, does the independent application layer get to exist at all versus the labs? Like, how do you... You have to believe this. Why do you believe it?

    2. TS

      Yeah. Look, I, I, I think it'd be, it'd be a sad thing if it didn't exist in general, and I think that's, like, my... But, you know, sa-sadness is fine. Um, the, uh [laughing] -

    3. EG

      Sad all the time.

    4. TS

      Oh, yeah. Sa-sa-sadness is fine. Um, but like-

    5. SG

      It happens, yeah

    6. TS

      ... that, that, that's not the reason why I think the application layer will exist. I think the application r- layer will exist for a number of reasons. One is because, you know, I think this idea, um, that what i- what is valuable to a company, um, is, you know, the, the user signal that they can gather, that only they can gather. Um, and to the extent that that is encoded, um, in a model, I think a lot of their business will, um, be at risk. But to the, to the extent that it is encoded in workflows, um, that is where they will be able to develop moat. So a good, I think a good example of that is, say, a company like Abridge, where the clinician's edits of the notes and what they do with those notes after the fact and the, um, the thing that happens in, um, inside the EMR three steps down, you know, that becomes a workflow that only-

    7. SG

      Can you explain what Abridge does? This is one of your customers, yeah.

    8. TS

      Sorry. Abridge is a, um, ambient, um, scribe, um, that is used by physicians in, um, well, like, you know, almost all hospitals in the US. I think Elad's an investor. Um, great ship's amazing, gr-great company, great team, um, great product. Um, and you know, they, they've basically, uh, you know, got this very, very deep integration into, into hospitals, into clinician workflows. And my argument would be here is that actually, you know, it's very, very hard for, um, a frontier model company to be able to eat away at that 'cause they just don't have access to that user signal. And what will happen over time is the folks who have access to that user signal can start to post-train models on that reward signal and, and start to get long, long horizon agentic models running that. And I think to the extent that that is possible and that signal is differentiated and unique and in the-- and is, um, somewhat, um, rare to, to get access to, there will be an application layer. And I think, you know, it's like support company is another example of that, where, you know, a support, the, a support task isn't one-shotted. Usually at a company like Baseten, when a ticket comes in, there's like, what? Like one to ten, 20 actions that get taken, and that is where, you know, someone can develop a specialized model.

    9. EG

      So there, there's almost two versions of this then. There's the new companies like Abridge or Decagon or some of these other things-

    10. TS

      Yeah

    11. EG

      ... that you mentioned that are doing these new types of applications that are using AI, and they sell it to customers.

    12. TS

      Yep.

    13. EG

      The other is enterprises building things in-house-

    14. TS

      Yep

    15. EG

      ... or building their own models. What proportion of the market today do you think is, um, these new application companies-

    16. TS

      Yeah

    17. EG

      ... versus enterprises just adopting AI?

    18. TS

      Yeah.

    19. EG

      And how do you think that looks in a couple years?

    20. TS

      Yeah, I, I, I think that's a, that's a... You know, we did, I think you asked me the same question two years ago-

    21. EG

      I, uh-

    22. TS

      ... on, on the podcast. [laughing]

    23. EG

      I hate to be repetitive.

    24. TS

      Um, I, it, it is crazy-

    25. EG

      At least I'm consistent.

    26. TS

      The answer is just that it's crazy that the answer is still, I think, I think if you look by inference count-

    27. EG

      Yeah

    28. TS

      ... it'd be 99% the former.

    29. EG

      Yeah.

    30. TS

      Um, and that is, that kind of represents the scope of the opportunity here, is that the majority of the market hasn't come online and-

  4. 5:577:55

    Serving frontier customers

    1. TS

      ... today.

    2. SG

      So if the majority of your customer base today is, as you described, the, the former like application companies, AI natives-

    3. TS

      Yeah

    4. SG

      ... the, um, fast-growing... I mean, some of them are at considerable scale now-

    5. TS

      Yeah

    6. SG

      ... like the Abridge, Cursor-

    7. TS

      Yeah. OpenEvidence

    8. SG

      ... OpenEvidences of the world.

    9. TS

      Yeah.

    10. SG

      Uh, what-

    11. EG

      You know, what do they, what do they teach you? What does that push the company to do? How do you think about serving them versus evolving for the enterprise?

    12. TS

      Yeah. Um, I, I think firstly, like you just learn a lot by building with the companies the greatest scale, doing the most interesting things. Um, we, we think of it two ways. Like, I think there's like the, the, the most obvious way, which is just build for the highest scale. Um, you know, most... Uh, the, the customers that will push you the most from technologically and everything kinda will fall into play. I think the Stripe evolution as a company showed that, which is like Stripe now, like serves like so many enterprises, but twelve years ago that wasn't the case. But they just built for the frontier and kind of went with them. Um, I think the second way we think about that-- this is to just think about building for companies that are serving enterprises. So yes, we don't serve the enterprise, but our customers serve enterprises. Um, Abridge serves enterprises. OpenEvidence, Decagon, um, all these-- Writer, Gamma, all these co-- Clay, all, all these companies serve enterprises en masse, and what we actually get is like a translation of the requirements from them, which is like, you know, they're like, "Hey, we need this sort of data retention. We need this ty-- these web models need to be deployed. Um, this is the types of GPUs or the latencies they're okay with. This is the model requirements from like a transparency perspective that they care about." And so I think that is actually the more nuanced answer, is that, uh, if you listen to what their needs are, we actually get a full translation of what the enterprise would require. Like, I would say that by serving companies like Abridge and OpenEvidence, we're probably pretty well suited to go serve the healthcare system, given that they are selling-- and latent health, given that they are selling

  5. 7:559:21

    Open source model mix

    1. TS

      to them.

    2. EG

      How, how much of a shift are you seeing in terms of the types of open source models that are being used? And so I think we've seen an evolution where two, three years ago, I think the main thing was kind of Mistral and then a few other things, and then Meta kind of came along with Llama, and then it kind of really shifted-

    3. TS

      Yep

    4. EG

      ... in terms of the most performant models are of Chinese origin in different ways. Do you see that sort of mix reflected in terms of what's being used by your customers?

    5. TS

      Yeah. I, I, I think customers, at least the customers we are serving, are very... And these are like the fastest growing AI companies in the world that are very forward-thinking. They, they wanna use the best models, and they, they are optimizing. I think there is... There, there are a, there's a subset of tasks, which I think is small today, where people really start to start with cost.

    6. EG

      Mm-hmm.

    7. TS

      Um, but everyone comes for capability first because that's really where the economic growth is being unlocked, where the value's being delivered, and then they optimize. And I think that's like actually been, you know... And so with that in mind, you know, you, you, you name like, you name it, everything from GPT OSS all the way to, um, Moonshot model to DeepSeeks, um, to, um, Canopy or, or Orpheus, which is like really good text to speech-

    8. EG

      Mm-hmm

    9. TS

      ... um, models. Um, customers generally wanna use whatever's at the frontier. And, and I think the, um, the difference has just been, um... I think we have a lot more visibility into how to run these and how to run these really well,

  6. 9:2113:07

    Chinese models and geopolitics

    1. TS

      and secondly, that they're good now.

    2. EG

      There have been a number of different concerns raised about the use of Chinese models-

    3. TS

      Yeah

    4. EG

      ... in particular security, or is there something embedded in the models-

    5. TS

      Yeah

    6. EG

      ... or, you know, Trojan horses or other things. Um, A, do you think there's any real concern there? And B, you know, people often talk about how there should be like US counterweights to this. From a geopolitical perspective, do you think that's something that's legitimate or something we should be worried about, or how do you think about the-

    7. TS

      Yeah

    8. EG

      ... sort of origins of these models versus their uses?

    9. TS

      Yeah. Look, I, I, I think these, these models firstly are fantastic. They're amazing. We work with these teams. They're truly awesome. Um, I'd say... Look, I, I don't-- It, it is hard for me, it is hard for me to see, and I, I could be wrong, but like, you know, if, if I, if I net- if I network bound these models that they're not magically, you know, gonna be able to cross those network boundaries-

    10. EG

      Mm-hmm

    11. TS

      ... into data centerse. And, you know, I, I don't... And we-- I've never seen any real evidence, um, except from some very early models that I think people picked up on very quickly, that there is some agenda or bias b- built into these. Um, I do think that to so, um, to some extent is... I, I think there is importance to the US that we develop our own models. I think that that would be a massive loss if that there are five companies, you know, five different labs in China that are creating open source models, um, and we're struggling to get one set up. So it's necessary. Um, I also think it's inevitable. Um, and you know, like the t- DeepSeek, the DeepSeek moment a year ago, um, I remember someone saying to me, and I, I thought it was like very well said, which is like, and the world's changed a lot, but they said, "Hey, you know, we should kind of just forget-"

    12. EG

      Mm-hmm

    13. TS

      "... that this is a Chinese model. We should just act like this came from-"

    14. EG

      Mm-hmm

    15. TS

      "... from Meta and, and build, and build with that in mind."

    16. EG

      Mm-hmm.

    17. TS

      It's like, you know, I, I think you're kind of missing the forest from the trees. Like th- there's two, there's two scenarios, right? E-either America does not ever come up with good open source models-

    18. EG

      Mm-hmm

    19. TS

      ... and there's probably a fundamental problem there, or we will get there, and we need to be ready for that world.

    20. EG

      Yeah, that makes sense. It's interesting because, um, you know, like you, I, I think it's very important for the US to have a strong-

    21. TS

      Yep

    22. EG

      ... open source footprint here. Um, at least for now, it looks like effectively the Chinese government is subsidizing at least a large subset of these models, and that subsidy or surplus is effectively just being passed on to US enterprises who are adopting these models. In other words, it's a way for the Chinese government to effectively subsidize US enterprise-

    23. TS

      Yeah

    24. EG

      ... in an indirect manner.

    25. TS

      Yeah.

    26. EG

      And I think that's a little bit lost right now. Um, but you know, it's always interesting to weigh that against some of the other concerns that are raised. So I appreciate your, your comments on this.

    27. TS

      Yeah. W- well, and I, and I think the concern also just, there just becomes is like what happened if we aren't able to... Like i-i-if it is fun, like I, I think if you think about the economics here, which is D-D-DeepSeek by most... DeepSeek's a good, a very good model.

    28. EG

      Mm-hmm.

    29. TS

      Um, you know, like, and like you could argue whether it's at the absolute frontier or not, but like let's, let's go back three months-

    30. EG

      Mm-hmm

  7. 13:0714:22

    Custom inference dominates

    1. SG

      What has been... Actually, maybe you can just characterize like workload a little bit, like how... Of tokens being served on Baseten, like how many of them are, uh, from custom models of some kind versus like vanilla open source today?

    2. TS

      It is all custom. It's basically-

    3. SG

      Okay

    4. TS

      ... yeah, like it, it-

    5. SG

      So like 95% plus?

    6. TS

      90, 95%. Like and, and I think that's really cool, to be honest. Like look, we have, we have two businesses. We have, we have, we have three business... We have four... We have three businesses right now.

    7. SG

      [laughs]

    8. TS

      Um, and like one-

    9. SG

      Should we help you count? Yeah.

    10. TS

      No, no. So we have like dedicated, dedicated inference, which is basically custom model inference. Um, your SLA is your SLA. Then we have shared inference, which is shared inference endpoint, um, shared SLAs, um, and then we have a training business. Um, um, I'd say 95% of the tokens today are on the, the f- the first business, and almost all of them, um, there's probably a, a... Yeah. For almost all of them, the customer is making some modifications to the model with their, with their own data, um, s- specialized for the use case. And I think what's even more important is, um, they might be compiling it in different ways. No one is just running the vanilla open source weights. Like you, you might be customizing it for quality, but you also might be customizing it for performance.

  8. 14:2217:10

    Post training acquisition

    1. SG

      You made an acquisition of a research team a few months ago.

    2. TS

      Yeah.

    3. SG

      You've mentioned, uh, post-training customization. Uh, what was the rationale behind the acquisition? What is that team doing today?

    4. TS

      Yeah. Um, so the, the rationale ar- around the acquisition was, you know, we, we are infrastructure and product people. Um, we are product people, um, and now are really good infrastructure people, and the um... And we didn't have much of a, um, research capability ourselves. And, and what we saw was, um, the market moving heavily and heavy. Like that we could accelerate the market itself, um, with post-training, um, resources, um, either productize or onto even just as resources for that market. Um, so Parsed was a, a company that was a Baseten customer, so they were post-training models and running them on Base- on Baseten. And I think th- what they realized was, um, that they would eventually need to become an inference company. Um, and what we realized was like, hey, we, we really needed that expertise because it is... It represents a way for us to get closer to the customer earlier, um, and, you know, be able to support them more, and it just made sense, um, as a, like pairing them together. And, um, just as, as I said in the opening statement here, which is, you know, as more and more post-training models have come up, we've realized that the deme- the demand for people, for people to either, um, for software loops to do post-training or for post-training expertise is very high, and we're, we're really, really investing, um, in that. Um, they're also a bunch of Australians. Um, you know, w- I like to think that we had a bun- a bit of alpha there. Um, but yeah, that, that's been fantastic. They're working with all sorts of customers. Um, and it, it's also very interesting when you, when you, um, start... You know, we were doing a lot of research on the performance side and less so on the post-training side. Um, it's interesting as we've started to do a lot more research on the post-training side, um, you start to see how linked inference and post-training are and like, you know, even e- even when you think about stuff like quantization and when you should do that, and like, you know, h- how, how, how training, um, how, how you train the model affects how you need to quantize for inference and how paired these problems are-

    5. SG

      Mm-hmm

    6. TS

      ... um, has become like very apparent. And more and more we realize the post-training inference are kind of both sides of the same problem. So 'cause inference will ideally will beget more post-training, where inference creates data, you do evals, you can now post-tran- post-train on the, um, on that reward function that you, that you found with those evals and, and hopefully just set up

  9. 17:1018:35

    When to invest in custom models

    1. TS

      the entire loop.

    2. SG

      Plenty of folks from, uh, ANT and OpenAI, Sam, Greg, et cetera, have said in recent months that like inference is super strategic, inference talent is strategic, capacity is strategic. Uh, so between that and post-training, these are, uh, very, uh, difficult to gather, like capabilities.

    3. TS

      Yep.

    4. SG

      Um, I imagine that lots of your customers go to you guys for advice on like how to do this progression of moving to custom models. Like what do you tell people about the life cycle and when they should invest in that?

    5. TS

      Yeah, I, I, I think it's, hey, go find... Go prove to yourself with the best-in-class model that you have something worth optimizing. [laughs] Um, and, and I think, you know, a lot of... You know, if a customer comes to us, you know, th- th- was that meme, which was like, it was like two years ago. It feels like there's no GPUs-

    6. SG

      I was gonna say that

    7. TS

      ... pre-product market fit. It's like no post-training pre-product market fit is what [laughs] I...

    8. EG

      Yeah, yeah.

    9. TS

      Is what I'd say.

    10. SG

      So people that you're working with here are very, very at scale first-

    11. TS

      Yeah. They, they, they are-

    12. SG

      ... in the use case. Yeah

    13. TS

      ... they have a user signal that they know how to optimize, and they've shown that they can, you know, they can serve customer value, and that value, and that they have something special around that value. And once you have that value, it's like, okay, now how can I do that better, faster, and cheaper? With the idea being that, hey, if you need to be very good at-... customer support, you can-- you maybe don't need to be that good at coding, and that a specialized model might be a better fit for that problem, and you can do it better, faster, cheaper.

  10. 18:3522:25

    Supply crunch and data centerse

    1. SG

      What about the capacity side? You, uh, started with, you know, unifying capacity across all the clouds and neo clouds.

    2. TS

      Yeah.

    3. SG

      How do you think about this when everybody keeps talking about a, a supply crunch and a multi-year supply crunch?

    4. TS

      Yeah. I think, um, you know, there, there's so much narrative around the supply crunch, um, and no matter, like, as much as we hear about it, I don't think people realize how bad it really is. Like, there is, you know, there is very, very little Slack compute available. Like, you know, we, we run pretty large clusters ourselves, and we run them at, like, uncomfortably high utilization. You know, we-when I'm saying we're, like, mid-nineties utilization-

    5. SG

      Mm.

    6. TS

      -um, most of the time. Um, there is-- we have may-- we, we have-- we sit in eighteen different clouds now. Uh, we have 90 clusters around the world across 18 different clouds. And like, you know, initially, we started-- we, like, built this technology to be able to, like, kind of create one runtime fabric that spans all these different clouds and try to abstract that away from our customers, um, as a way to think about reliability, latency, failover, all these things that we think are gonna be very important for very mission-critical use cases. Um, that same technology, like just our ability to get compute wherever humanly possible, um, has been really, really helpful in our ability to get supply. And, and what, what I mean by that is we can be introduced to a new provider in a different country, um, and have it up and running with the whole Baseten inference stack-

    7. SG

      As part of the fabric. Yeah.

    8. TS

      Part of fabric in, uh, half a day, half a... May-maybe less.

    9. SG

      I know.

    10. TS

      Uh, um, even for... And that gives us enormous flexibility. Um, even for us, it is hard for us to grow. Like, we have a, we have a... I think that's... Yeah. I'll say it. Um, we have a, a full PM standing meeting for the company where we basically like, like how do we, like how do we, how do we manage capacity for the demand right now? I think the second part which people don't really, um, the two-- the, the, the second part that people don't really understand is, um, that there are also a lot of, um, suppliers right now, um, that it's kinda grifty. You know, like the... I, I thi-I think, you know, they haven't run, they haven't run data centers before. You know, they don't understand SLAs for... E-especially for inference. Um, and so, like, you know, even when there is capacity available, um, there's a lot of dil-- like, there's probably... We, we run a lot more of those, and we have redundancy, and so it's fine. But if you... You know, there's probably like a dozen good, like, clouds, and I'd probably like put like three or four of them in like the, the gold tier.

    11. SG

      Mm-hmm.

    12. TS

      Um, and I think that just means that, like, supply, like, not only are we supply crunched, we're supplier and operationally crunched onto people who can, who can run these data centerse as well. So-

    13. SG

      How, how far ahead can you actually buy capacity right now? In other words, like, is there, uh, any, any slack in the market if you buy two years ahead or five years a-- you know what they're comparing.

    14. TS

      You, you mean like ac-like contract length or actually like, "Hey, I want this in January '28."

    15. SG

      Yeah, either one. Yeah.

    16. TS

      Yeah.

    17. SG

      I mean, it's more the, "I want this in January '28," or at least-

    18. TS

      Yeah

    19. SG

      ... I have some visibility into my future supply.

    20. TS

      Yeah. Um, you could buy that, but you gotta also remember how quickly the market is, how quickly the market is moving. Um, and like, you know, that gets balanced somewhat off, like, the fact that the H100 is such a great chip.

    21. SG

      Yeah.

    22. TS

      Um, and like, and then s- you know, it's crazy. If it's four years, four and a half years old, the price is going up still.

    23. SG

      Yeah.

    24. TS

      Maybe it has a useful life of nine years.

    25. SG

      Yeah.

    26. TS

      Um, so, um, you know, that, that's, that's good,

  11. 22:2524:09

    Longer GPU Contracts

    1. TS

      but at the same time, at the same time, you know, ye-yes, you can do that, um, but, you know, you're making a lot, like y-you're making a lot of bets-

    2. SG

      Yeah

    3. TS

      ... um, as part of that. And then in terms of... I think that's the big thing that's changed over the last six months, is that the term length that people want, um, has just gone up. So if you, if you wanted, um, a thou-a thousand, thousand 24 B200s-

    4. SG

      Mm-hmm

    5. TS

      ... um, which is, you know, um, from a good cloud, right now you're not getting that less than a three to five-year contract-

    6. SG

      Mm-hmm

    7. TS

      ... right now with a probably a twenty to thirty percent TCV, TCV prepay. Um, so, like, actually what becomes important when acquiring capacity, um, is you need to have enough demand to supply it, um, to serve, but then you also need like a low cost of capital, um, which is, which is actually changing the dynamic pretty significantly.

    8. SG

      Does that, does that impact how you think about going public as a company?

    9. TS

      Oh.

    10. SG

      Because arguably-

    11. TS

      Yeah.

    12. SG

      Yeah.

    13. TS

      I think you'd go sooner.

    14. SG

      Yeah, exactly.

    15. TS

      Yeah. Yeah. I, I think you need... Like, I, I think the... And I think there is demand for that. Um, but I think, you know, the pull, the... It al- it also, you know, one, one of our, one of the, one of the realizations that we had recently, and we're, we're software people, um, and so we don't, we don't think like this all the time, is that, you know, our, our business has, like, very interesting working capital-

    16. SG

      Mm

    17. TS

      ... um, requirements.

    18. SG

      Mm-hmm.

    19. TS

      Like, you know, um, and, and I, and I think, you know, even-- and that as a result of that, it has very interesting financing-

    20. SG

      Yeah, yeah

    21. TS

      ... um, requirements, and we're not-

    22. SG

      Mm

    23. TS

      ... at least right now, we're not even going down to the, down to the debt.

    24. SG

      Yeah, I mean, in a sense, there's also things you could do in terms of debt or other structures that-

    25. TS

      Yeah

    26. SG

      ... yeah.

    27. TS

      Yeah. And yeah, I've learned a lot about debt-

  12. 24:0926:07

    What Makes a Winner

    1. SG

      Yes

    2. TS

      ... um, re-recently.

    3. SG

      Given the, uh, supply crunch, uh, inference being one of, you know, the top couple markets-

    4. TS

      Yeah

    5. SG

      ... you're going after, you have, um, plenty of people who understand this problem and therefore, you know, some competition. How do you, uh, how do you think about, like, what are the factors that-Create a dominant player here or a winning player. Is it, as you mentioned, cost of capital? Is it access to supply? Is it software? Is it demand?

    6. TS

      Yeah.

    7. SG

      Just being excellent, everything.

    8. TS

      Yeah. It, it, it's, um... Look, I, I think what's so interesting about inference is GP-

    9. SG

      Is it operations, like it's such a cloud?

    10. TS

      Yeah, yeah, yeah. I think so. Yeah.

    11. SG

      [laughs]

    12. TS

      I think like GPUs as a service is not sticky. I think that's been seen. Like pe- customers generally just see that as, as commodity. Um, inference with the software layer included is incredibly sticky. Um, you know, like just, just like, you know, none of our top 30 customers have ever churned. You know, we're talking like 400% annual NDR-

    13. SG

      Mm-hmm

    14. TS

      ... uh, uh, um, around our business. And so it's like very... it's, um, it's very, very sticky. So I think that software layer is very important. The optimist in me is like, oh, there's so much value in the software, and I, we will build the best software layer for inference that exists. I think, you know, as I think is becoming clear now, access to inference compute is-

    15. SG

      Yeah

    16. TS

      ... is a strategic, strategic advantage, and I think that is like the... I think that is the, um, strategy that even the labs are going after, which is like, if we have the, if, if we have all the compute, good luck running inference.

    17. SG

      Yeah, yeah. In, in a world of constrained compute, the number one thing to own is compute.

    18. TS

      Yeah.

    19. SG

      And so, you know, just owning it in and of itself is an asset, and I think people underappreciate that.

    20. TS

      Yeah. You can't, you can't make a good hot chocolate without milk and, you know, the, um, [laughs]

    21. SG

      Unless you're a vegan.

    22. TS

      Unless you're a vegan. [laughs]

    23. SG

      Yeah, yeah.

    24. TS

      No one wants a vegan inference.

    25. SG

      Yeah. [laughs]

    26. TS

      [laughs]

    27. SG

      Well, I gotta ask you, people might want, um... they might, they might want alternative milk, right?

    28. TS

      Yeah.

  13. 26:0728:19

    Multi Chip Future

    1. SG

      So, okay, like when you, you... the H100 is a great chip. People, you know, want a B200, they want GB200. They want, of course, tons and tons of Nvidia. Um, when you think about making a bet, you know, several years in the future, do you believe that there's a, like, multi-chip world? Like, what do you, what do you think happens from a compute perspective, um, on chip side?

    2. TS

      Yeah. Um, I think, I think, you know, like diversification everywhere is a-

    3. SG

      Mm-hmm

    4. TS

      ... same way I want a world of many models. I think, you know, we want a world of many, most things. Um, and I think-

    5. SG

      You'd be sad if it didn't happen.

    6. TS

      Yeah. And I, and I, I think everyone would be sad. I, I, I will say, um, to some extent, which is, um... Yeah, and I think there will be inference-specific chips. I think you have like decode-specific chips, I think. And we're, we're looking at the-

    7. SG

      And Nvidia said this.

    8. TS

      Yeah, yeah. I mean, that was, that was the whole Groq, um, LP thing. It's like, you know... I, I think, I think that is, um, very straightforward and, and makes sense. I think people really, really, really underestimate supply chain stuff from Nvidia, like how good they are at that. CUDA, how good CUDA is, the developer ecosystem around it. Um, and, you know, we... it-- the ability... Like, to me, like one of the most important things as an infrastructure company in this moment is how fast you can move, and you can move fastest with Nvidia today. Um, and I think that is the reality, and like, it just like given the scale that they operate at, given the scale that they operate at, it's, um, it's hard to, it's hard to see, um, a t- it's hard to see the, the... and I'm not saying it won't happen, like the short term, like in the next couple of years, how anyone's gonna be able to compete with that. Es-especially with, you know, so much of the other, the other players. Like, what you need, um, to be able to compete here is the ecosystem to form around you, and if you tie up all your supply with one buyer, which, you know, a bunch of the other chip providers have done, it's actually hard for that ecosystem to form. You know, like if you, if you think about if, if you're a big lab and you have a proprietary deal with one chip type where you get 90% of the supply, it's actually in your best interest to make sure you get 95% of the supply to everything that's built for you, and no one else can ever use

  14. 28:1931:08

    Runtime Roadmap

    1. TS

      it.

    2. SG

      When you think about reacting to the market, um, what do you think is, like happening with the actual workloads that you have to go invest in, right? Like, obviously, code agents and long-horizon agents over time have become-

    3. TS

      Yeah

    4. SG

      ... a big deal. People talk a lot more about CPU compute, video inference is different.

    5. TS

      Yep.

    6. SG

      Um, I don't know if it's that sandbox that's like, what, what's important for you guys to invest in now?

    7. TS

      Yeah. Look, I, I, I think there's... for, for us, all the runtime stuff is obviously very important, and what that means is like what chips we run on, how we run, what kind of workloads we support. Like how... do we get very good at diffusion transformers? Yes. Um, coding agents need sandboxes. We should go build sandboxes. Um, there's all sorts of new speculation techniques to get faster inference. We need to do that. Um, even stuff like, um, KB cache-aware routing and, you know, that stuff's a bit old now, but like getting... continuing to be very good at that and, um, somewhat disentangling prefill and decode and starting to treat them as separate problems. I think that's, you know, something we are very focused on, and we're seeing massive gains there. That's at the runtime level. Um, I'd say, you know, beyond that, you know, everything we think about is how to create more of that loop between inference, post-training, because we think that just begets more inference. Um, and so, like we, we will build or partner on almost everything there. So, you know, we're gonna work with, you know, the best evals company in the world to make sure that's very well, well integrated, like brain trust, um, into and around Baseten. You know, we will partner with or on the sandbox side, build, build the best sandboxes experience, um, that will exist. Um, and then we'll create the, the best training APIs to make it so continual learning becomes somewhat of a solved problem. It's not just like a discrete thing. That's, I think, the core Baseten product thesis is like how do we build that loop, and then everything out, out around that becomes how do we make sure that we can do everything we can to, um, ensure that gets as big as possible. That's access to compute, that's on infrastructure, make sure we can get compute anywhere, make sure we have access to our own compute, um-And then I think it's all the primitives that come off of that just, ju-that just become incredibly, like, margin accretive both for us and our customers, um, which is, you know, stuff like, you know, sandboxes and, like, the, um, async batch inference. Like, how do we drive utilization by having a first-class batch inference experience? To me, this is, like, what an inference cloud looks like. It's like you are very good at inference, and then you, you start to do all the things tangential or that loop into inference and partner and where necessary and build where necessary. Um, but we really do wanna own, like, start with that core inference story and then go down to unblock supply or create margin and go up the stack to unlock value.

  15. 31:0833:48

    Scaling Edge Cases

    1. SG

      What, uh, would surprise people about some of the issues you discover only at scale? I'll give you an example. I was surprised when, uh, you guys ran into scale limitations, like fundamental limitations with some of the hyperscaler products that you were consuming.

    2. TS

      Yeah.

    3. SG

      And I-- because I kind of think of, you know, the AWS GCPs of the world as supporting infinite scale.

    4. TS

      Yeah. I mean, I, I think you just... And, like, again, like you-- I think very, very large companies w-like, that run services at big scale, it's probably the same stuff.

    5. SG

      Mm-hmm.

    6. TS

      Is that all the edge cases, um, just become-

    7. SG

      You, you actually experience it.

    8. TS

      You experience them. And like, you know, and you-- I, like-- I'll give you a few examples here. Like you, you see, you know, you start seeing, you know, yesterday we had for the first time ever, we saw some kernel panic, um, and that only happened because some, um, uh, fluent bit worker was creating too many logs and it, and the scale was too big, and it was all into one node, and it was happening two, two, two times at the same time by two different workers. Um, so you see all, like, the systems level and kernel level problems. Um, but then you start to see... I think the, the craziest stuff is that you start to see with, with LLMs, um, that these runtimes are pretty immature. Even how we use KV cache is, you know, um, you know, probably a little less sophisticated than most people see-

    9. SG

      Mm-hmm

    10. TS

      ...than most people see. And we, we, we are starting to see the, the limitations of the current and the next set of primitives that need to be built from a scale security, a performance perspective. But I, I think it's really at the runtime level and the systems level and then... But the edge cases are, I'd say, a lot more systems level than they are LLM specific.

    11. SG

      What are the things that keep you up at night?

    12. TS

      Capacity. Um, I think, I, I, I think, you know-

    13. SG

      Quick answer. Yeah.

    14. TS

      Yeah. I think capacity. I, I think the other one is probably just this market's so big and it's so, like... It, it represents, um, a moment when you should be as aggressive as possible. Um, and you know, r-r-really, you know, we, we've grown a ton, obviously, over the last 12 months, the last few months, but the answer's always just go, you know, go bigger, go faster. And I think that's really, really fun. It's also a little exhausting, and it's also, like, we are, we are all in somewhat uncharted territory in terms of how fast and how big you can go and how things can get. But I, but I think the big one is compute. I think, like, there's no world in which there's enough compute to, you know, get the amount of, the am- the amount of value that we wanna get out of LLMs in the next five to ten years.

    15. SG

      Or we have to invent a lot of new stuff.

    16. TS

      Yeah.

  16. 33:4836:44

    Hiring and Leadership

    1. TS

      To, yeah.

    2. SG

      Maybe if we just talk a little bit about, uh, what you're learning scaling, you know, 30x is like an aggressive thing to go through as a company. Uh, you've brought in a, a lot of, um, really amazing talent like, um, Danny and Samir and Stephen Dev, folks on both the, the technical and the, um, go-to-market side. Like, what do you, what do you think is working about how you are recruiting and scaling or, or what's your philosophy on that?

    3. TS

      We were very, very flat, like, until, I know, 12 to 18 months ago. I remember I went on walk with Elad, actually, and Elad was like, "You just need leaders." And I-- and, and, like, it's actually, like, so contrary to, um, everything. You know, as engineers, you're like, "Oh, you, you-"

    4. SG

      It's all overhead.

    5. TS

      It's all, it's all... Everything is overhead.

    6. SG

      [laughs]

    7. TS

      Everything is overhead. Um, and I, and I-

    8. SG

      You once told me, I think, that you, you didn't... You're like, "Hey, Sarah, Sarah, what about we just have engineers instead of salespeople?"

    9. TS

      Yeah.

    10. SG

      Yeah.

    11. TS

      Yeah. Yeah. That. [laughs]

    12. SG

      [laughs]

    13. EG

      Everybody learns it.

    14. TS

      Everyone-

    15. EG

      It's all the same.

    16. TS

      We're all, we're all... But I remember, like, you know, you, you, you said it so clearly at the time, Elad, and I, and I think that's what we've noticed, which is, like, actually having a leadership team, um, that you can trust, that you can trust, um, is, is, is so important. I, I think the, the two or three things that I would say is, like, you want people where you can give them whole problems.

    17. EG

      Yeah.

    18. TS

      And so, like, you know, if you, if you are, if you feel like you are micromanaging, if you feel like you need-- if you feel like, you know, you, you have to be involved in everything, I think that's a bit of a cop-out as a founder because you're just like, "I just need to be involved in everything." It's like, no, you probably don't have the right people. Um, I think the second thing is, um, be very, very clear what you're optimizing for, because I think when you're very, very clear what you're optimizing for, the people on the... And, like, if it's something generic like, "We want the smartest, hardworking people," like, you can't do much with that. Like, with us, what we cared about was, hey, actually, we don't care about a lot of people who have done this before. We care about first prin- people who are thinking first principled, prin- first principles. Work has to be, um, a high priority, but they also have to be very kind and nice and, you know, care about the collaborative environment. We don't have a hero culture. Um, you know, very low ego. Um, and you know, if you need, if you need a manager, like, um, it's probably not-

    19. SG

      [laughs]

    20. TS

      [laughs] It's probably not the right place, um, to be. But I think once you have, when you have that clear rubric, the, the people become very apparent that will fit into it, and the people that don't, um, fit into it also become very apparent. And I think what's more, like, we've hired amazing people like you mentioned, but I think what's a lot more interesting is, like, I think we'veWe haven't had a ton of, like, turnover there unnecessarily. Like, pe-people tend to work, um, 'cause we, 'cause we have a ve- we are very clear on what we want early on. It took us a while to get there, though.

  17. 36:4438:19

    Operations Pager Culture

    1. SG

      What about the idea of, like, an operations culture? You know, we were talking to Alyssa Henry about this, and she's like, "Well, the hard thing about cloud is actually just operations."

    2. TS

      Yeah.

    3. SG

      "I slept with a pager under my-"

    4. TS

      Yeah

    5. SG

      ... pillow for a decade." I don't think I've seen you detached from your Slack channel-

    6. TS

      Yeah

    7. SG

      ... for-

    8. TS

      My phone is buzzing right now.

    9. SG

      Yeah.

    10. TS

      The, um, the, um, the, um... It's like I, I'm getting an-

    11. SG

      [laughs] That's just not a strong one, yeah

    12. TS

      ... I, I, I'm, I'm, I'm getting, I'm getting anxious. So, um-

    13. SG

      And, and you've, you've been concerned before. Like, do people get it? Like, uh-

    14. TS

      Yeah

    15. SG

      ... you know, what is distinctive about that?

    16. TS

      I, I think, I think j- like, one, I think if you've worked at an infrastructure company... Like, we, we were once in a meeting with a bunch of AWS execs, and this was, you know, like, very senior AWS folks who all their pages went off multiple times-

    17. SG

      [laughs]

    18. TS

      ... um, during our 45-minute meeting. You know, like, it's a... I, I, I think, like, it's, it's, it's very much like just a cultural thing. Um, but yeah, like I, I don't... You know how, like, inference can't go down, and like, you know, we, you know, the, you, you, you, you learn to l- like, you know... What's this? Like, I think Amir, my co-founder, when his pager goes off, his seven-year-old said, "Is that a P0?"

    19. SG

      [laughs]

    20. TS

      [laughs] Oh. Oh, is that, is that, is that a P0? And so, you know, I, I think that is... You just have to get used to it, and that's the culture you live in, and it, it, it just changes the speed. Um, but also it's, it's, you know, becomes like a, you know, a cultural thing. A- I think it's very, very... It reject, it rejects people that don't fit into it very, very quickly.

    21. SG

      Like engineers who avoid PagerDuty.

    22. TS

      Yeah. You know, when we, when we have P0s, we're like, "Everyone on the call." Like, you know, like, there's been a joke that there may as well be a siren that goes off in the office-

  18. 38:1940:41

    Efficiency Drives Demand

    1. SG

      [laughs]

    2. TS

      ... when, when, when there's an incident, so.

    3. SG

      So people have been talking ad nauseam in the AI community about Jevons paradox-

    4. TS

      Yeah

    5. SG

      ... um, where if you decrease the-

    6. TS

      Yeah

    7. SG

      ... cost of... This is, it's a, a... It's really a, well, question around price elasticity-

    8. TS

      Yeah

    9. SG

      ... and availability. If you decrease the cost of a good, say intelligence as a good, um, people actually consume more of it.

    10. TS

      Yeah.

    11. SG

      Um, like the personal or business ROI of it, the demand for it goes up-

    12. TS

      Yeah

    13. SG

      ... not down. Um, do you see this, and are you, are you working against yourself trying to make these models more efficient? Do people just use them-

    14. TS

      More?

    15. SG

      ... more or less?

    16. TS

      Yeah. I, I think you gotta think about this from a developer's perspective and a consumer perspective. I think, like... I think consumers just want the best answers and the, and the, and the best experience. That's somewhat, um, governed by, you know, more intelligence to some extent. I think when you go to the developers, from the developer's perspective, um, they would insert more intelligence if you make it cheaper. Like, that's, you know... And they will, they will, they will insert more intelligence anyway.

    17. SG

      Mm-hmm.

    18. TS

      But if you make it in more cheaper, they'll, they'll insert a hell of a lot more intelligence. And you see this with agents and such. Agents are just longer running now, and I think that's what we have seen with the cost of inference going down, which is, you know, folks are just like, "Okay, we can, we can run this for longer. We can make it do a bit more work, and we'll get to a, um, a larger end." I think, like, if, like, compute scale is from an inference perspective as well, um, and you know, I think we are seeing that with almost all our customers, which is, you know, they either, they either start with like, "This is the quality of answer I, I need to get to, and this is the amount of inference, like, I need to do to get that," or, "This is the base level model that I can start with, uh, that I can work with to get there." And I think the more we drive down the costs, um, what they realize is, um, more intelligence just means better user experience-

    19. SG

      I just want a better answer

    20. TS

      ... better answers, better experiences, more dollars-

    21. SG

      More actions

    22. TS

      ... more dollars, even more revenue. So yeah, I think, I think inference going down just begets more infer- I, like, it, it is truly... Like, I think we're kind of in a world that is, you know, it is the last market, right? Like, even if there's AGI, all that's left is inference.

    23. SG

      Yeah. So you do not see in your customers a, a, like, this, this answer is enough and this action is enough dynamic?

    24. TS

      No, no.

    25. SG

      Yeah, it's gonna keep going for a long time,

  19. 40:4142:34

    Concierge Everything Future

    1. SG

      it looks like.

    2. TS

      Yeah.

    3. SG

      How do you view all this kind of evolving towards the future? So basically, this is one of the... It, it seems like it's gonna be one of the biggest markets of all times.

    4. TS

      Yeah.

    5. SG

      We have this massive shift where we're moving from software and seats and digitization into actual intelligence, selling units of cognition-

    6. TS

      Yep

    7. SG

      ... selling agentic workflows. What does this all look like in a couple years? Like, what is your view of this future world?

    8. TS

      I think for the c- for consumers it's, it's the best possible thing, right? Like, every- everything is somewhat s- smarter. You know, your doc- you get better care 'cause your doctors have access to better-

    9. SG

      Mm-hmm

    10. TS

      ... um, better tools. Um, there's more... You know, like, there's all this stuff about there being less software engineers, and I think we just build more software.

    11. SG

      Mm-hmm.

    12. TS

      I think we just build a ton more software and, like, you know, like, I, I see... You know, we're not slowing down hiring on software engineers. We're just building-

    13. SG

      Mm-hmm

    14. TS

      ... more things. Um, and that, for the consumers, that just means better tools, more software, um, all those, all those good things.

    15. SG

      It's almost like everybody has their own team for everything, right? You have an agent which helps with your doctor. You have an agent-

    16. TS

      Yeah

    17. SG

      ... that helps you learn stuff. You have an agent that helps you organize your life.

    18. TS

      It's a concierge. It's concierge everything.

    19. SG

      Yeah. Yeah, concierge everything for everyone.

    20. TS

      Yeah. And, and, and, and, and I think, like, what that means, what that... So that's amazing. I think that's great. And I think the, the, the... And education, same thing. You have concierge education. Like, every... You get personalized access to everything. I think then you go one step back in how it, uh, um, affects developers, I think it's, you know, um... And, and companies, I think if you don't embrace this-

    21. SG

      Mm-hmm

    22. TS

      ... I think it's the extinc- extinction moment-

    23. SG

      Mm-hmm

    24. TS

      ... for, for a bunch of folks, which is like, you know, everything needs... And I, and I don't think that means that, you know, core design needs Figma. I think that's a thing. I, I, I think, like, what, what's more, what's more interesting is just like, you know, all these workflow and software companies need to figure out-

    25. SG

      Mm-hmm

    26. TS

      ... what is the intelligent or intelligent inserted versions that, that drive the amount, the, all that user value for those end consumers that we talked about.

    27. SG

      Yeah,

  20. 42:3442:55

    Conclusion

    1. SG

      very exciting. Thank you so much for joining us today.

    2. TS

      Yeah. Thanks, guys. [upbeat music]

    3. SG

      Find us on Twitter at nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.

Episode duration: 42:57

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode XAbKflCncDo

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome