No Priors Ep 56 | With Baseten CEO and Co-Founder Tuhin Srivastava

No PriorsMar 21, 202438m

Sarah Guo (host), Tuhin Srivastava (guest), Elad Gil (host), Elad Gil (host)

Baseten’s product philosophy: efficient code, strong abstractions, and graduation with customersInference vs. training workloads: technical, operational, and SLA differencesPerformance optimization for LLMs and other model types (e.g., Whisper, Stable Diffusion)Customer patterns: from OpenAI APIs to shared endpoints, dedicated infra, and self-hostedEnterprise AI adoption timelines, spending patterns, and economic implicationsMarket structure, defensibility, and oligopolies in emerging AI verticalsGPU supply, hardware heterogeneity, and dependence on NVIDIA vs. alternatives

In this episode of No Priors, featuring Sarah Guo and Tuhin Srivastava, No Priors Ep 56 | With Baseten CEO and Co-Founder Tuhin Srivastava explores baseten CEO Explains Fast AI Inference, Infrastructure, And Enterprise Adoption Baseten CEO Tuhin Srivastava discusses how his company provides fast, scalable AI inference infrastructure for teams deploying large models, emphasizing "efficient code" over no-code abstractions. He contrasts training vs. inference workloads, and explains why inference is more repeatable, SLA-driven, and reliability-sensitive. The conversation covers performance optimization (e.g., speculative decoding, TRT-LLM, continuous batching), GPU supply dynamics, and how customers move from shared endpoints to dedicated and self-hosted deployments. They also explore how AI will change enterprise software economics, build-vs-buy decisions, and what defensibility looks like in rapidly growing AI markets.

Baseten CEO Explains Fast AI Inference, Infrastructure, And Enterprise Adoption

Baseten CEO Tuhin Srivastava discusses how his company provides fast, scalable AI inference infrastructure for teams deploying large models, emphasizing "efficient code" over no-code abstractions. He contrasts training vs. inference workloads, and explains why inference is more repeatable, SLA-driven, and reliability-sensitive. The conversation covers performance optimization (e.g., speculative decoding, TRT-LLM, continuous batching), GPU supply dynamics, and how customers move from shared endpoints to dedicated and self-hosted deployments. They also explore how AI will change enterprise software economics, build-vs-buy decisions, and what defensibility looks like in rapidly growing AI markets.

Key Takeaways

Speed is the primary competitive advantage in early AI markets.

Teams that move fastest from idea to reliable deployment gain a significant edge, which pushes many to buy infrastructure instead of building it, so they can focus on proprietary models, data, and workflows.

Get the full analysis with uListen AI

Inference has distinct requirements from training and is more repeatable across customers.

Inference workloads demand low latency, high reliability, versioning, and CI/CD integration; these patterns recur across teams, making them well-suited for specialized platforms like Baseten.

Get the full analysis with uListen AI

Performance optimization is now existential for inference providers.

Staying near the state of the art in throughput and latency (via speculative decoding, batching, and engines like NVIDIA’s TRT-LLM) is critical, as users are highly sensitive to speed and open-source models can be made very fast.

Get the full analysis with uListen AI

Customers typically progress from shared endpoints to dedicated and then self-hosted deployments.

As scale, privacy needs, and SLA expectations grow, teams outgrow shared inference endpoints and either move to dedicated clusters or host models inside their own cloud accounts to gain control and reduce costs.

Get the full analysis with uListen AI

AI-native businesses may trade headcount for high ongoing compute spend.

For many Baseten customers, inference is the second-largest expense after payroll, yet the businesses can still be highly valuable due to leverage, markups, and software optimization that preserve healthy margins.

Get the full analysis with uListen AI

Enterprise AI adoption will likely be overestimated in the short term, underestimated long term.

Many enterprises currently count copilots and basic OpenAI integrations as their AI strategy; true large-scale, domain-specific deployments will take years to mature but could dwarf today’s usage by an order of magnitude or more.

Get the full analysis with uListen AI

Building AI infra in-house is often a costly distraction for most teams.

Srivastava sees teams with years of internal infra work and dedicated engineers migrating to Baseten in hours or days, freeing those engineers to work on differentiated capabilities instead of maintaining "Docker dumpster fires."

Get the full analysis with uListen AI

Notable Quotes

“We’re not no-code, we’re efficient code — strong abstractions that make the easy things super easy, but still make the hard things possible.”
— Tuhin Srivastava

“Speed is actually your number one advantage in AI right now. If you’re not competing on speed, you’re going to be left behind.”
— Tuhin Srivastava

“What is proprietary to them is models, data, and workflow. What is repeated for them is infrastructure.”
— Tuhin Srivastava

“We had a four-person AI infra team that had been building for two years migrate all their workloads to Baseten in 36 hours.”
— Tuhin Srivastava

“I think we’re overestimating how big enterprise will get in the next 12 to 18 months, but underestimating where it will be in three to five years.”
— Tuhin Srivastava

Questions Answered in This Episode

How should a small team decide when it’s the right time to move from shared inference endpoints to dedicated or self-hosted infrastructure?

Get the full analysis with uListen AI

What practical steps can AI product teams take to systematically improve latency and throughput without deeply specialized infra engineers?

Get the full analysis with uListen AI

How will the economics of AI-native businesses evolve as hardware gets cheaper, models get smaller, and local deployment becomes more viable?

Get the full analysis with uListen AI

In which verticals will AI infrastructure and model serving likely become most defensible, and where will they be quickly commoditized?

Get the full analysis with uListen AI

What are the realistic paths to hardware heterogeneity beyond NVIDIA, and how should startups hedge against potential supply or pricing shocks in GPUs?

Get the full analysis with uListen AI

Transcript Preview

Sarah Guo

Hi, listeners. Welcome to another episode of No Priors. Today, Elad and I are catching up with Tuhin Shrivastava, the CEO and co-founder of Baseten, which gives teams fast, scalable AI infrastructure, starting with inference. They're one of the players at the center of the battle heating up around AI computing. Welcome, Tuhin.

Tuhin Srivastava

Hi. Thanks for having me. Good to see you guys.

Sarah Guo

Let's start at the beginning. For any listeners who don't know, uh, what is Baseten, and how'd you start working on it?

Tuhin Srivastava

Baseten is a infrastructure product, so we provide fast, scalable AI infrastructure for engineering teams working with large models. Currently, we're focused on inference, and we want to do a lot more after that. But, you know, for the past, say, four and a half years actually... Oh, that's, that's a long time. For, for last four and a half years, we've been, you know, cutting our teeth and, uh, trying to build this thing. Um, I think it's been pretty rewarding over the last 12 months seeing the market kind of show up and, you know, everyone get equally excited about AI infrastructure. Um, we, we started this honestly because firstly, you know, we thought ML was pretty cool in, in 2019. We thought it, it, it was going somewhere, um, and we wanted to build a picks-and-shovels business and kind of solve the problems that we were running into. Um, I think the side note here is that I wanted to start a company with my friends.

Sarah Guo

You often say that Baseten isn't no code, it's efficient code. Like, why does that difference matter?

Tuhin Srivastava

That wasn't always the case, I'd say. I, I, I'd say, like, you know, there was times when we had elements which were definitely a bit more no-cody. I think what, what we've learned over the last three or four years is, you know, um, code is just incredibly powerful and engineers want to write code. Even in its best form, you know, you want to build, like, really, really tight abstractions, but I think the, the ability to turn the knobs under the hood is, is very, very important, and I think no code kind of, um, makes that a lot harder. I don't think it removes it, but it makes it a lot harder. So what we, what we do is just build, um, very strong intuitive abstractions that try to make the easy things super easy and, but still make the hard things possible. So, you know, you can get a lot of value, um, really quickly. But I'll... I'd say unlike a lot of other infrastructure products that have been built over the last 10 years, we're trying t- to solve against a graduation problem, which is that, you know, that we're able to support teams as they grow and scale.

Sarah Guo

And just to sort of make it a little bit more visceral for our listeners, like, what, what are the types of applications that run on Baseten? Like, what's the scale of the platform? Do you have a favorite application?

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome