No Priors Ep 56 | With Baseten CEO and Co-Founder Tuhin Srivastava

At a time when users are being asked to wait unthinkable seconds for AI products to generate art and answers, speed is what will win the battle heating up in AI computing. At least according to today’s guest, Tuhin Srivastava, the CEO and co-founder of Baseten which gives customers scalable AI infrastructures starting with interference. In this episode of No Priors, Sarah, Elad, and Tuhin discuss why efficient code solutions are more desirable than no code, the most surprising use cases for Baseten, and why all of their jobs are very defensible from AI. Show Notes: (0:00) Introduction (1:19) Capabilities of efficient code enabled development (4:11) Difference in training inference workloads (6:12) AI product acceleration (8:48) Leading on inference benchmarks at BaseTen (12:08) Optimizations for different types of models (16:11) Internal vs open source models (19:01) timeline for enterprise scale (21:53) Rethinking investment in compute spend (27:50) Defensibility in AI industries (31:30) Hardware and the chip shortage (35:47) Speed is the way to win in this industry (38:26) Wrap

Sarah GuohostTuhin SrivastavaguestElad Gilhost

Mar 20, 202438mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Baseten CEO Explains Fast AI Inference, Infrastructure, And Enterprise Adoption

Baseten CEO Tuhin Srivastava discusses how his company provides fast, scalable AI inference infrastructure for teams deploying large models, emphasizing "efficient code" over no-code abstractions. He contrasts training vs. inference workloads, and explains why inference is more repeatable, SLA-driven, and reliability-sensitive. The conversation covers performance optimization (e.g., speculative decoding, TRT-LLM, continuous batching), GPU supply dynamics, and how customers move from shared endpoints to dedicated and self-hosted deployments. They also explore how AI will change enterprise software economics, build-vs-buy decisions, and what defensibility looks like in rapidly growing AI markets.

IDEAS WORTH REMEMBERING

5 ideas

Speed is the primary competitive advantage in early AI markets.

Teams that move fastest from idea to reliable deployment gain a significant edge, which pushes many to buy infrastructure instead of building it, so they can focus on proprietary models, data, and workflows.

Inference has distinct requirements from training and is more repeatable across customers.

Inference workloads demand low latency, high reliability, versioning, and CI/CD integration; these patterns recur across teams, making them well-suited for specialized platforms like Baseten.

Performance optimization is now existential for inference providers.

Staying near the state of the art in throughput and latency (via speculative decoding, batching, and engines like NVIDIA’s TRT-LLM) is critical, as users are highly sensitive to speed and open-source models can be made very fast.

Customers typically progress from shared endpoints to dedicated and then self-hosted deployments.

As scale, privacy needs, and SLA expectations grow, teams outgrow shared inference endpoints and either move to dedicated clusters or host models inside their own cloud accounts to gain control and reduce costs.

AI-native businesses may trade headcount for high ongoing compute spend.

For many Baseten customers, inference is the second-largest expense after payroll, yet the businesses can still be highly valuable due to leverage, markups, and software optimization that preserve healthy margins.

WORDS WORTH SAVING

5 quotes

We’re not no-code, we’re efficient code — strong abstractions that make the easy things super easy, but still make the hard things possible.

— Tuhin Srivastava

Speed is actually your number one advantage in AI right now. If you’re not competing on speed, you’re going to be left behind.

— Tuhin Srivastava

What is proprietary to them is models, data, and workflow. What is repeated for them is infrastructure.

— Tuhin Srivastava

We had a four-person AI infra team that had been building for two years migrate all their workloads to Baseten in 36 hours.

— Tuhin Srivastava

I think we’re overestimating how big enterprise will get in the next 12 to 18 months, but underestimating where it will be in three to five years.

— Tuhin Srivastava

Baseten’s product philosophy: efficient code, strong abstractions, and graduation with customersInference vs. training workloads: technical, operational, and SLA differencesPerformance optimization for LLMs and other model types (e.g., Whisper, Stable Diffusion)Customer patterns: from OpenAI APIs to shared endpoints, dedicated infra, and self-hostedEnterprise AI adoption timelines, spending patterns, and economic implicationsMarket structure, defensibility, and oligopolies in emerging AI verticalsGPU supply, hardware heterogeneity, and dependence on NVIDIA vs. alternatives

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.