No PriorsNo Priors Ep 56 | With Baseten CEO and Co-Founder Tuhin Srivastava
EVERY SPOKEN WORD
80 min read · 15,745 words- 0:00 – 1:19
Introduction
- SGSarah Guo
Hi, listeners. Welcome to another episode of No Priors. Today, Elad and I are catching up with Tuhin Shrivastava, the CEO and co-founder of Baseten, which gives teams fast, scalable AI infrastructure, starting with inference. They're one of the players at the center of the battle heating up around AI computing. Welcome, Tuhin.
- TSTuhin Srivastava
Hi. Thanks for having me. Good to see you guys.
- SGSarah Guo
Let's start at the beginning. For any listeners who don't know, uh, what is Baseten, and how'd you start working on it?
- TSTuhin Srivastava
Baseten is a infrastructure product, so we provide fast, scalable AI infrastructure for engineering teams working with large models. Currently, we're focused on inference, and we want to do a lot more after that. But, you know, for the past, say, four and a half years actually... Oh, that's, that's a long time. For, for last four and a half years, we've been, you know, cutting our teeth and, uh, trying to build this thing. Um, I think it's been pretty rewarding over the last 12 months seeing the market kind of show up and, you know, everyone get equally excited about AI infrastructure. Um, we, we started this honestly because firstly, you know, we thought ML was pretty cool in, in 2019. We thought it, it, it was going somewhere, um, and we wanted to build a picks-and-shovels business and kind of solve the problems that we were running into. Um, I think the side note here is that I wanted to start a company with my friends.
- SGSarah Guo
You often say that Baseten isn't no code, it's efficient
- 1:19 – 4:11
Capabilities of efficient code enabled development
- SGSarah Guo
code. Like, why does that difference matter?
- TSTuhin Srivastava
That wasn't always the case, I'd say. I, I, I'd say, like, you know, there was times when we had elements which were definitely a bit more no-cody. I think what, what we've learned over the last three or four years is, you know, um, code is just incredibly powerful and engineers want to write code. Even in its best form, you know, you want to build, like, really, really tight abstractions, but I think the, the ability to turn the knobs under the hood is, is very, very important, and I think no code kind of, um, makes that a lot harder. I don't think it removes it, but it makes it a lot harder. So what we, what we do is just build, um, very strong intuitive abstractions that try to make the easy things super easy and, but still make the hard things possible. So, you know, you can get a lot of value, um, really quickly. But I'll... I'd say unlike a lot of other infrastructure products that have been built over the last 10 years, we're trying t- to solve against a graduation problem, which is that, you know, that we're able to support teams as they grow and scale.
- SGSarah Guo
And just to sort of make it a little bit more visceral for our listeners, like, what, what are the types of applications that run on Baseten? Like, what's the scale of the platform? Do you have a favorite application?
- TSTuhin Srivastava
Everything from, you know, tiny side projects on weekends all the way to companies that are pretty AI native. You know, we've, we've supported foundation model companies. We work with companies like Descript where AI is very, very core to the product experience. We power a lot of AI features that Patreon has shipped. Um, but I'd say some of the more interesting use cases from our perspective, actually from my perspective at least, are either, you know, the really small teams that we're giving a lot of leverage to, um, so that they can ship things very quickly. So a, a really good example of that might be a company like PlannedAI, um, which is basically building an SDK for call centers, um, is how I'd describe it. But, you know, they're able to ship models and, um, you know, co-locate workloads, um, so that they can get, you know, sub-300 millisecond or sub-200 millisecond, um, responses, um, without, you know, months and months of infrastructure effort. I think on the other hand, it's like... I'm... It's really exciting to see, um, you know, companies become AI-enabled, um, 'cause that, that's where we see a lot of the value is gonna be over the next decade, is, you know... If I look at a company like Picnic Health, which has actually been around for a decade, um, and starting to do very, very interesting thing with this corpus of data that they've gathered over the last 10 years and, like, supporting those use cases. Um, I think th- their model's called, um, PicnicGPT, which extracts information from medical records. And, you know, to me that's... those are the really exciting use cases where, you know, you're giving, you're giving leverage to companies that are good at, like, they're good at the domain that, um, they are working in. Um, their model might be proprietary, their data might be proprietary, but the infrastructure doesn't necessarily need to be proprietary and we can give them, um, we can give them, um, just an easy way to deploy that stuff without, you know, many, many people-months.
- SGSarah Guo
It's become, like,
- 4:11 – 6:12
Difference in training inference workloads
- SGSarah Guo
in, in, in vogue to compare the size of your GPU cluster, like people are spending a lot of money on GPUs. We hear about 600,000 H100 equivalents and lots of venture rounds being raised often to train, you know, large models in some domain or another or even, you know, more and more expensive post-training. Are, are training and inference workloads different?
- TSTuhin Srivastava
Yeah, I think so. I, I think they just have very different, um, almost like SLAs for the customer. Like, you know, things that matter for, um, inference are things like, you know, your, your cluster is somewhat co-located with where you're doing your work. Um, whereas, whereas training, you know, stuff like that d- matters a bit less. It doesn't really matter where that training is happening as long as all your GPUs are somewhat together. Um, you know, even the GPU clusters themselves, like, you know, the, the for-training, um, networking is a very, very important piece, to have networking, um, on the racks themselves, where the inference matters a little less because you're doing a lo- a little bit more on individual GPUs and less, uh, across GPUs. I think from a user perspective, there's a lot more workflow, I'd say, in inference that's, um, repeated across customers as opposed to, you know, you guys work with a bunch of companies' training models. You know, the state of the art really is, um, give me some SSH keys, um, (laughs) and let, let me go at it. Um, whereas with inference, there's definitely, um, repeated workflow where people are trying to get similar things out of their inference infrastructure, whether that be like work- uh, whether that be, you know, version management, whether that be w- the way they deploy it, hooking it up into CI/CD, um, you know, cold starts and, and so on and so forth. So I'd say it seems a bit more repeatable today, how the problem, um, is being solved by customers. I'd say the hardware requirements are quite different. They're probably a little less, um, to, to, to some degree. Um, but I think, you know, resiliency and reliability matters a lot more.... you know, you know, downtime is unacceptable from an input perspective. Um, nodes get terminated all the time, um, from a training perspective.
- 6:12 – 8:48
AI product acceleration
- TSTuhin Srivastava
- EGElad Gil
You were quite early to this market, and I think you folks pioneered a lot of the sort of early ML, uh, sort of infrastructure for these sorts of use cases and applications. What has been the most surprising thing, or what did you least expect relative to how things evolved?
- TSTuhin Srivastava
Yeah. I think I can answer that question at two different altitudes. I think you can answer that question from like a market perspective, which is that, you know, I- I think, Eli, you, you have some old writing which is, you know, markets... basically markets are all that matter. Um, I- (laughs) I think we've felt that very viscerally, um, in some sense, which is that, you know, you can build all this cool stuff and then when markets show up, um, you feel that and that really pushes, you know, the customer forward, um, and the needs for your product forward. I- I think... So that's one thing, which is like the acceleration we throu- we saw through the end of 2022, uh, and in 2023, um, you know, definitely took us by surprise. Like if I... I can, I can be really honest and say that from 2019 to 2022 it was- it was pretty quiet, you know. We had, we had happy customers but, you know, the demands weren't necessarily there. Um, I think from like a practitioner perspective, um, how fast some of these teams move has really shocked me, which is like I- I think what is really clear in, um, AI and early stage AI in- in general, and like I think the enterprises are waking up to realize this right now, is that speed is actually your number one advantage. Um, things are moving so fast that, you know, there is, um... if you're not- if you're not competing on speed, um, you're gonna be left behind. Um, and so there's actually a lot of propensity to buy versus build, I'd say, which is like peop- people are happy to, you know, um, to buy technology, where I'd say in the past people were pretty hesitant, um, to buy infrastructure. We talk- we talk with companies all the time where- where we think that, oh, you know, they probably have something built out where, you know, they're a lot less sophisticated than you think and they're handling a lot more scale, so they need to be able to, um, have the infrastructure to support that. So I think that's- that's probably one of the larger things that we've been surprised by, which is, you know, how fast people need to move to be relevant. Um, I think the other thing which is just like how GPU needs, um, have evolved, um, going from... you know, at the end of 2022 most of our customers were using T4s and A10Gs. Um, you know, after that it changed to A100s, now it's gone to H100s. Like the compute needs aren't necessarily going down, um, they're only going up, especially as these services scale up.
- SGSarah Guo
You guys were just highlighted for, um, leading on
- 8:48 – 12:08
Leading on inference benchmarks at BaseTen
- SGSarah Guo
the independent artificial-analysis.ai benchmarks for highest throughput and lowest latency serving. Congrats. Can you give us some intuition for what is driving that? Like, you know, what makes inference harder to- or hard to run fast?
- TSTuhin Srivastava
Inference is... you know, I think there's like multiple things which are quite difficult about inference. Um, you know, I think the... there's the workflow headaches which we've talked about a bit and we- I could talk more about. There's like the s- scalability and reliability bottlenecks. I think the stuff you're talking about is performance optimization, which is really like how long does one generation take? Um, I think, you know, there's a lo- lot of work and research that's being done to run generations as fast as possible to get the maximal throughput and the minimal latency. Um, I think, um, historically... historically... or, um, it's funny when I say historically because I mean over the last six months. But over the last six months (laughs) ... over the last six months a lot of work has been done in the research community to get basically these things to move faster. Stuff like speculative decoding, which, um, came out, I think... can't really remember when it came out, sometime in the last six months, um, it started really being used. I... um, for- for us, you know, what it means is like how much, how much... uh, how well you can use the GPU, how you can scale across multiple GPUs, um, and how you can honestly, like, be really, really, um, up to date with the latest things that are happening in open source and research. Um, you know, we've partnered with a company... with- with NVIDIA with a company. We've partnered with NVIDIA (laughs) w- and- and- and really, like, worked really closely with their L- um, LLM engine called TRT-LLM, um, and that's actually driven a lot of the performance gains that we've worked with and, you know, we- we've contributed to that, we've forked that. Um, but, you know, the- the hard thing there, a lot of the optimization you're doing is pretty low level, um, and there's no real obstruction, so you either have to learn how to use open source very well or rewrite some of these kernels by yourself. Um, you know, if you look at something like open AI, like what do people complain about a lot of the time? It's speed. Um, it's speed and, like, that's probably, you know, one of the core performance advantages of open source is that you can get these smaller- smaller, um, smaller models to run faster, um, and I think, you know, that will continue to be a massive focus for us going forward as well. Um, on- on the, um, on the benchmarks, I think, um, you know, it's pretty crazy how that's evolved as well. I think, you know, we've gone from, like, state of the art being 90 tokens a second then, you know, it- it got over 100, now it's over 200. Now we're talking for some people up to 300, 400, um, and I- I think that's continue- that's gonna continue to be, um, like a very, very important, um, place to innovate and... uh, we- we think over time i- it will get somewhat commoditized, the performance, that especially- especially for language models, to be honest. I think, you know, more and more of our stuff should run locally, um, to- to some degree I think, um, and... but, um, being on top of it and making sure that we're- we're kind of attached to the state of the art is, you know, a... i- if- if we're not, it's an existential risk to the business, and so, you know, we ha- we have to do it.
- EGElad Gil
How much optimization have you been seeing for other types of models? So diffusion models, some of the language mo- the Texas speech models, you know, other areas like
- 12:08 – 16:11
Optimizations for different types of models
- EGElad Gil
that. I'm just sort of curious. There- there seems like there's different types of optimizations happening across different foundation model types as well, so I was curious, you know, what's- what's state of the art there and how you're thinking about it?
- TSTuhin Srivastava
I don't have the metrics, like, on hand but what we're doing... we're seeing... what we're seeing and...... um, you know, also pushing limits there as well. So just yesterday, for example, um, you know, we were able to get Whisper running. So there's Whisper, there's FasterWhisper, and there's Whisper on Trt, which, again, is a NVIDIA thing. I, I think what, what we are seeing is that there's more and more focus on bringing these experiences to real time as possible, um, and so like, you know, one of our customers is a comp- a company called Gamma, AI-powered storyteller- storytelling software. They use Stable Diffusion image models to generate images. Um, you know, getting that w- not waiting four or five seconds in and still having high quality images is, again, core to their business and making it very, very fast and, um, and easy to use. And I think, um, you know, w- we are seeing, like, definitely from a customer requirement perspective, we're, we're seeing that. I think, you know, have we been able to juice as much there? Not yet, but I think we're getting there.
- SGSarah Guo
Speaking of the applications, uh, uh, you know, driven by these models still being s- generally startlingly slow, including the, like, really amazing capable ones, from ChatGPT, to cognition, to, um, you know, things like, like Pika and Midjourney and, uh, uh, like the... You know, in a, in a way that consumers have not seen in many years, we are waiting, you know, seconds for, um, for interactions. Um, is your view of that, like, it'll change because the models are getting smaller? It'll change because more, um, smaller models will get more powerful, people will do distillation, people will just get better at running these things, like, w- we'll get better hardware? What's the w- w- what's the path to, like, not waiting 10 seconds a generation, two minutes a generation?
- TSTuhin Srivastava
You know, if you look at so many of the gains of running Mistral fast, let's take that as an example. Um, you know, the step function gains come from, you know, a few things. Like, first one is running on an H100 and not (laughs) running on an A100. So I think, you know, as hardware gets better and better, like, you get this almost, like, leg up, um, and so as, as, you know, hopefully those prices go down and that gets more available, th- that'll be one thing, and hopefully the H200 comes out next and we do that. I think the next piece is around, um, software optimization, so stuff like continuous batching, um, and dynamic batching, um, and speculative decoding, which basically makes it, um, easier to, to either parallelize or batch process a bunch of things, or makes each individual generation faster or offl- offsets it, offloads it to another model. There's lots of stuff in that place. I think these models also are just gonna get smaller, to be honest. Like, I, I think that's like the, the really, really powerful small model, um, that does one thing, um, that's, um, pretty exciting, um, I think it- is it, um... I think that's why stuff like Ollama's very interesting. But I, I th- I think if you saw, um, Sourcegraph has this, um, where they, they basically announced that Cody's now running locally for, um, a lot of customers. I think, you know, that's a pretty ex- actually exciting proposition. We have to figure out where we see our world, like that is not necessarily, you know, amazing for cloud providers like ourselves, 'cause, you know, we want everything to run on the cloud, but I think things do get smaller, things do get more efficient, and we have more distilled, more powerful model, sharper models to do, um... Yeah. It's, it's, it's un- the unbundling of models.
- SGSarah Guo
How, how and when do customers choose to deploy their own models or open source models or fine-tuned open source models on their own infra versus use public model endpoints? Or, like, what guidance would you give people?
- TSTuhin Srivastava
This is a general trend we're seeing, right? Which is that you go to OpenAI, you go to Anthropic, you have an API that works. Really quickly, you're like, "That's too slow," or, "That's too expensive," or, you know, "I, I don't need something that powerful." Um, so we often see cu- like, if, if customers have a lot of money, they end up going, paying for private deployments in Azure. They
- 16:11 – 19:01
Internal vs open source models
- TSTuhin Srivastava
still have the slowness issue. Then they go to open source models. When you're starting out with open source models, you're gonna go to a shared endpoint. Um, there's lots of great shared endpoint, um, providers. Um, but there are things that ma- that might matter to you, which, um, that shared endpoint providers can't give you. Firstly, you know, you want your own SLAs. So you don't want kind of this noisy neighbor's problem, that when my... if me and Elade have two competing apps and Elade a- apps gets slammed, I don't want my, my app to slow down. My model calls still need to be fast. So, in that case, you want m- might wa- want, might want dedicated compute.Um, there's also, like, data and privacy stuff. Maybe you want, you don't want, you know, your data running on the same infrastructure, data going through the same infrastructure that other folks', um, infrastructure is going to. And honestly, maybe even at some scale, it's cheaper to run it yourself. Um, and they're the three things. I think then when you go to larger companies, it becomes a no-brainer. Like, shared endpoints don't really work, um, for large companies. It's not gonna work, they're not, definitely not gonna work for enterprises and a lot of the times you're gonna want a, um, you, you might want Mistral 7B not coming off a shared endpoint provider, maybe not even on a dedicated endpoint. You might want it self-hosted within your, within your own AWS and GCP. Um, and that's kind of wh- actually where we see a lot of the world going as well, which is that especially larger customers, they're gonna have actually pretty good compute deals, um, and they're gonna have their own spend, um, you know, their own, uh, cred system or spend commit with the marketplaces, um, and actually running it on the infrastructure, um, solves a lot of problems and has a massive cost advantage to them, um, as well. And so I think there's, like, three stages to it, which is, like, you start at shared inference endpoint providers, you go to dedicated in the cloud, but I think for some customers, that's not enough either and you want to go, um, into your own cloud.
- EGElad Gil
One prediction on the enterprise side would be that, you know, if, if ChatGPT only launched 15, 16 months ago and GPT-4 just came out a year ago, then most enterprises are still in a planning cycle and they haven't really adopted AI at any real scale, which means for infrastructure providers like Baseten, it's, like, a huge opportunity that's about to come, right? I- already are cresting this giant wave and the wave is about to get 10 times bigger potentially. Wha- what do you view as the timeline for really mass scale enterprise adoption of AI?... and where do you think things will be in terms of order of magnitude usage overall a year from now, two years from now? I'm just sort of curious, like, what your view of that future is.
- TSTuhin Srivastava
(laughs) I think, I think it's a good question. I think we've been so wrong with time, with timescales here, but I, I do think, like, what we see it right now is that when we go to talk to enterprises, um, I'd say a lot of them, you know, like, fo- h- honestly, um, what we're seeing now is that, like, copilots especially, um, cogen stuff, um, that actually has already made its way into the enterprise. Like, most enterprises we talk to,
- 19:01 – 21:53
timeline for enterprise scale
- TSTuhin Srivastava
like when you say, when you ask them, "How advanced are you in the AI strategy?" They'll tell you, "Well, everyone uses Copilot," and that's like their big- first big foray. I think the next piece then is, like, using, like, OpenAI or, or Anthropic, um, in some, in some way more. Um, I think that is going on right now, and I think people are starting to experiment with that. I, I think, like, the fear I would say, like, I, if someone was just telling me that, I think it was Pfizer earmarked f- like, tens of millions of dollars for ML investment or AI investment over the next 12 to 18 months. That's kind of frightening to me in some degree, like, I think it's great. It's great for us, you know, I, I lo- I love to hear it as a business builder here, but it's also like, you know, to me that means that the, the pressure is actually coming from the above and, um, and, uh, you know, that's kind of like the f- I'd say the ML trap we f- we fell into in 2018 to 2020 when CIOs were buying software and it wasn't really attached to real user value or product value. And so, like, I would actually say that we're probably overestimating how big enterprise will get in the next 12 to 18 months, but we're underestimating where it will be in, you know, three to five years from now. And I, I think 10X is pre- like, a pretty massive, um, underinvestment, like, you know, uh, like, wha- uh, uh, uh, yeah, like I, I'd say what we see, like w- we're working with a customer right now that has four engineers and has... by the end of this year will have, you know, mid-hundreds of thousands of dollars, um, annual spend. Um, you know, that's for one use case with some thousand users, um, for them, and that's... and they're already cashflow positive as a business, which is like, which is insane to me, um, in general. But, like, I can't even imagine, um, what these workloads are gonna look like, um, when we get to the enterprise. When you start to think about, you know, take a cu- like, customer service and chatbots is probably, like, the number one place where people think about, um, efficiency. Um, the volume that some of these customers, um, you know... rather, my brother's the head of AI at Sunrun, which is, like, a public company, um, wi- that does solar panels, and I think that's, like, another really good example of a company where there's so much opportunity for, so much opportunity for AI just to eat away at processes. Um, and the volume is, you know, just so much higher than, um, what we're thinking from, like, a traditional business perspective, with harder requirements, so... which even drives spending higher.
- SGSarah Guo
Yeah, I think one, like, um, reset in stance that people should have on, on spending here is that traditionally, like, you know, people were, um, looking at software companies, you got really, uh, concerned as a software investor if your cost of goods sold was affected by a lot of, like, data processing basically, and so, you know, y- you had this, like, expectation that your average
- 21:53 – 27:50
Rethinking investment in compute spend
- SGSarah Guo
SaaS business at maturity might have 80% gross margins. And, um, and I, I think, like, you know, now people understand that the, the training businesses have a big upfront capex investment that, you know, may or may not pay off. But I, I think one of the things that you're pointing out is that you can actually spend a lot on the inference and the core intelligence and that actually, um, you know, end up with a very valuable business on the other end with perhaps fewer people. And so I think people have talked about that shape of company, but they don't really think of it as a norm yet, and, you know, I, I, at least in, in my portfolio, we are seeing more efficiency on headcount and, like, a lot more compute spend. And I, I know for some of the base 10 customers, that compute spend, um, for inference is actually like the, you know, o- one of the largest items on the P&L.
- TSTuhin Srivastava
Uh, we were working with this customer, um, and, uh, we basically, yeah, and this was, like, a challenge for us. So we, we asked for an upfront co- like, payment for the year of compute and, you know, they (laughs) , the ir- the CTO then came back to us and said, "Hey, like, like, I, I, I appreciate what you're doing, but no, like, this is, you know, how b- like, th- after payroll, this is our second-biggest, um, expense for the year. Like, we're, we're, we're not going to do that." And I think that's, you know, somewhat indicative of, like, how much spend there is here is that, you know, you, there, and it's, it's probably also somewhat indicative of how big, you know, I personally think the market, um, can, can be once we start seeing mass scale, um, adoption. But I do think, you know, I, I think that's a good point, um, Sarah, which is that, you know, I think it's somewhat of a reset in terms of, you know, I don't know if this looks like normal SaaS business. Um, and actually, I, I, I know for sure it does not, um, and I think, you know, even, like, traditional multiples, it's really, really hard to think about and, you know, what, what's crazy about it though is that I think the most efficient businesses, through markups and through software optimization can actually der- like, drive pretty healthy margins, um, and still have these really aggressive consumption, um, contract, and I think that's, I, I think that's rare. I don't, I don't, I can't think of, uh, I feel like you guys see more businesses, um, than I do. Um, and I hope. Um, and then, but (laughs) I hope. Um, but, but, but, like, you guys will be able to, like, chime in on, on that more. Like, is that, is that unique to this industry, or where else have you seen that?
- EGElad Gil
It's been a while since I've seen, uh, so many companies ramping so quickly, like-Uh, and sometimes they were fake ramps. So like, you know, in the internet wave of the '90s, it was kind of startups selling to each other and kind of bootstrapping off of venture capital, and then there was giant telecom buildouts on like a five-year cycle that, um, caused this huge revenue uplift, and then suddenly there was a glut and things dropped dramatically. Here it feels like things are ramping really fast, uh, off of products that are a couple of months old, which sometimes suggests that there's not defensibility. Um, and so then the question starts to become, okay, how do you build defensibility? And what does that mean? And how do things get commoditized and do they... and, you know. Um, so there's a couple different markets where suddenly you see three companies all go from zero to five or zero to 10 million of revenue in a year. Um, and then you're like, okay, there's three of these companies and they all ramped at the same time the same amount. And so there's enormous demand, but what does that mean in terms of do they cannibalize each other? Can three more entrants come in and do the same thing? Like what- what is the basis for competition in that market? And so I think there's a lot of that happening too, which is, at least for me, pretty unexpected, and I think it's just because we have such a big technology capability shift that suddenly you can do things you literally couldn't do a year ago, you know? It's kind of amazing.
I think it's particularly exciting when you go and apply that to, um... Elade, y- you've cut your teeth in- on a bunch of different healthcare initiatives. Healthcare is like a, you know, a really interesting place where, like, you look at, you know, um, like Nuance, if you remember Nuance Technologies, like, you know, they- they had the stranglehold over this market for years, and honestly, like, it always looked like they were kind of struggling all along the way, um, as well. And then- and then Wisp- and then Whisper comes along and, you know, you see, like, that market now of, like, you know, note-taking, um, for- um, for the medical thing, it's insane how fast it's growing and, like, there clearly is real value there. Um, and then I think the question actually goes, maybe this starts with like a SaaS business again, when you're like, okay, what's the workflow? And there's a power in the defensibility of the workflow, um, you're powering. Yeah, it's a really good point, and then the other thing I think that people often forget is that many markets are not monopolies. Um, many of them are oligopolies, you know, that's payments with Stripe and Adyen and, um, PayPal and all these things, right? And so, um, it's also possible some of these market structures are oligopoly markets, and then it's possible e- it actually ends up being winner take all and there's some network effect or data effect or... but if you look at some of these types of companies, like healthcare, to your point, is a great example where it's deal driven, right? You have large deployments with big customers and you lock them in for multiyear deals, and if you're actually able to lock down customer bases and effectively you- you can fragment in an oligopoly market more easily than if you have renewals every year, right? So I think also part of it is just like what's the contractual structure of a market. And people really don't talk about that kind of stuff, but I think it's really fascinating to think about through the lens of what actually is a sustainable business in each- each one of these categories.
You know, beyond healthcare, it's like where- where are these businesses going to get d- disrupted? Um, I think like financial services is an obvious one and they're- they've, funnily enough, I think they've been at the cusp of this stuff, um, in the past. I don't actually think they're on the cusp of it, um, as- as much as, you know, you'd think. Like I... you know, if- if you think about, um, a lot of the S- the big data stuff like 10 years ago, um, you know, the hedge funds were all over that, right? They're like, "Hey, there- there's alpha, there's alpha here." And I- and- and I know that,
- 27:50 – 31:30
Defensibility in AI industries
- EGElad Gil
um, you know, some of them are starting to look at large models and language models and, um, and whatnot, but I do feel like they're actually being a bit laggard, um, in terms of their adoption, um, of th- of these things. And it might actually be because they were so deep in the other sphere, in the o- in like the old ML world, that it's hard to kind of really quickly turn things around.
It's just such a different capability set that I think like old school machine learning or where you're just effectively doing regressions and just pulling out patterns in data-
Yeah.
...is kind of different from some of the generative stuff in terms of what it does and what it can do for you. And one of the things I've been thinking about recently related to what you just said is, what are the companies that just don't care about this? And that might be a very good thing because they're defensible, right? In the era of AI eating everything, like what can't be eaten? And therefore, maybe those are really good things to get involved with or to work on because you're not threatened by a dozen different new startups. That becomes really hard with like... when- when you start to think about some of the demos we've seen over the last couple of weeks. (laughs) I thought my job was safe until yesterday.
- SGSarah Guo
Yes. One- one of my partners asked me how long I thought venture capital was going to last in terms of like, you know, an agent based automation taking over, um, 'cause he was like all excited that he got out of software engineering at exactly the right time before his skills became useless.
- EGElad Gil
(laughs)
- SGSarah Guo
Um, but I tried to give him- I tried to give him a real answer, which is, I think on the early stage, um... at the early stages, a lot of the data doesn't exist, right? Like you'd have to capture real world data. You have increasingly meetings over Zoom, but, uh, you would want to capture a lot of information about people. A- um, so much of it is access and the information about who is like leaving and like a 100X engineer and entrepreneurial and product oriented and it works with Velocity, like a lot of that is not collected today, right? So you have this big inputs and it's... there's no digital trail for it, and so you have this big inputs problem. I think the decisioning, like if you think about like what is actually structurally, um, predictable, uh, maybe if you have all that data, the like people are the most identifiable piece, but the, um... maybe you can... maybe you have a- a model that is doing continuous learning and can l- (laughs) and learn like meta structures, like Elade is talking about, like, oh, this is a market that operates as an oligopoly where these are the core, um, uh, core drivers of, you know, differentiation and these are the dimensions of competition and such. But I think that that feels quite hard when you're investing in a technology landscape that is always changing, right? Um, and so like you're kind of always out of distribution and you don't have the data on the people and like I don't know how you make decisions on whether products are any good 'cause you'd have to have all the customer point of view or you'd have to have taste. Maybe models would have taste, so-
- EGElad Gil
I think Sarah is saying that her job is defensible, which I think is what everybody says. (laughs) No, no, no, not my job.
- SGSarah Guo
I am...I'm just saying, this morning I gave it a good think and I was like, "Should we hire people to go work on this?" And I was like, "Nah, that doesn't feel like a tenable problem this year." But I- I commit to, if it is feasible, like we're gonna be first. But I just, I feel like it'll be like, you know, another six months or so. (laughs)
- TSTuhin Srivastava
So that means it'll be next month. Um, you're giving away. (laughs)
- EGElad Gil
It's actually in about 20 minutes. There's a company launching.
- SGSarah Guo
Wait, I gotta ask you one more question because like you, you know, you're working closely with NVIDIA, you work with the hardware providers. Um, people are really interested in this topic now of like... I, I mean, generally, like do you believe in hardware heterogeneity? Right? There are some strong opinions on this from, you know, Databricks and others here. Um,
- 31:30 – 35:47
Hardware and the chip shortage
- SGSarah Guo
uh, and, um, do you, like, you know, uh, do, do you still see the same supply demand, um, dynamics around GPU shortage, um, from your customers that you did maybe beginning of last year?
- TSTuhin Srivastava
I think the chips have just changed. So like I think there is like b- before there was a shortage of, it just felt like of everything. Like if you wanted anything except a T4, you could not get it. Uh, that, that was, that was hard. I think, you know, right now, um, there's, there's two things we are seeing is that one, like most of the, like it is now possible for us to acquire compute, for us to acquire compute pretty quickly. We have big spend though, and so that, that is like that everything should be kind of, um, conditioned on that, is that we are, you know, we're, we're making long-term commitments with providers, so we're, we're able to, um, that gives us negotiating power. I think customers are still struggling with availability, um, for the most premium chips. And I, and, and I think, you know, whether that's H100s, A100s, I think even when there is availability, you're, you're oftentimes looking for like three to six weeks of negotiating with cloud providers, um, and then your rep calling in favors, um, in exchange for something or the other. Like, the amount of times that we're, the cloud providers were escalated conversations, um, just to get people moving faster is, is unreal. And so I think customers are still running into it. Um, I do think it's getting better and I do think that it will, you know, it sh- seem like it, it will go away. I think the heterogeneity argument around different things, I mean, it, it's probably a good thing, right? Like if, if there is more than one, one provider of chips. Uh, but that, that being said, um, I've, I personally think that it's pretty overstated how easy it is to, um, to run something that looks like CUDA or CUDA in some form on an AMD chip. Um, seems, seems like a challenge to me. I know a lot of people... There are, there are people who, who believe that they've got it done. I think the amount of time we, we spend debugging bad nodes we get, um, in a place where that you have a lot of information about, um, existing infrastructure, um, that's, that's challenging as it is. I can't imagine what that would be on, on these chips that are untested. And so like, I think over time, yes, I hope so. That'd be great. I think short term, um, you know, it's really hard for me to see how we make investments beyond, um, NVIDIA, espe- especially when there's a cust- where there's like customer, a crunch on the other side from customers who are like, "Hey, we need this now." And I don't think, I don't think we want to... The other thing we don't want to do is that, you know, what base ten doesn't give you, it doesn't give you raw access to GPU that's conditioned on this inference problem today that, you know, like you, if I gave you a GPU to use on base ten, there's not that much you could do on it except inference, just, just in terms of the access control, um, that we give to you. Um, that being said, like, you know, our customers do end up, you know, fiddling with NVIDIA drivers. They do end up, you know, um, like installing new versions of PyTorch and, um, and you know, have like custom docker images. Um, and I think running those things on things that aren't NVIDIA and especially, you know, doing them in a, in an abstracted way. Like it's... I think it'd be easier if I was building a service on top of these AMD chips and saying, "Take this service." But, you know, our customers do interface with those, with the GPUs in some way, and I think building an abstracted service where there's heterogeneity is like, that just sounds very, very challenging. You know? I'm sure there are people much smarter than me who could figure it out. Um, but I think for us, like I think that would just add a lot of complexity and really slow down how fast we can move. But I hope... I think, you know, there... It is a good world when there is more than one option.
- EGElad Gil
How do you see customers thinking about build versus buy, and how do you think that's going to be evolving over time?
- TSTuhin Srivastava
Speed is the only thing that matters in this market. And, and, and what, what this means is that, you know, if, if you, if you spend time fiddling around with your infrastructure, um, and your service goes down when you launch it, I think that actually hurts the end user experience a lot. Um, and it's something you just don't want to mess around with, and we see this from customers. You know, even customers like, um, where AI is their core, core thing,
- 35:47 – 38:26
Speed is the way to win in this industry
- TSTuhin Srivastava
is that they are understanding that what, what they... what is proprietary to them is models, data and workflow. What is repeated for them is infrastructure. And I think like, um, the smarter... Like the amount of times that we have seen in the last 12 months that we're gonna build this ourselves, which is very much like how infrastructure engineers were thinking, um, a decade ago, um, only to come back three months later. It's like, "I have a, you know, Docker dumpster fire, um, somewhere." It's like I, I... You know, we can't count. Like, it's our super qualifier. Like, have you built it yourself is how we know someone's gonna be a great base 10 customer 'cause they, they empathize with the pain and they know that this is gonna allow them to move a lot faster. Like we had, we had a company with a four-person, um, AI infrastructure team that'd been building this for two years, um, migrate all their workloads over to base ten in 36 hours. Um, and you know, like I think that is a pretty amazing case study for them, which is like, holy crap, I, we can now take these four engineers and focus on what is actually our competitive, uh, differentiated advantage. Um, and the way we think about our business is not, you know, we don't need to scoop everything, something off every single customer either. Like, we offer options where you can run this in your own, um, in your own environment and, you know, pay us a license fee and, you know. Like I, I think there... it is very, very cost-effective the way that, um, us and honestly other providers are doing this. Like I, I think it's crazy, to be honest, to try to build this yourself, especially at the scale that some of these customers are operating at. I was lo- I was looking at one of our, one of like the customers that we chatted with this morning who was, who was tinkering aro- tinkering around, and they said they were doing a billion tokens a day. This is a, you know, a, a six-person, um, chat, chat bot company that has a billion tokens a day going through them. Like, to, to build the infrastructure that supports that with the e- elasticity, reliability and, and the performance and then build the product experience around that, um, that's impossible for a six-person team. And like, I think you should try to take away the things, um, that, you know, other people can do just as well, um, if not better. Um, and that's my take and I think that's kind of what I see the market coming around to as well, which is that speed is a competitive advantage. Let's, let's spend our... Like, we can spend our way, we can buy that competitive advantage, um, without a long build cycle.
- SGSarah Guo
This was an awesome conversation. Thanks for doing it, Sohin.
- EGElad Gil
Thanks for joining.
- TSTuhin Srivastava
Thanks, Sara. Thanks a lot.
- SGSarah Guo
Find us on Twitter @nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find
- 38:26 – 38:32
Wrap
- SGSarah Guo
transcripts for every episode at no-priors.com.
Episode duration: 38:32
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode oe3OSlsf4YA
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome