No Priors Ep. 85 | CEO of Braintrust Ankur Goyal

Today on No Priors, Elad is joined by Ankur Goyal, founder and CEO of Braintrust. Braintrust enables companies like Notion, Airtable, Instacart, Zapier, and Vercel to deploy AI solutions at scale by efficiently evaluating and managing complex, non-deterministic AI applications. Ankur shares his insights into emerging trends in the use of AI tooling and coding languages, the rise of open-source, and the future of data infrastructure. Ankur also reflects on building resilient AI products, his philosophy on coding as a CEO, and the importance of a startup’s initial customer base. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Ankrgyl Show Notes: 0:00 Introduction 0:38 Ankur’s path to Braintrust 3:05 Braintrust’s solution 5:46 AI tooling trends 7:58 Instruction tuning vs. fine-tuning 8:57 Open-source AI adoption 10:42 Future of data infrastructure and synthetic data 14:45 Designing technical interviews 18:04 Rethinking agent-based approaches 19:34 Building out an AI team 23:35 Typescript as the language of AI 25:12 The shift away from using frameworks 26:02 Vendor consolidation among enterprises 27:16 Coding as a CEO 30:16 Collaborating with customers 33:00 Future of Braintrust and evals

Elad GilhostAnkur GoyalguestSarah Guohost

Oct 8, 202438mWatch on YouTube ↗

EVERY SPOKEN WORD

75 min read · 15,385 words

0:00 – 0:38
Introduction
1. EGElad Gil
  (music plays) So today on No Priors, um, we have Ankur Goyal, the co-founder and CEO of Braintrust. Ankur was previously vice president of engineering at SingleStore and was the founder and CEO of Impira, an AI company acquired by Figma. Braintrust is an end-to-end enterprise platform for building AI applications. They have companies like Notion, Airtable, Instacart, Zapier, Vercel, and many more with evals, observability, and prompt development for their AI products. And Braintrust, um, just raised $36 million from Andreessen Horowitz and others. Ankur, thank you so much for joining us today on No Priors.
2. AGAnkur Goyal
  Very excited to be here.
0:38 – 3:05
Ankur’s path to Braintrust
1. AGAnkur Goyal
2. EGElad Gil
  Can you tell us a little bit more about Braintrust, what the product does and, um, you know, we could talk a little bit about how you got started in this area and AI more generally?
3. AGAnkur Goyal
  Yeah, for sure. So, um, uh, I have been working on AI f- since what one might now think of as ancient history. Uh, back in 2017 when, uh, we started working on Impira, um, you know, things were totally different, um, but still it was really hard to ship products that work. And so we built tooling internally as we developed our A- AI products, um, to help us evaluate things, collect real user data, use it to do better evals and so on. Um, fast-forward a few years, Figma acquired us, and we actually ended up having exactly the same problems and building pretty much the same tooling. And I thought that was interesting for a few reasons, um, some of which you pointed out, by the way, when we were hanging out and, and chatting about stuff. But one, Impira was kind of pre-LLM. Uh, my time at Figma was post-LLM. But these problems were the same, and I think there's some, you know, longevity that's implied by that. You know, problems that existed pre-LLM probably are gonna exist in LLM land, uh, for a while. Um, and, and the second thing is that, you know, having, having built the same tooling essentially twice, it was clear that there's a pretty consistent need. Um, and so, uh, you know, I have very fond memories of the two of us hanging out and talking to a bunch of folks like, you know, Brian and Mike at Zapier and Simon at Notion and, and, you know, many others. And, uh, you know, I've been in a lot of user interviews over time. I've never seen anything resonate like the early ideas around Braintrust and really everyone's, uh, desire to, to ha- to have a good solution to the eval problem. Um, so we got to work and built a, a, honestly a pretty crappy, uh, initial prototype. Um, but people started using it, and, uh, you know, Braintrust, um, just, uh, over a year later has now kind of iterated from people's, uh, feedback and, you know, complaints and, and ideas into something I think that's, that's really powerful. Um, and yeah, that's how we kind of got started.
4. EGElad Gil
  Well, yeah, I remember in the early conversations we had around the company, or the idea I should say, it was meant to even potentially be open source. And it was the first time that I was involved with some sort of customer call and people would say, "We don't want you to open source it." Which I found really surprising. Like, people really pushed on, "We want this to exist for a long time. We wanna be able to pay for it." Um, and so there was that kind of really interesting market pull. Why, why
3:05 – 5:46
Braintrust’s solution
1. EGElad Gil
  do you think there was so much interest or, or need for this or demand for it? Or, you know, what does Braintrust drew and- do and how does that really impact your customers?
2. AGAnkur Goyal
  You know, many of our customers had actually built, uh, early customers had built, like, internal versions of Braintrust before we engaged with them. And, uh, there's a couple things that sort of came out of that. One is it helped them gain an appreciation for how hard the problem is. Evals really sound easy. Oh, it's just a for loop, you know, and then I look at- I console.log the, you know, for loop as I go and I look at the results. Um, but the reality is, like, uh, you know, the faster you can eval, the faster you can look at eval results, which start to get really complicated as you start doing things with agents and so on, um, the faster you can actually iterate and, and build stuff. It is actually a pretty hard problem to do, to do evals well. And many of our early customers, um, who were kind of like the pioneers in, in AI engineering, um, had learned that the hard way. Um, and I think the other problem is that, uh, you know, folks, especially folks, you know, like Brian for example, they saw that AI would be pervasive technology throughout the whole org, not just a project that, you know, Brian mi- might babysit and, and work on with one team. And, um, having a really consistent and, um, standardized, you know, way of doing things was really important. I remember early on, um, Brian pointed me to the Vercel docs, and he said, "One of the things I love about this is that, um, when new engineers are building UI now, they read these docs and they kind of learn the right way, uh, to build web applications. And you have that opportunity with AI." And I found that actually really motivating and, um, you know, really influenced how we think about things.
3. EGElad Gil
  That makes a lot of sense. So I guess, like, if you're swapping out, you know, GPT-4 for, um, Claude or you're making a change in model or you're changing a prompt, then it just helps you really understand how that propagates and what, what sets of outcomes for users are better, what sets are worse, and kind of troubleshoot them. And then it, it feels like you've built a whole other series of products around that that really help support that.
4. AGAnkur Goyal
  One of the biggest things when you're building AI products is this uncertainty about quality. So you might, uh, for example, um, get really excited about a feature, build a prototype, it works on a few examples, you ship it to some users, and you realize it actually doesn't work very well. Um, and it's just really hard to go from that prototype into something that systematically works, um, in an excellent way. And I think what we, uh, have helped companies do is basically, like, demystify that process. So instead of having a bunch of anxiety about, "Hey, I shipped something. I don't know if I'm ever gonna get it to be able to work well," um, you can implement, uh, some evals in Braintrust and then sort of turn the crank, um, and get really, really good outputs.
5. EGElad Gil
  You know, you work with a lot of
5:46 – 7:58
AI tooling trends
1. EGElad Gil
  the companies that I feel are the earliest adopters of AI into their own products. In other words, they've actually shipped products with AI in them and they're sort of that first wave. It's Notion, Airtable, you know, Zapier, people like that, Vercel. What proportion of your customers do you think are adopting some of the things that people are talking about a lot? And so that would be things like fine-tuning or RAG.Or building agents, like, do you think that's a very common set of things that people... Or do you think that's just kind of hype and... 'Cause I think you have a very clear picture of at least one segment of the enterprise market in terms of what people are actually doing.
2. AGAnkur Goyal
  Unambiguously, people are doing RAG. Um, so that one is, you know, it's- it's, like, simple and obvious. Um, uh, probably around 50% of the use cases that we see in production involve RAG, um, of some sort. Uh, fine-tuning is interesting. I- I- I think, you know, a lot of people think of fine-tuning as, um, an outcome, but it's actually really a- a technique, um, and the outcome that people are looking for is automatic optimization of their workloads. Um, fine-tuning is one way of doing that, and it is a very, very difficult way of automatically optimizing, uh, your- your- your use case. I think, um, uh, we, with our customers, have re-benchmarked fine-tuning on their workloads, I would say, every two to three months, and, um, there was a period of time when GPT-3.5 fine-tuning came out, um, before GPT-4.0 was easy to execute. Now, it's extremely cheap, actually, to run GPT-4.0, um, but there was this kind of period where it was really hard to have, uh, GPT-4.0 access. And GPT-3.5 fine-tuning was a way of... It's, like, the only lever, um, eh, uh, you know, for some use cases to improve quality. Um, but since then, you know, honestly, uh, I think almost, if not all of our customers, have moved off of fine-tuned models onto instruction-tuned models, um, and are seeing really good performance. Uh, uh, y- we even talked about that early on. I remember when we were thinking about BrainTrust, we thought, like, "Oh, boy. You know, everyone's going to need to use this to- to fine-tune models," and w- that was one of the first features we were thinking about building. Um, and, you know, no one is... uh, no one's really doing it.
3. EGElad Gil
  Could you explain,
7:58 – 8:57
Instruction tuning vs. fine-tuning
1. EGElad Gil
  just for, uh, the- the- the listeners, like, the difference between, uh, instruction-tuning and fine-tuning?
2. AGAnkur Goyal
  Yeah. I mean, I think it's- it's kind of like the difference between writing Python code and, um, creating an FPGA or something. So with instruction-tuning, all you do is modify the prompt to include examples of how it should behave. Um, you know, in some ways, it's actually very similar to fine-tuning. You're collecting data that guides how the model should behave, and then you're feeding it into a process that kind of nudges the model towards behaving that way. Um, fine-tuning is a much lower level thing, where you're actually, like, modifying or supplementing the weights in a model, um, so that it, uh, you know, um, learns from those examples. Um, and because it's, uh, so much lower level, it tends to be a lot slower and more expensive. Um, you know, there's a lot of ways you can injure the model while you're fine-tuning and actually make it worse on, um, you know, real world use cases. And so it's just a lot tougher to get
8:57 – 10:42
Open-source AI adoption
1. AGAnkur Goyal
  right.
2. EGElad Gil
  And then do you see a lot of open source adoption or mainly people using proprietary models, and are there other early technologies that you see people adopting right now?
3. AGAnkur Goyal
  We are very close to a watershed moment for open source models. Like, we saw the watershed moment for Anthropic when, uh, Claude 3.0 came out, um, and especially, you know, Claude 3.5, uh, eh, Sonnet has really taken off. We are very close to that, I think, with LLaMA 3.1, but we're not there yet. So we see very limited, um, practical adoption of open source models, but I think more interest than ever.
4. EGElad Gil
  And I think a lot of what you're seeing is also just things that are in production, right? And so, to some extent, um, there's a lot of discussion in the developer communities around what people are using and adopting and playing with, and then I think you're really focused on the market of enterprises that are shipping AI products and... You know, obviously, it can be used by, um, hackers and developers as well, but a lot of your usage as well is people who have things in production, and so it kind of reflects the state of the world for- for life systems at scale.
5. AGAnkur Goyal
  I am a developer, and I love open source software, and I have a very, um, difficult time with the fact that every time I use an OpenAI model, I'm paying a fee per token, but then I actually look at the numbers, um, and I... of course, I've looked at them with our customers too, and, you know, in some cases it's just negligibly cheap, and in the cases where it's pretty expensive, the ROI is actually really high. And so most of our customers are really, really focused on providing the best possible user experience for their customers and the fastest, um, iteration speed for their developers, and everything else is secondary. So I think until open source can really move the needle on one of those two axes, it's- it's gonna be tough, uh, for it to be adopted broadly.
6. EGElad Gil
  The- the other place you spent a lot of your career is
10:42 – 14:45
Future of data infrastructure and synthetic data
1. EGElad Gil
  on sort of databases and data infrastructure and things like that. So you had the VP of engineering at SingleStore, which I think, um, was renowned for really having an exceptional, like, um, database-centric team. How do you think about the data infrastructure that exists for the AI world today? What's- what's needed? What's lacking? What works well? What doesn't work?
2. AGAnkur Goyal
  The- the- the shift is that people have hoarded lots and lots of semi-useful data in data warehouses. Prior to LLMs, uh, there was actually this whole industry around AI where, you know, companies like DataRobot, for example, would come in and help you train models based on these proprietary structured data that you've collected in your super proprietary data warehouse. And I think the- the- the big insight or the crazy, you know, non-intuitive thing about, um, LLMs is that something trained on the internet outperforms, uh, what an enterprise can produce with their own data trained, um, on data in a data warehouse. And I think not only, um, is the nature of, like, the data processing problem different, but the value of data is actually... uh, you know, and how we think about the value of data is very, very different. Like, just hoarding data about your, you know, claims history or transaction history, it might not actually be that useful. Um, the real question is, like, how do you, you know, uh, construct a model which is really good at reasoning about the problems that you're working on? And I think the way that enterprises will, um, collect data and leverage it into, you know, these AI processes does not look like doing ETL on a data warehouse that's, you know, (laughs) running in- in Amazon or something like that. I think, um, it's gonna totally change. And- and I've seen... um, you know, like, a lot of the data that gets...... uh, uh, stored in Braintrust through people's logs. It actually never makes it to a data warehouse. Um, and, you know, people, they, they just don't really care about that because, you know, if they put it in a data warehouse, what are they gonna do with it?
3. EGElad Gil
  What do you think is missing from a data infrastructure perspective? So I think, um, you know, to your point, there's a couple of different steps. There's some sort of data cleaning step. There's some storage layer. There, uh, y- you know, there's, there's different forms of labeling, et cetera. How do you think all these pieces kind of evolve over the next couple years? And then, I guess, related to that, the other topic people have been talking a, a lot about is synthetic data and how important that will be in the future. I'm sort of curious to your views on these different areas.
4. AGAnkur Goyal
  Purely from a data standpoint, uh, it, it's important to think about what you're going to do with the data and then how the infrastructure enables that. So, you know, a data warehouse is really designed for ad hoc exploration on structured data, which is, it's just, neither of those two things is relevant in AI land. You're dealing with lots and lots of text, and you're not exploring it ad hoc using SQL queries. Um, what we see, actually, uh, as kind of what the most advanced companies are doing is actually using embeddings, um, and models themselves to help them f- sift through tons and tons of data, um, and find, for example, customer support tickets which are not well represented in the data that they're using for their evals, or not well represented in their fine-tuning, um, datasets, and, uh, tr- you know, trying to find those examples and, and use them. So I, I think the workload is gonna, is gonna shift and I, I actually think, like, LLMs and, you know, specifically embeddings are gonna be core to how people actually query data, not, you know, traditional algebraic relational indexes. That's gonna be a huge shift. Um, and, uh, you know, there's this huge debate about vector databases and, "Will traditional databases do vector database things?" I think that debate's kind of silly. I, I think r- you know, relational databases are perfectly capable of adding HNSW indices to them. Um, what will really be disrupted is the OLAP workload. So relational, you can't just slap, um, you know, semantic search and stuff into, uh, the architecture of a traditional data warehouse. I think that is actually a much deeper set of things that will need to change, uh, than the OLTP workload.
5. EGElad Gil
  This is your, your,
14:45 – 18:04
Designing technical interviews
1. EGElad Gil
  in some sense, third startup experience, right? You joined, uh, MemSQL/SingleStore quite early. You started Impira, which Figma acquired. Um, you're now doing Braintrust. W- what are the common things that you've taken with you as you've done this new startup? Like, what are the things that you've implemented early? What are the things that you've avoided?
2. AGAnkur Goyal
  You know, one of the things that I, I honestly took for granted at MemSQL, but we've kind of re-implemented at Braintrust, is having a really hard technical interview. Um, you know, MemSQL, maybe we, maybe we pushed it a little bit too far, but it was really known for really strong technical excellence, and I think our interview reflected that. Um, so that was actually one of the first things that we did. Um, uh, Manu and I spent probably, like, two or three days, uh, working through a bunch of really, really hard interview questions. Um, and I think it's just important that, um, you know, you hold the technical bar really high, uh, and try to find people that, um, are attracted to it. Actually, for example, if you do a front end interview at Braintrust, one of the questions involves writing some C++, and we lose a lot of candidates, uh, because of that question. Um, but it's a good signal that maybe Braintrust isn't the right place, uh, for you to work, um, because we, we do like to hire people who are willing to, you know, jump around in areas of the stack that they're unfamiliar with. So, I, you know, I think that's one of the, that's one of the biggest things that, that, uh, we've carried over. Uh, an- another thing that I think we did really well at, at both, um, Impira and MemSQL is have an obsessive, uh, relationship with our customers and just really, really focus on making them successful. It, it's sometimes really hard to prioritize customer feedback and think about, you know, if the 10 customers are asking for 10 different things, what do I do? Um, so what we've done at Braintrust is actually be very deliberate about which customers we, uh, prioritize, especially early on, and sort of as, uh, um, hypothesize that, you know, the Zapiers and Notions and so on of the world would have pretty similar use cases. Um, and so if you focus on these kinds of customers, then when they ask for stuff, you can pretty readily assume that other similar customers are gonna have the same problem. And that's allowed us to be very, very customer-centric while building a product that repeats itself for, for more customers. And now, what we're seeing is that, um, you know, the next wave of companies that are building with AI, both startups and more traditional enterprises, they actually want to be, uh, engineering things like the products that they admire, uh, most of which use Braintrust. And so a lot of those best practices are now built into the product. And kind of the next batch of companies is able to consume them right out of the box.
3. EGElad Gil
  Yeah, it's kinda interesting. I feel like even early on as, um, as, uh, companies were first adopting LLMs for actual live products, they would all follow kind of the same startup journey, or I should say technical journey, right? Initially, they'd look into, uh, at, at least back then, they'd look into fine-tuning or some open source model or something else. Uh, they'd eventually realize they should just be using GPT-4, which was the primary model at the time.
4. AGAnkur Goyal
  (laughs) Yeah.
5. EGElad Gil
  Um, and then they'd go through this big loop of starting to build internal tools, and then realize that really, their focus should be on product. And, you know, it was the exact same journey. And I remember in their early Braintrust customer conversations, you'd talk to them and they'd say, "Oh, we don't need this." And then three months later, they'd call and say, "Okay, we really need this." And it was always roughly the same timeframe. Are you seeing any common patterns today in
18:04 – 19:34
Rethinking agent-based approaches
1. EGElad Gil
  terms of, okay, companies that are now a year or 18 months into their journey using LLMs, like they always have the same thing come up?
2. AGAnkur Goyal
  The, there's a couple things. So one is, companies that are fairly deep into their journey, they, they have, like, one or two North Star products that are pretty mature and they're trying to figure out how to get those products to the next stage. Probably the most consistent thing I've seen is companies kind of walking back from the, uh, illusion that totally free form agents will solve all of their problems. So I think maybe, like, two or three months ago, many of the pioneering companies went way down the agent rabbit hole, um, and they kind of realized, like, "Wow, this is actually not..."This is, this is not, not the right, um, ap- approach. It's so hard to control performance. Um, the error rates are really high and they compound really quickly. Um, and so, you know, most of those companies have kind of walked back and, um, tried to, to, to build a different architecture where the control flow is, is actually managed deterministically by their code, um, but they, um, make LLM calls kind of, like, throughout the entire, uh, ar- architecture of the product. Um, and so that's, that's probably the, the biggest thing that we're seeing now is, um... I, I don't, I don't know if there's a good term for it yet, but maybe this kind of pervasive, uh, AI engineering throughout a product rather than trying to shove everything into the, you know, um, while loop of an agent.
3. EGElad Gil
  Yeah. The other thing that I've heard you talk about in the past is, um, the evolving role of what an AI team
19:34 – 23:35
Building out an AI team
1. EGElad Gil
  does at a company. And so, I think if you go back a couple years, people were doing machine learning and they'd hire a big ML ops team, and then the types of things that they'd be doing day to day were very different from what they do today in the context of adopting AI and even how you think about the role and who to hire maybe has shifted a bit. Could you talk a little bit about what you view as the evolution of the role of the data science team, the data team, the ML or AI team, et cetera?
2. AGAnkur Goyal
  Yeah. I think what, what's really interesting is many of the early adopters, uh, of LLMs didn't have any ML, uh, you know, staff when ChatGPT came out, uh, you know, what is it now, almost two years ago. Um, and those companies were able to move really quickly 'cause they kinda started with a fresh slate. Um, many of the, you know, smart folks that I know that are classical machine learning people or data scientists have now come around, but actually there was this big sort of resistance among them early on that, you know, LLMs are... they're not good at the things that we're trying to solve or, you know, maybe it's a scam or, or something like that.
3. EGElad Gil
  Do you think that was just, like, a different problem set in terms of traditional ML and the applications a bit are different from what Gen AI can do? Or do you think it was something else?
4. AGAnkur Goyal
  Well, I went through this myself, uh, watching the technology that we built to do document extraction at Impira become, you know, totally irrelevant. Um, and, uh, I personally, I, I think it's an emotional thing. Like, you, you try GPT-3 for the first time and first of all, it, you know, back then at least i- um, it was kind of snarky, and so that was a little bit irritating. Um, and it, it was also just way better at everything than anything you could possibly train. And I think that is so fundamentally disruptive to, you know, a, a lot of companies, a lot of people's individual identity. Um, it, it just is not easy to wrap your head around if, uh, you've been doing AI and ML for a while. So I, I think it was largely an emotional thing. You could argue that there is a cost, security, privacy, whatever element of it, but the companies that were a- sort of on the leading edge, they were able to figure that out pretty quickly. Um, you know, now I think more companies have come along the journey and I've seen a lot of really smart ML and data science people embrace LLMs and bring a lot of the sort of rigor that is still relevant around evals and measurement and, you know, um, prototyping and so on, and become these, like, AI platform teams. Usually it's a combination of people with product engineering backgrounds and, um, you know, a few folks with statistics or data science backgrounds. Um, and they, uh, start by building kind of, like, a marquee product, uh, for the company and then they, um, evolve into a platform team that enables, like, the n+1st project to be really successful. We see a lot of these teams forming, um, you know, as, as AI becomes more pervasive.
5. EGElad Gil
  So if you were at a enterprise company right now and you were to try and adopt, uh, AI or LLMs, like, who would you have to hire or what sort of capabilities would you move over into sort of this platform team?
6. AGAnkur Goyal
  I would start with, um, a group of really smart product engineers, uh, because the first thing you need to ask yourself is, what parts of my product or whatever I'm offering, um, can be cannibalized or completely changed by modern AI? Product engineers are generally the best people to think about that. You can get really far with, um, a really good UI, um, and very basic AI engineering that sort of proves out a concept. I think, um, you know, we've seen a number of good examples of that. I know, for example, v0 is, is, is a truly incredible piece of engineering at this point, both from an AI standpoint and also from a UI standpoint. But early on, you know, it, it was pretty simple, um, and, uh, that's the right way to start. Um, and then I think as you find product market fit, it's, it's sort of the right time to, um, think about, uh, you know, more rigor, think about fine-tuning. You know, maybe you should use open source models for cost or, or whatever, although I think not many people are far along that journey.
23:35 – 25:12
Typescript as the language of AI
1. EGElad Gil
  I thought you said something like, "TypeScript is a language of AI and Python is a language of machine learning."
2. AGAnkur Goyal
  Yeah.
3. EGElad Gil
  Uh, could you, could you extrapolate more on that?
4. AGAnkur Goyal
  First of all, uh, a vast majority of our customers use TypeScript. Um, and, you know, early on some of our customers were dealing with, like, should we use TypeScript or Python? And some teams were using TypeScript, some teams were using Python. Now almost everyone, including people that used to write Python primarily, is using TypeScript. Um, and I think that's, that's gonna continue forward. Uh, there's a few, few reasons for that. One is TypeScript is the language of product engineering and product engineers are the one who are, ones who are driving most of the AI innovation, at least in, in the world that we, uh, participate in. And so they're just literally pulling the AI ecosystem into their world, um, and that is, uh, you know, driving a lot of TypeScript stuff.Um, another thing is that Typescript as a language is inherently better-suited, uh, for AI workloads because of the type system. So the type system basically allows you to, uh, launder, (laughs) you know, the crazy stuff that comes out of, um, an AI model into a well-defined structure that the rest of your software system can use. Python has a pretty immature, uh, type system. They, you know, they're improving and I always get trolled on Twitter when I post about this by people who make, you know, somewhat valid arguments. Um, but Typescript is just a much, much better language for writing, uh, software that deals with, um, uncertain shapes of data. I, I, I think that's actually kind of its whole point. Um, so I, I, I think it is actually literally a better-suited language for working with, with AI.
25:12 – 26:02
The shift away from using frameworks
1. AGAnkur Goyal
2. EGElad Gil
  Have you seen any other, um, shifts in terms of, uh, usage of specific languages or, um, tooling or other things that's happened with this wave of AI?
3. AGAnkur Goyal
  Yeah, I think the, the biggest thing I've seen over the past six months is people dropping the use of frameworks. So early on, I think people thought that AI is this, you know, really unique thing and, uh, just like, you know, Ruby on Rails or whatever, um, we're gonna need to build new kinds of applications, uh, with new kinds of frameworks to be able to, to build AI software. And, uh, really, I think people have walked back from that and they now think of AI as kind of like a core part of their software engineering, um, as a whole. And so AI is now kind of like pervasively spreading throughout people's codebase, um, and it's not constrained to what you can create with, you know, a s- a single framework.
4. EGElad Gil
  Uh, outside
26:02 – 27:16
Vendor consolidation among enterprises
1. EGElad Gil
  of the areas that, um, BrainTrust touches from a tooling perspective, what do you think are other interesting emerging either platforms or approaches or products or infrastructure that people are starting to use?
2. AGAnkur Goyal
  I think what we've seen from a lot of our customers is a consolidation of vendors. Uh, and, and this is very, very, very much driven by AWS. Um, so AWS has its mojo back now that they have, uh, Anthropic on Bedrock and Anthropic is, you know, uh, especially Cloud3 and 3.5 are really, really good. Um, and so because, you know, many companies were consolidating their vendors prior to AI, um, AWS is so dominant, um, and now you can actually consolidate a lot of your AI stuff on AWS as well. We're seeing pretty dramatic vendor consolidation. Um, there's some companies that we talk to and their AI vendors are, it, it's literally OpenAI, AWS, and BrainTrust. Um, and pretty much everything else has consolidated away. So, you know, I, it, it'll be interesting to see what happens. Um, I, I certainly wouldn't underestimate, you know, AWS and the hyperscalers, especially on the infrastructure side.
3. EGElad Gil
  Well, one of the things that I think is striking is how much time you still spend coding, as CEO
27:16 – 30:16
Coding as a CEO
1. EGElad Gil
  (laughs) , and, um, uh, and there's a number of CEOs at different companies who continue to write code over the course of their careers to varying degrees, you know, like, uh, Tobi of Shopify would be an interesting example of that. How do you think about time spent coding versus marketing versus doing other things for the company, and why focus there?
2. AGAnkur Goyal
  My perspective on this has changed a lot over time. Um, when I was, you know, much younger, I, I started leading the engineering team at SingleStore and then, um, became a CEO, and, you know, people, people give you the conventional advice about what you should do with your time and who you should hire and stuff like that, and, um, uh, first, I think, you know, the profile of CEOs is, is changing, and second, I th- I think the market is changing. So i- in the world that we are in, which is enterprise software, um, people really, really care about the polish of the UI that they're using. Um, I think, you know, companies like Notion, for example, have really driven, um, uh, people's taste, uh, o- on those products, but, um, when many VCs were form- you know, having their formative experiences and, um, observing the patterns that they would, you know, eventually, um, mandate among their portfolio companies, things were very different, you know. Uh, IT bought enterprise software, and they bought it based on, you know, checklists that product managers came up with. Um, so I think a lot of this has changed, and for me it just feels very natural to, um, you know, participate in that change by, uh, being very, very deep in the product. And as hard as I've tried over the past, you know, decade plus, I just can't. I think I'm just literally addicted to, to writing code. It is the fastest, most efficient, and most pleasurable way for me to participate in what we're doing, um, as a company, and so instead of trying to change that, uh, which I've done, uh, at BrainTrust we've kind of engineered the company to support me spending a lot of time writing code. For example, one of the first people we hired, uh, was Albert, um, who was formerly an investor and investment banker before that. He is incredibly good at everything from, you know, selling, marketing, dealing with ops, um, helping with recruiting, uh, and, you know, uh, working with him has kind of freed me up to spend a lot more time doing that kind of thing, whereas at Impira, I spent probably like half or more of my day doing those things.
3. EGElad Gil
  Yeah, we had Jensen Huang on from, um, NVIDIA on No Priors previously, and I thought one perspective that he shared that you don't hear very much which you're now echoing is, you should really architect the company around the CEO versus just follow the same pattern every time of what the right thing for the company is. And obviously there's areas where you just have to do the same thing every time, like, you know, sales comp. Like, it really doesn't make sense...
4. AGAnkur Goyal
  (laughs) .
5. EGElad Gil
  ... to try and reinvent that, and everybody always tries for their first startup, and by the second startup, they're like, "Why did I even try that (laughs) you know?" It just kinda works. Um, but the flip side of it is, there are certain things to delegate or not, there are certain things to micromanage versus not, and it, it really varies by the person and what they love doing, and...... you know, all the rest of it. Um, uh,
30:16 – 33:00
Collaborating with customers
1. EGElad Gil
  are there other big differences between, um, eh, how you've approached Braintrust and Impira, for example, your prior startup?
2. AGAnkur Goyal
  Another thing that we're really, uh, bullish on at Im- uh, at Braintrust is people being in the office and being really comfortable, uh, being interrupt-driven. Um, these are two battles that were very difficult, uh, for us at, at Impira 'cause, you know, we, we weren't very firm about it. Um, I think the second one is actually a little bit more interesting. Uh, at Braintrust, if a customer complains about something or they find something about our UI annoying or they have an idea, we almost always fix it, um, you know, immediately. And, um, that is something that for a lot of engineers is very uncomfortable. Uh, but for the right engineers, they've been craving that experience, you know, their entire career. And so we, we hand-pick those people that want to be in that environment, and then again, we engineer our roadmap and, um, uh, think about how we allocate our time and so on to actually be able to support that. Uh, and I think it's one of the key sort of, um, things that, that has made the product really good and, and also creates a lot of love with our customers. Not everyone has to have the same edge, but I think you have to have some edge. And so we identified that as something we really cared about early on and, and again, you know, kind of, like, recruited a, a team of people who really wanna do that.
3. EGElad Gil
  Yeah, and I, I guess, like, that's translated into sort of customer adoption in some of the logos you've blended. Are there other things that have helped drive customer acquisition and, uh, you know, have there been unique ways that you've approached go-to-market?
4. AGAnkur Goyal
  Yeah, I mean, I think, uh, I went to the, you know, Elad school of hard knocks and, uh, (laughs) and learned a bunch of stuff early on from, from you. But, you know, really the thing that we did was we made that list of, like, 50 people who we thought were leading the way in AI and, um, said, "You know, let's try to figure out a way to get to these people and either get, recruit them as investors or as customers." Um, and, uh, I think that was, that was probably one of the most important, if not the most important things that we did. Some people, for example, um, were excited about Braintrust, we had known them for a while, they invested and they said, "You know what? We've already built our own version of this internally," or, "We don't care about this, um, but we think other people will need it, so we'd love to invest." And actually, many of those people have now come around and started using Braintrust too. Uh, so just being very deliberate about, uh, who our target market was. I mean, 50 companies is not a, a huge TAM in some ways, but those companies, uh, are very influential and they've led to many, many more customers now. So I think, I think that was the most important thing.
5. EGElad Gil
  Yeah, it feels like people really misdefine their initial customer envelope or people that they wanna target, and so they either go too broad, you know, or do everything from Fortune 500 to, you know, small startups, and then they're not really building for any specific user, or they
33:00 – 38:28
Future of Braintrust and evals
1. EGElad Gil
  go way too specific, maybe even in a segment that just isn't worth pursuing, and so it's really interesting to see how people think about that.
2. AGAnkur Goyal
  Yeah.
3. EGElad Gil
  Uh, could you tell me a little bit more about what you view as the future of Braintrust? How does it evolve as, like, a product and platform? And then, uh, how does it change as AI changes? Is all eval eventually done by machines or, you know, what, what does the future hold for us?
4. AGAnkur Goyal
  Yeah, I ask myself that question, you know, every month or so, and surprisingly little changes. But, um, you know, Braintrust, we started out by solving the eval problem, and I think we did that really well. And what we realized is that there's actually this whole platform that people want. One of our customers, uh, actually Airtable early on, they used our evals product to do observability. So they literally, uh, would create experiments every day as if they were evals and just dump their logs into, in, um, into those experiments. That's, you know, it's pretty obvious (laughs) when someone starts doing that, that they're trying to do observability in your product, and we dug into why, and it turns out that in AI, the whole point of observability is to collect data, um, into datasets that you can use to do evals and then, and then again eventually fine-tune models or, you know, more advanced things. But still, you know, evals is the, is the most important element there. And th- and the next thing that happened is that, uh, uh, you know, some of our customers said, "Hey, actually, um, I'm already doing, you know, observability and evals and stuff in Braintrust. I'm spending so much time in this product. Why do I have to go back to my IDE, which by the way knows nothing about my evals, it knows nothing about my logs? Um, can I work on prompts in Braintrust? Can I repro what I'm seeing live? Can I save the prompts and then auto-deploy them to my, you know, production environment?" That actually, it scared the crap out of me thinking, you know, just from my, you know, traditional now old school engineering perspective.
5. EGElad Gil
  Mm-hmm.
6. AGAnkur Goyal
  Um, but it's what people wanted and, and, uh, you know, uh, I was talking to Martin, uh, who just became a Braintrust daily active user, um, uh, uh, quite recently, and, um, you know, he spends like half his day now tinkering with prompts in AI Town in Braintrust. And so even like old school engineers (laughs) you know, like us, it- it- it's- it's definitely the right way to do things. And, and I sort of see Braintrust evolving into this kind of hybrid between, you know, uh, i- s- in some ways it's kind of like GitHub, you, um, you know, create prompts, now you can create, uh, more advanced functionality, uh, with Python code and TypeScript code and stitch it together with your prompts in the product all the way through to, you know, evals and observability. And I think we're really excited about building a universal developer platform for, for AI. Um, in terms of quality, having lived through the pre-LLM era, I actually think a lot of the anxieties and predictions about quality are exactly the same as they were pre-LLM. Um, even, you know, when we were doing document processing stuff at Impira, people were like, "Oh, hey, all documents will be perfectly extracted within six months from now." And LLMs, by the way, are amazing, but document processing is still not a totally solved problem, and I think it's because people will take whatever technology they have and push it to its extreme. Um, there are things that people are trying to do today that are past the extreme. Like, Auto-GPT is a great example of something that, um, is a, I think a, a really productive experiment in pushing AI past, um, what it can, you know, reasonably do. Um, but, you know, people are always gonna push things to their extreme. AI is an inherently non-deterministic thing, and so I think evals are, are still gonna be there. Um, it... We might just be evaluating, you know, more and more complex and, and interesting problems.
7. EGElad Gil
  And then what role do you think...... AI will play in evaling itself?
8. AGAnkur Goyal
  I mean, AI already evals itself. So, um, very similar to traditional math, I think, you know, if you're, if you're doing, like, a, a math homework assignment, it's way easier if someone gives you a proof to validate the proof than it is to actually generate a proof in the first place. And sort of the same principle works for LLMs. It's way easier for an LLM, especially a frontier model, to look at the work of, um, uh, you know, itself or another LLM and accurately assess it. Um, and so that, that's already the case. I think probably more than half of the evals that people do in Braintrust are LLM based. Um, I think some of the interesting things that are happening as LLMs are getting better and as GPT-4 quality is getting cheaper is that people are actually starting to do, um, LLM based evals on their logs. So o- one of the really cool things that you can now do in Braintrust is you can write LLM and code-based evaluators and then run them automatically on some fraction of your logs. Sometimes that actually even allows you to evaluate things that you're not allowed to look at. Um, and so the, you know, the LLM is allowed to read PII and, you know, crunch through something and tell you whether, uh, um, you know, your, your use case is working or not, but maybe no developer or, uh, person at the company is. And so I think that is a really interesting unlock and, and probably represents, uh, what people will be doing over at least the next year.
9. EGElad Gil
  Super interesting. Hey, Ankur, thank you so much for joining us today.
10. AGAnkur Goyal
  Thanks for having me.
11. SGSarah Guo
  (techno music) Find us on Twitter @nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way, you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.

Episode duration: 38:28

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode ccOFLML25K8

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome