No Priors Ep. 85 | CEO of Braintrust Ankur Goyal

Today on No Priors, Elad is joined by Ankur Goyal, founder and CEO of Braintrust. Braintrust enables companies like Notion, Airtable, Instacart, Zapier, and Vercel to deploy AI solutions at scale by efficiently evaluating and managing complex, non-deterministic AI applications. Ankur shares his insights into emerging trends in the use of AI tooling and coding languages, the rise of open-source, and the future of data infrastructure. Ankur also reflects on building resilient AI products, his philosophy on coding as a CEO, and the importance of a startup’s initial customer base. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Ankrgyl Show Notes: 0:00 Introduction 0:38 Ankur’s path to Braintrust 3:05 Braintrust’s solution 5:46 AI tooling trends 7:58 Instruction tuning vs. fine-tuning 8:57 Open-source AI adoption 10:42 Future of data infrastructure and synthetic data 14:45 Designing technical interviews 18:04 Rethinking agent-based approaches 19:34 Building out an AI team 23:35 Typescript as the language of AI 25:12 The shift away from using frameworks 26:02 Vendor consolidation among enterprises 27:16 Coding as a CEO 30:16 Collaborating with customers 33:00 Future of Braintrust and evals

Elad GilhostAnkur GoyalguestSarah Guohost

Oct 7, 202438mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Braintrust CEO Explains Enterprise AI Evals, Data, Teams, and Tooling

Ankur Goyal, CEO of Braintrust, describes how the company evolved from internal tooling into an enterprise platform for AI evals, observability, and prompt development, now used by leading AI-forward companies like Notion, Airtable, and Zapier.
He details what enterprises are actually doing with LLMs—heavy use of RAG, far less fine-tuning than expected, cautious experiments with agents, and very limited production use of open-source models so far.
The conversation covers how AI is reshaping data infrastructure (from warehouses and SQL to embeddings-based workflows), engineering stacks (TypeScript over Python, fewer AI-specific frameworks), and organizational structures (product-engineer-led AI platform teams).
Goyal also shares startup lessons on hiring, customer-obsessed execution, vendor consolidation, and consciously architecting Braintrust—and his own CEO role—around deep, ongoing involvement in coding and product craftsmanship.

IDEAS WORTH REMEMBERING

5 ideas

Evals are a hard but critical bottleneck for serious AI products.

Superficially, evals look like a simple loop over prompts and outputs, but in production—especially with complex systems and agents—companies need fast, consistent, reusable evaluation workflows to iterate and improve quality reliably.

RAG is mainstream; fine-tuning is niche and often unnecessary.

Roughly half of Braintrust customers’ production use cases involve RAG, while most have moved away from fine-tuning toward instruction-tuned frontier models because fine-tuning is slower, riskier, and harder to get right for many workloads.

Enterprises still favor proprietary frontier models over open source in production.

Despite strong developer interest in open source, Braintrust sees limited production adoption; OpenAI and Anthropic via AWS Bedrock dominate because they offer better UX, faster iteration, and strong ROI, which matter more than raw per-token cost.

Data infrastructure for AI is shifting from warehouses and SQL to embeddings and LLM-based querying.

Traditional data warehouses optimized for structured data and ad hoc SQL don’t fit AI workloads; advanced teams use embeddings and models to mine logs, discover underrepresented cases, and construct better eval and training datasets.

Free-form agents are being dialed back in favor of deterministic control flow with pervasive LLM calls.

Early adopters went deep on ‘fully autonomous’ agents but hit uncontrollable error rates and compounding failures, so they’re returning to architectures where code handles control flow and LLMs are invoked at many well-defined points.

WORDS WORTH SAVING

5 quotes

Evals really sound easy—‘oh, it’s just a for loop’—but it is actually a pretty hard problem to do evals well.

— Ankur Goyal

Almost, if not all of our customers, have moved off of fine-tuned models onto instruction-tuned models, and are seeing really good performance.

— Ankur Goyal

A data warehouse is really designed for ad hoc exploration on structured data, which is… neither of those two things is relevant in AI land.

— Ankur Goyal

TypeScript is the language of AI and Python is the language of machine learning.

— Ankur Goyal

People are always gonna push things to their extreme. AI is an inherently non-deterministic thing, and so I think evals are still gonna be there.

— Ankur Goyal

Origin and evolution of Braintrust as an AI evals and observability platformReal-world enterprise AI adoption: RAG, fine-tuning, agents, and model choicesShift in data infrastructure from traditional warehouses to embeddings and LLM-centric workflowsChanging roles of ML, data science, and AI platform teams inside companiesTypeScript vs. Python, frameworks, and emerging AI engineering tooling patternsStartup lessons: hiring standards, customer selection, interrupt-driven product iterationFuture of Braintrust, AI evals, and LLMs evaluating their own performance

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.