
Why Vertical LLM Agents Are The New $1 Billion SaaS Opportunities
Jake Heller (guest), Garry Tan (host), Jared Friedman (host), Diana Hu (host)
In this episode of Y Combinator, featuring Jake Heller and Garry Tan, Why Vertical LLM Agents Are The New $1 Billion SaaS Opportunities explores legal AI Pioneer Reveals Blueprint For Billion-Dollar Vertical Agents The episode features Jake Heller, founder of Casetext, detailing how his decade-old legal tech company pivoted almost overnight to build CoCounsel, a GPT-4-powered legal AI assistant that led to a $650M acquisition by Thomson Reuters.
Legal AI Pioneer Reveals Blueprint For Billion-Dollar Vertical Agents
The episode features Jake Heller, founder of Casetext, detailing how his decade-old legal tech company pivoted almost overnight to build CoCounsel, a GPT-4-powered legal AI assistant that led to a $650M acquisition by Thomson Reuters.
He explains the long pre-LLM slog of incremental improvements in legal research, and how early access to GPT-4 transformed their product from marginally better tooling into a fundamental workflow change that lawyers could no longer ignore.
Jake outlines how they redeployed 120 employees in 48 hours, used test-driven prompt engineering, and deeply modeled expert legal workflows to move from flashy demos (70% reliability) to mission-critical performance that lawyers trust.
The discussion generalizes Casetext’s lessons to the broader opportunity for vertical AI agents, the importance of domain-specific integrations and evals, and how newer reasoning models like OpenAI o1 may further enhance agentic workflows.
Key Takeaways
Early, decisive pivots around transformative tech can create outsized outcomes.
Within 48 hours of seeing GPT-4, Casetext shifted all 120 employees to build CoCounsel; that bet turned a solid, $100M-valued company into a $650M acquisition in months.
Get the full analysis with uListen AI
Vertical AI agents win by deeply modeling expert workflows, not by wrapping APIs.
Casetext decomposed real legal tasks into many discrete steps that mirror how top attorneys work (search, read, annotate, outline, draft), then turned each step into carefully tested prompts and logic.
Get the full analysis with uListen AI
Moving from 70% to near-100% reliability requires rigorous evals and TDD.
They built hundreds to thousands of tests per prompt, practicing test-driven development for prompting to systematically reduce hallucinations and catch regressions instead of relying on “vibes-based” prompt tweaks.
Get the full analysis with uListen AI
Mission-critical adoption hinges on trust; one bad experience can kill usage.
Because lawyers are conservative and extremely sensitive to errors, Casetext optimized first-week experience and accuracy, knowing that early mistakes would cause users to disengage for a long time.
Get the full analysis with uListen AI
Real IP lives in data, integrations, and business logic around the model.
Beyond LLM access, Casetext’s value came from proprietary legal datasets, specialized document-management integrations, robust OCR and preprocessing, and complex orchestration logic—making it hard to copy.
Get the full analysis with uListen AI
Market readiness can flip overnight when a technology feels non-incremental.
For years, lawyers ignored incremental tools, but ChatGPT made them perceive AI as inevitable, turning previously resistant $5M-a-year partners into proactive buyers who wanted to be ahead, not behind.
Get the full analysis with uListen AI
Next-gen reasoning models enable new prompting strategies around “how to think.”
With models like OpenAI o1, Jake is exploring prompts that not only show examples of good answers but also explicitly teach the model domain-specific thinking processes, potentially compounding performance gains.
Get the full analysis with uListen AI
Notable Quotes
“It took maybe 48 hours for us to decide to take every single person at the company and shift what they were working on to 100% building this new product based on GPT-4.”
— Jake Heller
“Until the very end, until CoCounsel, a lot of what we did were, relatively speaking, incremental improvements on the legal workflow—and when there's just an incremental improvement, it's actually pretty easy to ignore.”
— Jake Heller
“By the time you’ve dealt with all of the edge cases—before you even hit the large language model—there might be dozens of things you’ve built into your application to actually make it work and work well.”
— Jake Heller
“People will pay $20 a month for the 70%, and maybe $500 or $1,000 a month for something that actually works, depending on the use case.”
— Jake Heller
“If we’re an example of anything, it’s that there’s a path and you can do it—the jobs aren’t going to go away, they’ll just be more interesting.”
— Jake Heller
Questions Answered in This Episode
What concrete steps should a founder in a different vertical take to replicate Casetext’s test-driven approach to building reliable agents?
The episode features Jake Heller, founder of Casetext, detailing how his decade-old legal tech company pivoted almost overnight to build CoCounsel, a GPT-4-powered legal AI assistant that led to a $650M acquisition by Thomson Reuters.
Get the full analysis with uListen AI
How do you decide which parts of a complex domain workflow to automate first versus leaving to humans in the early product versions?
He explains the long pre-LLM slog of incremental improvements in legal research, and how early access to GPT-4 transformed their product from marginally better tooling into a fundamental workflow change that lawyers could no longer ignore.
Get the full analysis with uListen AI
Where is the tipping point between using a general model like GPT-4 or o1 and needing domain-specific fine-tuning or custom models?
Jake outlines how they redeployed 120 employees in 48 hours, used test-driven prompt engineering, and deeply modeled expert legal workflows to move from flashy demos (70% reliability) to mission-critical performance that lawyers trust.
Get the full analysis with uListen AI
How should startups balance speed of iteration with the level of rigor in evals when their use case is not obviously mission critical yet?
The discussion generalizes Casetext’s lessons to the broader opportunity for vertical AI agents, the importance of domain-specific integrations and evals, and how newer reasoning models like OpenAI o1 may further enhance agentic workflows.
Get the full analysis with uListen AI
In a world where many vertical agents are being built, what durable moats—beyond integrations and data—will actually protect billion-dollar SaaS agents over the next decade?
Get the full analysis with uListen AI
Transcript Preview
This is our first ever experience talking to this godlike feeling, you know, AI that was all of a sudden doing these tasks that would take me, when I practiced, like a whole day, and it's being done in a minute and a half. The whole company, all 120 of us did not sleep for those, you know, months before GPT-4. We felt like we had this amazing opportunity to run far ahead of the market.
That's why you're the first man on the moon.
Yeah. (laughs)
Welcome back to another episode of The Light Cone. I'm Gary. This is Jared and Diana. Harj is out, but he'll be back on the next one. And today, we have a very special guest, Jake Heller of Casetext. I think of Jake as a little bit like one of the first people on the surface of the moon. He created, uh, Casetext more than, I think, 11, 12 years ago, actually. And in the first 10 years, you went from zero to $100 million valuation, and then in a matter of two months after the release of GPT-4, that valuation went to a liquid exit to Thomson Reuters for $650 million. So you have a lot of lessons about how to create real value from really, like large language models. I think you were of, um, you know, our friends in YC, one of the first people to actually realize this is a sea change and revolution. And not only that, we're gonna bet the company on it, and you were super right. So welcome, Jake.
Happy to be here.
One of the cool things I think about Jake's story and reason why we wanted to bring him on today is that if you just look at the companies that good founders are starting now, it's a lot of vertical AI agents. I mean, I was trying to count the ones in S24. We have l- literally dozens of the YC companies in the last batch were building vertical-specific AI agents, and I think Jake is the founder who is currently running the most successful vertical AI agent. It's by far the largest acquisition, and it's actually deployed at scale in a lot of mission-critical situations. And the inspiration for this was, uh, we hosted this retreat a few months ago, and Jake gave an incredible talk about how he built it, and we thought that it'd be super useful for people who watched The Light Cone who are interested in this area to hear directly from one of the most successful builders in this area, how he did it. So how did you do it?
(laughs) Well, first of all, like, like a lot of these things, um, there's a certain amount of luck. Over the course of our decade-long, uh, journey, we started investing very deeply in AI, uh, and natural language processing, and we, we became close with a number of different research labs, uh, including some of the folks at OpenAI. And when it came time for them to start testing early versions, uh, we didn't realize it was GPT-4 at the time, but what was, what was GPT-4, we got a very early kind of like view of it. And so, you know, months before the public release of GPT-4, you know, we, as a company, were all under NDA, all working on this thing. And I- I'll never forget the first time I saw it, it took maybe 48 hours for us to decide to take every single person at the company and shift what they're working on from w- the projects we were then working on at the time to 100% of the company all working on building this new product we call CoCounsel based on the GPT-4 technology.
Install uListen to search the full transcript and get AI-powered insights
Get Full TranscriptGet more from every podcast
AI summaries, searchable transcripts, and fact-checking. Free forever.
Add to Chrome