What It Actually Takes to Deploy a Voice Agent to a Fortune 500

Brooke Hopkins is the founder and CEO of Coval (S24), a simulation and observability platform for voice agents that helps enterprises test, monitor, and evaluate AI-powered phone systems at scale — working with customers like Perplexity and Deepgram to process tens of millions of calls per month — and has just raised a $28.2M Series A. In this fireside, Brooke sat down with Harj Taggar, Managing Partner at YC to talk about how her years building evaluation infrastructure and developer tools at Waymo turned out to be surprisingly transferable to the world of voice agents, why voice is emerging as the first truly productionized use case for autonomous agents, and what it took to go from a broader evals idea to a deeply focused enterprise platform — including the moment a customer offered to pay her before she'd written a single line of code. https://www.coval.dev Apply to Y Combinator: https://www.ycombinator.com/apply Work at a startup: https://www.ycombinator.com/jobs

Harj TaggarhostBrooke Hopkinsguest

Jun 24, 202630mWatch on YouTube ↗

CHAPTERS

Coval’s mission: simulation + observability for voice agents at massive scale
Harj introduces Brooke Hopkins and Coval, framing the company as infrastructure for safely scaling voice agents. Brooke explains Coval’s two-part focus: testing via simulation before deployment and monitoring behavior once agents are live.
Why voice agents are emerging as the killer UI for AI
Brooke argues voice is the first truly productionized autonomous-agent use case. She explains why voice is a natural interface and why it fits environments where traditional software adoption has lagged.
Enterprise adoption: why Fortune 500s can deploy voice faster than other agents
Enterprises are moving quickly because voice workflows already exist (IVR trees, SOPs, call flows). Brooke describes how companies start with customer support and then expand voice agents into other operational and revenue-driving areas.
From replacing support to unlocking better customer experiences (airline example)
Harj and Brooke highlight the positive vision: voice agents can improve service and enable new interactions, not just cut costs. Brooke uses airlines to illustrate how voice can compress complex workflows into simple user intents.
What Coval provides: the missing ‘web infrastructure layer’ for voice
Brooke compares today’s voice stack to early web days—hard to scale, fragile under real-world conditions. Coval aims to make voice apps scalable and legible: performance, compliance risk, and product insights across huge volumes of conversations.
Where voice agents are strong vs. brittle (and what must be evaluated)
Brooke distinguishes agent failure modes from human agents: LLM agents can instantly reflect policy changes, but can also fail in surprising, unacceptable ways. She outlines three major evaluation dimensions teams must track.
Building an enterprise evaluation strategy (and earning trust in results)
Coval collaborates with enterprise customers to define an evaluation strategy and create an improvement flywheel, inspired by self-driving processes. Brooke explains how Coval productizes that expertise into a repeatable platform.
Common misconceptions: why word error rate is overrated
Brooke notes that customers often overweight transcription metrics like word error rate. Real success is intent understanding and completing the workflow, and agents can be confused by conversation structure (e.g., users dumping all info at once).
The next technical unlock: controllability across real-time voice stacks
Brooke predicts controllability—especially for real-time models—will be a major step forward. She maps voice stacks to autonomous systems loops and argues the future is neither fully cascaded nor fully monolithic, but better context sharing among specialized components.
Waymo lessons: why rare edge cases and simulation realism matter
Brooke describes her work at Waymo on dataset infrastructure and developer tools for large-scale simulation. She explains how self-driving’s emphasis on rare, high-impact examples translates to voice agent testing and simulation design.
Finding the wedge: from generic evals idea to voice via customer pull
Brooke recounts an early, broader “evals platform” concept that lacked strong traction. A first voice customer created undeniable pull—offering to pay even before software existed—revealing a painful, expensive problem worth building around.
Why Coval focused on enterprise early (and how to balance upmarket vs. startups)
Brooke explains the strategic choice to go deep with enterprise customers to understand scaling pain and build collaboration-heavy tooling. She also notes the need to keep learning from fast-moving AI startups and highlights founder-market fit with large-org workflows.
How ‘testing’ becomes the main work in agentic software development
Brooke argues AI changes the engineering time allocation: building becomes cheap, while planning, validation, rollout, and continuous monitoring dominate. Coval positions itself as testing/observability infrastructure for agent systems, akin to Datadog/Applied Intuition but for AI agents.
Founder journey, solo-founder rationale, and Coval’s roadmap: agentic evals
Brooke shares why she started a company and why solo founding fit the early phase. Harj explains YC’s criteria for funding solo founders, and Brooke closes with product priorities: agentic evaluation systems that help diagnose issues and drive self-improvement across many enterprises.