What It Actually Takes to Deploy a Voice Agent to a Fortune 500

Brooke Hopkins is the founder and CEO of Coval (S24), a simulation and observability platform for voice agents that helps enterprises test, monitor, and evaluate AI-powered phone systems at scale — working with customers like Perplexity and Deepgram to process tens of millions of calls per month — and has just raised a $28.2M Series A. In this fireside, Brooke sat down with Harj Taggar, Managing Partner at YC to talk about how her years building evaluation infrastructure and developer tools at Waymo turned out to be surprisingly transferable to the world of voice agents, why voice is emerging as the first truly productionized use case for autonomous agents, and what it took to go from a broader evals idea to a deeply focused enterprise platform — including the moment a customer offered to pay her before she'd written a single line of code. https://www.coval.dev Apply to Y Combinator: https://www.ycombinator.com/apply Work at a startup: https://www.ycombinator.com/jobs

Harj TaggarhostBrooke Hopkinsguest

Jun 24, 202630mWatch on YouTube ↗

EVERY SPOKEN WORD

35 min read · 6,589 words

HTHarj Taggar
[upbeat music] Today I'm thrilled to be joined by Brooke Hopkins, the founder and CEO of Coval. Coval is a simulation and evaluation platform for voice agents. They work with customers like Perplexity and Deepgram to monitor and evaluate tens of millions of customer calls per month. Today they're announcing a $28.2 million Series A round led by Norwest with participation from Base10, MAAC Ventures, and YC. Thanks so much for being here, Brooke.
BHBrooke Hopkins
Thanks so much for having me. Super excited.
HTHarj Taggar
Could you just give us, like, a quick recap of Coval and kind of what exactly does Coval do?
BHBrooke Hopkins
Yeah. Coval is a simulation and observability platform for voice agents, so we help you to scale your voice agents over millions of conversations so that you don't have to test your voice agents with real customers in production, and then also when you deploy your agents to production, you know what's happening out there in the wild. And so my background is from Waymo. I led their evaluation infrastructure team at Waymo, and my team was responsible for all of our developer tools for launching and running simulations. And now we're taking all those learnings from robotics, which is actually surprisingly similar to voice agents, and how do you make sure that an agent is getting from point A to point B and all the possible paths in between. We're testing and making sure that those work before production and then once it's live.
HTHarj Taggar
Yeah, your unique, like, background and how good of a fit working on simulation at Waymo turned out to be for what, um, deploying voice agents in production was one of the big reasons we funded you. Um, I definitely wanna come back to that, but let's start with just, um, voice agents in general. It seems like voice has really taken off as the, um, killer UI and the killer app for AI. Um, why is that? What do you think is going on?
BHBrooke Hopkins
Yeah. I think what's so exciting about voice is, A, it's the first productionized use case for autonomous agents. It was the first use case where autonomous agents were acting on behalf of users, on behalf of companies, doing things autonomously, uh, in order to get to some objective. But then on top of that, I think voice is going to be the interface for AI. It's the way that you-- It's the most natural way to interface with a text box or interface with a headless agent, and it also meets people where they are in all sorts of different environments. So I think we're seeing voice agents particularly successful with logistics, with healthcare, with all these places where previously software systems weren't as present because voice allows you to have n- like, a single side opting o- into, uh, automating a certain behavior. So, for example, a small doctor's office can still have someone on the phone, and then a, a larger enterprise can then go in and r- automate some process that they're doing in order to talk to thousands of disparate, uh, customers.
HTHarj Taggar
Cool. Um, you have particularly, like, front row seat unique insight into how these agents are being used by enterprises, like in, in production, in real use cases. Um, what's changed over the last 12 months in the enterprise world? Like, it seems like enterprises are especially quick to adopt voice agents, and so I'm just curious, like, has some- what-- has anything made it easier for them to deploy these things, or has anything changed about their attitudes or belief in the technology?
BHBrooke Hopkins
We're seeing enterprises deploy voice agents at massive scale and more rapidly than any other type of agent because there's a lot more infrastructure already in place for voice. So, for example, standard operating procedures for customer service. You have IVR trees or call flows that already exist. And so the leap from a call to an autonomous agent isn't quite as vast as, say, financial services agent that's, you know, making financial decisions on your behalf. And then what we're seeing with enterprises, and particularly the enterprises that we work with, is that they might start with customer support flows-
HTHarj Taggar
Yeah
BHBrooke Hopkins
... and then move into-- realize there are all these other places in their enterprise that could benefit from autonomous voice systems. So, for example, a concierge to help their users discover more products or meeting them where they are instead of having to go through an application or driving usage, driving adoption, um, automating logistics and back office work, all sorts of things that maybe they wouldn't have otherwise focused on, but now that they have all the voice infrastructure in place, they're able to identify all these different areas.
HTHarj Taggar
Yeah.
BHBrooke Hopkins
And so I think we're going to see something very similar to what happened with web and mobile, will happen with voice, where people start by just putting, like, a piece of paper on the web, and that's HTML, and then you create web experiences. Or people start by just putting a website on a phone and then realize that you can create mobile apps. And we're gonna see the same thing happen with voice, where people start with things that are already happening on a phone, like customer support or, um, logistics, and they're going to branch into all of these novel voice experiences that are much more AI native.
HTHarj Taggar
Yeah. That's a really interesting way of looking at it because I think there's a lot of focus obviously on voice agents as replacing customer support and replacing human labor, not as much focus on the o- the, the more positive vision maybe of, hey, like, actually you can ac- use AI to, um, sell more of your product or help people find more of what they actually want, um, which will just, like, increase the need for goods and products and services to meet that demand.
BHBrooke Hopkins
Totally. I think this is going to be true, like [clears throat] even take airlines as an example. I think in two years or one year from now, it's going to be unacceptable to call an airline and be on hold for 20 minutes. But imagine if instead you could call your airline on your way to the airport and say, like, "Are there any flights that are, you know, 20 minutes earlier? I made it to the airport earlier."
HTHarj Taggar
Yep.
BHBrooke Hopkins
And previously, you would have to go on the web portal, you have to go to the check-in desk, and there's going to be a lot easier ways to interface with really complex information and the complex set of decision-making that is just, um, distilled into, "This is what I'm trying to get to. How can I get there?"
HTHarj Taggar
Can you maybe talk us through just how does the infrastructure Coval provides your customers help them kind of go from simple use cases where there's sort of essentially just, you know, there's well-documented maybe support flows to things that are a little bit, like, harder or more interesting to do?
BHBrooke Hopkins
Yeah. We're trying to provide the infrastructure that allows anyone to scale voice applications. So think about what happened with web infrastructure. It used to be very hard to build a distributed web application. A DDoS attack could bring down any web app, um, you know, high spikes. I even remember websites going down, um, not that long ago, and now with serverless and all sorts of web infrastructure, it's really easy to build a s- really scalable site. And we're missing a lot of this infrastructure for voice. Today, it's still really hard to build voice applications, and we're trying to make it so that any enterprise can build a voice application that scales to millions of users, and they understand what's happening in all those conversations, where are things going wrong, where are the compliance risk problems, but also all of this product information. Like you have customers at your fingertips, and enterprises are always trying to hear from customers, understand the customer journey, opportunities for upsell, opportunities for product adoption. And so we can provide a lens into all of those customer interactions. What Coval is trying to do is how you provide that infrastructure so that any enterprise out there can build a voice application.
HTHarj Taggar
So when you're working with your customers to help them deploy these voice agents, what are some of the things that voice agents are kind of naturally good at doing out of the box, and where are the areas that they're more brittle and you have to provide more infrastructure?
BHBrooke Hopkins
Yeah, definitely. I think the interesting thing about voice agents versus, say, like a customer support agent is that voice agents fail in totally different ways than a customer support agent might fail. So a customer support agent might struggle to keep up with new product changes. So products are constantly evolving in enterprises, and so maybe a product was deprecated or changed or the policy changed, and that is something that agents are exceptional at. The moment you change that policy, then it deploys to tens of millions of conversations. The things that agents are-- struggle more with is that they might trip up in more egregious ways that an human agent might not. So for example, it might say just the completely wrong thing, or it might, uh, have a vocal hallucination. Famously, voice agents will accidentally scream, or they'll start to whisper, or they'll change voices halfway through. So the voice agents never, uh, cease to be really funny, but at the same time, if you're on the other side of that, that is, you know... A customer support agent would never accidentally scream during the conversation. [chuckles] And so there's all these things that maybe you didn't have QA for before, and now you have to. And I think also QA used to be somewhat optional as like a nice to have, but ultimately you kind of just assume that things are roughly working. But with voice agents, that's not true because you have the capability for them to access all sorts of systems. There's all sorts of compliance problems and security controls that you have to have in place. And then on top of that, we now have the ability to understand wha- vast amounts of data that's unstructured in a way that we never could before. So even if you continue to have human agents handling all these calls, there's still so much potential for being able to do things with that information. And maybe more specifically to your question is things that people test for are threefold. Did the agent do what it was supposed to? Did it take the right steps to get there, like workflows, the right-- call the right tool calls? And then also audio quality things like, um, background noise, interruptions, latency, all of these things that make voices sound really natural.
HTHarj Taggar
I was thinking, how do you set a customer up with that? Like, do you go in and tell them, "Here's all the criteria we're gonna evaluate, um, against," and sort of bring that expertise, or do you work with them to figure out what are the things you should be evaluating on and, um, and how do you help them build trust in your evaluations?
BHBrooke Hopkins
Yeah. For all of our enterprise customers, we work with them to create an evaluation strategy. I think this is one of the things that self-driving car companies did really well, is how do you take a system and make sure that there are processes in place that make you get better and better over time and create this flywheel. Having seen this across lots of different self-driving car companies as well as hundreds of different voice systems, we work with enterprises to help them set up a scalable evaluation system. But then I think we really think about the Coval platform as how do we scale our expertise now that we've-- we're very early in voice. I think when we started in voice, it was really just a few YC companies that were building in voice AI, and it seems kind of niche. And now every enterprise, every Fortune 500 out there, we're talking to figuring out how we can help them scale their voice AI.
HTHarj Taggar
And now that you've done this with real enterprises a bunch of times, um, what are some of the most common-- When you're sort of going setting up your infrastructure for the first time, what are some of the biggest misconceptions your customers have? Like, are there things they think they overvalue or they think, um, we really need to evaluate against this, um, uh, that turns out not to be so important? Or just what are the things you find yourself repeating a lot?
BHBrooke Hopkins
I think people think word error rate or transcription is more important than it actually is because really you can have a full conversation and miss lots of words and still understand what that person is saying. Like, if you've ever been on a Zoom call, you know [chuckles] that's, that's true. But really it's about how-- like did you understand the intent of the conversation, and did you get to the final step? The other part that's really hard for agents is maybe that's different than what was hard for human agents. So for example, um, starting over a conversation or saying all the information to begin with in the conversation can confuse agents because they might have a multi-step workflow.
HTHarj Taggar
From where we are now, like the next sort of step up in, um, performance and capability from like the voice agents in particular, what, what's gonna be the big unlock there, do you think? Is, is it better models, less, um, lower latency, um, better text to speech, speech to text? Like what, what do you think is gonna happen?
BHBrooke Hopkins
Yeah, I think controllability for real-time models. So today the way most voice AI applications work is with cascading architecture. So you have a speech to text which transcribes the conversation in LLM and then a text to speech which then says that thing out loud. This is actually what makes, uh, voice so similar to self-driving cars, is that you have perception, what's happening in the world around me; reasoning, what should-- planning, h- what should I do next; and then controls, how should I actually take that action, which maps very similarly to transcription, reasoning, and voice. And so you have this loop where you're kind of perceiving what's happening in the world around you, reasoning about what to do next, and then actually taking that action. And so autonomous systems are all very similar in this way, where they have this reasoning loop. And I think a lot of the advancements that came from self-driving were being able to bridge these different models by passing in different embeddings and different contexts, while also being able to provide expertise and focus for different models. And so in self-driving, there was actually a similar pattern of condensing and then specializing models, and I think we're seeing that with voice AI, where a single model is not going to solve everything, but also keeping all of the models separate is not going to solve everything. So we have to find some way to be able to make each of the steps be able to share embeddings, share context, while also making sure that they're specialized and good at the part that they're trying to do.
HTHarj Taggar
You mentioned a couple of times the similarities to self-driving cars, and obviously, um, you worked at Waymo for a while. Can you tell us a little bit about the work that you did at Waymo? Um, so when was that, and when did you first start seeing the parallels between sort of the simulation work there to the infrastructure you're building today?
BHBrooke Hopkins
Yeah. So when I started at Waymo, I was building out our dataset infrastructure. So how do you-- One of the really important things about self-driving cars that was pretty different from other ML systems at the time was that you cared a lot about specific examples and less about kind of general performance across the dataset as a whole. Because y- like, for example, a kid in a Halloween costume is a super important example, and it doesn't matter if that's only one in a hundred million miles.
HTHarj Taggar
Yeah.
BHBrooke Hopkins
That made dataset creation very different, where it was how do you find these very specific examples so that you can make sure that your dataset is representative of what you're trying to, uh, improve in your systems. And then I went on to lead a t- our team that was responsible for all of our developer tools. So how do you take a dataset, combine it with some configuration, and then be able to run that on distributed compute? And when I left Waymo, I realized that a lot of what was happening in AI sounded very similar to the problems I had talked about and solved a lot at Waymo-

Episode duration: 30:46

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode eSm_9tb5ZbY

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome