No Priors Ep. 124 | With SurgeAI Founder and CEO Edwin Chen

No PriorsJul 24, 202532m

Sarah Guo (host), Edwin Chen (guest), Elad Gil (host)

SurgeAI’s origin, bootstrapped growth, and billion-dollar revenue scaleDefinition and importance of truly high-quality human dataHuman–AI collaboration and scalable oversight for data generationDesign and complexity of rich RL environments and reward modelsLimits of synthetic data and enduring need for human feedbackProblems with benchmarks, leaderboards, and “vibe-based” evaluationsFuture landscape of frontier models and SurgeAI’s research ambitions

In this episode of No Priors, featuring Sarah Guo and Edwin Chen, No Priors Ep. 124 | With SurgeAI Founder and CEO Edwin Chen explores bootstrapped Data Giant SurgeAI Redefines Quality Human Input For AI SurgeAI founder and CEO Edwin Chen explains how his bootstrapped, 100-person company quietly built a billion-dollar business supplying high-quality human data to top frontier labs like Google, OpenAI, and Anthropic. He argues that most data vendors are “body shops” and that the real differentiator is deep, technology-driven measurement of quality and scalable human–AI collaboration. The conversation covers why synthetic data is overrated, why human evaluation remains the gold standard, and how rich RL environments with no upper bound on realism will shape the next wave of AI training. Chen also critiques misaligned benchmarks and fundraising culture, while predicting a diverse ecosystem of differentiated frontier models rather than a single commodity AI.

Bootstrapped Data Giant SurgeAI Redefines Quality Human Input For AI

SurgeAI founder and CEO Edwin Chen explains how his bootstrapped, 100-person company quietly built a billion-dollar business supplying high-quality human data to top frontier labs like Google, OpenAI, and Anthropic. He argues that most data vendors are “body shops” and that the real differentiator is deep, technology-driven measurement of quality and scalable human–AI collaboration. The conversation covers why synthetic data is overrated, why human evaluation remains the gold standard, and how rich RL environments with no upper bound on realism will shape the next wave of AI training. Chen also critiques misaligned benchmarks and fundraising culture, while predicting a diverse ecosystem of differentiated frontier models rather than a single commodity AI.

Key Takeaways

High-quality data goes far beyond box-ticking and basic compliance.

Chen argues that most vendors optimize for simple checks (did it follow instructions, have eight lines, mention 'moon') instead of depth, creativity, and expert-level work, leading to commodity, mediocre training data that caps model potential.

Get the full analysis with uListen AI

Technology-driven quality measurement is essential for human data at scale.

SurgeAI treats the problem like search ranking: they collect extensive annotator and task signals and apply ML to evaluate and weight contributions, rather than simply providing ‘warm bodies’ without any real quality instrumentation.

Get the full analysis with uListen AI

Human–AI collaboration (scalable oversight) outperforms either alone.

For complex tasks like story writing, humans increasingly start from model drafts and then substantially edit or reshape them, reserving human effort for creative, high-leverage changes while offloading rote structure to models.

Get the full analysis with uListen AI

Rich RL environments have effectively no ceiling on useful complexity.

Training agents in realistic, end-to-end job simulations (e. ...

Get the full analysis with uListen AI

Synthetic data is powerful but easily misused and often low value.

Many customers generate tens of millions of synthetic examples only to discard ~95% as unhelpful; Chen sees a few thousand highly curated human examples as frequently more impactful than orders of magnitude more synthetic data.

Get the full analysis with uListen AI

Human evaluation remains the gold standard for model assessment.

Frontier labs rely on careful human evals that check factuality, instruction following, and writing quality; Chen warns that quick, “vibe-based” pairwise ratings and SAT-style benchmarks effectively train models toward clickbait-like behavior.

Get the full analysis with uListen AI

Fundraising and early hiring are often misprioritized in startups.

Chen criticizes founders who raise money for social validation and over-hire roles like PMs and data scientists early; he advocates building product directly, staying lean, and only raising if real financial constraints emerge.

Get the full analysis with uListen AI

Notable Quotes

“We’re kind of like the biggest human data player in this space, and we hit over a billion in revenue last year with a little over 100 people.”
— Edwin Chen

“A lot of other companies in this space are essentially just body shops. What they are delivering is not data; they are literally just delivering warm bodies.”
— Edwin Chen

“High-quality data actually really embraces human intelligence and creativity. Otherwise, you’re basically just scaling up mediocrity.”
— Edwin Chen

“The alternative that all the frontier labs view as the gold standard is basically human evaluation… if you don’t do this, you’re basically training your models on the analog of clickbait.”
— Edwin Chen

“I think there’s almost an unlimited ceiling on the richness of RL environments. The more richness you have, the more the models can learn from.”
— Edwin Chen

Questions Answered in This Episode

How can smaller AI teams practically implement the kind of rigorous, technology-driven quality measurement for human data that SurgeAI describes?

Get the full analysis with uListen AI

Where is the line between useful synthetic data and noise, and how can practitioners design processes to reliably curate the valuable subset?

Get the full analysis with uListen AI

What would a better, widely accepted public benchmark ecosystem look like if we moved away from LMSYS-style leaderboards and SAT-like tests?

Get the full analysis with uListen AI

How should organizations prioritize investments between human evaluation, synthetic data generation, and complex RL environments as budgets and models scale?

Get the full analysis with uListen AI

In a future with many differentiated frontier models, how might application builders systematically choose and combine models for different domains and personalities?

Get the full analysis with uListen AI

Transcript Preview

Sarah Guo

(instrumental music plays) Hi, listeners. Welcome back to No Priors. Today, Elade and I are here with Edwin Chen, the founder and CEO of Surge, the bootstrapped human data startup that surpassed a billion in revenue last year, and serves top-tier clients like Google, OpenAI, and Anthropic. We talk about what high-quality human data means, the role of humans as models become superhuman, benchmark hacking, why he believes in a diversity of frontier models, the Scale Meta not-MNA deal, and why there's no ceiling on environment quality for RL, or the simulated worlds that labs want to train agents in. Edwin, thanks for joining us.

Edwin Chen

Great, great seeing you guys today.

Sarah Guo

Surge has been really under the radar until just about now, um, can you give us a little bit of, uh, color on sort of the scale of the company and what the original founding thesis was?

Edwin Chen

So we hit over a billion in revenue last year. We are kinda like the biggest human data player in this space, and we're about 100, a little over 100 people. And our original thesis was... We just really believed in the power of human data to advance AI, and we just had this really big focus from the start of making sure that we g- we had the highest quality data possible.

Elad Gil

Can you give people context for how long you've been around, how you got going, et cetera? I think, uh, again, you all have accomplished an enormous amount in a short period of time, and I think, you know, you've been very quiet about some of the things you've been doing, so it'd be great to just get a little bit of history and, you know, when you started, how you got started, and how long you've been around.

Edwin Chen

Uh, yeah, so we've been around for five years. I think we just hit our five-year anniversary, so we started in 2020. So before that... So I can give someone a context. So before that I used to work at Google, Facebook and Twitter, and one of the... Like, basically the reason we started Surge was... I just used to work on ML at a bunch of these big companies, and just the problem I kept running into over and over again was that it really was impossible getting the data that we needed to train our models. So it was just this big blocker that we faced over and over again, and there was just, like, so much more that we wanted to do. Like, uh, even just the basic things that we want to do, we struggled so hard to get the data. It was really just the- the big blocker. But then simultaneously, there were all these more futuristic things that we wanted to build. Like, if we thought of the next generation AI systems, if we could barely get the data that we needed at the time to solve, like just building a simple sentence analysis classifier, we, if we could barely do that, then (laughs) like, how would we ever advance, advance beyond that? So that, that really was the, the biggest problem. I can go into more of that, but that, that, that was essentially what we faced.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome