Skip to content
No PriorsNo Priors

No Priors Ep. 124 | With SurgeAI Founder and CEO Edwin Chen

In the generative AI revolution, quality data is a valuable commodity. But not all data is created equally. Sarah Guo and Elad Gil sit down with SurgeAI founder and CEO Edwin Chen to discuss the meaning and importance of quality human data. Edwin talks about why he bootstrapped Surge instead of raising venture funds, the importance of scalable oversight in producing quality data, and the work Surge is doing to standardize human evals. Plus, we get Edwin’s take on what Meta’s investment into Scale AI means for Surge, as well as whether or not he thinks an underdog can catch up with OpenAI, Anthropic, and other dominant industry players. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @echen | @HelloSurgeAI Chapters: 00:00 – Edwin Chen Introduction 00:41 – Overview of SurgeAI 02:28 – Why SurgeAI Bootstrapped Instead of Raising Funds 07:59 – Explaining SurgeAI’s Product 09:39 – Differentiating SurgeAI from Competitors 11:27 – Measuring the Quality of SurgeAI’s Output 12:25 – Role of Scalable Oversight at SurgeAI 14:02 – Challenges of Building Rich RL Environments 16:39 – Predicting Future Needs for Training AI Models 17:29 – Role of Humans in Data Generation 21:27 – Importance of Human Evaluation for Quality Data 22:51 – SurgeAI’s Work Toward Standardization of Human Evals 23:37 – What the Meta/ScaleAI Deal Means for SurgeAI 24:35 – Edwin’s Underdog Pick to Catch Up to Big AI Companies 24:50 – The Future Frontier Model Landscape 26:25 – Future Directions for SurgeAI 29:29 – What Does High Quality Data Mean? 32:26 – Conclusion

Sarah GuohostEdwin ChenguestElad Gilhost
Jul 23, 202532mWatch on YouTube ↗

At a glance

WHAT IT’S REALLY ABOUT

Bootstrapped Data Giant SurgeAI Redefines Quality Human Input For AI

  1. SurgeAI founder and CEO Edwin Chen explains how his bootstrapped, 100-person company quietly built a billion-dollar business supplying high-quality human data to top frontier labs like Google, OpenAI, and Anthropic. He argues that most data vendors are “body shops” and that the real differentiator is deep, technology-driven measurement of quality and scalable human–AI collaboration. The conversation covers why synthetic data is overrated, why human evaluation remains the gold standard, and how rich RL environments with no upper bound on realism will shape the next wave of AI training. Chen also critiques misaligned benchmarks and fundraising culture, while predicting a diverse ecosystem of differentiated frontier models rather than a single commodity AI.

IDEAS WORTH REMEMBERING

5 ideas

High-quality data goes far beyond box-ticking and basic compliance.

Chen argues that most vendors optimize for simple checks (did it follow instructions, have eight lines, mention 'moon') instead of depth, creativity, and expert-level work, leading to commodity, mediocre training data that caps model potential.

Technology-driven quality measurement is essential for human data at scale.

SurgeAI treats the problem like search ranking: they collect extensive annotator and task signals and apply ML to evaluate and weight contributions, rather than simply providing ‘warm bodies’ without any real quality instrumentation.

Human–AI collaboration (scalable oversight) outperforms either alone.

For complex tasks like story writing, humans increasingly start from model drafts and then substantially edit or reshape them, reserving human effort for creative, high-leverage changes while offloading rote structure to models.

Rich RL environments have effectively no ceiling on useful complexity.

Training agents in realistic, end-to-end job simulations (e.g., a salesperson’s full digital and real-world workflow) requires massive, coherent, time-evolving environments; Chen believes more realism and diversity directly translate into greater learning.

Synthetic data is powerful but easily misused and often low value.

Many customers generate tens of millions of synthetic examples only to discard ~95% as unhelpful; Chen sees a few thousand highly curated human examples as frequently more impactful than orders of magnitude more synthetic data.

WORDS WORTH SAVING

5 quotes

We’re kind of like the biggest human data player in this space, and we hit over a billion in revenue last year with a little over 100 people.

Edwin Chen

A lot of other companies in this space are essentially just body shops. What they are delivering is not data; they are literally just delivering warm bodies.

Edwin Chen

High-quality data actually really embraces human intelligence and creativity. Otherwise, you’re basically just scaling up mediocrity.

Edwin Chen

The alternative that all the frontier labs view as the gold standard is basically human evaluation… if you don’t do this, you’re basically training your models on the analog of clickbait.

Edwin Chen

I think there’s almost an unlimited ceiling on the richness of RL environments. The more richness you have, the more the models can learn from.

Edwin Chen

SurgeAI’s origin, bootstrapped growth, and billion-dollar revenue scaleDefinition and importance of truly high-quality human dataHuman–AI collaboration and scalable oversight for data generationDesign and complexity of rich RL environments and reward modelsLimits of synthetic data and enduring need for human feedbackProblems with benchmarks, leaderboards, and “vibe-based” evaluationsFuture landscape of frontier models and SurgeAI’s research ambitions

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome