No Priors Ep. 65 | With Scale AI CEO Alexandr Wang

Name: No Priors Ep. 65 | With Scale AI CEO Alexandr Wang
Uploaded: 2024-05-22T12:00:00Z
Duration: 39 min
Description: Alexandr Wang, CEO of Scale AI, explains how Scale evolved from powering autonomous vehicle datasets to becoming the core “data foundry” behind nearly every major large language model and key government AI programs.

No PriorsMay 22, 202439m

Sarah Guo (host), Alexandr (Alex) Wang (guest), Elad Gil (host), Elad Gil (host)

Founding story and evolution of Scale AI from AV data to LLMsThe three pillars of modern AI: compute, algorithms, and dataData abundance vs. data scarcity and the importance of frontier dataHybrid human–AI data generation and expert-driven supervisionAI evaluation challenges, benchmarks, and public leaderboardsEnterprise and government AI applications and self-improving systemsLong-term outlook on AGI, human–AI complementarity, and industry trajectory

In this episode of No Priors, featuring Sarah Guo and Alexandr (Alex) Wang, No Priors Ep. 65 | With Scale AI CEO Alexandr Wang explores scale AI’s Alexandr Wang on data abundance, evals, and AGI’s path Alexandr Wang, CEO of Scale AI, explains how Scale evolved from powering autonomous vehicle datasets to becoming the core “data foundry” behind nearly every major large language model and key government AI programs.

Scale AI’s Alexandr Wang on data abundance, evals, and AGI’s path

Alexandr Wang, CEO of Scale AI, explains how Scale evolved from powering autonomous vehicle datasets to becoming the core “data foundry” behind nearly every major large language model and key government AI programs.

He argues that AI’s limiting factor is shifting from compute to high-quality, expert-driven data, and outlines a vision of “data abundance” built from proprietary corpora, expert annotations, and hybrid human–AI synthetic data.

Wang emphasizes the importance of rigorous evaluations and public leaderboards to properly measure model capabilities, build trust, and support safe deployment across enterprises, governments, and consumer applications.

He believes the path to AGI will be gradual and domain-by-domain—more like curing cancer than inventing a single vaccine—with humans remaining crucial partners in guiding, critiquing, and extending AI systems over long time horizons.

Key Takeaways

Data is becoming the primary bottleneck for AI progress.

While compute spending is measured in tens or hundreds of billions, Wang argues that moving from GPT‑4 to GPT‑10 will be constrained by the availability of diverse, high-quality data rather than just more GPUs.

Get the full analysis with uListen AI

High-quality ‘frontier data’ matters far more than raw volume.

Enterprise and internet-scale datasets are huge, but only a small, carefully filtered subset—expert reasoning traces, agent workflows, multilingual and multimodal data—actually drives meaningful model improvements.

Get the full analysis with uListen AI

Hybrid human–AI pipelines will define the future of data generation.

Models can generate large amounts of initial content, but human experts are needed to correct, critique, and refine outputs to produce reliable synthetic data that meaningfully upgrades model capabilities.

Get the full analysis with uListen AI

Robust, held-out evaluations are essential to trust and safety.

Existing public benchmarks are often in training data and overfit; Scale is building private, regularly refreshed evals and leaderboards so labs, governments, and enterprises can accurately understand model strengths and weaknesses.

Get the full analysis with uListen AI

Every serious AI application will need a self-improvement loop.

Wang notes that leading labs succeed by continuously collecting usage data and evals to refine models; he expects enterprises and governments will need similar data flywheels, which Scale’s Gen AI platform aims to enable.

Get the full analysis with uListen AI

Human intelligence will remain complementary to AI for the long term.

He contends that humans plus models will outperform models alone for a “very, very long time,” particularly in long-horizon reasoning and goal-setting where today’s training paradigms are fundamentally limited.

Get the full analysis with uListen AI

The march toward AGI will be incremental and domain-specific.

Rather than a single breakthrough, Wang predicts a multi-decade process of solving many narrow capability problems with separate data flywheels, allowing society more time to adapt to increasingly capable systems.

Get the full analysis with uListen AI

Notable Quotes

“AI in general is the product of three fundamental pillars: the algorithms, the compute, and the data.”
— Alexandr Wang

“We as an industry can either choose data abundance or data scarcity, and we view our role to be to build data abundance.”
— Alexandr Wang

“Producing high-quality data for AI systems is near infinite impact, because even a tiny improvement in a model compounds over every future invocation.”
— Alexandr Wang

“The question is not whether a model is better than a human; the question is whether a human plus a model is better than a model alone.”
— Alexandr Wang

“The path to AGI looks a lot more like curing cancer than developing a vaccine.”
— Alexandr Wang

Questions Answered in This Episode

How can experts in fields like law, medicine, or physics practically participate in creating the ‘frontier data’ Wang describes, and how should they be compensated or credited?

Get the full analysis with uListen AI

What governance or standards should exist around hybrid human–AI synthetic data to prevent subtle errors or biases from being amplified at scale?

Get the full analysis with uListen AI

How might rigorous, independent evals and public leaderboards change competitive dynamics among major AI labs and influence regulatory policy?

Get the full analysis with uListen AI

For enterprises sitting on massive proprietary datasets, what concrete first steps should they take to identify the small subset of truly valuable training data?

Get the full analysis with uListen AI

If AGI emerges through many narrow, domain-specific advances, how should society plan for the cumulative impact—on jobs, education, and national security—over the next two decades?

Get the full analysis with uListen AI

Transcript Preview

Sarah Guo

(instrumental music plays) Hi, listeners, and welcome to No Priors. Today, I'm excited to welcome Alex Wang, who started Scale AI as a 19-year-old college dropout. Scale has since become a juggernaut in the AI industry. Modern AI is powered by three pillars, compute, data, and algorithms. While research labs are working on algorithms and AI chip companies are working on the compute pillar, Scale is the data foundry, serving almost every major LM effort, including OpenAI, Meta, and Microsoft. This is a really special episode for me, given Alex started Scale in my house in 2016 and the company has come so far. Alex, welcome. I'm so happy to be talking to you today.

Alexandr (Alex) Wang

Thanks for having me. I've known you of all for, uh, for quite some time, so excited to be on the pod.

Sarah Guo

Why don't we start at the beginning just for a broader audience? Talk a little bit about the founding story of Scale.

Alexandr (Alex) Wang

Right before Scale, I was studying AI machine learning at MIT, and this was the year when DeepMind came out with AlphaGo, uh, where Google released TensorFlow, so sort of the, maybe the beginning of the deep learning hype wave or hype cycle. And, uh, I remember I was at college. I was trying to, to use, uh, neural networks. I was trying to train, you know, image recognition, uh, neural networks, and the thing I realized very quickly is that, uh, these models were very much so just a product of their data, and you s- I sort of played this forward and thought through it, and, you know, these models, or AI in general, is the product of, you know, three fundamental pillars. There's the algorithms, uh, the compute and the computation power that goes into them, and the data. Um, and at that time, it was clear, you know, there were, uh, there were companies working on the algorithms, uh, labs like OpenAI or Google's labs or, you know, a number of, of AI research efforts. Uh, there were... NVIDIA was already a very clear leader in building compute for these AI systems. Um, but there was nobody focused on data, and, uh, it was really clear that over the long arc of this technology, data was only gonna become more and more important. And so in 2016, uh, dropped out of MIT, did YC, and really started Scale to solve the data pillar of the AI ecosystem and be the organization that was gonna solve all the hard problems associated with, how do you actually, um, produce and create enough data to fuel this ecosystem? And really, this was the start of Scale as the, the data foundry for AI.

Sarah Guo

It's incredible, uh, foresight because you describe it as, like, the beginning of the deep learning hype cycle. I don't think most people noticed that a hype cycle was yet going on, um, and so, uh, I just distinctly remember, uh, you know, you working through a number of early use cases, you know, building this company in my house at the time and discovering, I think far before anybody else noticed, that, um, the AV companies were spending all of their money on data. Um, how did you think about... Like talk a little bit about how the business has evolved since then 'cause it's certainly not just that use case today.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome