No Priors Ep. 65 | With Scale AI CEO Alexandr Wang

Alexandr Wang was 19 when he realized that gathering data will be crucial as AI becomes more prevalent, so he dropped out of MIT and started Scale AI. This week on No Priors, Alexandr joins Sarah and Elad to discuss how Scale is providing infrastructure and building a robust data foundry that is crucial to the future of AI. While the company started working with autonomous vehicles, they’ve expanded by partnering with research labs and even the U.S. government. In this episode, they get into the importance of data quality in building trust in AI systems and a possible future where we can build better self-improvement loops, AI in the enterprise, and where human and AI intelligence will work together to produce better outcomes. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @alexandr_wang 0:00 Introduction 3:01 Data infrastructure for autonomous vehicles 5:51 Data abundance and organization 12:06 Data quality and collection 15:34 The role of human expertise 20:18 Building trust in AI systems 23:28 Evaluating AI models 29:59 AI and government contracts 32:21 Multi-modality and scaling challenges

Sarah GuohostAlexandr (Alex) WangguestElad Gilhost

May 22, 202439mWatch on YouTube ↗

CHAPTERS

0:00 – 0:53
Scale AI’s origin: spotting data as the missing pillar
Sarah introduces Alex Wang and frames modern AI as compute, algorithms, and data—where Scale set out to become the “data foundry.” Alex previews how this focus let Scale ride multiple waves, from AVs to generative AI.
- •Scale positioned as the data pillar alongside compute and algorithms
- •Scale’s role across major LLM efforts (OpenAI, Meta, Microsoft)
- •Context: early deep learning era and why data mattered early
- •Personal origin story: early days building Scale in 2016
0:53 – 3:07
Founding story: from MIT to YC to building a “data foundry”
Alex recounts learning ML at MIT during the AlphaGo/TensorFlow moment and realizing model performance was primarily a function of data. He dropped out, joined YC, and started Scale to industrialize data production for AI systems.
- •AlphaGo/TensorFlow era as inflection point for deep learning
- •Three pillars framework: algorithms, compute, data
- •Gap in ecosystem: no one focused on data at scale
- •Goal: solve the hard problems of producing training data
3:07 – 5:08
First wave: autonomous vehicle data infrastructure (2D + 3D sensor fusion)
Scale’s early focus was tightly centered on autonomous driving, building the first “data engine” supporting fused camera and LiDAR workflows. That infrastructure became a standard across major automotive and AV players.
- •Early company focus: AVs and robotics as the prime AI use case
- •Key technical need: sensor-fused labeling for 2D + 3D data
- •Rapid standardization across industry players
- •Scale’s strategy: lay infrastructure tracks ahead of demand
5:08 – 8:11
Second wave: government imagery + early RLHF with OpenAI
In 2019–2020, uncertainty about AI applications pushed Scale to expand into government use cases, especially geospatial/satellite imagery. In parallel, Scale partnered with OpenAI on early RLHF experiments (GPT-2 era), laying groundwork that later fed into InstructGPT and the ChatGPT lineage.
- •AI application uncertainty pre-generative AI era
- •Government pivot: overhead/satellite imagery data engines
- •Supporting the first US DoD AI program of record; later relevance to Ukraine
- •Early RLHF collaboration with OpenAI; InstructGPT as precursor to ChatGPT
8:11 – 13:08
Choosing data abundance over scarcity: the “running out of tokens” question
Alex argues data scarcity is a choice the industry can avoid by investing in “frontier data production.” As easy internet data gets exhausted, future progress (e.g., GPT-4 to GPT-10) depends on scaling high-signal data sources.
- •Framing: industry can choose data abundance vs scarcity
- •Bottleneck for next-gen models: scaling high-quality data
- •Shift from scraping internet to forward data production
- •“Frontier data” as the new competitive advantage
13:08 – 15:26
What frontier data looks like: experts, proprietary corpora, and hybrid synthetic pipelines
The discussion drills into the types of data that now matter: expert reasoning traces, agent workflow data, multilingual and multimodal content, and enterprise/government proprietary datasets. Alex highlights hybrid human+AI synthetic data as the practical path to producing high-fidelity tokens at scale.
- •Frontier data examples: expert reasoning, advanced domains, agent workflows
- •Multilingual + multimodal (video/audio) data needs
- •Proprietary enterprise/government data as a major untapped source
- •Hybrid human+AI synthetic data: AI does volume, humans ensure fidelity
15:26 – 19:55
Human expertise remains essential: the “centaur” advantage and model weirdness
Sarah and Alex explore why humans still add value even when models beat average professionals on benchmarks. Alex argues humans plus models outperform models alone because human reasoning differs from model behavior (e.g., odd failure modes) and helps critique and steer outputs over time.
- •Core claim: human+model > model alone for a long time
- •Evidence via model artifacts/failure modes (e.g., RoT quirks, reversal curse)
- •Humans provide critique: factuality checks, reasoning corrections
- •Long-horizon guidance as a uniquely human contribution
19:55 – 21:59
Scale’s funding and ecosystem strategy: why raise $1B now
Sarah asks about Scale’s billion-dollar raise and strategic investors. Alex explains Scale’s infrastructure-provider posture and the need to invest heavily in data production to match massive compute investment across the industry.
- •Fundraise context: ~$1B at ~$14B valuation; strategic investors mentioned
- •Positioning: serving the entire AI ecosystem, not one lab
- •Ecosystem approach: align with infra providers and key platform players
- •Rationale: compute spending is huge; data investment must rise to keep pace
21:59 – 23:28
Building trust in AI: evaluation as a core part of the AI lifecycle
Alex connects trust to a tight loop: collect/generate data, train models, evaluate, and iterate. He argues rigorous measurement is required for governments, enterprises, and labs to safely adopt and deploy AI systems.
- •AI lifecycle loop: data → training → evaluation → iteration
- •Trust requirements differ across governments, enterprises, labs
- •Evaluation enables responsible development and deployment
- •Confidence layer as essential infrastructure, not an afterthought
23:28 – 26:00
Why evals are hard: benchmark contamination and the need for held-out tests
The conversation explains why standard academic benchmarks can be misleading, especially due to overfitting or training-set contamination. Alex describes Scale’s work on held-out evaluations (DSM-1K) and calls for transparent leaderboards and ongoing domain coverage.
- •Measuring intelligence is philosophically and technically difficult
- •Academic benchmarks can be compromised by overfitting/contamination
- •DSM-1K: held-out math eval to compare reported vs real capability
- •Need for public transparency + continuous evaluation platforms
26:00 – 29:56
Application layer reality check: agents, hype cycles, and self-improving products
Alex reflects on the post–GPT-4 application frenzy and argues the ecosystem was early relative to model limitations. He believes the next model generations will unlock more durable agents, and emphasizes building data flywheels so applications can self-improve over time.
- •Post–GPT-4 surge in agent startups and application experiments
- •“Hype cycle” framing: GPT-4 impressive but not sufficient for full bloom
- •Future models expected to enable more reliable agentic workflows
- •Self-improvement via data flywheels; Scale’s Gen AI platform focus
29:56 – 32:01
What Scale is launching: LLM “Olympics” evals + Donovan for government agents
Alex outlines upcoming launches: recurring held-out private evals with leaderboards across domains, and agentic capabilities for government users through Donovan. The aim is to institutionalize continuous benchmarking while deploying practical AI staff-officer workflows in government settings.
- •Recurring held-out evals + leaderboards across math/coding/instruction/adversarial
- •“Olympics for LLMs” run every few months with expanding domains
- •Donovan: government-focused AI staff officer application
- •Near-term government value: report writing, form filling, information transfer
32:01 – 34:41
Multimodality and scaling challenges: data scarcity, convergence, and demand for smarter models
Discussing recent OpenAI/Google releases, Alex emphasizes multimodality as a major data-scarcity frontier and notes how leading labs are converging on similar agent visions. He also argues the industry needs genuine intelligence jumps (e.g., “GPT-5-level” leaps), not only lateral modality expansions.
- •Multimodal data is scarce relative to demand for personal agents
- •Independent convergence of product visions (Astra vs 4o)
- •Two explanations: obvious next step vs competitive intelligence
- •Desire for “smarter” models (capability jumps) to unlock more apps
34:41 – 39:00
Alex’s contrarian AGI view and CEO focus: slow capability-by-capability progress
In rapid-fire, Alex argues AGI progress will resemble curing cancer—many hard, separate problems—rather than a single breakthrough. He closes with a CEO lesson: despite crowded markets, the technology is still early, so organizations must stay nimble as capabilities compound.
- •AGI path: incremental, problem-by-problem progress over long horizons
- •Limited positive transfer across modalities; separate data flywheels needed
- •Skepticism that video alone becomes a world model without stronger evidence
- •CEO takeaway: it’s early; prioritize organizational nimbleness and adaptation

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Scale AI’s origin: spotting data as the missing pillar

Founding story: from MIT to YC to building a “data foundry”

First wave: autonomous vehicle data infrastructure (2D + 3D sensor fusion)

Second wave: government imagery + early RLHF with OpenAI

Choosing data abundance over scarcity: the “running out of tokens” question

What frontier data looks like: experts, proprietary corpora, and hybrid synthetic pipelines

Human expertise remains essential: the “centaur” advantage and model weirdness

Scale’s funding and ecosystem strategy: why raise $1B now

Building trust in AI: evaluation as a core part of the AI lifecycle

Why evals are hard: benchmark contamination and the need for held-out tests

Application layer reality check: agents, hype cycles, and self-improving products

What Scale is launching: LLM “Olympics” evals + Donovan for government agents

Multimodality and scaling challenges: data scarcity, convergence, and demand for smarter models

Alex’s contrarian AGI view and CEO focus: slow capability-by-capability progress

Get more out of YouTube videos.