No PriorsNo Priors Ep. 65 | With Scale AI CEO Alexandr Wang
CHAPTERS
- 0:00 – 0:53
Scale AI’s origin: spotting data as the missing pillar
Sarah introduces Alex Wang and frames modern AI as compute, algorithms, and data—where Scale set out to become the “data foundry.” Alex previews how this focus let Scale ride multiple waves, from AVs to generative AI.
- •Scale positioned as the data pillar alongside compute and algorithms
- •Scale’s role across major LLM efforts (OpenAI, Meta, Microsoft)
- •Context: early deep learning era and why data mattered early
- •Personal origin story: early days building Scale in 2016
- 0:53 – 3:07
Founding story: from MIT to YC to building a “data foundry”
Alex recounts learning ML at MIT during the AlphaGo/TensorFlow moment and realizing model performance was primarily a function of data. He dropped out, joined YC, and started Scale to industrialize data production for AI systems.
- •AlphaGo/TensorFlow era as inflection point for deep learning
- •Three pillars framework: algorithms, compute, data
- •Gap in ecosystem: no one focused on data at scale
- •Goal: solve the hard problems of producing training data
- 3:07 – 5:08
First wave: autonomous vehicle data infrastructure (2D + 3D sensor fusion)
Scale’s early focus was tightly centered on autonomous driving, building the first “data engine” supporting fused camera and LiDAR workflows. That infrastructure became a standard across major automotive and AV players.
- •Early company focus: AVs and robotics as the prime AI use case
- •Key technical need: sensor-fused labeling for 2D + 3D data
- •Rapid standardization across industry players
- •Scale’s strategy: lay infrastructure tracks ahead of demand
- 5:08 – 8:11
Second wave: government imagery + early RLHF with OpenAI
In 2019–2020, uncertainty about AI applications pushed Scale to expand into government use cases, especially geospatial/satellite imagery. In parallel, Scale partnered with OpenAI on early RLHF experiments (GPT-2 era), laying groundwork that later fed into InstructGPT and the ChatGPT lineage.
- •AI application uncertainty pre-generative AI era
- •Government pivot: overhead/satellite imagery data engines
- •Supporting the first US DoD AI program of record; later relevance to Ukraine
- •Early RLHF collaboration with OpenAI; InstructGPT as precursor to ChatGPT
- 8:11 – 13:08
Choosing data abundance over scarcity: the “running out of tokens” question
Alex argues data scarcity is a choice the industry can avoid by investing in “frontier data production.” As easy internet data gets exhausted, future progress (e.g., GPT-4 to GPT-10) depends on scaling high-signal data sources.
- •Framing: industry can choose data abundance vs scarcity
- •Bottleneck for next-gen models: scaling high-quality data
- •Shift from scraping internet to forward data production
- •“Frontier data” as the new competitive advantage
- 13:08 – 15:26
What frontier data looks like: experts, proprietary corpora, and hybrid synthetic pipelines
The discussion drills into the types of data that now matter: expert reasoning traces, agent workflow data, multilingual and multimodal content, and enterprise/government proprietary datasets. Alex highlights hybrid human+AI synthetic data as the practical path to producing high-fidelity tokens at scale.
- •Frontier data examples: expert reasoning, advanced domains, agent workflows
- •Multilingual + multimodal (video/audio) data needs
- •Proprietary enterprise/government data as a major untapped source
- •Hybrid human+AI synthetic data: AI does volume, humans ensure fidelity
- 15:26 – 19:55
Human expertise remains essential: the “centaur” advantage and model weirdness
Sarah and Alex explore why humans still add value even when models beat average professionals on benchmarks. Alex argues humans plus models outperform models alone because human reasoning differs from model behavior (e.g., odd failure modes) and helps critique and steer outputs over time.
- •Core claim: human+model > model alone for a long time
- •Evidence via model artifacts/failure modes (e.g., RoT quirks, reversal curse)
- •Humans provide critique: factuality checks, reasoning corrections
- •Long-horizon guidance as a uniquely human contribution
- 19:55 – 21:59
Scale’s funding and ecosystem strategy: why raise $1B now
Sarah asks about Scale’s billion-dollar raise and strategic investors. Alex explains Scale’s infrastructure-provider posture and the need to invest heavily in data production to match massive compute investment across the industry.
- •Fundraise context: ~$1B at ~$14B valuation; strategic investors mentioned
- •Positioning: serving the entire AI ecosystem, not one lab
- •Ecosystem approach: align with infra providers and key platform players
- •Rationale: compute spending is huge; data investment must rise to keep pace
- 21:59 – 23:28
Building trust in AI: evaluation as a core part of the AI lifecycle
Alex connects trust to a tight loop: collect/generate data, train models, evaluate, and iterate. He argues rigorous measurement is required for governments, enterprises, and labs to safely adopt and deploy AI systems.
- •AI lifecycle loop: data → training → evaluation → iteration
- •Trust requirements differ across governments, enterprises, labs
- •Evaluation enables responsible development and deployment
- •Confidence layer as essential infrastructure, not an afterthought
- 23:28 – 26:00
Why evals are hard: benchmark contamination and the need for held-out tests
The conversation explains why standard academic benchmarks can be misleading, especially due to overfitting or training-set contamination. Alex describes Scale’s work on held-out evaluations (DSM-1K) and calls for transparent leaderboards and ongoing domain coverage.
- •Measuring intelligence is philosophically and technically difficult
- •Academic benchmarks can be compromised by overfitting/contamination
- •DSM-1K: held-out math eval to compare reported vs real capability
- •Need for public transparency + continuous evaluation platforms
- 26:00 – 29:56
Application layer reality check: agents, hype cycles, and self-improving products
Alex reflects on the post–GPT-4 application frenzy and argues the ecosystem was early relative to model limitations. He believes the next model generations will unlock more durable agents, and emphasizes building data flywheels so applications can self-improve over time.
- •Post–GPT-4 surge in agent startups and application experiments
- •“Hype cycle” framing: GPT-4 impressive but not sufficient for full bloom
- •Future models expected to enable more reliable agentic workflows
- •Self-improvement via data flywheels; Scale’s Gen AI platform focus
- 29:56 – 32:01
What Scale is launching: LLM “Olympics” evals + Donovan for government agents
Alex outlines upcoming launches: recurring held-out private evals with leaderboards across domains, and agentic capabilities for government users through Donovan. The aim is to institutionalize continuous benchmarking while deploying practical AI staff-officer workflows in government settings.
- •Recurring held-out evals + leaderboards across math/coding/instruction/adversarial
- •“Olympics for LLMs” run every few months with expanding domains
- •Donovan: government-focused AI staff officer application
- •Near-term government value: report writing, form filling, information transfer
- 32:01 – 34:41
Multimodality and scaling challenges: data scarcity, convergence, and demand for smarter models
Discussing recent OpenAI/Google releases, Alex emphasizes multimodality as a major data-scarcity frontier and notes how leading labs are converging on similar agent visions. He also argues the industry needs genuine intelligence jumps (e.g., “GPT-5-level” leaps), not only lateral modality expansions.
- •Multimodal data is scarce relative to demand for personal agents
- •Independent convergence of product visions (Astra vs 4o)
- •Two explanations: obvious next step vs competitive intelligence
- •Desire for “smarter” models (capability jumps) to unlock more apps
- 34:41 – 39:00
Alex’s contrarian AGI view and CEO focus: slow capability-by-capability progress
In rapid-fire, Alex argues AGI progress will resemble curing cancer—many hard, separate problems—rather than a single breakthrough. He closes with a CEO lesson: despite crowded markets, the technology is still early, so organizations must stay nimble as capabilities compound.
- •AGI path: incremental, problem-by-problem progress over long horizons
- •Limited positive transfer across modalities; separate data flywheels needed
- •Skepticism that video alone becomes a world model without stronger evidence
- •CEO takeaway: it’s early; prioritize organizational nimbleness and adaptation