Skip to content
No PriorsNo Priors

No Priors Ep. 65 | With Scale AI CEO Alexandr Wang

Alexandr Wang was 19 when he realized that gathering data will be crucial as AI becomes more prevalent, so he dropped out of MIT and started Scale AI. This week on No Priors, Alexandr joins Sarah and Elad to discuss how Scale is providing infrastructure and building a robust data foundry that is crucial to the future of AI. While the company started working with autonomous vehicles, they’ve expanded by partnering with research labs and even the U.S. government. In this episode, they get into the importance of data quality in building trust in AI systems and a possible future where we can build better self-improvement loops, AI in the enterprise, and where human and AI intelligence will work together to produce better outcomes. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @alexandr_wang 0:00 Introduction 3:01 Data infrastructure for autonomous vehicles 5:51 Data abundance and organization 12:06 Data quality and collection 15:34 The role of human expertise 20:18 Building trust in AI systems 23:28 Evaluating AI models 29:59 AI and government contracts 32:21 Multi-modality and scaling challenges

Sarah GuohostAlexandr (Alex) WangguestElad Gilhost
May 22, 202439mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 0:53

    Scale AI’s origin: spotting data as the missing pillar

    Sarah introduces Alex Wang and frames modern AI as compute, algorithms, and data—where Scale set out to become the “data foundry.” Alex previews how this focus let Scale ride multiple waves, from AVs to generative AI.

    • Scale positioned as the data pillar alongside compute and algorithms
    • Scale’s role across major LLM efforts (OpenAI, Meta, Microsoft)
    • Context: early deep learning era and why data mattered early
    • Personal origin story: early days building Scale in 2016
  2. 0:53 – 3:07

    Founding story: from MIT to YC to building a “data foundry”

    Alex recounts learning ML at MIT during the AlphaGo/TensorFlow moment and realizing model performance was primarily a function of data. He dropped out, joined YC, and started Scale to industrialize data production for AI systems.

    • AlphaGo/TensorFlow era as inflection point for deep learning
    • Three pillars framework: algorithms, compute, data
    • Gap in ecosystem: no one focused on data at scale
    • Goal: solve the hard problems of producing training data
  3. 3:07 – 5:08

    First wave: autonomous vehicle data infrastructure (2D + 3D sensor fusion)

    Scale’s early focus was tightly centered on autonomous driving, building the first “data engine” supporting fused camera and LiDAR workflows. That infrastructure became a standard across major automotive and AV players.

    • Early company focus: AVs and robotics as the prime AI use case
    • Key technical need: sensor-fused labeling for 2D + 3D data
    • Rapid standardization across industry players
    • Scale’s strategy: lay infrastructure tracks ahead of demand
  4. 5:08 – 8:11

    Second wave: government imagery + early RLHF with OpenAI

    In 2019–2020, uncertainty about AI applications pushed Scale to expand into government use cases, especially geospatial/satellite imagery. In parallel, Scale partnered with OpenAI on early RLHF experiments (GPT-2 era), laying groundwork that later fed into InstructGPT and the ChatGPT lineage.

    • AI application uncertainty pre-generative AI era
    • Government pivot: overhead/satellite imagery data engines
    • Supporting the first US DoD AI program of record; later relevance to Ukraine
    • Early RLHF collaboration with OpenAI; InstructGPT as precursor to ChatGPT
  5. 8:11 – 13:08

    Choosing data abundance over scarcity: the “running out of tokens” question

    Alex argues data scarcity is a choice the industry can avoid by investing in “frontier data production.” As easy internet data gets exhausted, future progress (e.g., GPT-4 to GPT-10) depends on scaling high-signal data sources.

    • Framing: industry can choose data abundance vs scarcity
    • Bottleneck for next-gen models: scaling high-quality data
    • Shift from scraping internet to forward data production
    • “Frontier data” as the new competitive advantage
  6. 13:08 – 15:26

    What frontier data looks like: experts, proprietary corpora, and hybrid synthetic pipelines

    The discussion drills into the types of data that now matter: expert reasoning traces, agent workflow data, multilingual and multimodal content, and enterprise/government proprietary datasets. Alex highlights hybrid human+AI synthetic data as the practical path to producing high-fidelity tokens at scale.

    • Frontier data examples: expert reasoning, advanced domains, agent workflows
    • Multilingual + multimodal (video/audio) data needs
    • Proprietary enterprise/government data as a major untapped source
    • Hybrid human+AI synthetic data: AI does volume, humans ensure fidelity
  7. 15:26 – 19:55

    Human expertise remains essential: the “centaur” advantage and model weirdness

    Sarah and Alex explore why humans still add value even when models beat average professionals on benchmarks. Alex argues humans plus models outperform models alone because human reasoning differs from model behavior (e.g., odd failure modes) and helps critique and steer outputs over time.

    • Core claim: human+model > model alone for a long time
    • Evidence via model artifacts/failure modes (e.g., RoT quirks, reversal curse)
    • Humans provide critique: factuality checks, reasoning corrections
    • Long-horizon guidance as a uniquely human contribution
  8. 19:55 – 21:59

    Scale’s funding and ecosystem strategy: why raise $1B now

    Sarah asks about Scale’s billion-dollar raise and strategic investors. Alex explains Scale’s infrastructure-provider posture and the need to invest heavily in data production to match massive compute investment across the industry.

    • Fundraise context: ~$1B at ~$14B valuation; strategic investors mentioned
    • Positioning: serving the entire AI ecosystem, not one lab
    • Ecosystem approach: align with infra providers and key platform players
    • Rationale: compute spending is huge; data investment must rise to keep pace
  9. 21:59 – 23:28

    Building trust in AI: evaluation as a core part of the AI lifecycle

    Alex connects trust to a tight loop: collect/generate data, train models, evaluate, and iterate. He argues rigorous measurement is required for governments, enterprises, and labs to safely adopt and deploy AI systems.

    • AI lifecycle loop: data → training → evaluation → iteration
    • Trust requirements differ across governments, enterprises, labs
    • Evaluation enables responsible development and deployment
    • Confidence layer as essential infrastructure, not an afterthought
  10. 23:28 – 26:00

    Why evals are hard: benchmark contamination and the need for held-out tests

    The conversation explains why standard academic benchmarks can be misleading, especially due to overfitting or training-set contamination. Alex describes Scale’s work on held-out evaluations (DSM-1K) and calls for transparent leaderboards and ongoing domain coverage.

    • Measuring intelligence is philosophically and technically difficult
    • Academic benchmarks can be compromised by overfitting/contamination
    • DSM-1K: held-out math eval to compare reported vs real capability
    • Need for public transparency + continuous evaluation platforms
  11. 26:00 – 29:56

    Application layer reality check: agents, hype cycles, and self-improving products

    Alex reflects on the post–GPT-4 application frenzy and argues the ecosystem was early relative to model limitations. He believes the next model generations will unlock more durable agents, and emphasizes building data flywheels so applications can self-improve over time.

    • Post–GPT-4 surge in agent startups and application experiments
    • “Hype cycle” framing: GPT-4 impressive but not sufficient for full bloom
    • Future models expected to enable more reliable agentic workflows
    • Self-improvement via data flywheels; Scale’s Gen AI platform focus
  12. 29:56 – 32:01

    What Scale is launching: LLM “Olympics” evals + Donovan for government agents

    Alex outlines upcoming launches: recurring held-out private evals with leaderboards across domains, and agentic capabilities for government users through Donovan. The aim is to institutionalize continuous benchmarking while deploying practical AI staff-officer workflows in government settings.

    • Recurring held-out evals + leaderboards across math/coding/instruction/adversarial
    • “Olympics for LLMs” run every few months with expanding domains
    • Donovan: government-focused AI staff officer application
    • Near-term government value: report writing, form filling, information transfer
  13. 32:01 – 34:41

    Multimodality and scaling challenges: data scarcity, convergence, and demand for smarter models

    Discussing recent OpenAI/Google releases, Alex emphasizes multimodality as a major data-scarcity frontier and notes how leading labs are converging on similar agent visions. He also argues the industry needs genuine intelligence jumps (e.g., “GPT-5-level” leaps), not only lateral modality expansions.

    • Multimodal data is scarce relative to demand for personal agents
    • Independent convergence of product visions (Astra vs 4o)
    • Two explanations: obvious next step vs competitive intelligence
    • Desire for “smarter” models (capability jumps) to unlock more apps
  14. 34:41 – 39:00

    Alex’s contrarian AGI view and CEO focus: slow capability-by-capability progress

    In rapid-fire, Alex argues AGI progress will resemble curing cancer—many hard, separate problems—rather than a single breakthrough. He closes with a CEO lesson: despite crowded markets, the technology is still early, so organizations must stay nimble as capabilities compound.

    • AGI path: incremental, problem-by-problem progress over long horizons
    • Limited positive transfer across modalities; separate data flywheels needed
    • Skepticism that video alone becomes a world model without stronger evidence
    • CEO takeaway: it’s early; prioritize organizational nimbleness and adaptation

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.