Dwarkesh PodcastMark Zuckerberg — Llama 3, $10B models, Caesar Augustus, & 1 GW datacenters
CHAPTERS
- 0:00 – 2:57
Llama 3 and the new Meta AI assistant: what’s shipping
Zuckerberg outlines what changes users will notice most: Meta AI gets a major model upgrade with Llama 3 and becomes more visible across WhatsApp, Instagram, Facebook, and Messenger. He also highlights real-time knowledge via Google/Bing integration and new creative features like fast image generation and animation.
- •Llama 3 powers Meta AI and is also released openly for developers
- •Meta AI positioned as a top free AI assistant with broader in-app placement
- •Real-time knowledge through integration with Google and Bing
- •New creative tooling: image animation and real-time image generation while typing
- 2:57 – 4:46
Under the hood of Llama 3: model lineup, benchmarks, and roadmap
They dig into the Llama 3 family: 8B and 70B released now, with a 405B dense model still training. Zuckerberg shares early benchmark numbers, positions the models as best-in-class for their size, and previews upcoming multimodal, multilingual, and longer-context variants.
- •Initial Llama 3 releases: 8B and 70B; 405B dense model in training
- •Benchmark claims (e.g., ~82 MMLU for 70B; ~85 MMLU mid-training for 405B)
- •Small model progress: 8B approaches Llama 2’s biggest model capability
- •Roadmap: multimodality, multilinguality, and larger context windows
- 4:46 – 9:14
Why Meta bought so many GPUs: Reels, recommendations, and never being capacity-constrained again
Dwarkesh asks how Meta anticipated the GPU crunch; Zuckerberg explains it came from being behind on Reels and recommendation infrastructure. Moving from “connected” content to recommending from a vastly larger candidate set forced a step-change in training and compute needs, motivating an aggressive GPU buildout.
- •Reels push required more GPU capacity for training recommendation models
- •Shift to “unconnected content” expanded candidates from thousands to hundreds of millions
- •Compute constraints slowed the effort to catch up to TikTok-style recommendations
- •Decision rule: buy enough for current needs, then double for the unknown next wave
- 9:14 – 13:32
From FAIR to “we need AGI”: why coding and reasoning matter for consumer products
Zuckerberg describes Meta’s long-term AI investment (FAIR) and the more recent GenAI group formed after ChatGPT and diffusion breakthroughs. He explains how building assistants for social/business contexts revealed that capabilities like coding and reasoning improve performance broadly—even when users aren’t asking coding questions directly.
- •FAIR started ~10 years ago as a research bet toward general intelligence
- •GenAI group formed to productize foundation models across Meta’s apps/devices
- •Surprising lesson: coding training improves rigor and general reasoning
- •Reasoning is essential for multi-step goal completion in business/creator interactions
- 13:32 – 16:00
What “AGI” means here: capabilities, multimodality, memory, and emotional understanding
They discuss whether future Llama versions replace programmers or instead amplify human productivity. Zuckerberg frames AGI as a bundle of capabilities that improve progressively—highlighting multimodality (eventually video and 3D), memory/personalization, and a distinctive focus on emotional understanding.
- •AGI as gradual addition of skills rather than a single threshold
- •Multimodality roadmap: text+images now, video and 3D later
- •Memory and personalization likely won’t rely only on giant context windows
- •“Emotional understanding” as a key capability for social and embodied use cases
- 16:00 – 18:25
Industrial-scale inference: assistants that do tasks, plus creator and business agents
Dwarkesh presses on what could justify tens or hundreds of billions in inference spending. Zuckerberg argues Meta’s products will shift from chat to task execution and that many specialized agents will exist—especially for businesses and the ~200M creators on Meta platforms who want scalable community engagement.
- •Assistant evolution: from Q&A chatbot to delegated task completion
- •Multiple agents rather than one universal AI—businesses want aligned representatives
- •Creator agents: creators “own/train” AIs to engage communities at scale
- •Consumer use cases plus broader impact in science/health via CZI and research
- 18:25 – 22:21
How model progress happens: tool use, agents, and moving from hand-coded scaffolding into the model
They explore whether progress is mostly scaling, new data, or better systems around models. Zuckerberg describes a pattern: ship a base model, add application logic and tool integrations, then progressively train these behaviors into the next generation to reduce brittleness and increase generality.
- •Current pattern: base model + app-specific code, fine-tuning, and tool orchestration
- •Llama 3 improves tool use so less must be hand-engineered (e.g., search)
- •Product teams ‘hack’ interim solutions, then fold lessons into next model training
- •Hand-coded systems unlock use cases but remain brittle compared to trained behaviors
- 22:21 – 24:39
Community ecosystem: distillation to smaller models and Meta’s compute allocation realities
Asked which community fine-tunes excite him, Zuckerberg emphasizes being surprised by what developers build and points to a gap: models smaller than 8B for certain on-device or fast classification-style tasks. He notes Meta is focused on training the 405B model, making GPU allocation a constant tradeoff.
- •Expectation of surprising third-party fine-tunes rather than predictable ones
- •Desire for much smaller models (1–2B, even ~500M) for speed/on-device use
- •Distilled small models can route/shape prompts before calling larger models
- •GPU fleet priorities: major capacity tied up training the 405B model
- 24:39 – 29:43
Scaling laws vs reality: tokens, compute-optimality, and the next bottleneck (energy)
They discuss training on massive datasets (e.g., ~15T tokens) and the observation that models can keep improving with more data. Zuckerberg argues the key uncertainty is how long exponential scaling continues; near term, GPU supply is easing, but energy permitting and power delivery become the dominant constraint.
- •70B trained on ~15T tokens and still learning near the end
- •Strategic tradeoff: keep training vs move on to testing Llama 4 hypotheses
- •Scaling uncertainty: exponential curves eventually hit bottlenecks
- •Next major bottleneck: energy availability and permitting, not just capital
- 29:43 – 33:55
1 GW datacenters and the physics of AI: timelines, regulation, and synthetic data as “inference-like” training
Zuckerberg explains why gigawatt-scale datacenters are a qualitatively new engineering and regulatory challenge—comparable to dedicating a major power plant to model training. He also discusses distributed training uncertainty and the growing role of synthetic data generation, blurring the line between training and inference.
- •Typical datacenters: ~50–150 MW; moving to 300–500 MW or 1 GW is unprecedented
- •Energy buildouts face multi-year regulatory and transmission constraints
- •Distributed training may reduce colocation needs, but it’s an open question
- •Synthetic data generation may become a major component of the training pipeline
- 33:55 – 41:37
Is AI the most important technology ever? Computing-level shift, but constrained and gradual
Zooming out, Zuckerberg compares AI’s impact to the invention of computing itself—enabling entirely new apps and experiences. He’s skeptical of an overnight intelligence explosion due to physical constraints and expects society will have time to adapt, even if change is profound over decades.
- •AI as a low-level, general innovation akin to “people didn’t have computers → people have computers”
- •New apps will emerge as with web/mobile, but at a more fundamental layer
- •Physical constraints reduce likelihood of sudden overnight takeoff
- •AI will reshape work and creativity, enabling people to do more of what they want
- 41:37 – 1:05:37
Open source risks vs concentrated power: security analogies, misinformation, and what would change Meta’s stance
They debate the dangers of open weights (e.g., stripping safety tuning, bio risks) versus the danger of a few closed model gatekeepers. Zuckerberg argues widespread, hardened deployment can improve security—analogous to open source software hardening—and says Meta will evaluate each release, balancing long-term theoretical risks with immediate harms like misinformation and election interference.
- •Open weights can be misused; bad actors may remove guardrails
- •Counter-risk: concentrated AI power in untrusted actors or institutions
- •Security analogy: stronger defensive AI can protect systems against weaker offensive AI
- •Near-term priority: real-world harms today (misinfo, fraud, election interference) while monitoring emerging behaviors
- 1:05:37 – 1:14:57
Would Meta open source a $10B model? Economics, gatekeepers, and why open ecosystems matter
Dwarkesh probes whether Meta would open source extremely expensive frontier models. Zuckerberg argues open sourcing can pay back via ecosystem optimizations (cost, efficiency), as with Open Compute, and frames open LLMs as a way to avoid an AI future dominated by API gatekeepers analogous to Apple/Google on mobile.
- •Open sourcing can yield efficiency gains worth billions at Meta’s scale
- •Historical precedent: Open Compute standardized supply chains and cut costs
- •Strategic motive: avoid a future where closed-model providers control what developers can build
- •Economic caveat: if the model itself becomes “the product,” open sourcing becomes trickier
- 1:14:57 – 1:18:37
Custom silicon roadmap, licensing arrangements, and closing reflections on focus and leadership
Zuckerberg discusses Meta’s custom silicon strategy: start with inference for ranking/recommendations to free GPUs for training, then move toward training on custom chips over time. The conversation ends with a light counterfactual about Google+, then a broader note that at large-company scale, focus—not capital—is often the scarcest resource.
- •Custom silicon first targeted inference for Reels/newsfeed/ads to reduce GPU load
- •Long-term plan: eventually train large models on Meta-designed chips, but not by Llama 4
- •Licensing: permissive-ish Llama license with carve-outs for large cloud resellers/revenue share
- •Leadership constraint: organizational focus is the limiting factor at scale (keep the main thing the main thing)