Skip to content
Dwarkesh PodcastDwarkesh Podcast

Mark Zuckerberg — Llama 3, $10B models, Caesar Augustus, & 1 GW datacenters

Zuck on: * Llama 3 * open sourcing towards AGI * custom silicon, synthetic data, & energy constraints on scaling * Caesar Augustus, intelligence explosion, bioweapons, $10b models, & much more Enjoy! 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Transcript: https://www.dwarkeshpatel.com/p/mark-zuckerberg * Apple Podcasts: https://podcasts.apple.com/us/podcast/mark-zuckerberg-llama-3-open-sourcing-%2410b-models-caeser/id1516093381?i=1000652877239 * Spotify: https://open.spotify.com/episode/6Lbsk4HtQZfkJ4dZjh7E7k?si=GOqj7hUdSaWSgi7ULWXjMA * Me on Twitter: https://twitter.com/dwarkesh_sp 𝐒𝐏𝐎𝐍𝐒𝐎𝐑𝐒 * This episode is brought to you by Stripe, financial infrastructure for the internet. Millions of companies from Anthropic to Amazon use Stripe to accept payments, automate financial processes and grow their revenue. Learn more at https://stripe.com/ * V7 Go is a tool to automate multimodal tasks using GenAI, reliably and at scale. Use code DWARKESH20 for 20% off on the pro plan. Learn more at https://www.v7labs.com/go?utm_campaign=Dwarkesh%20Podcast%20Newsletter&utm_source=Dwarkesh-Podcast&utm_medium=Newsletter&utm_term=Paid-Email * CommandBar is an AI user assistant that any software product can embed to non-annoyingly assist, support, and unleash their users. Used by forward-thinking CX, product, growth, and marketing teams. Learn more at https://www.commandbar.com/ If you’re interested in advertising on the podcast, fill out this form: https://airtable.com/appxGOvFLDLP5dlzv/pagFVrbHRohW6F2bZ/form 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 - Llama 3 00:09:15 - Coding on path to AGI 00:26:07 - Energy bottlenecks 00:34:03 - Is AI the most important technology ever? 00:38:04 - Dangers of open source 00:54:40 - Caesar Augustus and metaverse 01:05:36 - Open sourcing the $10b model & custom silicon 01:16:02 - Zuck as CEO of Google+

Mark ZuckerbergguestDwarkesh Patelhost
Apr 18, 20241h 18mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 2:57

    Llama 3 and the new Meta AI assistant: what’s shipping

    Zuckerberg outlines what changes users will notice most: Meta AI gets a major model upgrade with Llama 3 and becomes more visible across WhatsApp, Instagram, Facebook, and Messenger. He also highlights real-time knowledge via Google/Bing integration and new creative features like fast image generation and animation.

    • Llama 3 powers Meta AI and is also released openly for developers
    • Meta AI positioned as a top free AI assistant with broader in-app placement
    • Real-time knowledge through integration with Google and Bing
    • New creative tooling: image animation and real-time image generation while typing
  2. 2:57 – 4:46

    Under the hood of Llama 3: model lineup, benchmarks, and roadmap

    They dig into the Llama 3 family: 8B and 70B released now, with a 405B dense model still training. Zuckerberg shares early benchmark numbers, positions the models as best-in-class for their size, and previews upcoming multimodal, multilingual, and longer-context variants.

    • Initial Llama 3 releases: 8B and 70B; 405B dense model in training
    • Benchmark claims (e.g., ~82 MMLU for 70B; ~85 MMLU mid-training for 405B)
    • Small model progress: 8B approaches Llama 2’s biggest model capability
    • Roadmap: multimodality, multilinguality, and larger context windows
  3. 4:46 – 9:14

    Why Meta bought so many GPUs: Reels, recommendations, and never being capacity-constrained again

    Dwarkesh asks how Meta anticipated the GPU crunch; Zuckerberg explains it came from being behind on Reels and recommendation infrastructure. Moving from “connected” content to recommending from a vastly larger candidate set forced a step-change in training and compute needs, motivating an aggressive GPU buildout.

    • Reels push required more GPU capacity for training recommendation models
    • Shift to “unconnected content” expanded candidates from thousands to hundreds of millions
    • Compute constraints slowed the effort to catch up to TikTok-style recommendations
    • Decision rule: buy enough for current needs, then double for the unknown next wave
  4. 9:14 – 13:32

    From FAIR to “we need AGI”: why coding and reasoning matter for consumer products

    Zuckerberg describes Meta’s long-term AI investment (FAIR) and the more recent GenAI group formed after ChatGPT and diffusion breakthroughs. He explains how building assistants for social/business contexts revealed that capabilities like coding and reasoning improve performance broadly—even when users aren’t asking coding questions directly.

    • FAIR started ~10 years ago as a research bet toward general intelligence
    • GenAI group formed to productize foundation models across Meta’s apps/devices
    • Surprising lesson: coding training improves rigor and general reasoning
    • Reasoning is essential for multi-step goal completion in business/creator interactions
  5. 13:32 – 16:00

    What “AGI” means here: capabilities, multimodality, memory, and emotional understanding

    They discuss whether future Llama versions replace programmers or instead amplify human productivity. Zuckerberg frames AGI as a bundle of capabilities that improve progressively—highlighting multimodality (eventually video and 3D), memory/personalization, and a distinctive focus on emotional understanding.

    • AGI as gradual addition of skills rather than a single threshold
    • Multimodality roadmap: text+images now, video and 3D later
    • Memory and personalization likely won’t rely only on giant context windows
    • “Emotional understanding” as a key capability for social and embodied use cases
  6. 16:00 – 18:25

    Industrial-scale inference: assistants that do tasks, plus creator and business agents

    Dwarkesh presses on what could justify tens or hundreds of billions in inference spending. Zuckerberg argues Meta’s products will shift from chat to task execution and that many specialized agents will exist—especially for businesses and the ~200M creators on Meta platforms who want scalable community engagement.

    • Assistant evolution: from Q&A chatbot to delegated task completion
    • Multiple agents rather than one universal AI—businesses want aligned representatives
    • Creator agents: creators “own/train” AIs to engage communities at scale
    • Consumer use cases plus broader impact in science/health via CZI and research
  7. 18:25 – 22:21

    How model progress happens: tool use, agents, and moving from hand-coded scaffolding into the model

    They explore whether progress is mostly scaling, new data, or better systems around models. Zuckerberg describes a pattern: ship a base model, add application logic and tool integrations, then progressively train these behaviors into the next generation to reduce brittleness and increase generality.

    • Current pattern: base model + app-specific code, fine-tuning, and tool orchestration
    • Llama 3 improves tool use so less must be hand-engineered (e.g., search)
    • Product teams ‘hack’ interim solutions, then fold lessons into next model training
    • Hand-coded systems unlock use cases but remain brittle compared to trained behaviors
  8. 22:21 – 24:39

    Community ecosystem: distillation to smaller models and Meta’s compute allocation realities

    Asked which community fine-tunes excite him, Zuckerberg emphasizes being surprised by what developers build and points to a gap: models smaller than 8B for certain on-device or fast classification-style tasks. He notes Meta is focused on training the 405B model, making GPU allocation a constant tradeoff.

    • Expectation of surprising third-party fine-tunes rather than predictable ones
    • Desire for much smaller models (1–2B, even ~500M) for speed/on-device use
    • Distilled small models can route/shape prompts before calling larger models
    • GPU fleet priorities: major capacity tied up training the 405B model
  9. 24:39 – 29:43

    Scaling laws vs reality: tokens, compute-optimality, and the next bottleneck (energy)

    They discuss training on massive datasets (e.g., ~15T tokens) and the observation that models can keep improving with more data. Zuckerberg argues the key uncertainty is how long exponential scaling continues; near term, GPU supply is easing, but energy permitting and power delivery become the dominant constraint.

    • 70B trained on ~15T tokens and still learning near the end
    • Strategic tradeoff: keep training vs move on to testing Llama 4 hypotheses
    • Scaling uncertainty: exponential curves eventually hit bottlenecks
    • Next major bottleneck: energy availability and permitting, not just capital
  10. 29:43 – 33:55

    1 GW datacenters and the physics of AI: timelines, regulation, and synthetic data as “inference-like” training

    Zuckerberg explains why gigawatt-scale datacenters are a qualitatively new engineering and regulatory challenge—comparable to dedicating a major power plant to model training. He also discusses distributed training uncertainty and the growing role of synthetic data generation, blurring the line between training and inference.

    • Typical datacenters: ~50–150 MW; moving to 300–500 MW or 1 GW is unprecedented
    • Energy buildouts face multi-year regulatory and transmission constraints
    • Distributed training may reduce colocation needs, but it’s an open question
    • Synthetic data generation may become a major component of the training pipeline
  11. 33:55 – 41:37

    Is AI the most important technology ever? Computing-level shift, but constrained and gradual

    Zooming out, Zuckerberg compares AI’s impact to the invention of computing itself—enabling entirely new apps and experiences. He’s skeptical of an overnight intelligence explosion due to physical constraints and expects society will have time to adapt, even if change is profound over decades.

    • AI as a low-level, general innovation akin to “people didn’t have computers → people have computers”
    • New apps will emerge as with web/mobile, but at a more fundamental layer
    • Physical constraints reduce likelihood of sudden overnight takeoff
    • AI will reshape work and creativity, enabling people to do more of what they want
  12. 41:37 – 1:05:37

    Open source risks vs concentrated power: security analogies, misinformation, and what would change Meta’s stance

    They debate the dangers of open weights (e.g., stripping safety tuning, bio risks) versus the danger of a few closed model gatekeepers. Zuckerberg argues widespread, hardened deployment can improve security—analogous to open source software hardening—and says Meta will evaluate each release, balancing long-term theoretical risks with immediate harms like misinformation and election interference.

    • Open weights can be misused; bad actors may remove guardrails
    • Counter-risk: concentrated AI power in untrusted actors or institutions
    • Security analogy: stronger defensive AI can protect systems against weaker offensive AI
    • Near-term priority: real-world harms today (misinfo, fraud, election interference) while monitoring emerging behaviors
  13. 1:05:37 – 1:14:57

    Would Meta open source a $10B model? Economics, gatekeepers, and why open ecosystems matter

    Dwarkesh probes whether Meta would open source extremely expensive frontier models. Zuckerberg argues open sourcing can pay back via ecosystem optimizations (cost, efficiency), as with Open Compute, and frames open LLMs as a way to avoid an AI future dominated by API gatekeepers analogous to Apple/Google on mobile.

    • Open sourcing can yield efficiency gains worth billions at Meta’s scale
    • Historical precedent: Open Compute standardized supply chains and cut costs
    • Strategic motive: avoid a future where closed-model providers control what developers can build
    • Economic caveat: if the model itself becomes “the product,” open sourcing becomes trickier
  14. 1:14:57 – 1:18:37

    Custom silicon roadmap, licensing arrangements, and closing reflections on focus and leadership

    Zuckerberg discusses Meta’s custom silicon strategy: start with inference for ranking/recommendations to free GPUs for training, then move toward training on custom chips over time. The conversation ends with a light counterfactual about Google+, then a broader note that at large-company scale, focus—not capital—is often the scarcest resource.

    • Custom silicon first targeted inference for Reels/newsfeed/ads to reduce GPU load
    • Long-term plan: eventually train large models on Meta-designed chips, but not by Llama 4
    • Licensing: permissive-ish Llama license with carve-outs for large cloud resellers/revenue share
    • Leadership constraint: organizational focus is the limiting factor at scale (keep the main thing the main thing)

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.