No PriorsNo Priors Ep. 95 | Best of 2024
CHAPTERS
- 0:00 – 0:30
2024 highlights reel setup: why these clips matter
Sarah Guo opens the episode by framing it as a year-end compilation of standout moments from No Priors’ 2024 conversations. She tees up the themes—AI infrastructure, model access, agents, multimodality, autonomy, and the road to AGI—before moving into the first featured guest.
- •Year-end “best of 2024” format and intent
- •Focus on people building and shaping the AI ecosystem
- •Quick context-setting before jumping into guest clips
- 0:30 – 1:55
Jensen Huang: the unit of compute becomes the data center
Jensen Huang explains NVIDIA’s evolution from shipping chips to delivering data-center-scale computing systems. He argues that real performance only reveals itself when you build and test the full system—hardware, networking fabric, and software—at scale, not in theoretical peak specs.
- •Progression from chip → server → rack → full data center systems
- •Why end-to-end system buildout is required to validate software and performance
- •Peak vs real-world performance gaps emerge at scale
- •“Data center” as the new fundamental unit of computing
- 1:55 – 3:59
Jensen Huang: vertical integration, then disaggregation for every cloud
Jensen details NVIDIA’s approach: build vertically integrated supercomputer-scale systems internally, optimize full-stack, then break them into sellable components. The goal is compatibility across major cloud providers despite different control/security planes—so developers can ‘build once, run everywhere’ on CUDA.
- •NVIDIA builds many configurations (air/liquid, x86/Grace, Ethernet/InfiniBand, NVLink, etc.)
- •Internal supercomputers used to test and optimize systems
- •Optimize end-to-end, then disaggregate into parts for customers
- •Cross-cloud compatibility as a strategic requirement for CUDA ubiquity
- 3:59 – 6:11
Andrej Karpathy: exocortex access, ownership, and ‘renting your brain’
Karpathy discusses the implications of AI as an ‘exocortex’—an extension of human cognition—arguing access and control will become central. He contrasts closed-model oligopolies with open alternatives, proposing that users may rely on closed models for quality while keeping open weights as a fallback for sovereignty.
- •Closed-model concentration vs open ecosystems (e.g., LLaMA)
- •‘Not your weights, not your brain’ as a control/ownership framing
- •Trade-off: control vs using a ‘better brain’ via closed providers
- •Fallback strategies when closed APIs fail; importance of open progress
- 6:11 – 7:13
Karpathy: the case for much smaller models via distillation
Elad presses on the smallest useful model size, and Karpathy argues future ‘cognitive core’ models could be surprisingly small—potentially around a billion parameters. He attributes today’s bloat to uncurated data and emphasizes distillation as a powerful way to transfer capability from large teachers to small students, with tools handling retrieval and external knowledge.
- •Current models waste capacity memorizing low-value artifacts
- •Vision of a small ‘cognitive core’ that thinks and uses tools to look up facts
- •Parameter-size speculation: could be ~1B parameters
- •Distillation works ‘surprisingly well’ as the key compression mechanism
- 7:13 – 8:59
Bret Taylor: ‘company agents’ replace websites as the digital front door
Bret Taylor outlines a near-term opportunity: branded, customer-facing AI agents that handle the full range of a company’s web interactions. He argues this is ‘shovel-ready’ because processes and systems-of-record are well-defined; the key is designing a conversational brand experience rather than chasing AGI-level autonomy.
- •Shift from websites/apps to conversational brand experiences
- •Company agents as practical, deployable automation now
- •Grounding in known workflows and enterprise systems-of-record
- •Different meanings of ‘agent’ in enterprise vs AGI research
- 8:59 – 11:17
Building real agents: concrete deployments and expanding scope
Sarah describes what her company (Ciara) builds and where engineering effort goes, using real deployments like Sonos support and SiriusXM’s subscription agent. She predicts agents will evolve from customer service entry points into comprehensive interfaces for everything a business does—similar to how early websites looked ‘quaint’ in hindsight.
- •Examples: Sonos onboarding/debug support; SiriusXM ‘Harmony’ agent for account changes
- •Branded experience as a core product requirement
- •Agents often start in customer service but expand to full business capability
- •Analogy to early web evolution and changing expectations over time
- 11:17 – 13:24
OpenAI Sora team: video models as world models on the path to intelligence
The Sora discussion emphasizes that training on video can yield emergent understanding of the physical and social world (e.g., implicit 3D). They argue visual grounding matters because much human learning is visual, and that better world modeling could improve broad intelligence beyond just generating realistic videos.
- •Sora learns properties like 3D structure without explicit 3D supervision
- •Visual data teaches causal/temporal dynamics (events affecting future events)
- •World modeling as a core component of human-like intelligence
- •Video generation framed as a stepping stone to broader capability
- 13:24 – 14:44
Prediction at scale: humans are low-fidelity, models can surpass us
Sarah probes whether models need physics-engine-level fidelity, and the guests argue humans are often approximate world predictors. They’re optimistic models can eventually outperform humans in narrow predictive accuracy (e.g., trajectories), while still acknowledging that high-fidelity prediction isn’t required for all intelligence.
- •Humans’ world models are approximate; long-horizon precision is limited
- •AI systems may surpass humans in specific predictive tasks
- •High-fidelity prediction is beneficial but not strictly necessary for intelligence
- •Trajectory prediction (e.g., throwing) as an intuitive benchmark
- 14:44 – 15:52
Karpathy: the ‘bitter lesson’—predict data, scale compute, gain capability
Karpathy connects Sora to scaling-law thinking: the most scalable path is the ‘simple’ objective of predicting data rather than hand-designed tasks. He frames video prediction as analogous to text prediction, reinforcing a unified philosophy for how capability emerges with more compute and data.
- •Scaling laws favor simple objectives that improve with compute
- •‘Predict data’ as the recipe behind both LLMs and Sora-style models
- •Skepticism toward overly complex, bespoke optimization targets
- •Compute scaling as a driver of steadily improving predictions
- 15:52 – 19:00
Waymo’s Dmitri Dolgov: the autonomy gap is the ‘long tail of nines’
Dolgov explains that getting a prototype to drive for many miles is increasingly easy with modern transformers and VLMs, but removing the driver requires extreme reliability. The real difficulty is handling the rare edge cases—pushing from impressive demos to a system that’s demonstrably safer than humans over millions of miles.
- •Difference between driver assistance and full autonomy is reliability (‘number of nines’)
- •Early milestones were achievable with small teams; the long tail is the hard part
- •Modern off-the-shelf models can get you surprisingly far quickly
- •Safety validation at scale is the barrier to true driverless operation
- 19:00 – 20:37
Figma’s Dylan Field: AI won’t kill UI—new paradigms will add more software
Dylan argues that even if agents and chat interfaces expand, prior interaction modes won’t disappear; instead, interfaces diversify like media formats do. He’s skeptical that chat becomes the universal UI and predicts we’ll end up with more UI and more software as new interaction patterns mature.
- •Agents drive interface innovation, but UI remains essential
- •New media types tend to coexist rather than replace old ones
- •Skepticism that chat is the dominant interface for all tasks
- •Expectation of increased software/UI surface area overall
- 20:37 – 23:42
Multimodality and input: voice limits, ‘Minority Report’ fatigue, and cameras
Elad and Dylan discuss when voice, text, and other modalities make sense, noting voice isn’t desirable for navigating complex information all day. Sarah highlights ‘intelligent cameras’ as a powerful input mode because people struggle to describe visuals; capturing images/video could become a mainstream way to communicate intent to AI systems.
- •Voice UI is valuable in some contexts but not universally preferred
- •Gesture/AR-style interfaces may be compelling but not for constant use
- •Users naturally over-extrapolate new interaction modes; exploration is healthy
- •Intelligent cameras as a high-leverage, accessible input method
- 23:42 – 26:29
Scale AI’s Alexandr Wang: AGI as decades of capability-by-capability progress
Alexandr Wang frames AGI progress as more like curing cancer than inventing a vaccine: many hard subproblems with limited transfer between them. He argues limited generalization—especially across modalities—means the field must build separate data flywheels and better evaluations for niche capabilities, leading to steady progress over time.
- •AGI path is ‘plodding’: many small problems, limited leverage between them
- •Implication: society has time to adapt due to gradual progress
- •Claim of limited cross-modality transfer; separate data flywheels required
- •Need for more data and stronger evals to drive real capability gains
- 26:29 – 27:07
Wrap-up: where to find full episodes and what’s next
Sarah closes by thanking listeners and pointing to links for the full conversations featured in the compilation. She invites suggestions for future guests and questions, and shares where to follow and subscribe for weekly episodes and transcripts.
- •Pointers to full episodes in the description
- •Call for listener input on guests and questions for next year
- •Subscription and follow options across platforms
- •Transcripts and email signup at no-priors.com