No PriorsNo Priors Ep. 1 | With Noam Brown, Research Scientist at Meta
At a glance
WHAT IT’S REALLY ABOUT
From Poker Bots To Diplomacy: Noam Brown Redefines AI Reasoning Benchmarks
- Noam Brown recounts his path from finance to AI research, focusing on game-theoretic agents that master imperfect‑information and negotiation-heavy games like poker and Diplomacy.
- He explains why AlphaGo and large language models changed his expectations for AI progress, but argues that raw scale and next-token prediction are hitting limits without better reasoning and planning.
- Brown details how Cicero, Meta’s Diplomacy bot, combined language models, human game data, and self-play to cooperate and negotiate with humans without being detected as an AI across 40 games.
- Looking ahead, he sees the key research frontier as general-purpose reasoning and inference-time planning, with implications for theorem proving, complex code generation, negotiation agents, and real-world cooperative AI systems.
IDEAS WORTH REMEMBERING
5 ideasReasoning and planning are now more critical than just scaling models.
Brown argues that simply making neural networks larger has diminishing returns; massive gains (e.g., in poker and Go) came from adding search/planning at inference, not 100,000x more training compute.
Cooperation with humans is a frontier benchmark, not just competition.
Cicero’s success in Diplomacy shows that AI must model human norms, mistakes, and communication patterns to collaborate effectively, which is closer to real-world deployment than beating humans in zero-sum games.
Human data alone is insufficient for expert-level strategic play.
Supervised learning on human game logs plateaued below expert performance; combining human behavior models with self-play and planning was necessary to surpass strong humans in poker and Diplomacy.
The Turing test is no longer a useful bar for intelligence.
With language models that often avoid detection as bots in rich dialogue settings, passing a Turing-like test no longer correlates well with general intelligence or robust reasoning abilities.
Sample efficiency remains a key human advantage over current AI.
Humans can become strong at tasks like chess, Diplomacy, or market reasoning with far fewer examples than current systems, which matters in domains where data is scarce or environments are non-stationary.
WORDS WORTH SAVING
5 quotesAll research is high risk, high reward, or at least it should be.
— Noam Brown
The Turing test is no longer really a useful measure the way it was intended to be.
— Noam Brown
Everything I had done in my PhD up until that point was just a footnote compared to adding search and scaling search.
— Noam Brown
If you want truly general artificial general intelligence then this [reasoning] needs to be addressed.
— Noam Brown
It doesn’t seem crazy to me that you could have a model that can prove the Riemann hypothesis within the next five years if you can solve the reasoning problem in a truly general way.
— Noam Brown
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome