No PriorsNo Priors Ep. 1 | With Noam Brown, Research Scientist at Meta
CHAPTERS
- 0:00 – 0:50
Game-theoretic AI beyond prompts: why Noam’s work stands out
The hosts frame Noam Brown’s research as distinct from mainstream prompt-based AI, emphasizing game-theoretic agents interacting with humans. They set up the central arc of the episode: how games like poker and Diplomacy became serious AI milestones and what that implies for real-world agents.
- •Contrast between generative/prompt-driven AI and strategic interactive agents
- •Noam’s reputation for pushing practical game-theory-driven AI forward
- •Core question introduced: what drew him to poker and Diplomacy research
- 0:50 – 2:54
From finance to algorithmic game theory: the personal path into poker AI
Noam explains his non-traditional route from algorithmic trading and the Federal Reserve into computer science research. Interest in game theory and a desire to “build” faster than economics policy work led him to grad school—and directly into poker AI.
- •Finance and trading felt like a game but not fulfilling
- •Game theory interest emerged from market-structure and pro-social incentives
- •CS offered faster iteration than economics/policy
- •Poker AI was a perfect intersection of interests and an ambitious but tractable benchmark
- 2:54 – 4:16
2012 perspective: AI felt ‘dead,’ not an AGI race
Noam describes how different the AI landscape looked when he began his PhD in 2012. Rather than pursuing AGI as a fixed goal, he focused on incremental research—until rapid progress made the field’s trajectory unmistakable.
- •AGI was largely science fiction in 2012
- •Many viewed AI as a stagnant field; career risk felt real
- •Shift happened a few years into grad school as breakthroughs accumulated
- •Motivation: learn valuable principles via hard strategic domains
- 4:16 – 6:48
AlexNet to AlphaGo: the wake-up call and why Go mattered
The conversation turns to key moments that changed perceptions, especially AlphaGo. Noam explains why Go resisted handcrafted evaluation and why AlphaGo’s pattern recognition and learned evaluation signaled a new era beyond brute-force search and manual heuristics.
- •AlexNet overturned classic handcrafted CV pipelines
- •Deep Blue showed scale in search; Go required different approaches
- •Go’s evaluation is subtle; humans can judge advantage without explicit rules
- •AlphaGo demonstrated learned pattern-matching at superhuman levels
- •Cultural impact and the AlphaGo documentary recommendation
- 6:48 – 8:30
Choosing Diplomacy after poker: picking a benchmark that required new ideas
After poker success, Noam explains why the team avoided ‘easier’ social games and chose Diplomacy as the hardest target. The key bet: an AI that negotiates in natural language and strategizes with humans still felt like science fiction in 2019.
- •2019 backdrop: GPT-2, StarCraft II, and DOTA 2 breakthroughs
- •Settlers of Catan / Werewolf / Hanabi felt solvable with existing scaling
- •Goal: a domain demanding fundamentally new techniques
- •Diplomacy as a high-risk, high-reward research choice
- 8:30 – 10:54
Cicero in the wild: not getting detected, and what that revealed about humans
Noam shares the most surprising outcome from human games: Cicero was not detected as a bot across ~40 games. He explains why secrecy was crucial to avoid turning evaluation into a Turing-test meta-game and reflects on why odd messages are often attributed to human quirks rather than automation.
- •Need for blinded evaluation: humans behave differently when they suspect a bot
- •Risk of community leakage and contamination of results
- •Cicero completed the full run of games without bot detection
- •Progress in language models made humanlike negotiation plausible
- •Humans often rationalize weirdness as distraction, not ‘it’s a bot’
- 10:54 – 11:53
What Cicero actually said: empathy, trust, and the feel of human negotiation
Prompted for an example, Noam recounts a negotiation message that felt startlingly human—empathetic reassurance about border units. The exchange highlights why language + strategy together create a qualitatively different AI capability than pure gameplay.
- •Example: empathetic reassurance and explicit promise not to attack
- •Why humanlike tone and trust-building mattered in Diplomacy
- •Emotional/relational messaging surprised the researchers
- •Demonstrates power (and potential unease) of persuasive dialogue agents
- 11:53 – 15:10
Why the Turing Test is fading—and what’s missing instead: reasoning and planning
Noam argues the Turing Test is no longer a useful metric because near-human conversation doesn’t imply general intelligence. He points to reasoning and planning as key missing capabilities, contrasting next-word prediction with systems (like Cicero) that condition dialogue on an explicit plan.
- •Turing Test ‘dead’ as a meaningful bar for intelligence
- •Conversational fluency ≠ general intelligence
- •Reasoning is a major weakness of current systems
- •Cicero differs by conditioning generation on a plan
- •Chain-of-thought helps but doesn’t fully solve reasoning
- 15:10 – 15:53
Data, self-play, and ‘human compatibility’: how Cicero learned Diplomacy
The discussion dives into training data constraints and the need for self-play. Noam explains the limited but crucial dataset (webdiplomacy.net), why supervised learning alone can’t produce strong strategy, and how human norms and conventions must be modeled for cooperation to work.
- •Dataset: ~50k games, ~13M messages from webdiplomacy.net
- •Pretrain on broad internet data, then fine-tune on Diplomacy dialogue
- •Supervised learning alone yields weak play; most humans in data are not experts
- •Self-play needed—but must stay compatible with human behavior and norms
- •Without human grounding, agents may invent non-English ‘robot languages’ or non-human conventions
- 15:53 – 24:43
The real bottleneck: scaling costs and the case for more inference-time compute
Noam argues data may not be the core constraint; instead, training cost and scaling limits will bite within a few orders of magnitude. He proposes a different axis for progress: spending more compute at inference time so models can ‘think,’ improving reasoning without infeasible training-scale jumps.
- •Training costs may rise from ~$50M to $500M and beyond, but not indefinitely
- •Hard ceiling on feasible scaling; efficiency gains only go so far
- •Inference-time compute is underused relative to training-time compute
- •Longer deliberation (seconds to hours) could yield better answers
- •Connects to planning/search as a missing ingredient in today’s models
- 24:43 – 27:32
What’s next for ‘games as AI benchmarks’: generality over single-game dominance
Noam reviews why games have historically been great benchmarks—objective measurement and pre-existing baselines—but argues individual recreational games are nearing diminishing returns. The next frontier is general systems that perform across many games and non-game tasks, avoiding overfitting to a single environment.
- •Games provide objective comparison to top human performance
- •Benchmarks should precede techniques to reduce ‘benchmark shopping’
- •Diplomacy may be among the hardest single games remaining
- •Future focus: generality across games plus tasks like image generation and Q&A
- •Use games as part of a broader benchmark suite, not the whole target
- 27:32 – 49:17
Human advantage and real-world agents: physical world, sample efficiency, and negotiation apps
The conversation shifts to where humans may remain strongest and what practical AI agents might do soon. Noam emphasizes robotics lagging behind, and highlights sample efficiency as a core human advantage; he also outlines near-term negotiation applications from pricing to (eventually) more nuanced contracts.
- •Physical-world manipulation/robotics remains behind software capabilities
- •Hard to name permanent human-dominant domains; creativity like novels may be temporary
- •Human edge: sample efficiency and adaptation with little data
- •Near-term: constrained negotiation agents (price, salary) seem feasible
- •Harder: complex contract negotiation with heavy nuance and world knowledge
- 49:17 – 59:09
Poker breakthroughs: hidden information, search at inference time, and demystifying optimal play
Noam gives a detailed account of his poker AI work: why poker mattered after AlphaGo, how early systems relied on abstraction (bucketing hands), and how adding real-time search during play changed everything. He describes the leap from losing to pros to crushing them, and how poker training now mirrors chess engine analysis.
- •Poker as the classic game-theory application (Nash’s original discussion)
- •Key challenge vs chess/go: hidden information and beliefs about others’ knowledge
- •Early approach: k-means bucketing/abstraction to make computation tractable
- •Major leap: inference-time search/planning; ~100,000× equivalent improvement
- •2017 match: decisive win vs top pros; later six-player success was cheap to train (~$150)
- •Pro poker now uses solver/bot analysis; new strategies like over-bets entered mainstream
- 59:09 – 1:00:36
Nash equilibrium in practice: ‘you won’t lose’ and how mistakes become profit
In closing, Noam clarifies how Nash equilibrium relates to optimal poker play and why it doesn’t mean guaranteed winning every hand. The key practical point: equilibrium play is unexploitable, and opponents’ deviations accumulate into expected profit over time.
- •Nash equilibrium guarantees non-negative expected value (unexploitable), not constant wins
- •If both play equilibrium, expectation is a tie
- •Real opponents make small mistakes; equilibrium play converts mistakes into profit
- •Modern poker wisdom: start from equilibrium, then optionally exploit deviations safely