Noam Brown: AI vs Humans in Poker and Games of Strategic Negotiation | Lex Fridman Podcast #344

Name: Noam Brown: AI vs Humans in Poker and Games of Strategic Negotiation | Lex Fridman Podcast #344
Uploaded: 2022-12-06T00:00:00Z
Duration: 2 h 29 min 21 s
Description: Noam Brown discusses his work building superhuman AI systems for complex strategic games: heads‑up and six‑player no‑limit Texas Hold’em (Libratus, Pluribus) and the negotiation-heavy board game Diplomacy (Cicero).

Lex Fridman and Noam Brown on aI Masters Poker and Diplomacy, Redefining Strategy, Trust, and Negotiation.

Noam BrownguestLex Fridmanhost

Dec 6, 20222h 29m

Nash equilibrium, game theory, and imperfect-information gamesDesign and evolution of poker AIs: Libratus (heads‑up) and Pluribus (six‑player)Search vs. neural networks in games like chess, Go, and pokerCicero: a Diplomacy AI combining language models with RL and human dataTrust, deception, and human‑compatible behavior in multi‑agent systemsHuman‑like AI opponents, training tools, and cheat detection challengesPotential real‑world implications for negotiation, diplomacy, and AI ethics

In this episode of Lex Fridman Podcast, featuring Noam Brown and Lex Fridman, Noam Brown: AI vs Humans in Poker and Games of Strategic Negotiation | Lex Fridman Podcast #344 explores aI Masters Poker and Diplomacy, Redefining Strategy, Trust, and Negotiation Noam Brown discusses his work building superhuman AI systems for complex strategic games: heads‑up and six‑player no‑limit Texas Hold’em (Libratus, Pluribus) and the negotiation-heavy board game Diplomacy (Cicero).

WHAT IT’S REALLY ABOUT

AI Masters Poker and Diplomacy, Redefining Strategy, Trust, and Negotiation

Noam Brown discusses his work building superhuman AI systems for complex strategic games: heads‑up and six‑player no‑limit Texas Hold’em (Libratus, Pluribus) and the negotiation-heavy board game Diplomacy (Cicero).
He explains core ideas like Nash equilibrium, self‑play, counterfactual regret minimization, and the critical role of search, arguing that poker’s imperfect information makes it even more challenging than games like chess or Go.
In Diplomacy, Brown’s team combines large language models with reinforcement learning and human game data to create an AI that can negotiate, form alliances, and build trust with humans in natural language at roughly top‑human level.
They explore how such systems illuminate human irrationality, trust, deception, and the limits of self‑play, and how these ideas may transfer to future NPCs, training tools, and even real‑world negotiation and decision support.

IDEAS WORTH REMEMBERING

7 ideas

Game‑theoretic ‘balanced’ play can outperform human psychological exploitation.

Libratus crushed elite heads‑up poker pros by approximating a Nash equilibrium strategy that didn’t adapt to specific opponents or do ‘mind games’, undermining the belief that reading people always beats theory.

Search is at least as important as raw neural network strength.

Across chess, Go, and poker, planning ahead via search dramatically boosts performance; removing Monte Carlo tree search from Go AIs drops them from far‑superhuman to roughly human‑grandmaster strength.

Imperfect‑information games require optimizing action probabilities, not just actions.

In poker (and rock‑paper‑scissors), the value of a move depends on how often you do it; balancing bluffing and value bets so you are unpredictable is central, and Libratus explicitly optimizes these frequencies.

Six‑player poker shows equilibrium‑style methods can generalize beyond two‑player zero‑sum.

Although theory gives no guarantees, Pluribus uses depth‑limited search and equilibrium‑inspired self‑play to achieve superhuman performance in six‑player games, where cooperation and more complex dynamics appear.

Self‑play alone fails in social, cooperative settings; you must learn from humans.

In Diplomacy, a self‑play‑only bot develops an alien ‘robot language’ and inhuman conventions and is quickly ostracized and crushed by humans; Cicero instead anchors its policies and language to large human datasets.

Controlling language models with explicit ‘intents’ makes dialogue strategic, not just imitative.

Cicero separates deciding what actions it wants (for itself and others) from generating messages, conditioning a language model on those intents and filtering out messages that would backfire or reveal too much.

Trust and minimal lying are crucial even in a game famous for backstabbing.

The team found that frequent lying in Diplomacy reduces long‑run performance because humans stop cooperating; Cicero is explicitly regularized to be honest or at least not obviously deceptive most of the time.

WORDS WORTH SAVING

5 quotes

In any finite two‑player zero‑sum game, there is an optimal strategy that, if you play it, you are guaranteed to not lose in expectation, no matter what your opponent does.

— Noam Brown

One of the key strategies in poker is to put the other person into an uncomfortable position, and if you’re doing that, then you’re playing poker well.

— Noam Brown

We played our bot against four top heads‑up no‑limit hold’em poker players, and the bot wasn’t trying to adapt to them… it was just trying to approximate the Nash equilibrium, and it crushed them.

— Noam Brown

Diplomacy is a game about trust and being able to build trust in an environment that encourages people to not trust anyone.

— Noam Brown

War is an inherently negative‑sum game. There’s always a better outcome than war for all the parties involved.

— Noam Brown

QUESTIONS ANSWERED IN THIS EPISODE

5 questions

If Nash‑equilibrium–style play dominates in poker, where do human psychological skills still meaningfully matter, if at all?

Noam Brown discusses his work building superhuman AI systems for complex strategic games: heads‑up and six‑player no‑limit Texas Hold’em (Libratus, Pluribus) and the negotiation-heavy board game Diplomacy (Cicero).

How might the techniques behind Cicero transfer to real‑world negotiations or diplomacy without amplifying manipulation or deception?

He explains core ideas like Nash equilibrium, self‑play, counterfactual regret minimization, and the critical role of search, arguing that poker’s imperfect information makes it even more challenging than games like chess or Go.

What new kinds of video games or NPC interactions become possible once language models can negotiate, gossip, and build long‑term trust coherently?

In Diplomacy, Brown’s team combines large language models with reinforcement learning and human game data to create an AI that can negotiate, form alliances, and build trust with humans in natural language at roughly top‑human level.

How should we balance developing human‑like strategic AIs with the risks they pose for cheating, influence operations, or ‘deep’ persuasion?

They explore how such systems illuminate human irrationality, trust, deception, and the limits of self‑play, and how these ideas may transfer to future NPCs, training tools, and even real‑world negotiation and decision support.

Could the need for human‑compatible behavior in multi‑agent systems force us to rethink how we define and measure ‘intelligence’ in AI?

EVERY SPOKEN WORD

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

At a glance

AI Masters Poker and Diplomacy, Redefining Strategy, Trust, and Negotiation

Game‑theoretic ‘balanced’ play can outperform human psychological exploitation.

Search is at least as important as raw neural network strength.

Imperfect‑information games require optimizing action probabilities, not just actions.

Six‑player poker shows equilibrium‑style methods can generalize beyond two‑player zero‑sum.

Self‑play alone fails in social, cooperative settings; you must learn from humans.

Controlling language models with explicit ‘intents’ makes dialogue strategic, not just imitative.

Trust and minimal lying are crucial even in a game famous for backstabbing.

If Nash‑equilibrium–style play dominates in poker, where do human psychological skills still meaningfully matter, if at all?

How might the techniques behind Cicero transfer to real‑world negotiations or diplomacy without amplifying manipulation or deception?

What new kinds of video games or NPC interactions become possible once language models can negotiate, gossip, and build long‑term trust coherently?

How should we balance developing human‑like strategic AIs with the risks they pose for cheating, influence operations, or ‘deep’ persuasion?

Could the need for human‑compatible behavior in multi‑agent systems force us to rethink how we define and measure ‘intelligence’ in AI?

Get more out of YouTube videos.