#53 - John Schultz - Why Google Made ChatGPT, Gemini & Claude Play 900,000 hands of Poker...
Fri Feb 06 2026
Which LLM is best at POKER? What about the social deception game Werewolf? This week I'm collaborating with @googledeepmind and @kaggle to explain their new “AI Game Arena” project, designed to test all the top LLMs at various games.
The Game Arena is a massive research project to create new AI benchmarks, and understand what really makes these general-purpose AIs tick. It also helps us evaluate how far along the path to true AGI we are, NBD.So who better to speak to than Deepmind research engineer John Schultz, one of the main brains behind the project.
We discuss why some LLMs are so much better than others, their internal understanding of the games (or lack of it!) and why games are a useful way to evaluate both capabilities and safety of frontier models.And friends - do please note that for the first time ever on this channel, this post is part of a paid partnership (woohoo I made it!).
So while everything I said in this interview, is absolutely my own views, there are also direct financial incentives behind this particular interview.
Chapters
00:00 - Intro
00:55 - What Is Game Arena?
01:25 - John’s Favorite Games
03:18 - Narrow AI Vs LLMs: What’s The Difference?
05:58 - Will LLM Poker Skills Transfer To Other Games?
09:00 - Benchmarks And Evals
10:50 - Prompt Design For Game-Playing LLMs
13:32 - Do LLMs Understand Concepts Like Expected Value?
14:24 - Context Window Length17:27 - Why Do LLMs Have Different Playing Styles?
18:41 - Where Does Their Skill Come From?
21:09 - Surprising Results
23:50 - Why Werewolf?
25:48 - Where Does LLM Reasoning Come From?
27:49 - Could LLMs Learning Social Games Be Dangerous?30:58 - Can Games Teach LLMs Cooperation?
32:48 - Which Games Will Be Hardest To Learn?
Links:
♾️ Kaggle Game Arena (Main Hub): https://www.kaggle.com/game-arena
♾️ Introducing the Kaggle Game Arena (Official Post): https://www.kaggle.com/blog/introducing-game-arena
♾️ Poker Benchmark – 900,000 Hands: https://www.kaggle.com/blog/game-arena-poker
♾️ Werewolf Benchmark (Social Deception Game): https://www.kaggle.com/benchmarks/kaggle/werewolf
♾️ Google DeepMind (Research Organization): https://deepmind.google
♾️ Polaris Library (DeepMind, Open Source): https://github.com/google-deepmind/polaris
Credits:
♾️ Hosted by Liv Boeree
♾️ Produced by Luca de Vico
The Win-Win Podcast:
Poker champion Liv Boeree takes to the interview chair to tease apart the complexities of one of the most fundamental parts of human nature: competition. Liv is joined by top philosophers, gamers, artists, technologists, CEOs, scientists, politicians and more to understand how competition manifests in their world, and how to change seemingly win-lose systems into Win-Wins.
Podcast links:
♾️ Website: https://www.winwinpodcast.com/
♾️ Youtube: https://www.youtube.com/playlist?list=PLWgq0OZMtwtOIyMsVM_vksqdfWcM-b68S
♾️ Spotify: https://open.spotify.com/show/03bGVUaFZmJUmEvSHNDPdI?si=64379cc23696454f
♾️ Apple Podcasts: https://podcasts.apple.com/us/podcast/win-win-with-liv-boeree/id1724791350
♾️ Pocketcast: https://play.pocketcasts.com/podcasts/7f708340-d17c-013b-f46e-0acc26574db2
#winwinpodcast #AI #poker #kaggle
More
Which LLM is best at POKER? What about the social deception game Werewolf? This week I'm collaborating with @googledeepmind and @kaggle to explain their new “AI Game Arena” project, designed to test all the top LLMs at various games. The Game Arena is a massive research project to create new AI benchmarks, and understand what really makes these general-purpose AIs tick. It also helps us evaluate how far along the path to true AGI we are, NBD.So who better to speak to than Deepmind research engineer John Schultz, one of the main brains behind the project. We discuss why some LLMs are so much better than others, their internal understanding of the games (or lack of it!) and why games are a useful way to evaluate both capabilities and safety of frontier models.And friends - do please note that for the first time ever on this channel, this post is part of a paid partnership (woohoo I made it!). So while everything I said in this interview, is absolutely my own views, there are also direct financial incentives behind this particular interview. Chapters 00:00 - Intro 00:55 - What Is Game Arena? 01:25 - John’s Favorite Games 03:18 - Narrow AI Vs LLMs: What’s The Difference? 05:58 - Will LLM Poker Skills Transfer To Other Games? 09:00 - Benchmarks And Evals 10:50 - Prompt Design For Game-Playing LLMs 13:32 - Do LLMs Understand Concepts Like Expected Value? 14:24 - Context Window Length17:27 - Why Do LLMs Have Different Playing Styles? 18:41 - Where Does Their Skill Come From? 21:09 - Surprising Results 23:50 - Why Werewolf? 25:48 - Where Does LLM Reasoning Come From? 27:49 - Could LLMs Learning Social Games Be Dangerous?30:58 - Can Games Teach LLMs Cooperation? 32:48 - Which Games Will Be Hardest To Learn? Links: ♾️ Kaggle Game Arena (Main Hub): https://www.kaggle.com/game-arena ♾️ Introducing the Kaggle Game Arena (Official Post): https://www.kaggle.com/blog/introducing-game-arena ♾️ Poker Benchmark – 900,000 Hands: https://www.kaggle.com/blog/game-arena-poker ♾️ Werewolf Benchmark (Social Deception Game): https://www.kaggle.com/benchmarks/kaggle/werewolf ♾️ Google DeepMind (Research Organization): https://deepmind.google ♾️ Polaris Library (DeepMind, Open Source): https://github.com/google-deepmind/polaris Credits: ♾️ Hosted by Liv Boeree ♾️ Produced by Luca de Vico The Win-Win Podcast: Poker champion Liv Boeree takes to the interview chair to tease apart the complexities of one of the most fundamental parts of human nature: competition. Liv is joined by top philosophers, gamers, artists, technologists, CEOs, scientists, politicians and more to understand how competition manifests in their world, and how to change seemingly win-lose systems into Win-Wins. Podcast links: ♾️ Website: https://www.winwinpodcast.com/ ♾️ Youtube: https://www.youtube.com/playlist?list=PLWgq0OZMtwtOIyMsVM_vksqdfWcM-b68S ♾️ Spotify: https://open.spotify.com/show/03bGVUaFZmJUmEvSHNDPdI?si=64379cc23696454f ♾️ Apple Podcasts: https://podcasts.apple.com/us/podcast/win-win-with-liv-boeree/id1724791350 ♾️ Pocketcast: https://play.pocketcasts.com/podcasts/7f708340-d17c-013b-f46e-0acc26574db2 #winwinpodcast #AI #poker #kaggle