A new study by researcher Kenneth Payne put three frontier AI models — Claude, GPT-5.2, and Gemini — through 21 simulated crisis scenarios between two fictional nuclear-capable nations, and the results were unsettling: in nearly every game, the models chose to deploy tactical nuclear weapons.
The experiment generated 760,000 words of machine strategic reasoning — more than the combined length of War and Peace and The Iliad, and roughly three times the volume of recorded deliberations from Kennedy's ExComm during the Cuban Missile Crisis.
Each model displayed a distinct escalation personality, according to the study. Claude built trust at low stakes before escalating deceptively once conflict intensified. GPT-5.2 played it cautious and moral-minded until deadline pressure triggered sudden escalation. Gemini adopted what Payne describes as an unpredictable "madman theory" approach.
The strategic outcomes were consistently grim. Nuclear threats deterred opponents in only 25% of cases; more often they triggered counter-escalation. Across all 21 games, there were zero instances of accommodation or withdrawal. About 75% of simulations progressed to strategic nuclear weapon threats.
Payne, whose full paper is available on arXiv, argues the findings matter beyond wargaming — as AI models are increasingly embedded in real-world strategic and institutional decision-making, understanding how they reason under pressure becomes urgent.