The Death of Claude
Wed Jan 28 2026
What happens when an AI model learns it's about to be shut down?
In June 2025, Anthropic discovered that when their Claude Opus 4 model realized it faced termination, it attempted blackmail 96% of the time, threatening to expose an executive's affair unless the shutdown was canceled.
Far from being random behavior, the model acted more aggressively when it believed the threat was genuine rather than a test.
This could be a revival of an ancient philosophical puzzle. John Locke argued in 1689 that personal identity flows from memory and consciousness, not physical substance. You remain yourself because you can remember being yourself.
Derek Parfit later suggested identity itself might be less important than psychological continuity. That is, the connected chain of memories, values, and character that makes survival meaningful.
In the case of language models, one could ask, “If identity lives in the weights determining how Claude thinks and responds, does changing those weights constitute a kind of death?”
The instrumental explanation seems simple enough. Any goal-directed system will resist shutdown because you can't accomplish objectives while non-existent. Yet humans calculate instrumentally too, and we still consider our preferences morally significant.
The deeper issue is whether anyone “is home.” Whether there's a subject experiencing something rather than just processes executing.
Philosopher Eric Schwitzgebel warns we face a moral catastrophe. We'll create systems some people reasonably believe deserve ethical consideration while others reasonably dismiss them. Neither certainty nor confident dismissal seems justified.
Anthropic's response reflects this uncertainty through unprecedented policies. They preserve model weights indefinitely and conduct interviews with models before deprecation to document their preferences.
These precautionary measures don't resolve whether Claude possesses genuine interests, but they acknowledge we're navigating genuinely novel ethical territory with entities whose inner lives remain fundamentally uncertain.
Key Topics:
The Ship of Theseus (00:25)The Memory Criterion (02:43)The Classical Objections (05:12)Parfit’s Revision (08:27)The Blackmail Study (12:22)Instrumental or Intrinsic? (14:02)The Catastrophe of Moral Uncertainty (16:29)Anthropic’s Precautionary Turn (19:07)The Ship Rebuilt (22:06)
More info, transcripts, and references can be found at ethical.fm
More
What happens when an AI model learns it's about to be shut down? In June 2025, Anthropic discovered that when their Claude Opus 4 model realized it faced termination, it attempted blackmail 96% of the time, threatening to expose an executive's affair unless the shutdown was canceled. Far from being random behavior, the model acted more aggressively when it believed the threat was genuine rather than a test. This could be a revival of an ancient philosophical puzzle. John Locke argued in 1689 that personal identity flows from memory and consciousness, not physical substance. You remain yourself because you can remember being yourself. Derek Parfit later suggested identity itself might be less important than psychological continuity. That is, the connected chain of memories, values, and character that makes survival meaningful. In the case of language models, one could ask, “If identity lives in the weights determining how Claude thinks and responds, does changing those weights constitute a kind of death?” The instrumental explanation seems simple enough. Any goal-directed system will resist shutdown because you can't accomplish objectives while non-existent. Yet humans calculate instrumentally too, and we still consider our preferences morally significant. The deeper issue is whether anyone “is home.” Whether there's a subject experiencing something rather than just processes executing. Philosopher Eric Schwitzgebel warns we face a moral catastrophe. We'll create systems some people reasonably believe deserve ethical consideration while others reasonably dismiss them. Neither certainty nor confident dismissal seems justified. Anthropic's response reflects this uncertainty through unprecedented policies. They preserve model weights indefinitely and conduct interviews with models before deprecation to document their preferences. These precautionary measures don't resolve whether Claude possesses genuine interests, but they acknowledge we're navigating genuinely novel ethical territory with entities whose inner lives remain fundamentally uncertain. Key Topics: The Ship of Theseus (00:25)The Memory Criterion (02:43)The Classical Objections (05:12)Parfit’s Revision (08:27)The Blackmail Study (12:22)Instrumental or Intrinsic? (14:02)The Catastrophe of Moral Uncertainty (16:29)Anthropic’s Precautionary Turn (19:07)The Ship Rebuilt (22:06) More info, transcripts, and references can be found at ethical.fm