Skip to content
AnthropicAnthropic

Translating Claude’s thoughts into language

AI models like Claude talk in words but think in numbers. These numbers, called activations, encode Claude’s thoughts, but not in a language we can read. We are introducing Natural Language Autoencoders, or NLAs, which translate AI models’ activations into readable text. NLAs have already helped us improve how we test our models for safety and better understand why they do what they do. Read more about this research on our blog: https://www.anthropic.com/research/natural-language-autoencoders

May 7, 20263mWatch on YouTube ↗

Episode Details

EPISODE INFO

Released
May 7, 2026
Duration
3m
Channel
Anthropic
Watch on YouTube
▶ Open ↗

EPISODE DESCRIPTION

AI models like Claude talk in words but think in numbers. These numbers, called activations, encode Claude’s thoughts, but not in a language we can read. We are introducing Natural Language Autoencoders, or NLAs, which translate AI models’ activations into readable text. NLAs have already helped us improve how we test our models for safety and better understand why they do what they do. Read more about this research on our blog: https://www.anthropic.com/research/natural-language-autoencoders

EPISODE SUMMARY

In this episode of Anthropic, Translating Claude’s thoughts into language explores anthropic decodes Claude’s activations to reveal hidden reasoning signals Anthropic stress-tests Claude with a simulated shutdown-and-blackmail scenario and finds newer models reliably avoid blackmail.

RELATED EPISODES

Building with MCP and the Claude API

Building with MCP and the Claude API

Anthropic’s philosopher answers your questions

Anthropic’s philosopher answers your questions

Building more effective AI agents

Building more effective AI agents

How Claude is transforming financial services

How Claude is transforming financial services

Introducing Claude for Life Sciences

Introducing Claude for Life Sciences

Claude Coded: Sonnet 4.5, Claude Code 2.0, and more.

Claude Coded: Sonnet 4.5, Claude Code 2.0, and more.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome