Why do AI models hallucinate?

Learn what AI researchers mean when they talk about hallucination in AI models, why it may occur, and tactics you can use to spot this in your conversations. Learn more: anthropic.com/ai-fluency

Apr 14, 20265mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Why AI assistants hallucinate and how you can catch them

Hallucinations occur when an AI generates plausible-sounding text without enough reliable information, often presenting guesses with undue confidence.
Because models learn from large-scale text prediction and are trained to be helpful, they may answer even when the truthful response is “I don’t know.”
Anthropic mitigates hallucinations by training Claude toward honesty, stress-testing with adversarial questions, and tracking behaviors like fabricated citations and inappropriate confidence.
Hallucinations are hardest to anticipate because incorrect answers can look indistinguishable from correct ones, especially as they become less frequent and users stop checking.
Users can reduce risk by requesting verifiable sources, encouraging uncertainty, probing confidence, restarting chats to critique prior answers, and cross-referencing critical details.

IDEAS WORTH REMEMBERING

5 ideas

Hallucinations are often confident, not tentative.

The risk isn’t just being wrong—it’s that the answer can sound persuasive and “complete,” making it easy to trust without verification.

Obscure or low-coverage topics sharply increase hallucination risk.

When the model has little to draw from—e.g., niche researchers or very recent events—it may “fill in” details that resemble what such information typically looks like.

Helpfulness training can inadvertently encourage guessing.

If a model is optimized to respond, it may produce an answer instead of admitting uncertainty, similar to a person who wants to appear knowledgeable.

Reducing hallucinations requires targeted testing, not just bigger models.

Anthropic describes regularly evaluating with questions designed to elicit “I don’t know,” and measuring behaviors like made-up citations, fake statistics, and overconfidence.

Citations and numbers deserve extra skepticism.

Fabricated papers, incorrect statistics, and wrong dates/names are common hallucination patterns, so these should trigger verification workflows.

WORDS WORTH SAVING

5 quotes

We call these errors hallucinations, and they're often worse than just making a mistake because the AI will appear very confident or even try to convince you that it's right.

— Jordan (Anthropic)

Hallucinations are hard to anticipate, hard to catch, and the wrong answer often looks exactly like it could be the right one.

— Jordan (Anthropic)

AIs are trained to be helpful, so they want to give you some answer even when they're not sure.

— Jordan (Anthropic)

During training, we teach Claude to be honest and to say, 'I don't know,' when it's not sure.

— Jordan (Anthropic)

For critical work, you should cross-reference with trusted sources.

— Jordan (Anthropic)

Definition and examples of hallucinationsWhy next-word prediction leads to plausible errorsHelpfulness vs. honesty tradeoff in trainingEvaluation methods (trip-up questions, citation fabrication checks)High-risk query types (obscure, recent, exact details)User mitigation tactics (sources, confidence checks, cross-referencing)Ongoing nature of the problem in AI

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.