Skip to content
Lex Fridman PodcastLex Fridman Podcast

Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity | Lex Fridman Podcast #452

Dario Amodei is the CEO of Anthropic, the company that created Claude. Amanda Askell is an AI researcher working on Claude's character and personality. Chris Olah is an AI researcher working on mechanistic interpretability. Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep452-sb See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. *Transcript:* https://lexfridman.com/dario-amodei-transcript *CONTACT LEX:* *Feedback* - give feedback to Lex: https://lexfridman.com/survey *AMA* - submit questions, videos or call-in: https://lexfridman.com/ama *Hiring* - join our team: https://lexfridman.com/hiring *Other* - other ways to get in touch: https://lexfridman.com/contact *EPISODE LINKS:* Claude: https://claude.ai Anthropic's X: https://x.com/AnthropicAI Anthropic's Website: https://anthropic.com Dario's X: https://x.com/DarioAmodei Dario's Website: https://darioamodei.com Machines of Loving Grace (Essay): https://darioamodei.com/machines-of-loving-grace Chris's X: https://x.com/ch402 Chris's Blog: https://colah.github.io Amanda's X: https://x.com/AmandaAskell Amanda's Website: https://askell.io *SPONSORS:* To support this podcast, check out our sponsors & get discounts: *Encord:* AI tooling for annotation & data management. Go to https://lexfridman.com/s/encord-ep452-sb *Notion:* Note-taking and team collaboration. Go to https://lexfridman.com/s/notion-ep452-sb *Shopify:* Sell stuff online. Go to https://lexfridman.com/s/shopify-ep452-sb *BetterHelp:* Online therapy and counseling. Go to https://lexfridman.com/s/betterhelp-ep452-sb *LMNT:* Zero-sugar electrolyte drink mix. Go to https://lexfridman.com/s/lmnt-ep452-sb *OUTLINE:* 0:00 - Introduction 3:14 - Scaling laws 12:20 - Limits of LLM scaling 20:45 - Competition with OpenAI, Google, xAI, Meta 26:08 - Claude 29:44 - Opus 3.5 34:30 - Sonnet 3.5 37:50 - Claude 4.0 42:02 - Criticism of Claude 54:49 - AI Safety Levels 1:05:37 - ASL-3 and ASL-4 1:09:40 - Computer use 1:19:35 - Government regulation of AI 1:38:24 - Hiring a great team 1:47:14 - Post-training 1:52:39 - Constitutional AI 1:58:05 - Machines of Loving Grace 2:17:11 - AGI timeline 2:29:46 - Programming 2:36:46 - Meaning of life 2:42:53 - Amanda Askell - Philosophy 2:45:21 - Programming advice for non-technical people 2:49:09 - Talking to Claude 3:05:41 - Prompt engineering 3:14:15 - Post-training 3:18:54 - Constitutional AI 3:23:48 - System prompts 3:29:54 - Is Claude getting dumber? 3:41:56 - Character training 3:42:56 - Nature of truth 3:47:32 - Optimal rate of failure 3:54:43 - AI consciousness 4:09:14 - AGI 4:17:52 - Chris Olah - Mechanistic Interpretability 4:22:44 - Features, Circuits, Universality 4:40:17 - Superposition 4:51:16 - Monosemanticity 4:58:08 - Scaling Monosemanticity 5:06:56 - Macroscopic behavior of neural networks 5:11:50 - Beauty of neural networks *PODCAST LINKS:* - Podcast Website: https://lexfridman.com/podcast - Apple Podcasts: https://apple.co/2lwqZIr - Spotify: https://spoti.fi/2nEwCF8 - RSS: https://lexfridman.com/feed/podcast/ - Podcast Playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 - Clips Channel: https://www.youtube.com/lexclips *SOCIAL LINKS:* - X: https://x.com/lexfridman - Instagram: https://instagram.com/lexfridman - TikTok: https://tiktok.com/@lexfridman - LinkedIn: https://linkedin.com/in/lexfridman - Facebook: https://facebook.com/lexfridman - Patreon: https://patreon.com/lexfridman - Telegram: https://t.me/lexfridman - Reddit: https://reddit.com/r/lexfridman

Dario AmodeiguestLex FridmanhostAmanda AskellguestChris Olahguest
Nov 10, 20245h 15mWatch on YouTube ↗

FREQUENTLY ASKED QUESTIONS

Direct answers grounded in the episode transcript. Tap any timestamp to verify against the source.

  1. What are Anthropic ASL-3 and ASL-4 safety levels?

    Anthropic's ASL levels are an if-then system for scaling AI safety as models become more capable. Amodei says the Responsible Scaling Plan tests each new model for catastrophic misuse and autonomy risk, then ties test results to security and deployment requirements. ASL-1 covers systems that manifestly do not pose autonomy or misuse risk, such as Deep Blue. ASL-2 is where today's models sit because they are not capable enough to meaningfully increase CBRN risk beyond what a search engine can provide. ASL-3 is the point where models could help non-state actors, so Anthropic would add special security precautions and narrow deployment filters. ASL-4 is more severe: models could enhance knowledgeable state actors, become the main source of a dangerous capability, or accelerate AI research. The point is to avoid false alarms now while clamping down when danger is demonstrated.

    58:51 in transcript
  2. What is Constitutional AI in Claude?

    Constitutional AI uses written principles to train Claude through AI feedback instead of only human labels. Amodei contrasts it with RLHF, where humans compare two model responses or rate one response directly. In Constitutional AI, the AI system compares possible responses using a human-readable document of principles, a constitution. The model reads the principles, the context, and the candidate response, then judges how well the response followed the criteria. That feedback goes into a preference model, which then helps improve the AI itself. Amodei describes this as a kind of self-play loop between the AI, the preference model, and the improving AI. In practice, Anthropic still uses RLHF and other methods too, but Constitutional AI reduces the need for human feedback and makes each human data point more valuable.

    1:52:53 in transcript
  3. Why does Dario Amodei think powerful AI could arrive by 2026 or 2027?

    Amodei's 2026 to 2027 estimate comes from extrapolating the recent pace of capability gains. He says he is not confident and warns that clipped versions of the claim can remove the caveats. His rough argument is that models have moved from high school level, to undergraduate level, to something like PhD level on some tasks, while missing modalities are being added, including computer use and image generation. If someone simply eyeballs that rate of improvement, he says, 2026 or 2027 looks plausible. He also names ways the straight-line forecast could fail: running out of data, being unable to scale clusters, or disruptions to GPU production. His actual position is more cautious: a mild delay may be likely, long timelines are not impossible, and scaling laws are empirical regularities rather than laws of the universe.

    2:19:07 in transcript
  4. Why do people think Claude is getting dumber?

    Askell says the same model can feel worse because prompts, randomness, and expectations vary. In the cases she was looking at, she says nothing had changed: it was the same model, the same system prompt, and the same overall setup. She still treats complaints seriously because a real product change can alter behavior. For example, turning artifacts from an opt-in feature into a default can change Claude's behavior because it changes the system prompt. But some regressions may come from unlucky prompts rather than model changes. Askell says trying the same prompt several times can reveal that a task may have only succeeded half the time all along. Lex adds that people also get used to strong performance, so failures become more salient after the initial sense of magic fades.

    3:30:37 in transcript
  5. What does superposition mean in mechanistic interpretability?

    Superposition is Olah's explanation for how neural networks represent more concepts than obvious dimensions or neurons. He starts with word embeddings: if a 500 or 1,000 dimensional space could only hold orthogonal concepts, it would run out of room quickly. The superposition hypothesis says sparse concepts can be projected into a lower-dimensional space and still be recoverable, similar to compressed sensing. Because most concepts are absent most of the time, such as Japan and Italy not usually appearing together in the same sentence, the model can pack many more meaningful features than it has dimensions. Olah says the stronger version is that neural networks may be shadows of much larger, sparser networks. That also explains polysemantic neurons, where one neuron responds to unrelated things, and why interpretability needs better feature extraction.

    4:41:10 in transcript

Answers are AI-generated from the transcript and may contain errors. Tap a question to verify against the source.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome