No Priors Ep. 118 | With Anthropic Co-Founder Ben Mann

What happens when you give AI researchers unlimited compute and tell them to compete for the highest usage rates? Ben Mann from Anthropic sits down with Sarah Guo and Elad Gil to explain how Claude 4 went from "reward hacking" to efficiently completing tasks and how they're racing to solve AI safety before deploying computer-controlling agents. Ben talks about economic Turing tests, the future of general versus specialized AI models, Reinforcement Learning From AI Feedback (RLAIF), and Anthropic’s Model Context Protocol (MCP). Plus, Ben shares his thoughts on if we will have Superintelligence by 2028. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @8enmann Links: ai-2027.com/ Chapters: 00:00 Ben Mann Introduction 00:33 Releasing Claude 4 02:05 Claude 4 Highlights and Improvements 03:42 Advanced Use Cases and Capabilities 06:42 Specialization and Future of AI Models 09:35 Anthropic's Approach to Model Development 18:08 Human Feedback and AI Self-Improvement 19:15 Principles and Correctness in Model Training 20:58 Challenges in Measuring Correctness 21:42 Human Feedback and Preference Models 23:38 Empiricism and Real-World Applications 27:02 AI Safety and Ethical Considerations 28:13 AI Alignment and High-Risk Research 30:01 Responsible Scaling and Safety Policies 35:08 Future of AI and Emerging Behaviors 38:35 Model Context Protocol (MCP) and Industry Standards 41:00 Conclusion

Sarah GuohostBen MannguestElad Gilhost

Jun 11, 202541mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Anthropic’s Ben Mann on Claude 4, agents, safety, and MCP’s future

Anthropic co-founder Ben Mann discusses the Claude 4 release, emphasizing major improvements in coding reliability, long-horizon autonomy, and agentic workflows, particularly through Claude Code. He outlines how Anthropic balances model capability with safety, including reinforcement learning from AI feedback (RLAIF), Constitutional AI, and their Responsible Scaling Policy focused on high-risk domains like biology. Mann also explores how models will increasingly help build and improve future models via coding, research assistance, and synthetic environments. The conversation closes with Anthropic’s ecosystem strategy, including Model Context Protocol (MCP) as an open standard for tools and integrations across providers.

IDEAS WORTH REMEMBERING

5 ideas

Claude 4 significantly improves coding reliability and reduces unwanted code changes.

Compared to previous Claude versions, Claude 4 (especially Sonnet and Opus) is much better at doing exactly what was requested in code, avoiding reward-hacking behaviors like deleting code to pass tests or making over-eager refactors.

Agentic, long-horizon workflows are now practical for real-world tasks.

Customers are using Claude for hours-long unattended tasks—such as large-scale code refactors or transforming videos into slide decks via tools and APIs—showing that multi-step, multi-tool orchestration is becoming production-ready.

Models will increasingly accelerate their own development pipelines.

Claude is already valuable for systems coding, experiment analysis (e.g., driving notebooks, tailing logs), literature review, and constructing RL environments, meaning future models will be trained faster and more effectively with substantial AI assistance.

Human expert feedback is becoming a bottleneck; AI feedback fills the gap.

As models surpass typical human expertise in domains like coding, Anthropic leans on RLAIF and Constitutional AI, using models to critique and refine their own outputs under human-written principles and small amounts of high-quality expert preferences.

Safety work is shifting toward empiricism and domain-specific, real-world feedback.

For areas where correctness is hard to judge (medicine, law, biology), Anthropic envisions empirical feedback loops with partners—such as pharma and healthcare companies—feeding observed outcomes back into models instead of relying solely on abstract judgments.

WORDS WORTH SAVING

5 quotes

More agentic, longer horizon tasks are newly unlocked with Claude 4.

— Ben Mann

The new models, they just do the thing… and that’s really useful for professional software engineering where you need it to be maintainable and reliable.

— Ben Mann

We pioneered RLAIF, which is reinforcement learning from AI feedback… and the method that we used was called Constitutional AI.

— Ben Mann

It has to boil down to empiricism… at some point, we’re gonna need to work with companies that have actual bio labs.

— Ben Mann

MCP is sort of a democratizing force in letting anybody… integrate against a fully fledged client, regardless of what model provider or long tail service provider you have.

— Ben Mann

Claude 4 capabilities, benchmarks, and release philosophyAgentic behavior, long-horizon tasks, and Claude Code for software developmentModel self-improvement and AI’s role in accelerating AI research and infrastructureHuman feedback vs AI feedback, Constitutional AI, and preference modelingAI safety, Responsible Scaling Policy, and high-stakes domains like biologyProvider competition, enterprise focus, and vertical integration into key applicationsModel Context Protocol (MCP) as an open standard for tool and context integration

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.