Skip to content
AnthropicAnthropic

Introducing Claude Opus 4.5

Claude Opus 4.5 sets a new standard for coding, agents, computer use, and enterprise workflows. It knows when to pause and think, which means fewer wasted steps and better results. When we gave it our two-hour engineering assignment, it scored higher than any human ever has. We’re excited to see what you build. Learn more: https://www.anthropic.com/news/claude-opus-4-5

Nov 23, 20250mWatch on YouTube ↗

At a glance

WHAT IT’S REALLY ABOUT

Claude Opus 4.5 debut: stronger coding, agents, vision, efficiency

  1. Anthropic positions Claude Opus 4.5 as its best model yet, especially for coding, agentic tasks, and everyday work like spreadsheets.
  2. The speaker emphasizes increased trust and reliability, citing longer stretches of autonomous progress and solving bugs prior models struggled with.
  3. Opus 4.5 is described as more efficient because it can decide when to think before acting, leading to more correct and targeted changes.
  4. In a two-hour take-home engineering evaluation, the model reportedly scored higher than any human has in that test.
  5. Anthropic notes improved front-end and vision capabilities that make the model better at using computers, and announces availability across major cloud platforms starting today.

IDEAS WORTH REMEMBERING

5 ideas

Opus 4.5 is framed as a step-change in practical coding ability.

The speaker claims it is “the best in the world at coding” and notes anecdotes of it finding bugs that the Sonnet model could not, implying stronger debugging and reasoning in engineering workflows.

Reliability is highlighted via reduced need for human intervention.

Rather than only citing benchmarks, the speaker stresses lived experience: longer “time between interventions” and growing trust that the model will proceed correctly on its own.

The model is positioned as more efficient through better action planning.

Opus 4.5 allegedly “knows when to think before acting,” suggesting improved internal decision-making about when to deliberate versus execute, reducing incorrect edits and rework.

A bespoke engineering test is used to signal top-tier capability.

Anthropic cites a two-hour intensive take-home task where Opus 4.5 scored higher than any human has, aiming to communicate real-world engineering competence beyond standard leaderboards.

Improved front-end and vision are tied directly to better computer use.

The transcript links stronger vision and front-end skills to being “a lot better at using computers,” implying more robust UI interaction, interpretation of visual elements, and end-to-end task completion.

WORDS WORTH SAVING

5 quotes

Claude Opus 4.5 is our best model yet.

Sholto

It's the best in the world at coding, agentic tasks, and everyday work like spreadsheets.

Sholto

What's harder to show is how it just gets it.

Sholto

We've got this take-home. It's a two-hour intensive engineering task, and in that time, the model scored higher than any human ever has.

Sholto

For the first time, it's on every major cloud platform.

Sholto

Coding performanceAgentic task executionSpreadsheet/everyday productivity workReliability and reduced interventionsThink-before-act efficiencyTwo-hour engineering take-home benchmarkVision and computer-use capabilityMulti-cloud availability

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome