Code with Claude London 2026: Opening Keynote

Get the latest updates from Anthropic's engineering and product leaders at the Code with Claude 2026 opening keynote in London.

May 19, 202646mWatch on YouTube ↗

EVERY SPOKEN WORD

35 min read · 6,510 words

0:00 – 0:58
Intro
1. SPSpeaker
  [upbeat music]
0:58 – 1:30
Claude comes to London: first international Code with Claude
1. SPSpeaker
  Morning, London.
2. SPSpeaker
  Hello.
3. SPSpeaker
  Hello. This is, uh, this is the first time that we've taken Claude, uh, Code with Claude outside of San Francisco, and, uh, I'm very excited to be here with everyone. Um, before we jump in, can I take a quick selfie?
4. SPSpeaker
  Yeah. [clapping]
5. SPSpeaker
  Is that okay? All right. Ready? [laughs] All right. I think we got it.
6. SPSpeaker
  [laughs]
1:30 – 3:04
Learning to code on a calculator: practical “magic” and early tinkering
1. SPSpeaker
  I, um, I wanna start today by telling you how I originally got into coding. I actually learned on a scientific calculator. And, uh, I found out that I could program my calculator to help me pass math tests and do well on homework. I was using a TI-83 calculator. I think in the UK you used Casios, uh, Casios for this. And I found out that you could write little programs in TI BASIC. This is like the basic dialect for, for TI calculators. And, um, I could reference the little programs when I couldn't remember what to do. It worked beautifully. I got great scores on my math tests and my homework, and then I taught my classmates how to program their calculators, and they also got great scores on their math tests and their homework. At some point, I even published a guide on the internet for how to [laughs] program scientific calculators. I was, uh, I was 13 when I wrote this. That was my introduction to software. It was very practical, and, um, as, as a kid, it just, it felt like magic. Around the same age, I figured out HTML, not to build a startup, but
3:04 – 5:08
From eBay HTML hacks to modern complexity—and the distance collapses again
1. SPSpeaker
  to make my eBay listings look better than everyone else's so I could sell Pokémon cards. There were tables, blinking text, the whole thing, and, uh, and it worked. I sold three legendary birds for 99 cents each. [laughs] They were all holographic. [laughs] And, uh, in hindsight, I wish I would've held onto those. There was a whole generation of us like that, right? We didn't learn to program from a textbook. We learned from tinkering. And I still remember the giddy feeling that I got when one of my calculator programs ran. You made the thing, and it did what you wanted. And then somewhere along the way, programming got complicated. Uh, compilers, type checkers, build systems, package managers, 12 config files before you could write a single line of code. The distance between "I have an idea" and "It runs" just kept getting longer. What's happening now is that distance is collapsing again. You describe a problem, and the program shows up. It's the calculator feeling, except the calculator can write a distributed system. Now I get to watch people experience the same feeling of bringing ideas to life, and it's happening on a scale none of us imagined even a year ago. It's so gratifying to talk to people experiencing this every day, from individual indie devs to our biggest customers that are taking on significant challenges.
5:08 – 6:38
Real-world impact: Spotify migrations and Binti’s foster care outcomes
1. SPSpeaker
  For instance, Spotify uses Claude Code to migrate thousands of repos. The team, led by Niklas Gustafsson, built a background agent on Claude. It reads a migration that's described in plain English, and it runs it across a fleet of agents opening PRs. It's now merging over 1,000 PRs a month into production, and it's cutting migration time by over 90%. That's real engineering hours back. And sometimes speed isn't just about efficiency. It's about delivering on a mission. Felicia Korkoru is the co-founder and CEO of Binti. Her software runs the systems that caseworkers use to place kids in foster care.The paperwork, the home visits, the licensing process. And this year, her team used the Claude API to give caseworkers back hours that they used to spend on paperwork. And they took twenty days off the process of licensing a foster family. Twenty days. It's not just an efficiency metric. That's a kid connecting with a family. That excitement of solving problems is something I hear from people
6:38 – 9:41
The capability-adoption gap and why developers matter
1. SPSpeaker
  all the time. And I'm gonna guess that everyone here experiences this differently. Some of you are living on the frontier. Some of you are bringing along the people around you. Many of you can feel the ground shifting under us, and you want a view of maybe what's coming next. And all of us at Anthropic can feel that shifting happening too. The industry is moving at incredible speed. And that makes sense when you look at the exponential in how fast these models are getting better. Remember, a couple years ago, the frontier of model development was so good that Claude could draft a pretty good Git commit. A year ago, we were standing on stage at our first ever Code with Claude event in San Francisco. Opus-Four was the headline. And the idea that Claude could run for a few minutes and build an entire feature without a human checking in, that was crazy. Six months ago, agents started running end-to-end overnight, and I started being able to wake up to finished work. Then, uh, last month, Mythos read the entire OpenBSD source tree, and it found a twenty-seven-year-old vulnerability that survived every human reviewer, every fuzzer, every static analyzer that was thrown at it for almost three decades. The jumps keep getting bigger, and the intervals keep getting shorter. But even though model capabilities are improving on an exponential, most organizations are still adopting AI on a linear path. That means there's a growing gap between what AI can do and what it's actually doing for people. Closing that gap and translating model capability into something that people can actually use is what you all as developers do, and you're doing it. Year over year, API volume is up nearly seventeen X on the Claude platform. And on Claude Code, the average developer is now spending over twenty hours a week running Claude. Like you, we've been shipping a lot lately [chuckles] and we want you to walk away today with a clear picture of where we're headed, so you can plan for it,
9:41 – 12:06
Today’s roadmap: model layer, managed agents, and Claude Code primitives
1. SPSpeaker
  and you can ride the exponential with us. This event is how-- is about how we're making our products work better for all of you. And this morning, we're gonna show you what that looks like. First, Lisa is gonna talk about our foundation. That's the model layer. And she'll explore more about what, what is going on with our frontier models and what's coming next. On the Claude platform, we're shifting updates to Claude managed agents, and Anju and Caitlin are gonna walk you through how you can securely build and deploy agents at scale. Then finally, on Claude Code, Cat and I will walk you through the latest features like, uh, the new agent view and then important new primitives like routines to let Claude prompt itself even when you're away from your computer. All of it comes back to you and what you're gonna build. Because most people are never gonna call the Claude API. Most people are never gonna run Claude in a terminal. They're gonna experience AI through something one of you built on the Claude platform. A designer exploring new directions with Canva, a lawyer getting a brief out the door faster with Lagora, or a developer using any one of the world's best coding agents. You all shape what AI feels like for everyone else. We'd never be able to build something, build everything that people need to solve their problems. This is only something that you all can do. So, uh, thanks for being here today. Thanks for partnering with us to shape what AI looks like for the world and for joining us as we all reexperience those feelings of learning to code and solve problems again. Up next, Lisa from the research PM team. [audience cheering] [upbeat music]
2. SPSpeaker
  [on hold music]
12:06 – 13:36
Claude’s model evolution: from code proficiency to end-to-end ownership
1. SPSpeaker
  Hi. So I joined Anthropic in 2023 and have been a part of every model launch since Claude 3. That means I've been involved in bringing 17 different versions of Claude to end users and developers. Claude has come a long way. Opus 3 launched just over two years ago and was our first model that was proficient at writing long-form code. Sonnet 3.5 new, Sonnet 3.6 really, was the first model that could use a computer safely. Sonnet 3.7 was our first model that would think before giving answers. And last year at this event, I introduced developers to Opus 4. Opus 4, we didn't know it really at the time, but it was our first model that could write complex Excel files and PowerPoint documents. And now, with our most recent models, Opus 4.7 and Mythos Preview, we're seeing Claude is able to own outcomes end to end and apply judgment to complete tasks with high ambiguity. So we haven't slowed down. In fact, we've been accelerating. We shipped eight frontier models in the past 12 months. Each one builds on the last, helping you write better code and deliver more advanced experiences to your end users. Claude's behaviors and capabilities underpin everything else you're going to hear about today.
13:36 – 15:08
Why incremental intelligence creates new markets: tool use, thinking, long context
1. SPSpeaker
  As Claude gets stronger, your starting line moves forward. Like Boris mentioned, we talk a lot about the exponential at Anthropic. We believe as model intelligence increases, the value of use cases increases exponentially. So consider coding. Agentic coding is far more impactful than code autocomplete. In this way, incremental intelligence creates new markets and grows the pie. As we build better versions of Claude, we are excited to see benchmarks improve, but what's really exciting is when Claude can do something that it couldn't do before. Tool use, computer use, thinking that adapts to the task at hand, agentic plans that hold-- agentic loops that hold a plan over hundreds of steps, long context windows. We pioneered these areas. So when you're building on Claude, you're building on the model family that shipped these capabilities first and has had the longest to make them reliable. These capabilities are essential for code, but the foundations of model intelligence have gotten strong enough to support a diversity of tasks in a range of domain verticals. So Claude can iterate on designs for marketing presentations, can redline legal documents, can build financial forecasts,
15:08 – 16:39
Customer results with Opus 4.7: coding agents, engineering tasks, and design taste
1. SPSpeaker
  and even analyze genomic sequencing data. Let me share some examples of how customers are using our latest model, Opus 4.7. AMP, the coding agent, moved their smart mode to Opus 4.7 after it outperformed on their internal benchmarks. They also simplified their tooling since the model no longer needed as much help. Rakuten ran the model on their internal benchmark and found it resolved three times more production engineering tasks than the previous model. Intuit saw Opus 4.7 catching its own logical faults during the planning phase and accelerating execution far beyond previous Claude models. We also recently launched Claude Design by Anthropic's Labs, powered by Opus 4.7. Customers are using Claude Design to build production interfaces in a single sitting, and we've heard that Opus 4.7 has a taste for visual design. Developers building on the model directly are seeing the same. We continue to hear from everyday users that Claude understands the assignment. Users view Claude as a valued thought partner since it's willing to challenge their beliefs and push back when they're wrong. So we've made tremendous progress, but models are still imperfect. Claude absolutely still has verbal tics, can be stumped by
16:39 – 18:10
What’s next: judgment, ‘infinite’ context, and multi-agent coordination
1. SPSpeaker
  viral common sense questions, and sometimes does more than you asked for. Watching Claude improve is what makes this industry, this moment, and my job exciting. So here's what to look forward to from here. Higher judgment and code taste. This means models you can trust with complex autonomous engineering. Context windows that feel effectively infinite as models take on longer continuous work without losing their core intent. And multi-agent coordination, powering teams of agents that can collaborate on goals too big for any one agent or model. One metric I look to to make sense to all of this change is task horizon. So how long can a model work before losing the thread? Last year at this time, models could reliably work for minutes, and today, most users have agents that run for hours. We expect future generations of Claude to run continuously. So we will have agents that are proactive, that are always on, that know what to do without being told. These agents will be responsible for high-level goals that require judgment and collaboration. So let me give you some examplesInstead of asking Claude
18:10 – 20:57
Developer guidance: design for the next model, reduce scaffolding, upgrade via evals
1. SPSpeaker
  to write a project update, you would ask Claude, "Keep the project on track this week." Instead of asking Claude to produce a financial forecast, Claude would own and update the forecast to keep it accurate over time. So what should you as developers do with all of this change? Since the exponential keeps moving, Claude models-- future Claude models will be more capable than Opus four point seven and even Mythos Preview. You need to build for emerging capabilities, not just what works today. That means designing for the next version of Claude, not the current one. We've seen countless times that the developers who win are the ones whose architecture is ready to absorb the next big jump. Scaffolding is what we call the parts of the agent that aren't Claude, so the loops, the instructions, the tools. We're seeing that as models get smarter, the scaffolding that used to help can hold Claude back. Claude is intelligent and resourceful, and more intelligent models can als- often get further with generalized primitives, like a file system and sandbox computing environment. As models get more intelligent, you will also need to keep making harder evals and product prototypes. They were-- They are how you will notice that the exponential is moving underneath you. When a task that used to fail starts passing, that's your sign to ship something that you couldn't ship before. And finally, as the pace continues to accelerate, the teams who are getting the most out of Claude are the ones who treat model upgrades as a business opportunity. You should make the upgrades easy by testing-- by automating your evaluations and testing processes, and you should test models hands-on to better touch and feel how the improved intelligence and capabilities will help your end users. We are seeing the exponential continue, which means Claude will keep getting smarter and keep picking up new capabilities. You, as developers, are the first to feel that. You're the first to experiment, the first to build new products, and the first to find markets that nobody else sees yet. Now, Caitlin and Angela are gonna show you how the Claude platform can make this reality come to life.
2. SPSpeaker
  [upbeat music]
20:57 – 22:07
Claude Platform: solving outcomes + scaling production agents
1. SPSpeaker
  So model capabilities are on this exponential, but most businesses are still on the linear. It's never been more important to take advantage of growing model intelligence to drive true business impact. But what's preventing these businesses from snapping to this exponential themselves? Well, it boils down to two key problems. The first one is just getting the right outcomes can be too difficult. To build an agent that does exactly what you need to do, you need to do things like prompt optimization, tool construction, harness engineering. There's just so much you have to do to get it just right.
2. SPSpeaker
  That's right. And the second problem is you need to ship fast, but you need to ship scalably at the same time. Everybody is moving insanely fast right now, and you've got to keep up. But to win, you need quality too. It's really easy to build prototypes, but it's hard to scale in production. So we've built the Claude platform to give you everything you need to ship, to get the right outcomes quickly and for scale. The platform gives you API primitives that are tuned to Claude models. It gives you infrastructure to build and scale agentic systems, and it gives you controls to operate those systems.
22:07 – 23:13
Lower cost frontier quality: the advisor strategy (executor + advisor models)
1. SPSpeaker
  That's right. Now let's go back to one of the most common problems that Caitlin and I hear all the time from businesses. Businesses need frontier-level intelligence, but at lower costs. One of the ways that we're solving this is with the advisor strategy. So this is really easy to implement. All you have to do is update your tools and your tools array on the messages API. And what we're doing behind the scenes is that we're splitting execution from advising. So in execution, you can use a smaller model. This will make it a bit cheaper. But when that small model needs help, it can reach out to a larger model for advice.
2. SPSpeaker
  Yeah. So in practice, this means you can have a Haiku or a Sonnet class model that you're executing and use Opus as an advisor. So we used the advisor strategy. We ran this with Sonnet as an executor and Opus as an advisor. We found that Sonnet performed way better than Sonnet alone. And actually, we found that Sonnet performed even more cheaply than on its own because Opus advised it to get its work done better. Eve Legal, for example, used the advisor strategy, and they told us they got frontier model quality at five times lower cost.
23:13 – 24:17
Claude Managed Agents: faster to production with orchestration, outcomes, and dreaming
1. SPSpeaker
  And that's awesome. This is a perfect situation where you can apply it to a freemium model. For example, you'd want to put your best foot forward with your product, but you'd have to be conscious of your costs. It's also really great in situations where you have extremely high volume of agentic workloads, so you have to keep an eye out on your ROI. All right. Now let's talk about speed and scale. Those two things that Caitlin was saying that you have to do at the exact same time. Recently, we introduced Claude Managed Agents. Claude Managed Agents is an agentic harness paired with production-grade infrastructure. It allows teams to ship literally ten times faster. You can now build a production-grade agent in days, not months.
2. SPSpeaker
  One of our favorite examples of a customer who built on Claude Managed Agents was Asana. Asana built AI teammates so that humans can collaborate directly with agents within Asana projects, delegating tasks and things like that. And Asana wanted to build for speed and scale at the same time, so they built on Claude Managed Agents.
3. SPSpeaker
  And recently, we introduced a couple of key features that were upgraded for Claude Managed Agents.
24:17 – 31:13
New enterprise controls: self-hosted sandboxes and MCP tunnels (Counter demo)
1. SPSpeaker
  Multi-agent orchestration, which allows you to build fleets of agents, outcomes that allow you to specify what success looks like for Claude, and Claude just iterates to get it done, and dreaming, where Claude can basically introspect on its previous transcripts and learn and self-improve. And today, we're introducing two more new features. The first one is self-hosted sandboxes. So this means that you can now have Claude execute work on your own server. The other thing is MCP tunnels. And for many people who build internal agents, being able to access your internal MCP server securely is now possible on Claude Manage Agents through MCP tunnels. Claude Manage Agents spawn sandboxes when it's time to execute work, like editing files. And you can still choose to use our sandboxes to do that work, but starting today, you can also use your own. And we're introducing this with first-class support for Daytona, Cloudflare, Vercel and modal sandboxes with super easy integration, so it's easy to get started with any of these providers. With MCP tunnels, you can have your MCP servers hosted behind your firewall on your private network so that Claude Manage Agents can access them without exposing them over the public internet. Should we check them out? Let's do it. So Caitlin and I have been working with a fictional company called Counter. Counter offers software for small businesses to be able to build up a digital storefront. And Counter really wants to become an AI native business. And one of its most common workflows is that it's constantly running growth experiments to optimize its merchant onboarding. Caitlin has graciously offered to help build a growth agent on Claude Manage Agents for Counter. So here we are in Counter Slack workspace. I'm in here with my colleague, Gabe. We're wearing our super cool Counter hats, and we're ready to go. So we're really excited to see how our latest growth experiment on our onboarding flow has performed. And you can see that our new growth bot came into the Slack channel, and it told us there's actually a really clear winner in our most recent experiment. The simpler version of the onboarding flow is way outperforming the older, longer version of the flow, where merchants were kind of dropping off before they ever finished signing up for Counter. The agent is proactively telling us that it's calling the experiment and it's gonna get started on cleaning up the old variant, which is awesome. So let's take a look at how this agent actually is set up. And to do that, I'm gonna use the Claude API CLI and the Claude developer console. So first, let's check out the agent's configuration. You see here we've got our system prompt, our skills, our tools. Well, let's take a look at the MCP servers. First, we've got Slack. Makes sense, right? That bot was talking to us in Slack. Then we've got our data warehouse so that the bot can go and read data about the experiments and be able to determine what might be winning. We've got our feature flags MCP server so that the bot can call experiments or start new ones and, and take actions like that. But let's look at these URLs. So for Slack, regular public MCP URL. But for the data warehouse and features flag services, you might notice something different here. We've got these URLs that are behind tunnel.anthropic.com. And what this means is both of these MCP servers are behind Counter's firewall and only accessible via the tunnel so that Counter can kind of meet those requirements of making sure that its MCP servers stay on its private network. And so let's pop into the Claude developer console and see how this is set up. So for Counter, they set up a gateway in their own private network. Then they d-- establish a secure connection to Anthropic. As a result, any agent that Caitlin builds that uses the MCP tunnel is now able to securely access internal MCP servers. You can go ahead and create these MCP tunnels directly in the Claude developer console, and we give you a couple of different configuration options, so you can adjust them accordingly. So Counter's next big requirement is that agent, when it goes to do things like write code and execute code, it y- it wants us to do those things on a self-hosted sandbox. It doesn't want to run these things on public infrastructure. So what we do is, for our Claude Managed agents, we can set up our environment to make sure that we're using a self-hosted sandbox. So with the self-hosted sandbox, now what's going to happen is when the agent actually needs to execute work, it's gonna put a work item in a queue. And here, Counter is actually using Vercel's, uh, self-hosted sandbox. And so what will happen is that Vercel will actually pick up that work item, and then they will actually spin up a sandbox in Counter's Vercel account in order to execute the work. So one of the cool things about Claude Managed Agents is that we've got a whole bunch of really rich observability within the console experience. And so let's actually go check out the session that ran that resulted in that Slack thread that we saw earlier. And so you can see here that this agent is running a whole bunch of different tools, but we can briefly take a look and make sure that some of the things that we cared about around MCP tunnels and self-hosted sandboxing are working the way we want. So first we've got this set flag winner tool call, which means that the agent was able to work through the MCP tunnel to call that MCP server for the feature flag service. And then you can see here too, when the agent decides it's time to write the code to clean up the old variant, it checks out a new Git branch, and it's doing this work on our Vercel self-hosted sandbox. Awesome. Let's head back to Slack and take a look at what this Growthbot agent has been doing. So first off, Growthbot is very proactive. Not only has it figured out that it should default the, uh, simplify variant, but it has also put together a PR in order to actually do that defaulting and clean it up. And when that work gets executed, it's done on Counter's servers via the self-hosted sandbox. Another great thing in its productivity, you can see it also gave us a little screenshot. So you can see the old variant, which is a bit messy, and the new one, and that definitely looks cleaner. Not only is this agent proactive, but it's also collaborative. You can see here Caitlin asked, "What next?" And Growthbot was actually able to detect that another great opportunity could be to solve this forty-six percent drop-off that it found in the onboarding flow.
2. SPSpeaker
  Now GrowthBot was able to calculate that because it was able to query the data warehouse securely through an MCP tunnel. And we love that it ended up by asking us, "Do you want me to get started on that?" Thank you very much, Claude. All right, let's wrap up. So today we introduced two big upgrades to Claude managed agents with self-hosted sandboxes and MCP tunnels. With this collection of feature upgrades, it's now easier than ever to become an AI native business yourself and build agents for all of your workflows, and ultimately accelerate whatever you're building. And now Kat and Boris are going to come and talk to you about how Claude Code is making it even more fun to ship as a developer.
3. SPSpeaker
  [applause]
31:13 – 40:08
Claude Code product layer: interfaces, org-wide adoption, and new primitives
1. SPSpeaker
  Thanks. Angela and Caitlin just showed you how the Claude platform bridges the gap between what models can do and what agents' businesses ship. We see a related challenge on Claude Code to bridge the gap between model capabilities and how every developer can benefit from them. First, I want to take some time to just thank all of the builders in the room with us today and watching online. Thank you for trusting us in your production code base back when Sonnet three seven was our frontier model, and back when our product was rough around the edges. Your support is what makes the team so excited to come in every day and make Claude Code better. Let's back up to why Claude Code exists. Our mission is to help every builder close the gap between having a great idea and having a shipped product. The way that we enable this is by building tools that elicit the frontier intelligence from our models and making these tools accessible to every builder. And we don't think of ourselves as having a finished roadmap to hand to you. We think of ourselves more like mountaineers just climbing alongside you in terrain that none of us has mapped before, figuring things out as we go. And we're growing with you with increasing AI capabilities and helping you manage these new challenges as they emerge. I remember looking back to last year when I would give Claude Code a task, and I would read every single edit it tried to make, every permission prompt. I would give it super detailed guidance about what to do and what not to do, just walking it through the details of the change. Now most of us are running in auto mode to delegate permissions to Claude, and we're checking in after Claude has already tested, verified its changes and has a PR ready for us. Looking back, Claude Code started in the CLI, and the terminal is still the interface for power users who want a minimal text interface and the most control and customizations. Then we added the IDE. This gives you the same powerful agents, but you also have the ability to follow along with all the code changes as the agent is working. And we heard from you that many of you are now juggling multiple Claude Code instances, which we've affectionately been calling multi-Clauding. We've added two new interfaces to help you manage more agents. One that I use frequently is Claude Code on desktop. It's a full screen graphical interface. It has built-in previews, a sidebar control plane, and the ability to render images and rich outputs. We've built desktop to be a single view across both local sessions and your cloud sessions with visual indicators of which agents are running, which ones are blocked, and which ones need your input. Next is our newest surface, Claude Agents view in the CLI for people who actually just prefer to stay in the terminal. In this view, you can also see what's running, what's waiting for you, and what's done in one glance. You can reply in line to unblock. You can jump in and out of sessions, all without losing your place. The VS Code IDE extension and desktop app are both built on the Claude Agent SDK, the same one that many of you are building on as well. And many enterprises have now adopted Claude Code wall to wall. At Anthropic, we've seen that this has driven a two hundred percent increase in the number of PRs per engineer, even as our engineering org has scaled substantially. Together with all of you, we're redefining and discovering what the future of engineering looks like by embracing these new challenges that we're encountering and building automations powered by Claude to tackle each. Here's some of the feedback that we've heard from our users and what we've built with the help of everyone in this community. We heard from you that you want to spend less time on code review, so we shipped a code review product that deploys a team of agents to traverse all your code changes and auxiliary files to catch critical bugs. Thousands of companies use this every day, including every internal Anthropic team. We heard from you that you want to code on the go, so we launched remote control and Claude Code in iOS and Android, so you can fire off a task no matter where you are. You're no longer needing to walk around with this open laptop or you're no longer stuck at your desk. You can now go to the park, touch grass, and still get your tasks done. [laughs] We heard from you that you're also spending a lot of time babysitting your PRs, fixing flaky CI tests, addressing code review comments, and resolving merge conflicts.So we added Autofix. It listens for these events and proactively fixes each one so that your PR is always green. And we heard from you that you wanna run Claude Code on new tickets and on new customer bug reports that are coming in, so we built routines. Configure once and Claude Code can run on a schedule or in response to a webhook or API request. The work that used to require you to manually kick off now just happens on a schedule. And last, we heard from you that you're landing so much code that security teams are struggling to keep up, so we built Claude Security. It scans your codebase overnight, flags vulnerabilities ranked by severity, and lets you kick off Claude Code to address each one. Each of these primitives that we've built composes together so that we can, together, easily adapt to the future of engineering. Everything I've covered is something an individual can pick up today, but it's especially exciting to see how a range of companies have adopted Claude Code at the scale of their entire engineering orgs. First, Shopify powers e-commerce for millions of merchants worldwide and has built AI use into its entire engineering culture. They use Claude Code across the entire company, both across engineering org but also non-engineers, so product managers, designers, data scientists. And they're bringing it directly into their platform to stand up tools at scale. Andrew McNamara is the director of applied AI at Shopify, and in his words, "The speed is just crazy," and Claude Code has transformed how they build their internal tools. Another example is Mercado Libre. They're Latin America's most popular e-commerce site, with over 100 million buyers. They have a team of 23,000 engineers, and their org runs on Claude Code. When that happens across an org, the work itself changes. Engineers are pointing agents at debt, tech debt that people haven't had the time to fix themselves. It's reviewed more than 500,000 PRs with human oversight and modernized more than 9,000 of their apps. Oscar Mullen leads technology and is aiming for 90% autonomous coding in a fully agent-driven PR loop by Q3 of this year. But the detail I love the most isn't just the number, it's that managers and VPs who haven't committed code in years are now shipping again. Claude Code is putting coding back in the hands of people who've spent the last decade in reviews and roadmap sessions instead of in their codebase. And we see this across the entire industry. Millions of developers are getting more product shipped at higher quality and at faster speeds. Now, let's see what this actually looks like in practice. To take you through it, please welcome the creator of Claude Code, Boris Cherny. [upbeat music] [audience applauding]
40:08 – 45:14
Live demo: async coding with desktop sessions, routines, and CI Autofix
1. SPSpeaker
  I just realized we're matching. [laughs] Thanks, Kat. Um, I'm gonna jump into the demo in a sec, but, um, first I just wanna say everything that we're showing still feels magical to me. Using Claude Code, I still get those same feelings that I got when I programmed my first calculator. And even inside Anthropic, in Slack all day we're trading screenshots of things that engineers are doing in the wild with Claude Code. It's, it's just so exciting to see. Um, and so today I'm excited to share some examples of Claude Code's magic with, uh, with all of you. For this demo, let's imagine that we're an engineer at Acme Pay. It's a payment infrastructure company. We're gonna start in the Claude Code desktop app, and we're gonna start by working on just one single task. And in this session, Claude is working on adding refunds to Acme's merchant dashboard. It's working on a full implementation. Item potency, so a duplicate webhook doesn't double refund a merchant; multi-currency handling across all the regions that Acme serves; and, uh, audit logging for the compliance team. It writes the implementation, and it's gonna verify its own work. Claude pulls up the merchant dashboard in the browser, triggers a refund, and the modal closes before the success toast appears. It's a real edge case. Claude sees the failure. It traces it back to a race condition in the optimistic update. It fixes it, and it verifies it actually works in the browser before calling the task done. Now let's zoom out. Claude Code isn't running just one session here. It's one of many sessions all running and being managed in parallel. In the Claude Desktop app, you can now see all your Claude Code sessions, which ones are running, which need your input, which have PRs that have already been merged and closed. Synchronous coding is now just a slice of what's happening at any given moment, and we think that going forward, more code is gonna be written in an async way.And this is why we keep talking about verification. If Claude can check its work, you can just let it run while you work on something else, and you can come back to a fully working result. A lot of my code these days is written by routines. I'm not the one doing the prompting. I'm the one that creates a routine that does the prompting. For the engineers in the room, think of it like a higher order function. Routines are a higher order prompt. For example, let's go back to the refund session we just looked at. A teammate filed a GitHub issue overnight, a routine watching the repo picked it up async, and then it kicked off the working Claude. With routines, developers can set up async automations and wake up to PRs that are ready to merge. Here's our routines view. Routines can be run on a schedule, they can get kicked off by webhooks, or even kicked off by arbitrary API calls, and you can run them locally on your machine or on remote Claude compute. Let's look at one more feature in desktop. This is CI Autofix. What it does is it watches the PR that the last session just opened, and its job is to babysit the PR to production. It's gonna autofix any comments from code review and security review, autofix CI, autorebase if there's any merge conflicts. And look at what just happened. CI flaked on a network timeout. The routine woke up, diagnosed it as a known infra issue, and retried the job. And a- actually in the Claude Code code base, we, we tell it to fix the root cause instead, not just retry. And CI is green. The engineer that owns the PR is never gonna see the red X, and that work is off their plate. That's the shift. The default isn't, "I'm gonna prompt Claude Code," the default is now, "I'm gonna have Claude prompt Claude Code." Everything you just saw is available today, uh, including routines and all the latest updates to the Claude Desktop app. We're excited for you to try them out and let us know what you think. We
45:14 – 46:26
Closing: one story across layers—capability is here, speed to impact is the gap
1. SPSpeaker
  hope these features continue to let you close the gap between your ideas and shipping products. And that's really what every talk today was pointing at. Lisa's capability curve, Anjula and Caitlin's agents that run on infrastructure that you control, what Cat and I just showed you. These are three layers of one story. The capability's already here, and the remaining gap is how fast we put it to work. I encourage you to spend the rest of today exploring these layers. Research talks if you're evaluating the models, uh, Claude Platform sessions if you're building for your users, and Claude Code workshops if you wanna learn more ways to bring Claude into your day-to-day development workflow. Dive in, go deep, and start building with us. Thank you. [audience applauding] [upbeat music]

Episode duration: 46:26

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode 6amLO7I9xdg

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Intro

Claude comes to London: first international Code with Claude

Learning to code on a calculator: practical “magic” and early tinkering

From eBay HTML hacks to modern complexity—and the distance collapses again

Real-world impact: Spotify migrations and Binti’s foster care outcomes

The capability-adoption gap and why developers matter

Today’s roadmap: model layer, managed agents, and Claude Code primitives

Claude’s model evolution: from code proficiency to end-to-end ownership

Why incremental intelligence creates new markets: tool use, thinking, long context

Customer results with Opus 4.7: coding agents, engineering tasks, and design taste

What’s next: judgment, ‘infinite’ context, and multi-agent coordination

Developer guidance: design for the next model, reduce scaffolding, upgrade via evals

Claude Platform: solving outcomes + scaling production agents

Lower cost frontier quality: the advisor strategy (executor + advisor models)

Claude Managed Agents: faster to production with orchestration, outcomes, and dreaming

New enterprise controls: self-hosted sandboxes and MCP tunnels (Counter demo)

Claude Code product layer: interfaces, org-wide adoption, and new primitives

Live demo: async coding with desktop sessions, routines, and CI Autofix

Closing: one story across layers—capability is here, speed to impact is the gap

Get more out of YouTube videos.