Code with Claude 2026: Opening Keynote

Get the latest updates from Anthropic's engineering and product leaders at the Code with Claude 2026 opening keynote in San Francisco.

May 6, 202647mWatch on YouTube ↗

EVERY SPOKEN WORD

35 min read · 6,983 words

SPSpeaker
[upbeat music]
SPSpeaker
Please welcome to the stage Chief Product Officer of Anthropic, Ami Vora.
SPSpeaker
[upbeat music]
SPSpeaker
Good morning, everyone. It is such a pleasure to see you all. Thank you for joining us. When I think about why I'm here today, I go back to the first time I wrote a computer program and it worked. I didn't grow up coding. I grew up in the foothills of the Appalachians. I never built my own computer. I didn't even play video games. The first time I actually tried to build anything complicated was in my college computer science classes. Now, this was so long ago, we had to wait in line to log directly into the servers because they were the only thing powerful enough to handle our ray tracers. This might be familiar to some of you, the hum of servers, the smell of old pizza, coffee, and that very specific aroma of a windowless basement computer lab. [laughs] I can still remember that feeling of hitting compile and waiting to see if my program worked, that feeling of joy, discovery, a little relief, and the excitement that I had made something that had never existed in the world before. That's what hooked me, and that's why I'm here today. But so much has changed. What I could only get by waiting in line at a college computer science lab, that's available to anyone, any day of the week, anywhere in the world. No line, no weird smells, [laughs] and no barriers, just that same feeling of excitement, joy, and relief. And I know a lot of you feel the same way. People say to me all the time, "I feel like Claude c- has given me superpowers." It's one of my favorite things to hear. And we're seeing people use those powers. For instance, Scott McVicker runs developer infra at Stripe. One of his teams had fifty thousand lines of Scala that needed to become Java before they could upgrade their JDK. The initial estimate was ten engineering weeks. They used Claude and finished in four days. And sometimes speed isn't just about efficiency. It's about what's waiting on the other side. Felicia Kakru is the co-founder and CEO of Binti. Her software runs the systems that caseworkers use to place kids in foster care, the paperwork, the home visits, the licensing process. This year, her team used the Claude API to give caseworkers back hours they used to spend on paperwork. That took twenty days off the process of licensing a foster family. Twenty days. It's not just an efficiency metric. That's a kid connecting with a family. And that excitement, joy, relief, that feeling of discovery, it's something I hear from everyone. I'm gonna guess, though, that everyone here experiences it differently. Some of you are living on the frontier every day. Some of you are bringing along the people around you, and some of you came here because, like me, you can feel the ground shifting under us, and you want a view of what's to come. Trust me, I feel all [laughs] of those things often in the same morning. I come to work with a plan, and then I have to tear it up by lunchtime because something new has happened. That sound familiar? And, you know, that makes sense when we step back and look at how fast these models are getting better. At Anthropic, we talk a lot about the exponential, and I think that's what we're all feeling right now.Remember a couple years ago, the frontier of model development was something so good it could write a decent email. And [chuckles] we were pretty happy about that. A year ago, we were standing on this stage, Opus 4 was the headline, and the idea that an agent could run for an hour without a human checking in felt like a stretch goal. But then six months ago, agents were running end-to-end overnight, so we'd wake up to finished work. And then last month, Mythos read the entire OpenBSD source tree and found a 27-year-old vulnerability that had survived every human reviewer, every fuzzer, every static analyzer thrown at it for almost three decades. The jumps keep getting bigger, and the intervals keep getting shorter. But even though model capabilities are improving on an exponential, most organizations are still adopting AI on a linear path. And that means there's a gap between what AI can do and what it's actually doing for people. Closing that gap, translating model capability into something real people use to solve their problems, that's what developers do. That's what you all are doing. And we're seeing it happen. Year over year, API volume is up nearly 17X on the Claude platform. And on Claude Code, the average developer is now spending 20 hours per week running Claude. Now, like you, we've been shipping a lot lately, and we want you to walk away from today with a clear picture of where we're headed, so you can plan for it and ride the exponential with us. Let me say up front, we don't have a new model to unveil. Today is about how we're making our products work better for you, so you can close the gap for the rest of the world. And this morning, we'll show you what that looks like. First, Diane will talk about our foundation, the model layer. She'll share more about our frontier models and what's coming. On the Claude platform, we're shipping updates to Claude managed agents, Outcomes, Dreaming, multi-agent orchestration. And Angela and Caitlin will walk you through how the platform handles the infrastructure, so you don't have to. And on Claude Code, Cat and Boris will walk you through how you can use new primitives like routines to let Claude Code prompt itself even when you're away from your computer. But all this comes back to you and what you're going to build. 'Cause most people will never call the Claude API. They'll never open a terminal and type Claude. They'll experience AI through something one of you built on the Claude platform, whether that's a designer exploring new directions with Canva or a lawyer getting a brief out the door faster with Legora or a developer using any one of the world's best coding agents. So thank you. You all shape what AI feels like for everyone else. We'd never be able to build everything [laughs] that, um, that people need to solve their problems. That's something only you all can do. And one way we want to show our gratitude is by sharing a little exciting news. As of today, we are increasing rate limits for developers on Claude Code and the Claude platform to help you keep building and closing that gap for the world. More specifically, we are doubling Claude Code's five-hour rate limits for Pro, Max, Team, and seat-based enterprise plans. And we're raising our API limits considerably for Claude Opus. We're making this possible by expanding our compute partnerships. We're partnering with SpaceX to use all the capacity of their Colossus One data center. And we're investing this directly into individual developers and small teams. Over time, we'll continue to explore every way to help you get the best out of Claude, including our existing compute efforts and even bolder bets. So thanks for being here today. Thanks for partnering with us to shape what AI looks like for the world. Thanks for giving people superpowers. Up next, Diane, who leads our research PM team. Thank you.
SPSpeaker
[cheering] [upbeat music]
SPSpeaker
Congratulations. Thanks, Ami. I'm Diane, and I joined Anthropic back in 2023. And I've been a part of every model since Claude 2. For those of you who are counting, that's bringing 18 versions of Claude across Haiku, Sonnet, Opus, and now Mythos to users and developers like you. We wrestled with making Opus 3 great at adhering to JSON and also making it the best at writing long-form code. With Sonnet-3.5u, or as we all now finally know it, Sonnet-3.6, we taught Claude to make and use a computer safely. And with Sonnet-3.7, which had a tendency to be slightly too overeager, we figured out the right way to expose that to users and developers, so you could get the most out of Claude. This time last year, we used Claude 4 to be able to use thinking dials in a way that worked well and to address test time compute. And we haven't slowed down. In the last 12 month, we shipped eight frontier models to developers and users, each one building upon the last, allowing you to write better code, and the products you build go further than previously before.The model layer underpins everything else you'll hear about today, and that's the bottom line. As model intelligence increases, your starting line moves forward, and you could do more than ever before. We talk about the an-- exponential a lot at Anthropic. You heard a little bit from Ami as well. For me, the exponential means that as model intelligence increase, the use cases you can build and deliver to your users increases exponentially. For example, agentic coding is far more impactful than code autocomplete. And in this way, new products and new experiences create new markets and grow the pie for everyone. In research, we don't think about the exponential as sweet bench numbers going up. It's also about creating and tracking capabilities that previously didn't exist until we designed and created them. Tool use, computer use, thinking that adapts to the problem, agentic loops that hold a plan over hundreds or thousands of steps, and long context windows that teach Claude knowledge that it previously didn't have. And these capabilities don't just stop at code. Today, Claude can generate and iterate on visual designs, analyze and create complex work deliverables, and also navigate business domains you might be a part of in open-ended, ambiguous fashions. That's because the model intelligence, the core foundation, has gotten smart enough and strong enough to support all of this. When you're building on Claude, you're building on the model family that created these capabilities first and has had the most time to make them reliable. Let me make that concrete with our latest model, Opus 4.7. AMP, the coding agent, moved their entire smart mode onto Opus 4.7 because it scored the highest on their benchmarks, and they were able to simplify their tooling, change their scaffold, because the model no longer needed the help. Rakuten ran it on their benchmarks and resolved three times the number of production engineering tasks than they previously could. And finally, Intuit's Opus 4.7 identify its own logical faults during the planning stage, figure out what was wrong, backtrack, resolve it, ultimately leading to faster and cleaner execution. The day after we launched Opus 4.7, we launched Claude Design by Anthropic Labs, one of my favorite launches this year. Already, people are building production interfaces with a combination of Claude Design and Claude Code. This is because Opus 4.7 has a real taste for visual design, the right nuances to show while adhering to your design principles. We also hear from everyday users that people like to use Claude because it understands a full assignment and can figure out when to push back and question assumptions. At the same time, as each of us already know having built on these systems, the models are unfinished and they're works in progress. They can still be stumped sometimes by very basic things and also lose the thread when you introduce a lot of context. That's what makes this exciting, and thanks for being on this journey with us. Here's a little about what we're working on and what's ahead. First, higher judgment and better quality code taste. This means versions of Claude that you can trust with complex autonomous engineering work. Second, context windows that feel infinite when combined with high-quality memory. So it feels like you could do a long-running task while getting better results. And finally, multi-agents coordination, powering teams of agents and instances of Claude that collaborate on big goals that far too big for any single instance ever could. The way I think about progress and model intelligence is task horizon, which is a measure of how long a version of, of Claude or a model can work autonomously while improving on its deliverables and the quality of its work. Last time this year, models can work for minutes. Now, most of you and I probably have agents that are running for hours on end. And tomorrow, we'll have agents that are proactive, always on, and know what to work on without losing the thread. So what do we as developers make of all of this? The exponential will keep improving, and you need to build for emerging capabilities, not just for today's versions of Claude. This is because new models will be far more capable than the ones we have access to today. It used to be that we have to build scaffolding to keep every version of Claude up, and now scaffolding is there to actually amplify model intelligence. You used to have to design complex iterative loops, give it the right tools, figure out how to do retries, and now all of that can be folded into the right thinking and the right execution right within the model.
SPSpeaker
You are already seeing where this can go. Opus preview Mythos is the next point of that exponential, and it's not a small step. Therefore, how we all work with model and Claude need to change. Here are some things we think about at Anthropic. First, you need to design for the next version of Claude, not just the current one. We've seen countless times that the developers who win are the ones who optimize their architectures to absorb the next intelligence jump, not just today's incremental accuracy. This means maintaining and creating harder evals, building ambitious prototypes that you don't think might work today because that's how you'll notice when the exponential is improving and moving under you. That something that previously didn't work all of a sudden starts passing. That's a sign that you probably have something magical to give to your users that didn't work before. And here's what the teams that are getting the most out of Claude have figured out. Model upgrades are a business opportunity. The teams that are getting the most out of Claude models are the ones who make upgrade cheap. This is automated evals, simple scaffolding, and ambitious prototypes and uses of capabilities that others haven't imagined yet. We believe that the exponential on slide one is gonna keep looking like that. As model intelligence increase, you as developers have the chance to make a head start to experiment with building new use cases, creating exciting new products for your users, and ultimately creating new designs and markets, and ultimately grow, growing the pie. Everything Caitlin and Angela are about to show you will give you the tooling to make all of this possible and come to life. Thank you so much for being here. [upbeat music] So model capabilities are on the exponential, but businesses are still operating on the linear. And so as a business, it's never been more important to make sure that you're really able to harness the power of that exponential. But what's stopping businesses from really snapping to that? Well, it really boils down to two key problems. The first one is getting the right outcomes. Getting the right outcomes is still too difficult, right? You have to be able to do a lot of prompt optimization, tool construction, harness engineering. There's still a lot of work that it takes to really steer the model to exactly where you need it to go.
SPSpeaker
That's right. And the second problem is you wanna ship fast, but you wanna ship scalably at the same time. Everyone in the tech space is moving insanely fast right now, and you've got to keep up, but to win, you need quality too. It's really easy to ship prototypes, but it's really hard to scale in production. So we've built the Claude platform to give you everything you need to get those great outcomes and to ship with speed and scale at the same time. The platform comes with API primitives that are tuned to Claude models. It gives you infrastructure to build and scale agentic systems, and it gives you controls to operate those systems.
SPSpeaker
That's right. So if we go back to our problem statements that we hear all the time from different businesses, one of the most common ones that Caitlin and I hear all the time is how much businesses need high intelligence, but of course, at lower cost. So one of the ways that we're solving this is with the advisor strategy. This is really easy to implement. All you have to do is update your tools array inside the messages API. And what we're doing is that we're actually providing an agent architecture that splits execution from advising. So in execution, you can pick a smaller model, it'll be a little cheaper, but then that small model, when it needs advice on what to do next, it can actually reach out to a bigger model for help.
SPSpeaker
Yeah. So in practice, this means you could use a Haiku or a Sonnet class model to execute and Opus as an advisor. And when we ran this with Sonnet executing and Opus advising, we saw Sonnet perform way better than Sonnet alone. But more importantly, we saw Sonnet perform even more cheaply than Sonnet alone because Opus was able to advise it to get its work done better. A great example of this was eVE Legal. eVE Legal used the advisor strategy, and they told us they got frontier model quality at five times lower cost.
SPSpeaker
Yeah, and that's awesome. We love stuff like this because you can go ahead and use this in things like a freemium model, right? When you offer these kinds of freemium experience to your users, you have to be conscious of the costs that you're going to incur, but obviously, you wanna make sure you have a good experience for them. It's also really great for areas where you have an extremely high volume amount of workloads, and of course, you'd have to be a little conscious of your ROI. So that's all great, but what about speed and scale? Those two things that Caitlin mentioned to be really difficult to achieve at the same time. Well, most recently, we introduced Claude Managed Agents. Claude Managed Agents is an agentic harness that's paired with production-grade infrastructure. Teams are able to go from prototype to production in literally a matter of days. The teams that we've worked with have been able to ship literally ten times faster with Managed Agents. Another great thing with Managed Agents that we love is that it bundles in a lot of the best practices out of the box. So for example, when you build an agent, one of the best practices that you wanna make sure that you do is, of course, give it memory. That way, the agent persists user preferences, and it kind of remembers closer to what you want it to do every single session
SPSpeaker
It's a little difficult to build memory, and so this is an example of a best practice that we actually just bundle in out of the box, and it's automatically tuned for Claude. And we wanna make sure that everyone hears this. When we do give you memory, that memory is ultimately yours, so you can take that and port it wherever you'd like.
SPSpeaker
So one of our favorite examples of someone who built on Managed Agents was Notion. Notion wanted to build for speed and scale at the same time, so they chose to build on Managed Agents, and they built the ability for you to fire off Claude agents directly within their product experience for long-running, complex, autonomous tasks.
SPSpeaker
Yeah, and we love that feature. That's super cool. All right. Well, today we are upgrading, uh, Claude Managed Agents with three really powerful features. We're going to be introducing multi-agent orchestration so that you can actually create fleets of agents to solve really complex tasks. We're gonna introduce Outcomes, which allow you to specify exactly what success looks like, and then Claude will literally just iterate until it gets it done. And we're going to introduce Dreaming, and this one we're really excited about. But with Dreaming, Claude is actually able to self-learn. It's able to actually inspect over its previous sessions, figure out skills that it missed, lessons it should have learned, and actually apply those directly to memory on its own. But instead of just showing these all of, uh, talking about all of these with you, we're actually gonna show you live what these things look like. So Caitlin, let's do that.
SPSpeaker
Let's do it.
SPSpeaker
So Caitlin and I have, have been inspired by some of the announcements earlier today. There's greater API rate limits, uh, for Opus and, uh, we've, uh, you know, been hanging out with a, a particular space company for, uh, uh, most recently. So we've been inspired to create a little startup of our own, a fictional one, to be clear, uh, called Lumara. And with Lumara, we've decided, why don't, you know, why don't we build agentic software that helps us autonomously land drones on the moon? And we really care about speed and scale at the same time, so obviously, we are going to build this on Claude Managed Agents.
SPSpeaker
Exactly. So let's say we line up our first customer, and this hypothetical first customer wants to land drones on the moon to mine for hypothetical materials. Um, and this is a big, ambitious job, and despite all of our dreams, we're not actually aerospace engineers, so we're gonna need really awesome agents to get this work done for us. So we're gonna integrate all three of the new features that we just talked about. Um, and we did this for our first customer, and I'm gonna show you using the Claude API CLI how we actually set that up. So first things, first things first, this is a big job, so we actually want multiple agents to help us get this work done. Um, so I'm gonna go ahead and show you the agents that we actually have set up for our customer. The first one, we have a commander agent, and really, the commander agent's job is to make sure that this whole mission goes well. Then we've got a detector agent, and the detector agent's job is to make sure that we're actually finding the sites to land on that will have high-quality mining materials. And then we've got our navigator agent, and the navigator is making sure that we're landing our drones safely and flying them to their destination. So I'm gonna go ahead and actually set up our commander to be a coordinator of the other two agents.
SPSpeaker
And when this is running, what's actually happening is that the commander is spinning up a session, and then for each of these sub-agents, they have their own independent threads so that they have independent context windows. This is a very intentional design, and we found that by doing this all together and then merging in all the results, we get better performance.
SPSpeaker
Exactly. So that's multi-agent. Let's go ahead and integrate Outcomes. So the way Outcomes work is we wanna make sure that our customer, which has very specific criteria for what they want to accomplish, can define that criteria, and then we can provision a greater agent that actually makes sure we get that outcome. And so Outcomes actually just start with a pretty simple markdown file. So here you can see in a markdown file ex- again, really, really simple, and it just kind of outlines this criteria that shows us whether a run is successful. So we want our drones to touch down softly, we want them to land on clear ground, um, and pretty importantly actually, we wanna have enough fuel in reserve so that we can get our drones safely back to Earth. Um, so in order to actually set out this rubric for our outcomes, I'm gonna go ahead and send an event, um, to our session that, uh, defines this rubric as our outcomes.
SPSpeaker
Yeah, and when this is running, like Caitlin mentioned, we actually create a separate grader, and this grader agent is evaluating across the session as to whether or not in each run we've actually met the rubric that was specified. Now, of course, it could one-shot this, but more likely than not, it probably needs to iterate over a couple sessions to do it. And as Caitlin has highlighted here, you can specify the max number of iterations you want to allow this to do.
SPSpeaker
So we've integrated multi-agent, we've integrated Outcomes, and it's time to test. Our customer gave us some data on six hypothetical sites that m- they might wanna land their drones on for us to run some simulation sessions and see what happens. So I'm gonna pop over into Lumara's dashboard where you can see that I ran, um, a simulation against these six sites.
SPSpeaker
Yeah, and this is pretty good. I mean, this is a one-shot with our whole system. It's got the multi-agent architecture, it's got the Outcomes feature integrated, and you can see here that it solved four out of six sites correctly. But clearly it could've done a bit better on sites three and sites four, and like any two, you know, good founders, we obviously want to hill climb on this system. And normally, hill climbing is a pretty difficult process. You have to put t- a lot of work together to go and do that, but we're gonna show you how we are hill climbing on this with just Dreaming.
SPSpeaker
Yeah, so we ran the simulation yesterday. We weren't quite happy with our results, and we came into the Claude developer console into our Dreaming interface. And you can see I can actually just hit this little button that says Dream, choose a memory store where a Dreaming agent can go and look over all of those past simulation sessions and write its learning to memory so that all of our new sessions can actually reference those learnings in memory to do a better job. So I did this last night, um, and this is our dream that ran. Um, and you can see that, uh, we wrote a bunch of stuff to memory, which is awesome. Um, and crucially and most importantly, the agent actually chose to write a descent playbook.So that all of our additional sessions that we run going forward will have this playbook to reference, um, including all sorts of heuristics from the different missions that it ran previously. This is like a really robust playbook of a whole bunch of information that we can go ahead and use. So this ran overnight last night, um, and I popped back into Lumara's dashboard this morning and ran a new simulation now that our system is upgraded.
SPSpeaker
And that's awesome. Right? We were able to actually hill climb and not regress on any of the ones that we cared about, but the two sites that could have improved have actually improved. And to hill climb on this, all we had to do was just have Caitlin press a button in the console called Dream. All right. Awesome. Let's wrap up. And everything that we showed you here today live in this demo is available on the Claude platform for you all to build on. Multi-agent orchestration, Outcomes, and Dreaming are now making the Claude Managed Agents primitive so much more powerful so that you can use it to construct deep, powerful agentic systems that scale. Whether you're landing drones autonomously on the moon or you're building the next big business, Claude Managed Agents is here to help you construct the agentic systems that you need and solve the problems that you're looking to solve. Now, we would love to pass it on to Cat and Boris, who will show you how Claude Code is making it even more fun to build as a developer.
SPSpeaker
[upbeat music] Thank you. Angela and Caitlin just showed you how the Claude platform closes the gap between what models can do and what agents' businesses ship. We have a related challenge on Claude Code. We also want to close the gap between model capabilities and what every developer can actually do with them. First, I just want to thank all the developers here in this room. Thank you for trusting Claude Code on your production databases back when Sonnet three seven was our frontier model [laughs]
SPSpeaker
[laughs]
SPSpeaker
And when our product was a bit rough around the edges. Your support is what makes our team so excited to come in every day and make Claude Code even better. Let's start with why Claude Code exists. Software development is being reinvented in real time. The mission of Claude Code is to close this gap between the great ideas that you all have and shipping a product to the market. And the way that we enable this is we build tools that elicit the frontier intelligence from our models, and we make them accessible to every builder. And we don't think of ourselves as having a finished roadmap for you. We think of ourselves more like mountaineers, kind of like climbing alongside you into terrain that none of us has explored before, learning together what works as we go. And we're growing with you. We're growing with increasing AI capabilities, and we're navigating these new challenges together. I still remember a year ago when I would give Claude Code a task, and I would carefully review every single edit it was trying to make, every single permission prompt, giving it really detailed feedback on what I liked and what I didn't, and just, like, holding its hand every step of the way until the result was good. I remember some of these tasks would be like one hundred, two hundred permission prompts, uh, unti- until I got the final result. And now most of you are running in auto mode. You're delegating permissions to Claude, um, and you're checking in after Claude has done a lot of its work and has a PR for you to review. Over the last year, we've expanded the number of ways that you can use Claude. We started with the terminal, then we launched the IDE, and now we have desktop. We started with the CLI. This is still the interface for power users who want a minimal text interface, who want all the latest customizations and the most control. Then we added IDE because a lot of you actually want the same powerful agents, but you want to follow along with all the code changes that it's making. And then based on all of your feedback for something a bit more visual, we knew where we had to go next. We launched our newest surface, Claude Code on Desktop. It's a surface that's designed for people who want this, like, full-screen graphical interface, built-in preview so you can watch as Claude develops your app, a sidebar control plane for all your agents, [clears throat] and the ability to render images and rich outputs. We've built Desktop to be a control plane not only for your local sessions, but for your remote ones as well, with visual indicators for which agents are stuck and which ones are ready to go. The, the IDE and the desktop app are built on the Claude Agent SDK, the same SDK that many of you are already building on. Many enterprises have adopted Claude Code tools wall to wall. At Anthropic, this has driven a two hundred percent increase in the number of PRs per engineer while keeping the same code quality bar even as our engineering team has scaled substantially. Together with you all, we're discovering and redefining the future of what software engineering looks like. We're embracing these new challenges by embracing automations powered by Claude to overcome each. I'm going to walk through a few of them right now. Here's the feedback that we heard from our users and what we've built with the help of this community. We heard from you that you want to spend less time on code review. So we shipped code review that deploys a team of agents to catch critical bugs on your behalf. Thousands of companies use this every day, including all internal Anthropic teams. We heard from you that you really want to code on the go.So we launched remote control, and we added Claude Code to the iOS and Android Claude apps so that you can fire off a task from anywhere. You're no longer walking around with an open laptop, [laughs] balancing it, trying not to fall, um, and you're no longer stuck at your desk. You can now go to a park, touch grass, and still code. [laughs] Um, we heard that you're spending a lot of time babysitting PRs, fixing flaky CI tests, addressing code review comments, resolving all the merge conflicts, so we added Autofix. It just listens to all these events, proactively puts up fixes so that your PRs are always green. We heard from you that you're kicking off Claude Code tasks on new tickets and new customer bug reports, so we thought we should build routines. So we have routines, and it lets you configure, configure once, listen for webhooks, API events, or run on the schedule, and it'll just kick off Claude Code automatically for you. So instead of you having to manually kick things off, Claude will handle it. And last, we heard from you that you're launching so many features that your security teams are having a hard time keeping up, [laughs] so we built Claude Security. It scans your whole code base overnight, and it can kick off Claude Code to address the vulnerabilities that it finds. All of these primitives compose together, and this helps all of us together adapt to the future of what engineering looks like. Everything I've covered is something that you can pick up today. It's been especially exciting to see how a range of companies have taken these tools and adopted them at the scale of entire orgs. First, I wanted to share about Shopify. They power e-commerce for millions of merchants worldwide, and they've imbued AI in across the entire engineering org and changed their culture. They're using Claude Code across the company, both on engineering teams, but also non-engineering, so design, product, data science. They're building it directly into their platform and standing up tools at scale. Andrew McNamara is the director of applied AI at Shopify, and in his words, the speed is just crazy. Claude Code has completely transformed how they build internal tools. Another example is Mercado Libre. They're Latin America's most popular e-commerce platform. They serve over 100 million buyers. Their org is twenty-three thousand engineers, and everyone runs on Claude Code. When that happens across an org, the work itself changes shape. Engineers are pointing agents at tech debt that people haven't touched in a long time and people don't have time for. It's reviewed more than five hundred thousand PRs with human oversight and modernized more than nine thousand apps. Oscar Mullen, who leads technology, is aiming for ninety percent autonomous coding and fully of agent-driven PR loop by Q3 of this year, and we hear this from many others across the industry. The detail I love the most here actually isn't this number. Uh, it's that a lot of the managers and VPs we talk to are, like, going, getting their hands dirty in the code base again. Claude Code is putting coding back in the hands of people who've spent the last decades on roadmaps and reviews, and now they're back building. We see this across the industry. Millions of developers are getting more product shipped at a higher quality than before. Now, let's see what this actually looks like in practice. To take you through it, please welcome the head of Claude Code, Boris Cherny. [upbeat music]
SPSpeaker
Thanks, Kat. Can we do a quick selfie?
SPSpeaker
[laughs]
SPSpeaker
Phew. All right. Before I jump into this demo, I just wanna mention something. Everything that we're showing today still feels magical to me, and I work on Claude Code every day. Even in Anthropic, we share screenshots back and forth of the cool things that people are building with Claude and things that people are doing in the wild, and honestly, I just feel excited to be on this journey together, um, discovering all of this. So, uh, today I'm excited to share a few more examples of, uh, what this looks like. Unfortunately, we can't all work in the lunar drone business. So for this demo, let's imagine that we're an engineer at Acme Pay, and it's a payments infrastructure company. We're gonna start the Claude desktop app, and we're gonna start by working on a single task. In this session, Claude's working on adding refunds to Acme's merchant dashboard. It's building a full implementation, uh, item potency, so a duplicate webhook doesn't double refund a merchant. There's multi-currency handling across all the regions Acme serves and audit logging for the compliance team. It writes the implementation, and it's gonna verify its own work. Claude pulls up the merchant dashboard, it triggers a refund, and, ah, there's no success toast. That's a real edge case. Claude sees the failure. It traces it back to a race condition in the optimistic update. It fixes it.And it's gonna verify that it actually works in a browser before it calls the task done. Now, uh, let's zoom out. This session wasn't running alone. It, it's actually one of many sessions that were all running in parallel and, uh, being managed in parallel. In the Claude desktop app, you can now see all your Claude Code sessions, which ones are running, which ones need your input, and which have PRs that have already been merged and closed. Synchronous coding is now just a slice of what's happening at any given moment, and we think that going forward, a lot more code is gonna start to be written in an async way. And this is why we keep talking about verification. If Claude can check its work, you can just let it run while you work on something else, and you come back to a fully working result. And I'll, I'll-- for me personally, a lot of my code nowadays is written by routines. I'm not the one doing the prompting. I'm the one creating a routine that does the prompting. For engineers in the room, think of it like a higher order function. Routines are a higher order prompt. For example, the refund session that we just looked at. A teammate filed a GitHub issue overnight, a routine watching the repo picked it up async, and then kicked off the work in Claude. With routines, developers can set up async automations and wake up to PRs that are ready to merge. Here's our routines view. Routines can be run on a schedule, they can be kicked off by webhooks, or they can even be kicked off by arbitrary API calls. You can run them locally on your machine or on remote cloud compute. Let's look at one more feature. This is CI Autofix that Cat talked about earlier, and what it's doing is it's watching the PR the prior session just opened. Its job is to babysit the PR to get it all the way to production. It's gonna autofix any comments from code review and security review. It's gonna autofix CI, and it's gonna autorebase if there's merge conflicts. And look at what just happened. CI flaked on a network timeout, the routine woke up, it diagnosed it as a known infra issue, it retried the job, and now it's green. And actually, in the Claude Code code base, we have it not just retry, we have it fix the root cause every time. The engineer who owns the PR is never gonna see a red X, and that work is off their plate. And that's the shift. The default isn't, "I'm gonna prompt Claude Code." The default is now, "I will have pro-- I will have Claude prompt Claude Code." Everything you just saw is available today, including routines, the latest updates to the Claude desktop app. We're excited for you to try it out and let us know what you think. We hope these features continue to close the gap between your ideas and shipping products. And that's really what every talk today was pointing out. Diane's capability curve, Anshu and Caitlin's agents that grade and improve themselves, what Cat and I just showed you. These are three layers of one story. The capability is already here. The gap left is how fast we put it to work. I encourage you to spend the rest of today exploring these layers. Research talks if you're evaluating the models, Claude Platform sessions if you're building for your users, uh, or Claude Code workshops if you want to learn more ways to bring Claude into your day-to-day development work. Dive in, go deep, and start building with us. Thank you. [audience applauding] [upbeat music]

Episode duration: 47:29

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode GMIWm5y90xA

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome