EVERY SPOKEN WORD
25 min read · 5,497 words- SPSpeaker
[on-hold music]
- SPSpeaker
Hello, everyone. Uh, my name is Rebecca, and I'm on the Anthropic Go-To-Market team. It's wonderful to see so many developers here today at our first Code with Claude in London. Uh, we have a really special panel here today with tech leaders from monday.com, Doctolib, and Delivery Hero. Um, for those that don't know, all three companies were founded between twenty eleven and twenty thirteen, so before the rise of the LLM era. And today's conversation's really going to be about their pivot to becoming an AI-native enterprise. Um, with that, would love to pass it off to the panelists. Would love a quick intro, uh, name, role, company, um, and then what's the Claude-powered system that's now critical for your organization, and roughly how old is the codebase that it lives in? Ruslan, if you want to start.
- SPSpeaker
Hi. I'm, I'm one of the engineering leaders at monday.com, and, uh, we are the company which tries to reinvent everything about how we do and what we build for our customers. And Claude is actually powering us on this, uh, huge new mission from reinventing us as a platform to manage work to actually a platform who helps execute work for our customers, uh, building teams of agents and, uh, tons of, uh, native AI capabilities. Um, the company is roughly, like, fourteen years old, so the codebase, uh, of a company that had been all this time in a startup mode, you can imagine it's like, uh, it's a monolith, uh, journey that takes a long time, uh, to break out and a lot of, like, uh, imperfections. Uh, so yeah, it's, it's quite a journey we're on, so...
- SPSpeaker
And hi, everyone. Uh, so at Doctolib, kind of we have two missions. Uh, first one is to make people healthier, and the second one is to improve the daily life of healthcare professionals. So for people, kind of we build, like, a full health companion where you can, uh, find and kind of book an appointment with a doctor, uh, patient messaging, manage your kind of personal health record. And then for healthcare practitioners, it's all the solutions that they need to do their job, from clinical, kind of financial, care corporations, kind of manage the patients. Uh, in terms of usage of Claude, uh, we have pretty much now a hundred percent adoption of people kind of building with Claude. And building, it's not just about engineers, but it's everybody, kind of product managers, designers. And, uh, now with Claude Cowork, it's actually, like, a large proportion of the population outside of kind of tech and product that is also now beginning kind of to use Claude to do their daily work. And then in terms of, uh, kind of the codebase, uh, we're about half of it is, uh, a monolith, which was created, uh, now over a decade ago, kind of when we were founded. And then about half is also kind of similar journey as what, uh, Ruslan was mentioning, uh, being built in the last couple years, uh, in terms of, uh, distributed systems and the kind of services.
- SPSpeaker
Yeah. Uh, so my name is Ulrich Schäfer. I'm, uh, VP for Tech Foundations at Delivery Hero. Delivery Hero is the world's leading local delivery network, so we're delivering food, groceries, and other things, uh, to our customers worldwide in over sixty markets. Um, what we're doing with Claude is obviously we're also rolling out Claude Code to our engineers. We're, um, using Claude also within our products. But what I want to talk about today is mostly how we build our autonomous software delivery system, uh, called HeroGen.
- SPSpeaker
Awesome. Um, Roderique, I think with that, my next question is, none of the three of you really got to greenfield this. You all had an existing working product, a lot of customers, and a large functioning engineering org and, and many years of codebase, right? And so would love you to walk us through what you built with Claude and how it actually plugs into what was already there.
- SPSpeaker
Yeah. So, um, if we look at this autonomous software delivery agent, uh, what we wanted to achieve there-- and this morning we heard already on the keynote that when you, when you look at things like Claude, you need to build for the next model, not for the current one. We did exactly that last year. We looked at where the trajectory is going, and we decided to try out to, uh, build basically an agent that takes, uh, a Jira ticket or a GitHub issue and takes it to, um, uh, production readiness in terms of a pull request that can then be merged, right? And that's, uh, what we built on, um, over the last, uh, uh, two quarters of last year, and we launched it in, in Q1, and it has gained huge traction across all of our, uh, engineers and subsidiaries in the group. So to just give you some numbers where we are right now, uh, so we're at around one hundred and seventy-three on average merge pull requests that go into production per day right now, uh, in the last ten days as an average. Um, and, uh, yeah, we have around seven thousand merge pull requests in total since the launch in, in February, and the trajectory is really exponential.
- SPSpeaker
Awesome. Alex, would love to hear a little bit about the experience at Doctolib as well.
- SPSpeaker
Sure. Uh, so a lot of our focus has been, like, how to get, uh, everybody kind of to start using Claude and-- but also kind of using it effectively. And, uh, one of the things that we did kind of very consciously was we said, "Okay, it's not gonna be only the teams that are, uh, working on the platform that will be working on kind of building, like, everything around C-Claude and everything around how we build with Claude." But we wanted to kind of really leverage kind of the expertise, the creativity, the innovation of, like, all the engineers, right? Because your best engineers are gonna be the ones who are gonna find the most interesting ways of kind of using it, of applying kind of the new sy- uh, technology.Uh, so what we said was the job of the platform teams is just as much as just enabling, uh, and, uh, finding kind of what are those kind of best practices that people are using, and then helping, uh, accelerate those, right? So either remove bottlenecks or industrialize them and then scale them across kind of the, you know, the teams as a whole. Uh, what we ended up doing was we said, "Okay, we want everybody, like, as they're building kind of their own skills, uh, make them available to everybody else." Uh, we've ended up building a skills marketplace kind of where everything is kind of discoverable. You can see which skills get the most usage, uh, which ones are kind of trending. And then we also have a, um, like an environment that we provide for our developers that have, like, all the tools automatically connected as you start kind of working. And we packaged many of those skills kind of directly into that environment, right? So immediately, like, as you onboard, uh, at Doctolib, you get access to all of those skills. They're available to you. And then there's also kind of plug-ins for experimental skills as people are trying out new things. Uh, and, like, what really has worked for us as well is not just kind of doing this in silos, but, uh, kind of the most popular channel I think at the whole company right now is called Build with AI, where people are kind of sharing kind of everything that they're learning, asking questions, uh, promoting kind of the skill set they have. Uh, like it's, it's really kind of the liveliest conversation that we have.
- SPSpeaker
That's awesome. So it's not just individual adopter at the engineering level, but across the org there's some streamlining and broader organizational-
- SPSpeaker
Yeah
- SPSpeaker
... efforts.
- SPSpeaker
Yeah. The, the goal has been really to try to go down the learning curve together rather than everybody kind of doing things and, and doing amazing things, but by themselves.
- SPSpeaker
Awesome.
- SPSpeaker
Mm-hmm.
- SPSpeaker
Um, Ruslan, I know we were catching up a bit yesterday on monday's agent, but would love to hear more about the evolution and that journey.
- SPSpeaker
Yes. So monday we, we of course do everything, right? We invent, uh, reinvent how we work and what we were, uh, do for our customers at the same time. I want to focus more on like product implementation and customer-facing products we, uh, where we use Claude in. Of course, we build, uh, platform agents on top of monday that help execute work and also invite external agents to be first-class citizen of the platform, which is, uh, creating a lot of challenges for like identity systems and, uh, permission models of course. But, uh, at the same time, uh, one of the biggest successful, uh, releases monday ever had was, uh, monday Vibe. It's a prompt application, um, uh, tool that, uh, we opened to our users, and it's been growing, uh, exponentially in, in the successful adoption with customers. Uh, so in short, it's basically, um, turning a simple prompt into a detailed PRD, uh, for the, uh, for the customer to interpret their intention well and refine it well together and then, like, build a working application literally in, in minutes. Um, there is interesting advantage, uh, that we got to boost this effort and literally do a POC for this like in a couple of days because, uh, monday invested early in like open platform for external developers. And despite we have an old code base that has a lot of, uh, things with it, uh, coming with it, um, the open platform approach helped to literally contain it and, uh, literally let the Vibe coding tool do what external developer's doing using the same APIs, using the same SDKs and, uh, also like deployment mechanism and publishing applications in exactly the same way. That really massively boosted, boosted the initial, uh, initial phase of it. And later stages, of course, to unlock full potential, we still deal with like as soon as you touch every feature of the platform, you want Vibe to, you know, interact with it, an application that you build to be integrated and build more and more complex applications. You need to make sure that every feature need to be API open. Every feature needs to be, um, accessible, uh, to, to it properly, which is of course a much longer, uh, journey we are in. But, uh, that's kind of an interesting factor that, uh, we felt the early investment in that early open platform really paid out in this case.
- SPSpeaker
That's, that's awesome to hear. Um, I'd also be curious, I think as everyone in the room is probably aware of the model under all of these changes every few months, especially with the advancements in the industry. And so I'm curious to understand across the three orgs, um, when a new Claude model ships, what actually happens inside your organization that week? Maybe walk me through a Anthropic releases a new model to we've actually rolled it out into production.
- SPSpeaker
So, um, with, uh, with Herogen or autonomous software delivery agent, I think the, uh, key moment in time where a model change really, uh, introduced a step change was, uh, last November with the new, uh, Opus models, where our vision of this system actually working, [chuckles] um, became a reality. Because before that it was more of, uh, a fancy idea that we had, right? We, we just took a big bet that models are going to improve that much to be able to take whole features and just, uh, do all the work, uh, for the engineer. So this was a step change. Since then, with that particular system, we have, uh, um, stayed with, uh, Opus 4.5. Uh, we have not yet made bigger, uh, model changes mostly because, uh, we do not yet have the AB testing set up and the necessary volume to make, um, good decisions, um, in terms of moving to, to a different model.
- SPSpeaker
Yeah. I think in our case, uh, kind of when can we see kind of some of the new up- uh, updates, uh, first of all, like it's the excitement of people going saying, "Okay, what can I now do kind of with these models that I haven't been able to do before?" Right? Uh, because, uh, many times maybe if you're building your skills or if you're building your workflows, you were potentially kind of trying to compensate for some things. Uh, and uh, now you can go and say, "Hey, kind of do I still need to do this?" Uh, you know, the good thing is kind of if you already have an experimentation culture, like it's something that people are very much looking forward to, and they want to try that instead of having to s- to have somebody go tell them, "Hey, uh, did you see this? Like, uh, you know, is-- are you gonna do something about it?" So it's, uh, you get some of the natural excitement and say, "Hey, kind of, uh, what can I do now with it?" Uh, and I, I think that's, uh, that's what makes it, um-Uh, kind of really a community kind of that's building like all of-- for us, all of those skills together and kind of sharing those lessons.
- SPSpeaker
And, and do you guys have a team internally at Doctolib focused on those evals of new models?
- SPSpeaker
Yeah. Uh, so when it comes to kind of the products, uh, that are AI kind of based kind of products, uh, that, uh, kind of we offer to our customers, uh, we have kind of teams that have very strict kind of evals and kind of every time kind of there's a new release, uh, we go and check, okay, like what's the, you know, like what's the performance of that? Uh, what's the trade-off between like all the different kind of variables that kind of we look at? I think when it comes to kind of more of the kind of development process, it's still like a little bit more, uh, vibes, uh, for me [chuckles] , uh, than, uh, kind of potentially it could be. Uh, where I think like as, uh, uh, as kind of the models continue to improve, it'd be kind of very interesting to see kind of how can we have kind of better verification, uh, to give us more confidence and even kind of to go faster kind of with some of the existing kind of models as things come out.
- SPSpeaker
Awesome. Ruslan, yeah, would love to hear from you, especially as you guys think about monday agents and customer-facing agents.
- SPSpeaker
Uh, yeah. So again, continuing the story of monday vibe, uh, vibe as a app building tool, I think, uh, it actually works as a multimodal system, right? So there's an orchestrator, for example, that uses Opus model, and then there is like a workflow underneath, of course, that has deterministic actions and like, uh, simpler models using different things, uh, executing different sub-actions. Um, so I think the release of the model impacts usually only one part of it. So it's like actually the end-to-end evaluation is the key here. Uh, but of course, each of them runs like, uh, their own like evals in, in specific atomic actions. So there, there was a story, for example, when we, um, migrated from Opus 4.5 to .6. I think that was, uh, quite a change because, yes, it brings all these amazing capabilities, but at the same time, all the system problems we've been, uh, optimizing so far just didn't transfer well, right? It was completely different beast, and it was, um-- it requires us like literally to go and rethink and fine-tune prompt techniques, uh, the, for this new orchestrator to work, uh, and maximize its value for us. Uh, we actually worked-- uh, our team worked heavily with solution engineers from Anthropic that actually really helped to go like into depth and like really understand the best practices there. So I think since then, uh, this is the practice, uh, we do in like major mo-model releases. But a lot of smaller changes fo- of course, go also through still the whole end-to-end testing and then still all good AB testing in production and, and all these other things that normal products go through. Um, but that's kind of, uh, some of the examples I can share.
- SPSpeaker
Awesome. So it seems like there's an internal kind of evaluation phase with your own engineers and maybe, uh, in tandem with the Anthropic folks, and then there's AB testing for the customers and, and end users.
- SPSpeaker
Yeah. It's, it's quite a journey, right? So to release such a major release for a model change is quite a journey. You cannot assume it's just compatible, uh, with the old one. It's literally treating it as a completely different thing, um, and, um, harnessing it and like in a, in a way that works for it, right? I think that's definitely the approach we take now.
- SPSpeaker
Awesome. Um, I'm also curious from the three of you, like what is the architectural or platform decision you would make differently if you were starting this journey today?
- SPSpeaker
Um, yeah. When it comes to the, uh, architecture, what we try to do with the system was that we wanted to integrate it as good as possible into the existing software delivery ecosystem that we have in place at Delivery Hero, which is, by the way, also quite fragmented. Um, so we wanted to connect our agent with all the different systems out there like Jira, like, uh, GitHub Issues, uh, soon also, uh, GitLab, um, as an tou- first touchpoint, integration point with, uh, the person assigning the, the task to the agent, right? So we, we try to not change the whole interface to something else like a chat window or, or something similar. We try to stay with the, the, the current environment that people have to drive that kind of adoption, where they just assign these tickets to the agent, right? There was one thing we also, of course, added integrations into our continuous integration system. So we're running these tests with the agent. We're feeding back, uh, any problems that are, uh, that are being found during the test to the agent to fix them, right? We also have the issue with flaky CI that, that, uh, the agent is fixing now itself. Um, other integration points that we did with our ecosystem are, for example, with our security team. So we, uh, wanna build a way for security vulnerabilities that are code related to be automatically assigned to the agent, automatically fixed, so that the, uh, repository owners just have to look at the pull requests and say, "Okay, that's fine. I take it," right? Another thing from an architectural perspective, and that was what-- was a thing that really drove the success rate, and success rate is, uh, in our case, we define it as the ratio between the pull requests that are actually accepted into code, so merge, and the ones that are actively rejected by a software engineer. Uh, so one thing that drove that success rate to up to eighty-five percent was, uh, what we call a council of agents. This is a set of different models that is looking at the same code and reviews it. The, uh, reason why we choose several different models for this is mostly that we want to avoid that a model has some sort of blank spot or some sort of bias and then doesn't detect, um, the, the issues with the code it itself generated, right? And, uh, so this was a, a pretty meaningful change, um, and actually did not drive up the cost as much as we thought. So it's very, very doable. I can just suggest that to anyone [chuckles] to, to try that out.
- SPSpeaker
Awesome.
- SPSpeaker
In, in, in our case, kind of when it comes to the overall kind of product architecture, I, I mentioned kind of we're moving kind of from a more monolithic system to distributed. Uh, about half of our PRs are still going in the monolith, half are outside. And you can see a notable difference between how easy it is kind of to adopt, uh, like all the tooling, like outside of the monolith, where you have kind of smaller code bases or where you have kind of better kind of well-defined kind of patterns versus inside, right? And, uh, I think one of the kind of choices that we made about everything that kind of we're, we've been building outside has been to be very opinionated. Uh, so it means that there's a very standard way of how we build, uh, like all the services, all the applications that we do outside. That makes it a lot easier, right? And then when we go back inside the monolith, like you actually have to be able to provide a lot more context about, "Hey, here's the right way of doing things," or maybe this was the old way, right? Because over time, uh, you may have had multiple patterns and, uh, you, you kind of the model is very good at figuring out, "Hey, how have things been done before?" So you have to spend like an extra effort to be able to tell it, "Okay, this is the new way that you wanna do it. Don't just kind of follow the pattern, uh, that you see in the code base." So I think that's kind of one big learning that like it at least kind of for, for now, uh, having kind of smaller code bases that have kind of more standardization, uh, and kind of more documentation kind of built into them, uh, makes a kind of big difference in, uh, in the model kind of performance. Second one is, uh, I think kind of many of the things that, uh, you know, you kind of were okay kind of with before when it came to automation, uh, are now becoming to be much costlier, right? So when, you know, the limiting factor was, well, how long is it gonna take us to write the code, uh, you didn't care if, uh, you know, some things had, you know, required a little bit more human interaction. Now that, uh, everything can be-- when it comes to coding, can be done kind of by agents, uh, you begin to constantly kind of find all of those new bottlenecks. And I think that's been like one of our kind of lessons as well, that, uh, to be able to now to go faster, you need to go back and kind of rethink all of like the, you know, the many different touch points. Sometimes it's the different kind of processes, not just architecture that you've built in, and kind of question everything, uh, because like the, you know, what has worked for you before with a very different kind of distribution of ta- like what people were working on, like how, like which tasks took the longest time, uh, it's not working, uh, now. So everything has to be rethought.
Episode duration: 29:10
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode XFaeIbL-lvE
