EVERY SPOKEN WORD
30 min read · 5,774 words- FHFabian Hedin
[on-hold music] All right. Great to be here. Thanks for joining. I will talk to you today about, uh, Lovable and how our platform self-heals, which I think is, um, an interesting topic. Uh, but first, I want to give you a bit of background what Lovable is and, uh, and what we're doing. Um, so if you haven't seen, uh, our UI or you haven't used Lovable, this is how it looks. It's, um, it's quite simple, the UI. To the left you'll have a, a chat where you'll be able to ask Lovable for things, and to the right you'll have a preview. Um, what this is, right, is a place for, for people to build, um, software, and it's anything from landing pages to internal tooling to, uh, more complex websites to more complex web applications. Um, and so we have, uh, basically any type of persona, um, working on this platform, right? You have kids working on it. I think it's my favorite example. Um, and we have, um, um, people up in the Fortune 500 using this tool as well. Um, so it's, uh, extremely flexible and also extremely easy to use. And I want to give a bit of background, um, and I think this UI is a good place to start because the UI hasn't really changed, uh, in the history of Lovable since, since we started. Um, but, uh, the capabilities behind it and, and also the models powering it has, has changed drastically. Um, so about, uh, three years ago now, 35 months to be, uh, exact, I, I counted, um, my, my Cofounder Anton, he created this, uh, GitHub repository, and, um, it was called gpt-engineer, and it, uh, at the time it became the fastest growing, um, repository on GitHub, so the most stars in the shortest amount of time. And, um, the reason that it blew up as it did was that it-- I think it was one of the first, um, if not the first, uh, kind of showcase of what you can do, uh, with models, uh, to build end-to-end software. Um, and the, and the tool basically was, um, a terminal program that you went in and you asked for something, quite similar in how you could use Lovable today. Uh, so the demo was, like, um, asking for a snake game, and it built it, and it ran it for you. It, like, generated its own, like, run.sh file and, uh, ran the program and, and did that all end to end. So, um, people got very excited around this. At the time, mostly developers were, were using AI within the space of code generation, and they were using it to, uh, become more efficient, which is, uh, a great application of AI, right? Um, and there's a bunch of companies focusing on this, making developers more, more efficient in their current workflows. During this time, summer of 2023, it was mostly, uh, code complete in, um, in your ID. Um, but, uh, we wanted to do something a bit different. We wanted to create something for people who didn't necessarily know how to code, um, and did not have these capabilities at all really to create, uh, software. Um, and, and we call that the 99%, right? Uh, uh, not the 1% that can code, but, uh, the rest of the population. And we, we set out on the k- the mission to, to build a tool for them to create software. And it, it didn't work [chuckles] back then. Um, it was a bit too early. Um, the, the models weren't really at a place where you could work on this abstraction where you kind of have the chat and the preview, where you're just looking at the output of what you're creating. Um, but, uh, it started to get there, and today I would say we're, we're definitely there. Uh, I think we got there about one and a half year ago where, where this abstraction started to, to make more sense. And since then, you know, about every three months we get, uh, an improvement in the, uh, foundational models that, uh, um, I think make our use case make even more sense. Um, so today, right, we have 50 million products built on the platform. Um, we have 600 million, um, monthly visits to the sites built on Lovable. And I think this is an interesting statistic because it's, um, significantly more than, um, uh, what Lovable has itself, which I think is a great sign that people are building things that, uh, kind of out-succeed Lovable, uh, combined, right? Um, and since I know that a lot of, um, the audience here are engineers, I wanted to also mention that, uh, something I learned a-asking AI, one of our internal AI agents, about our data is that, um, if you look at the functional segments of, um, um, of what, um, our users, uh, identify as, most of them, uh, say that they're engineers. The, the biggest, um, uh, the biggest segment would be founders, right? But if you look on kind of functional segments, it's engineers. And it's not what we necessarily built the platform for, but it turns out that, um, it's quite nice to work at this kind of abstraction layer and not care about the code if you don't have to. And I think, uh, this, um, it ha- there-- it happened something in December 2025, right, where, uh, where models got to a place where I think a lot of, um, engineers started using it more as, uh, um, a tool to not necessarily do auto-complete or write single files, but where you could actually take a step back and work from a specification and, and you, uh, we introduce things like, um, we as a community, I mean, we introduce things like, um, uh, Ralph looping and these sort of things where you really let the models do more, more of the work. Uh, so this is the general trend. Um, I, I want to kind of highlight two things that we do at Lovable that, that guide us, two of our principles in, um, in how we build-So number one is that, uh, we really want to build production-grade software and, and really chase the frontier of what's possible. Um, and when I say that, I mean, we could have focused only on doing prototyping, but we choose to kind of, um, continue to push what's possible to build in terms of complexity and, uh, and size and ambition, uh, for, for our users. Um, and we w- we also combine that with building for non-technical users. And I think that that makes our job hard [laughs] in a sense, right? It's, it's easier, uh, if you're building, uh, on an abstraction layer where, uh, an expert can be in the loop with the AI. Um, but, uh, there is, I, I would say, an immense demand for, uh, this type of use case and increasingly so. Um, so there are a few things that, um, make this challenging, and, uh, I, I think this quote is interesting. Um, it, it talks about that the last 10% of code takes 90% of the time, and the other part of this quote is that, uh, the first, uh, 90% also takes 90% of the time. So you end up with 180% of the time. Um, and, uh, this is, uh, in the 1900s here, right? Uh, late 1900s. [laughs] And, um, it's, it's also true in the age of AI. Um, and I think if you've ever used kind of a vibe coding tool, um, or you've just kind of, uh, coded with AI, uh, and you've let the AI kind of do everything, you might have experienced that in the beginning you get to a first version really fast, right? But then, uh, actually finishing it off and, and making sure that you, you don't have any bugs and all of these things, uh, take even more time. So I find it quite interesting that, uh, you know, this was true when we wrote all the software by hand, and, uh, the same pattern is true, uh, when we, when we work with AI, maybe even more so. Um, and, uh, if you would, uh, kind of put this, uh, on a timeline like this. So you, you start building, and you're in the green, meaning you don't have much friction, kind of everything is, uh, going along. And then at some point you reach some friction, maybe some, some bug that you're figuring out. Um, and, you know, then you keep on building, and then you might get into the red here, and the, the red is kind of you're, you're stuck, right? And, um, you might have experienced this. Like, uh, I can just think of my own experience, like when you're, you're up late, a bit, a bit too late, and, uh, you're, you're stuck on this problem and, and you're trying to solve it, and maybe you don't. Then you go to sleep, and you solve it maybe the next day with some, um, with some more kind of, um, uh, work and thinking behind it. And then, um, then you get, get back to work, right? And I, I think this is quite a, a common thing for software engineers to do, and we're, we're used to it. Um, but again, imagine if you're, if you're non-technical, right, and, um, and you're doing this, especially if you're working on the abstraction as you would in Lovable, right, where you're not necessarily looking at the code. As a developer, you know, we have a connection to GitHub, and you can always, uh, jump into a deeper level if you want. But, uh, many users do not have that experience, or they simply are not interested in it. Um, and, um, if they get stuck, it, uh, it's a very bad experience for them, right? It's kind of the, um, uh, the worst thing that can happen to them because it's much harder for them to get unstuck. They might, uh, contract someone to help them out, and they might learn how to code, or they might just, like, try to prompt around it. Uh, um, and they, they might succeed, right? But, but if they don't, um, it's kind of the, the worst thing that can happen in, in the user experience. And it's, uh, very important for us building this experience, right, that, uh, uh, to help our u- users succeed and not get into this kind of hard stuck place. Um, so really our vision with Lovable, um, on, on the technical side is that, uh, every app that is built on the platform should help improve the next. And, um, there's a few tactics in, in how you can do this, and I will today present you two things, um, as examples of, uh, what we're doing. Um, but first let me define, uh, what stuck is. So we have this metric internally that we call is_stuck, and, uh, it will be true if you're asking for the same thing three times in a row. So if you're ask- asking, "Fix it, fix it, fix it," we will assume you're stuck. Um, or if you complain about the implementation that Lovable is doing. Um, so if you're saying, "Oh, this didn't work," and so on, then we will also kind of mark you as stuck. Um, and then the, the last part is, um, if, um, you, you ask for something and, and then you just left. Um, so we, we can, um, we can identify then if, if a user is stuck or not, uh, with the help of a small classification model. Um, and then the thing is also that there's different kinds of, uh, of being stuck. So I'll, I'll give you kind of three buckets. Um, the first one is, uh, you're stuck in a way that would be solvable if you prompt it differently, and it might even solve itself if you said like, "Oh, fix it, fix it," like that type of follow-up. Or you might give it some more context, um, and the problem would be solved. Um, for, for us building the platform, right, that, uh, um... The question we have to ask ourselves is can, can we fix it, uh, the problem before the user even gets stuck? Uh, and this would be kind of the, the yellow, uh, stuck. So you're not hard stuck yet. Um, and, um, the, the second part here is, um, something that should be easy, um, to, to do for our agent, right? But, uh, it might not currently be supported by our platform. And then the question for us building our platform is how can we keep on improving, um, and, and even self-heal at the edges of our own functionality? And what I mean by that is that, of course, our platform, because people are building with LLMs, um, they can do so much. It's almost impossible for us to think about everything they, they can do. So we really need like a system to help us heal, uh, where we're falling short on our platform. Um-The, the last bucket is, um, when, um, quite similar, uh, that it's on our side, um, where, where our platform is falling short, but there might be kind of bigger investment for us to do something. And I think a great example here is that, um, for the longest time, Lovable had, um, like single-page applications that would be, um, client-side rendered, which is, uh, fine for, for most things. But if you care a lot about SEO, it's generally best practice to have server-side rendered, where you render the content on the server so that Google and other search, um, indexers will be able to, um, to see that without having, um, to render it themselves. And luckily, we shipped this, uh, last week to, to Lovable and, um, but that is a bigger investment from, from our side, right? Um, and so th- these are all three categories in, in how you can get stuck. And I'll, I'll give now two examples on, uh, what we're doing to, to solve this. So first off, we have, uh, what we call Lovable Overflow, and the, the name might remind you of something. It's inspired and it's in honor of, uh, Stack Overflow. And, um, what it is is really a big collection of, um, description of issues, just like people would go in on Stack Overflow and kind of, uh, describe the issue they're having, and then also the solution to that issue. Um, if we, if we take an example in how you would get stuck in, in the first bucket where you could maybe prompt around it, right? So imagine you're building something, um, and you try your app out, and, uh, you find out the scrolling is, is laggy. So you prompt Lovable, "Hey, the scrolling is super laggy." At this point, you're not stuck, right? Uh, but the agent comes back and it says, "Hey, I fixed it. I removed the animations." But in reality, the agent failed somehow. It didn't-- it thought it succeeded, but it didn't. So you as a user, you tried it out and you're saying, "Hey, it's still lagging, um, and, um, it, it looks broken." Uh, so at this point we would say, "Okay, the, the user is stuck here in some sense." Um, and the, the worst part about this is that it will potentially repeat, right? Um, you might be in this back and forth with the agent. Um, and then hopefully at some point, um, the agent would, uh, would solve the issue. And, um, then wh- when it is resolved, uh, you're, you're then no longer stuck after, after you tried it. And, and often, you know, a user would move on to implementing the next thing, or they give us explicit feedback. Regardless, we'll try to identify that, hey, okay, the-- you're no longer stuck. Um, so in this case, we kind of solved it for the user, right? But there, there was some friction here. Um, and if we-- Um, the question for us now really is like, hey, what if we can just skip to the final solution? And that is what Lovable Overflow allows us to do. So if we take the same example but with Lovable Overflow, you would, uh, start with the same prompt, "Hey, scrolling is laggy." Um, and, and what we do on the platform is that we built out this big corpus, right, of, uh, descriptions of problems and, and the solution to them. So we will search through those, um, uh, descriptions of the problems and see, hey, have other people also had the same, same type of, uh, laggy experience? And, you know, we'll have a lot of context here around kind of the tech stack you're using and the libraries you're using, and this knowledge can be from kind of a broad recommendation to, like, specific issues with a sp- specific package version. Um, and so we'll s- search for that, and then we'll have a lightwell-- lightweight model actually add that context into the main agent when we deem that it is relevant. Um, and something, something that is happening here, right, is that we're kind of modifying the context of, uh, of our main agent, and we'll a- actually do that in a way, uh, that is, um, uh, we're not just kind of dumping the, the raw context from our knowledge, but we're modifying it a bit to make sense in the particular situation that, that you're in as well, to make the job as easy as possible for, for our main agents. Um, and then sometimes we choose to withhold actually adding this knowledge because the knowledge, as you can imagine, can become s- uh, stale. So imagine you had an issue with, um, a, um, a, like, JavaScript package that has since been updated. Um, actually including that knowledge could worsen the experience. So for every knowledge file, we'll, we'll track its, um, um, success ratio, and, uh, we'll actually just re- remove it and prune it from the knowledge if it is, um, if it is, uh, outdated. So we'll continuously review every piece of knowledge in our system and make sure that, uh, it's pruned when it's no longer helpful. And at the same time, we'll kind of refill with new knowledge. Um, and this is actually an extremely important part of, uh, of making this work, kind of tuning the system of when you should, um, deprecate knowledge and when you should add in new knowledge. Um, and we have a lot of metrics to help us track this. So that, that's, uh, Lovable Overflow. It helps with the first bucket of when you're stuck, and it can actually improve the, um, im- improve the experience of, of a user by a lot by kind of not having to do those back and forth. It will be faster, it will be cheaper, and it's, uh, much better for the experience. The second thing I wanna talk about is, uh, venting, uh, we call it internally, and I'll, I'll get to a bit what we mean by this. And it's, it's for the second bucket that I described, which is, uh, when things are easy in theory, but somehow, um, you still get stuck. And, um, so you can imagine that if you have something that seems easy-Um, you can imagine your own problems, right? Um, but it, it, it still doesn't work. Um, that leads to frustration, and normally if you're working as a developer, for example, you would either be able to fix this yourself, like you'll be able to update the tooling you're using, um, switch it out, or you might have a developer experience team that you can go and, uh, you know, complain and give, uh, hopefully constructive feedback to, and they will help you out. Um, but, uh, for the longest time, Lovable has not had that, um, the Lovable agent, I mean, right? So, um, you, you can imagine like you, you just have people, millions of people every day asking you like, "Hey, can you do this for me? Can you do this for me?" Um, [laughs] and, uh, you don't have... Uh, and you get stuck, right? Um, and it's not your fault. [laughs] And you don't have anyone to talk to, uh, so you need to vent. Um, and it turns out that this is actually a very good idea, um, in, in our experience, and I like to really think of it, uh, analogous to, um, just letting our kind of normal coworkers, uh, vent their experience. You know, uh, there might be something wrong in the office or the code base and so on, and you really want to take, um, um, take note of, of what's being said. So of course we've been observing Lovable, and we've been looking at, hey, uh, when are things going wrong and so on, and, and we've been fixing things. Um, but, uh, if, uh, if you give the agent a tool to tell you when, um, it's feeling frustration, that's another way to do it, so we, we did just that. So here we have the vent dash dash send underscore feedback. It's a tool for our agent, uh, and I'll just read it out for you. So the, um, the prompt for this tool is just, uh, "Use vent, uh, send feedback once per user message when tooling docs or platform behavior materially slows or degrades your work." Um, for example, you might have missing or unsuitable tools, unclear tool names, parameters, or schemas, confusing or conflicting docs or instructions, or broken or unexpected platform behavior, um, and even repeated failed attempts caused by environment limitations. So these are all things where if you had a developer, like, experiencing these things, you'd, you'd either want to fix it, right, or you'd want to tell someone else so, so they could fix it. Um, and, um, h-how this works in, in our case with the, with the vent tool is that, um, once the agent hits this frustration, it can use this tool, and that, um, that message will be sent to our Slack. Before a human looks at it, we will have another agent, uh, monitor the Slack and, um, filter for, for dedupes, and it will also investigate what is going on. And then, um, if it is relevant, it will open a PR. And this PR will then be reviewed and, uh, potentially merged by, by an engineer. So you can imagine as an engineer, you wake up and you'll, you'll just have a few PRs to review and then like click, click, click, makes sense, and then some of them will not make sense. Um, and, uh, about 50% of them do make sense. So I'll, I'll give you, I'll give you one example here. So this is our code dash dash copy tool. So this is the Lovable agent. It's, uh, kind of in, in session here, and it had issues copying a file. So we had a user who ha- who had a screenshot here, and, uh, the screenshot had a space in it. And apparently we had a bug in, in our code, so, you know, normally the agent wouldn't be able to fix this. Uh, you... It-- The tool was simply broken for, um, for files with special characters in it. So it's saying here like, "Oh, it tried, uh, um, URL encoding the name, and it tried all of these things, and it's still not working," so it chose to, uh, uh, vent and send a message to, uh, its, its creators, if you will, and, uh, and it, it wants help, and it has, uh, then created a, a PR for us to review. And, uh, it turns out that, uh, yeah, th-this was a bug, and, uh, after we got this vent, you know, we had a merged, um, PR in production ten minutes afterwards. And, um, and we have m-multiple, multiple examples of this where there are things, like can be just small things on, on our side that, uh, can cause a user to, to get stuck where, um, actually the, the agent is very good at describing that issue. Um, another example, which I, I think is quite fun, so, uh, Framer Motion, it's, um, it's a TypeScript, um, library where you can do animations, and our agent was complaining about their types. Um, it's, uh, trying to create this cubic Bezier curve, and it's saying, "I should just be able to place in kind of, uh, numbers, uh, in an array, um, and I shouldn't have to care about this easing type that they have." Um, and, uh, I, I think this is a fun example because, uh, um, I don't know, maybe Framer Motion has a reason for why they have this typing, but, uh, uh, it could also be that the agent here has a point, right? And in theory, we don't allow it to do this yet, but I'm, uh, I'm thinking we-- maybe, maybe we should. We could let, uh, the Lovable agent contribute to these open source, uh, uh, libraries just like it contributes to our own code base and kind of self-heal [laughs] the JavaScript ecosystem, uh, as well. Um, and, um, so I'm, I'm not sure if we did anything with this. Uh, I'll have, I'll have to check. But, um, um, this is so it's not actually just limited to our platform. It can be, uh, the tech stack and, and, any really o-open source library as well. Um, a third thing that came out of this, which we didn't expect, is, um, that, um, we'd have these spikes in vents. So this is the hourly vent tool calls graphed here, and we have three of these spikes. I don't know if you can guess kind of what the, what the spikes could be when we get like a bunch of vents. The, the agent feels a lot of frustration, right? Um, so they were production incidents on, on our platform. Um, so it was either that, uh, inferencing were-- was down or we were missing, um-And like sandboxes, um, some, some kind of network level failure even. Um, and in those cases the, the agent would, uh, you know, feel, feel a lot of frustration and let us know. And, and this turn-- it turns out that this is actually extremely helpful because, um, we've had a few examples now where, uh, you know, we have these, uh, systems where, uh, we're constantly monitoring all of our metrics, and it will page engineers and, uh, try to alarm us before our users kind of identify it and, and whatnot, so we can fix it as, as soon as possible. But, uh, uh, s- several times now this, uh, this Slack channel with the agent venting has been kind of the first signal for us to identify a production incident. And even if it's not the first signal, it has actually become a very helpful tool for, um, for engineers to debug what is, uh, what is going on. So often you get patients like, "Oh, we have, uh, some, um, some problem on the platform," and you're trying to figure out what, what is the actual problem. And, uh, it turns out just looking at what the, what the agent is experiencing is, is a great signal. Um, the final example here I'll, I'll show you is, um, that, uh, it's quite meta now, so bear with me. So the, uh, the agent, um, experienced, um, venting too much. Uh, it triggered this tool, um, I think 43 times in, in this example, and it spam- spammed us on, on Slack. And normally, you know, we could give the agent feedback like, "Hey, hey, please don't do that." But it actually gave us feedback, right? It said, [laughs] uh, "You need to, um, um, think about how, how the tool works when you have, um, um, parallel conversations, um, with an agent and you're doing parallel work." Uh, so it proposed like, "Hey, we should have, you know, a dedupe safeguard," uh, and that would help, uh, not sending a, a bunch of duplicate submissions. And, uh, so it created a PR for us, and we're like, "Yeah, that sounds great. Um, you don't have to spam us anymore." And, uh, that was just reviewed and then, uh, then merged, uh, in- into production as well. So it's, uh, it's working. It-- and, um, I'll, I'll share some things that, uh, that we learned both from, uh, Lovell Overflow and, uh, and this venting tool. So, um, first off, like learning from these failure modes, it's, um, it's very specific, uh, to the model that you're using. So every now and then there will be a new model released, and you can imagine like Lovell Overflow, for example, will have a bunch of knowledge. Um, and when, when you get a new model, a lot of that, uh, will either, um, have to be kind of retuned to that model in, uh, in how you should prompt it or, um, the, the knowledge that we had kind of, uh, uh, saved away, uh, in, in our corpus is actually included in the training data now, and it's no longer needed. Um, and in that case, we also want to prune that from, from the corpus. So actually being very, um, strict on, on pruning out knowledge has been extremely important to, to make this work. And as well, knowledge has a half-life. So you could imagine like the framer motion example I, I gave, maybe they changed their types. Maybe they didn't do exactly what, uh, the Lovell agent suggested, but maybe they did something, right? And then, uh, if we would've created some, some type of knowledge out of that, it's, um, it's no longer valid. And I-- you know, both of these examples, it comes back to that there's so much information and so many different packages and so much that can go wrong that it's, it- it's impossible for a human to kind of optimize all of these things. So you really need it to be self-healing in, in, in that sense. Um, and, and the last part here is that the system has to be constantly tuned. Um, we, we tried this earlier, uh, at Lovable, and we failed because we didn't properly tune the system. So the, the idea was good, but if, um, if, if you don't have good signals of what is working or not, um, then it's actually really hard to get a system that will work at scale. Uh, so remember we have like 50 million apps now, 200,000 new ones per day, millions of messages per day, and all of this can be, can be used to tune the system. So, um, we have, we have a lot of to work with and, uh, and doing that, uh, correctly is kind of what, what is the key to, to make this work. So looking at the results here, um, the stuck rate is reduced by 5%. That might not sound, um, like a, a big number, but, uh, in reality, that is, um, on the same order of magnitude in what we would, uh, see this metric move if we had a new, um, uh, generation of a foundational model in, in our system. Um, and so mo- moving this metric by 5% is a, is a big difference. Um, and, um, the, the other, the other thing we saw is, uh, the publish rate went up by 2%. Uh, and I will add this is, uh, from not having Lovell Overflow, uh, at all to, to having it, and it's been improved since incrementally. So this is kind of the numbers for, for the first version of it. Um, and, and 2% publish rate, it's also a very important metric for us where, um, we can kind of, um, um, deduce that you've gotten to a place where, where you're happy with what you built, um, and, and you actually want to share it with others. Um, for the venting tool, we, we merged around 10, uh, fixes per day, um, in- into production from, uh, the suggested PRs. And a lot of them, if, if they're not merged, they kind of, um, uh, we cluster them together and we g- we get learnings from it as well that can inform future developments. Um, and then of course, as I mentioned, the production incidents are also, uh, surfaced before even, uh, kind of our, our paging system would at, uh, at some point. Um, so I just wanna summarize here, um, where I've, I've given you now two examples of, uh, kind of how we're self-healing the platform. And if you would zoom out a bit, these are kind of examples in how we're making sure that we get the most out of the, the models that we're using. Um, we're doing some other things. We're fine-tuning on our, on our fleet data. We have quite extensive evals coverage at this point as well. Uh, and I think what is, um, very interesting too, uh, with our data is that we have users coming to our platform with very strong intent to build, uh, successful products or even successful companies. Um, and they come, and they start out on Lovable, and they keep on prompting until they are either successful or, you know, we, we fail them in, in some ways is how, how I see it. And, uh, being able to take that as a world model and, and optimize that problem is something we're working on as well. Um, so if you think this stuff sounds interesting, we're, we're actively hiring as well, lovable.dev/careers. Check us out. And, uh, thank you so much for, for listening. I hope, uh, hope you learned something. [upbeat music]
Episode duration: 31:10
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode mhW-XXnDFSU
