How Lovable vibecodes production software at scale

Vibecoding a beautiful, working prototype is easier than ever. Running a platform that non-developers use to ship production software is a different engineering problem. Today, software built on Lovable serves 600M+ monthly sessions. In this session, Fabian Hedin, Cofounder and CTO of Lovable, walks through the systems Lovable built to run Claude reliably at consumer scale — the fleet-learning layer that catches coding mistakes, the eval loop that gates every model release, and how to keep Lovable itself improving.

Fabian Hedinhost

May 20, 202631mWatch on YouTube ↗

CHAPTERS

0:18 – 1:50
Lovable’s chat-to-preview UI and who it’s for
Fabian introduces Lovable’s simple two-pane interface—chat on the left, live preview on the right—and frames it as a platform for building everything from landing pages to full web apps. He emphasizes the wide range of users, from kids to Fortune 500 teams, and notes that the UI has stayed stable while capabilities have rapidly evolved behind the scenes.
- •Chat-driven software creation with instant preview
- •Supports a spectrum of outputs: pages, tools, websites, web apps
- •Broad persona fit: beginners through enterprise users
- •UI stability vs. major backend/model evolution
1:50 – 2:51
Origins: GPT-Engineer and the shift from code-gen to end-to-end building
He traces Lovable’s roots to the gpt-engineer GitHub repo, which showcased early end-to-end software generation (e.g., generating and running a Snake game). This period mostly served developers via productivity boosts like IDE autocomplete, but Lovable aimed for a different abstraction: building software from natural language without needing to code.
- •gpt-engineer’s rapid GitHub adoption as an early proof of end-to-end generation
- •Terminal-based prompts that generated runnable projects
- •2023 landscape: AI mainly used for developer efficiency and autocomplete
- •Lovable’s differentiated goal: full software creation for non-coders
2:51 – 3:52
Building for the “99%”: why it was too early, and why it works now
Fabian explains Lovable’s mission to empower non-technical users and why that initially failed—models weren’t reliable enough at the chat+preview abstraction. He argues the turning point came about 1.5 years ago, with frequent foundation-model improvements since then making the approach increasingly viable.
- •Mission focus: non-technical creators, not just engineers
- •Early failure attributed to model limitations
- •Chat-to-preview abstraction becomes practical as models improve
- •Ongoing acceleration: meaningful model gains every few months
3:52 – 5:54
Scale today and an unexpected core segment: engineers
He shares Lovable’s current scale—tens of millions of products and massive traffic to user-built sites—and highlights that user creations collectively outperform Lovable’s own traffic. Despite building for non-coders, Lovable sees many engineers using it because working from specs at a higher abstraction can be more efficient than living in code.
- •50M products built; 600M monthly visits to Lovable-built sites
- •User creations ‘out-succeed’ the platform’s own distribution
- •Engineers appear as the largest functional user segment
- •Trend shift: engineers adopting spec-first, agent-driven workflows
5:54 – 6:55
Principles: production-grade ambition + non-technical usability
Fabian outlines two guiding principles: push toward production-grade software (not just prototypes) and keep the experience accessible to non-technical users. This combination raises the difficulty because it removes the expert ‘escape hatch’ while still expecting robust, deployable outcomes.
- •Explicit choice to target production-grade outputs
- •Continual push on complexity/ambition of what users can build
- •Designing for non-technical users increases failure sensitivity
- •High demand for non-coder software creation drives the tradeoff
6:55 – 9:57
The ‘last 10%’ problem in AI coding: where users get stuck
He reframes the classic “last 10% takes 90% of the time” as still true—possibly worse—when using AI to code. Fabian describes a friction curve from smooth progress to hard stuck moments, and explains why this is particularly damaging for non-technical users who can’t dive into code to recover.
- •AI accelerates first versions but finishing/debugging still dominates time
- •Friction curve: green (smooth) → yellow (friction) → red (stuck)
- •Engineers can recover via debugging; non-technical users often cannot
- •Avoiding ‘hard stuck’ becomes a top UX and platform priority
9:57 – 10:57
Defining and detecting ‘stuck’: Lovable’s is_stuck metric
Fabian defines how Lovable operationalizes being stuck via an internal metric: repeated requests, complaints about failed implementations, or users abandoning after a request. A lightweight classifier helps identify stuck sessions, enabling interventions and system-level learning.
- •is_stuck triggers: asking the same thing 3x, complaining, or leaving
- •Classification model used to detect stuck behavior
- •Turns qualitative frustration into measurable product signal
- •Enables targeted strategies based on stuck type
10:57 – 12:59
Three buckets of stuck—and what they imply for self-healing
He groups stuck situations into three categories: promptable issues that could be solved with better context, platform/tooling gaps that should be easy but aren’t supported, and larger product investments needed to close systemic gaps (e.g., SSR for SEO). Each bucket implies a different remediation strategy, from better guidance to platform evolution.
- •Bucket 1: solvable with better prompting/context (yellow stuck)
- •Bucket 2: should be easy but platform/tooling falls short (edge self-heal)
- •Bucket 3: requires larger roadmap investment (e.g., SSR vs CSR)
- •Goal: fix issues before users reach ‘hard stuck’
12:59 – 15:33
Lovable Overflow: a Stack Overflow-like corpus to shortcut the back-and-forth
Fabian introduces Lovable Overflow: a curated database of issue descriptions and solutions derived from real user struggles. Instead of letting users bounce through repeated failed attempts, Lovable searches this corpus for relevant fixes and injects tailored context into the main agent to jump closer to the correct solution on the first try.
- •Inspired by Stack Overflow: problems paired with solutions
- •Example: laggy scrolling leads to repetitive agent/user loop without guidance
- •Retrieval over corpus plus contextualization injected into agent prompt
- •Reduces iteration cost, time, and frustration
15:33 – 17:36
Keeping retrieved knowledge fresh: relevance gating and pruning stale fixes
He explains that adding knowledge blindly can backfire when dependencies or best practices change. Lovable tracks success rates per knowledge item, withholds context when it may be harmful, and prunes outdated entries to maintain a high-signal corpus that stays aligned with evolving ecosystems and models.
- •Knowledge can become stale as packages/versions evolve
- •Lightweight model decides when retrieved context is relevant
- •Per-knowledge success ratios drive deprecation/pruning decisions
- •Continuous refresh is essential for quality at scale
17:36 – 21:08
‘Venting’ as a tool: letting the agent report platform/tooling friction
Fabian describes a second self-healing mechanism: a ‘vent’ tool the agent can call when docs, tools, schemas, or platform behavior materially slow work. These structured complaints are routed to Slack, deduped and investigated by another agent, and often turned into PRs for engineers to review—creating a tight loop from agent frustration to production fixes.
- •vent_send_feedback tool triggers on material friction and repeated failures
- •Captures issues like missing tools, confusing schemas, broken behavior
- •Pipeline: Slack → agent triage/dedupe/investigate → auto PR → human review
- •Turns agent frustration into actionable engineering tasks
21:08 – 26:11
Venting in action: real fixes, ecosystem implications, and incident detection
He shares concrete examples: a file-copy bug involving spaces/special characters fixed within minutes, and an agent complaint about Framer Motion typing that hints at broader OSS improvements. He also shows how spikes in venting correlate with production incidents, sometimes surfacing problems earlier than traditional monitoring, and notes meta-fixes like deduping vent spam in parallel sessions.
- •Example: copy tool broke on filenames with spaces/special characters; fast PR+deploy
- •Example: Framer Motion typing friction suggests potential OSS contributions
- •Vent spikes correlated with incidents (inference down, sandbox/network failures)
- •Meta example: agent proposed dedupe safeguards after spamming vents
26:11 – 31:10
What they learned and the measurable impact on user success
Fabian summarizes key learnings: failure modes are model-specific, knowledge has a half-life, and both systems require ongoing tuning with strong success signals. He reports outcomes—lower stuck rate, higher publish rate, and a steady stream of agent-suggested fixes—positioning these loops as ways to extract more value from foundation models and improve reliability at scale.
- •New models change what knowledge is needed; aggressive pruning matters
- •Knowledge decays as ecosystems change; constant tuning is required
- •Results: ~5% reduction in stuck rate; ~2% increase in publish rate
- •~10 agent-suggested fixes merged per day; vents help identify incidents