Skip to content
Lenny's PodcastLenny's Podcast

Scott Wu: Why Devin will write half of Cognition's code

Cognition's 15 engineers each run up to five Devins in parallel; a quarter of monthly PRs already ship from agents, freeing architects to scope tickets.

Scott WuguestLenny Rachitskyhost
May 4, 20251h 32mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 6:00

    Devin in practice: a 15-engineer team shipping with “fleets” of AI juniors

    Scott opens with a concrete snapshot of how Cognition builds Devin using Devin: small team, many parallel agents, and hundreds of PRs merged monthly. The conversation frames Devin not as a code helper, but as a workflow-native engineer producing a meaningful share of production changes.

    • 15 engineers at Cognition routinely work with multiple Devins in parallel
    • Devin is already merging several hundred PRs per month internally
    • ~25% of current PRs are Devin-authored, trending upward
    • The premise: AI will meaningfully change how engineering work gets done
  2. 6:00 – 6:51

    What Devin is: an autonomous, asynchronous software engineer embedded in team tools

    Scott defines Devin as an end-to-end engineer that can be tagged like a teammate and deliver PRs, not just suggestions. He contrasts point-solution coding assistants with an agent that can plan, execute, iterate, and integrate into existing team workflows.

    • Devin is designed for end-to-end task completion, not just code completion
    • Works asynchronously via Slack/Linear/GitHub integrations
    • Acts like a junior remote engineer you can hand tasks to
    • Emphasis on workflow handoff: issue → plan → PR → review/merge
  3. 6:51 – 9:14

    From “high school CS student” to junior engineer: capability gains + UX/product gains

    Scott describes Devin’s growth over the past year and why progress isn’t only model intelligence. Much of the leap came from productizing how humans collaborate with agents—planning, feedback loops, integrations, and ways to intervene mid-flight.

    • Early skepticism: many didn’t believe agents were feasible in early 2024
    • “Jagged intelligence”: better than humans in some areas, worse in others
    • Major upgrades came from tooling: Slack/GitHub/Linear, planning, code touch-ups
    • Agent usability is a learning curve distinct from chatbots
  4. 9:14 – 10:23

    Who uses Devin today and why: startups to Fortune 100, multiplying engineers and teams

    Scott shares the range of customers—from tiny startups to regulated enterprises—and how Devin’s value changes with context. A key theme is leverage: multiplying an engineer’s throughput and spreading knowledge across the team via accumulated context.

    • Devin is used across company sizes, including highly regulated orgs
    • Core promise: speed via parallelism and async delegation
    • Devin accumulates team knowledge and reuses it across sessions
    • Different engineering realities (startup vs bank) but similar leverage benefits
  5. 10:23 – 17:27

    Origin story: hacker houses, reinforcement learning conviction, and eight pivots toward agents

    Scott recounts Cognition’s formation through long-standing relationships and intense “hackathon mode.” The founding bet was that high-compute reinforcement learning and agent form factors would unlock practical autonomous coding, with code being ideal due to fast feedback loops.

    • Team background: competitive programming roots + prior AI/startup experience
    • Started Nov 2023 in hacker-house sprint cycles; company formed early 2024
    • Big bets: RL will drive next capability jump; agents will replace pure text completion
    • Code is ideal for RL due to executable feedback loops (tests, runtimes)
  6. 17:27 – 19:28

    Why personify the agent: the Devin name, “junior buddy” framing, and onboarding reality

    Scott explains why Devin has a human-like identity: it encourages users to delegate appropriately and collaborate effectively. The team learned that success depends on ramping Devin like a new teammate—small tasks, repo setup, and reliable verification loops.

    • Personification helps users adopt the right mental model: delegate to a junior buddy
    • Onboarding lessons: start small, set up repo/VM/testing, avoid huge re-architectures first
    • Best workflow is multi-Devin, asynchronous, with humans steering at key moments
    • Naming and interface choices reinforce autonomy + collaboration
  7. 19:28 – 25:18

    Engineering is shifting: from bricklayer to architect (and why PRs will be >50% Devin)

    Scott forecasts a near-term future where engineers spend more time on problem definition, architecture, and decisions, while agents handle much of implementation and debugging grind. He shares internal metrics and the expected trajectory toward majority AI-authored PRs.

    • Current internal metric: ~25% of PRs from Devin; expectation: >50% by year-end
    • Humans focus on definition/architecture; agents handle much of implementation work
    • Async agent workflows unlock “200–1000%” gains by removing real-time bottlenecks
    • Programming remains vital: it’s still about telling computers what to do
  8. 25:18 – 30:22

    Skills in the AI era: still learn to code, but lean into systems thinking and trade-offs

    Scott argues strongly that people should still learn to code because it builds problem decomposition and understanding of abstractions. The most valuable skills shift toward designing systems, reasoning about trade-offs, and specifying precisely what you want built.

    • “Should you learn to code?” → yes, for fundamentals and abstraction fluency
    • Understanding layers (networking, DBs, runtimes) remains a durable advantage
    • Architect-level thinking: define problems, evaluate trade-offs, design solutions
    • Jevons paradox: easier coding may increase total software built and engineers hired
  9. 30:22 – 34:38

    How Cognition actually works day-to-day: five Devins per engineer and parallel delegation

    The conversation becomes operational: how an engineer structures a day with multiple Devins running in parallel and when humans intervene. Scott highlights the adjustment period required to make async delegation feel natural and effective.

    • Typical pattern: one engineer runs up to ~5 Devins concurrently
    • Humans intervene on scoping, key architectural choices, and final verification
    • Goal is Devin doing 80–90% with humans guiding the critical 10–20%
    • Adoption grows as interface + reliability improve
  10. 34:38 – 37:23

    Live demo: Devin modifies the Devin web app, asks clarifying questions, opens a PR

    Scott demonstrates Devin working in real time: scanning the codebase, proposing a plan, asking a UI question, implementing changes, and producing a pull request. The demo showcases asynchronous collaboration, human-in-the-loop decisions, and easy verification via preview deploys.

    • Devin navigates files, identifies relevant components, and proposes implementation
    • Clarifies requirements (e.g., open link in new tab) without blocking progress
    • Produces a PR and debug cycle (CI issues visible)
    • Demonstrates how small, verifiable tasks are ideal for agent execution
  11. 37:23 – 39:53

    Beyond coding: research playback and building artifacts (a quiz site) from web exploration

    Scott shows a second example where Devin researched Lenny and built a quiz website, with full step-by-step playback. The emphasis is on transparency: teams can audit what Devin did, where it looked, and what it changed.

    • Devin can browse/research, then generate an application artifact from findings
    • Playback mode shows each step taken (sources visited, actions performed)
    • Transparency is valuable for trust and debugging agent behavior
    • Illustrates broader “agent” capability beyond repo code changes
  12. 39:53 – 44:49

    Devin Wiki + large codebase understanding: indexing, diagrams, and onboarding use cases

    Scott explains how Devin builds an internal representation of a repository, exposes it via a wiki, and answers architecture questions across a large codebase. This becomes an onboarding accelerator and a “staff-like” codebase explainer despite being “junior” elsewhere.

    • Devin indexes repos and generates a navigable wiki with architecture details
    • Supports Q&A over the codebase (search + informed answers)
    • Useful for onboarding: ask “dumb questions” safely and quickly
    • Scales to very large codebases by working like humans: high-level map → zoom in
  13. 44:49 – 47:43

    Automation with Linear: label a ticket, get a scoped plan + confidence, then start execution

    Scott introduces a workflow where Linear tickets can be routed to Devin via a label. Devin scopes the work, highlights relevant files/snippets, reports confidence, and then can be launched into a full execution session when the human approves.

    • Linear automation: add a Devin label to trigger analysis and scoping
    • Devin surfaces key files/snippets and provides an execution plan
    • Confidence signaling helps humans decide when to delegate vs intervene
    • Makes codebase intelligence accessible to PMs and non-authors too
  14. 47:43 – 49:07

    What Devin does best (and worst): tasks vs problems, verification loops, and steering bigger work

    Scott describes the boundary conditions for success: well-defined tasks with clear verification beat open-ended problems. Larger projects are possible but require more human steering, checkpoints, and tighter specs.

    • Best fit: well-defined tasks with easy tests/verification
    • “Give Devin tasks, not problems” as a guiding heuristic
    • Works well for bug fixes, small features, docs/tests, incremental changes
    • Bigger efforts require closer human steering and iterative alignment
  15. 49:07 – 52:57

    Product strategy debates: opinionated workflow vs general agent, and “suite of tools” vs one Devin

    Scott shares internal product tensions: how narrowly to focus on PR-centric engineering workflows vs broader uses, and whether to bundle everything into one agent experience. Devin has evolved into a suite (Search, Wiki, Linear scoping) to match real-world engineering messiness and user control needs.

    • Debate #1: how opinionated Devin should be about “the right” workflow
    • Debate #2: one comprehensive agent vs a suite of specialized tools
    • Users value control modes: ask questions without initiating full execution
    • Engineering reality is messy; multiple flows are necessary for adoption
  16. 52:57 – 1:32:31

    Competition, defensibility, and stickiness: why knowledge accumulation + team workflows matter

    Scott positions Cognition in a crowded landscape where foundation labs and IDE tools are converging on agents. He argues the practical goal isn’t an impenetrable moat, but stickiness driven by Devin learning your codebase, embedding in multiplayer team workflows, and compounding value over time.

    • Landscape: foundation labs + IDEs + agent products all moving toward autonomy
    • “Moat” reframed as “stickiness”: staying power via workflow embedding
    • Devin gets better as it learns repo/process/team context over time
    • Multiplayer value: Slack threads, PR reviews, onboarding, shared context compounding

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.