Lenny's PodcastHow to measure and improve developer productivity | Nicole Forsgren (Microsoft Research, Github)
CHAPTERS
- 0:00 – 0:34
Start with a crisp problem statement (and why teams get stuck)
Nicole opens by stressing that most developer productivity initiatives fail early because the org isn’t aligned on the actual goal. “Improve developer experience” can mean culture, inner/outer loop friction, tooling, or something else entirely—and each implies a different strategy. The chapter sets up a recurring theme: clarity first, metrics second.
- •Misalignment on the definition of the problem derails months of work
- •“DevEx” can mean culture, toolchain friction, inner/outer loop, etc.
- •Executives often underestimate how hard crisp goal-setting is
- •Different goals require different measurement approaches
- 0:34 – 2:56
Show setup + what you’ll learn (DORA, SPACE, benchmarks, pitfalls)
Lenny introduces Nicole Forsgren and frames the episode around practical measurement and improvement of developer productivity. He previews the major frameworks (DORA and SPACE), what elite performance looks like, and common mistakes companies make. Sponsor messaging also reinforces the theme: productivity needs both qualitative and quantitative signals.
- •Episode focus: measuring and improving engineering productivity and experience
- •Key frameworks: DORA and SPACE
- •Promise of benchmarks and implementation guidance
- •Emphasis on combining qualitative + quantitative insights
- 2:56 – 7:54
Nicole’s career path: from IBM engineer to DevOps research leader
Nicole walks through her “choose your own adventure” background—software engineering, systems administration, and the pain that pushed her toward research. She explains how a technical PhD helped her connect engineering practices to business outcomes. The thread throughout is using data to drive better software delivery and org outcomes.
- •Early experience: enterprise engineering + operations realities
- •Pivot to a technical MIS PhD to “win with data”
- •Research focus: linking software delivery practices to ROI and outcomes
- •DORA, Google acquisition, GitHub, and Microsoft Research roles
- 7:54 – 10:06
Defining the terms: developer productivity vs. DevEx vs. DevOps
Nicole separates three commonly conflated ideas. Productivity is about outcomes over time (and must include sustainability), DevEx is the lived experience of building software (predictability, low friction), and DevOps is a set of capabilities—technical, architectural, cultural—that enable fast, reliable delivery. The distinctions matter because they change what you measure and optimize.
- •Productivity: how much gets done over time—cannot be brute-forced
- •DevEx: developers as “users” of internal systems; reduce friction/uncertainty
- •DevOps: capabilities and practices, not a tool you buy
- •Wellbeing and community effects are core to sustainable productivity
- 10:06 – 13:43
The real business conversation: faster delivery without sacrificing reliability
Lenny and Nicole discuss why leaders want speed but worry about ROI and instability. Nicole contrasts old ITIL-style change control beliefs (long approvals = stability) with evidence that speed and stability can improve together. The goal becomes a focused, data-informed conversation about constraints and guardrails—not “go faster at any cost.”
- •Common executive concerns: ROI, risk, reliability, “too fast = unstable”
- •Old change-approval batching creates instability and delays feedback
- •Use targeted questions to uncover what leaders truly fear or value
- •Translate time spent in delivery steps into business-relevant value math
- 13:43 – 19:00
DORA explained: the four metrics and why speed and stability move together
Nicole unpacks the DORA research program and the famous four delivery performance metrics: lead time, deployment frequency, MTTR, and change fail rate. The key insight is counterintuitive: faster teams are typically more stable because they ship smaller changes with smaller blast radii. This chapter establishes “ship small, ship often” as a quality strategy, not just a speed tactic.
- •DORA metrics: lead time, deploy frequency, MTTR, change fail rate
- •Two speed + two stability metrics; they correlate strongly
- •Small, frequent deploys reduce blast radius and simplify debugging
- •Batching changes (weeks/months) increases merge conflicts and recovery time
- 19:00 – 22:33
Benchmarks for elite performance (and how precise you need to be)
Nicole shares the (2019) elite benchmarks and explains why category-level precision is usually sufficient for decision-making. They clarify what “lead time” means in practice: moving through the deployment pipeline to early production exposure/feedback. The discussion ties metrics back to developer experience: long delays force context reloading and painful interruptions.
- •Elite benchmarks: on-demand deploys; <1 day lead time; <1 hour restore; 0–15% fail rate
- •Precision is less important than knowing your current category and improving
- •Lead time focuses on pipeline/feedback loops, not perfect “global rollout” end state
- •Long lead times degrade DevEx via context switching and rework
- 22:33 – 25:33
Why company size doesn’t matter (and why retail outperformed)
Nicole addresses the assumption that DORA benchmarks only fit startups or only fit large enterprises. Their research found no meaningful performance difference by company size—both small and large orgs try to explain away results. The one notable outlier was retail, which tended to perform better, likely due to market “natural selection” pressures.
- •No statistically significant performance difference between small vs. large companies
- •Common excuses exist on both sides: “we’re too complex” vs. “we lack resources”
- •Retail performed better as a significant outlier
- •Hypothesis: survival pressure (cloud scaling, peak events) forced excellence
- 25:33 – 29:23
How to improve: work backward from outcomes to DevOps capabilities
Nicole responds to a common critique: metrics can feel like judgment without a path to improvement. She explains that DORA is more than the four metrics—there’s a full set of capabilities that predict performance. Working backward links business outcomes to delivery performance, then to concrete technical, architectural, and cultural practices to invest in.
- •DORA isn’t just metrics; it’s a research-backed capability model
- •DevOps is capabilities (technical/architectural/cultural/lean), not “a DevOps tool”
- •Examples: automated testing, CI/CD, trunk-based dev, version control discipline
- •Architecture matters: loose coupling, cloud (done well), and enabling fast feedback
- 29:23 – 35:39
SPACE framework: choosing balanced productivity metrics beyond delivery speed
Nicole introduces SPACE as a framework for measuring complex creative work, including developer productivity. Unlike DORA’s fixed four, SPACE helps you select context-appropriate metrics across five dimensions to avoid over-optimizing one signal. The rule of thumb: use at least three dimensions at once to keep metrics in balance.
- •SPACE dimensions: Satisfaction, Performance, Activity, Communication/Collaboration, Efficiency/Flow
- •Designed for metric selection in context (not a single prescribed dashboard)
- •Use at least three dimensions to balance incentives and avoid metric gaming
- •DORA can be seen as a SPACE implementation focused on the “outer loop”
- 35:39 – 37:52
Measuring satisfaction (and why surveys aren’t “soft” data)
The conversation drills into satisfaction and wellbeing—how to measure it and why it’s predictive. Nicole argues that dismissing self-report data as “people lie” is misguided; system data can be incomplete or misleading too. The best approach triangulates: instrument what you can, survey periodically, and use each data type to reveal what the other cannot.
- •Measure satisfaction via periodic surveys; don’t over-sample continuously
- •Wellbeing correlates strongly with other productivity dimensions and failure modes
- •Avoid single-dimension activity traps (e.g., lines of code, commits, PR count)
- •People-data reveals hidden heroics or broken processes that systems won’t show
- 37:52 – 45:43
Where to go deeper: papers, tools, and Nicole’s book-in-progress
Nicole shares resources for teams trying to implement measurement well, including the SPACE paper and work on combining system + self-report data. She also previews a forthcoming book focused on the end-to-end measurement journey, especially for orgs starting from “nothing.” The goal is practical accessibility: scripts, examples, and guidance that don’t require a full data science team.
- •Recommended reading: SPACE paper (ACMQ), people+systems data paper, Hubbard’s How to Measure Anything
- •Key idea: measurement maturity shifts from people-data heavy to more scalable system telemetry over time
- •New book aims to make measurement approachable: interview scripts, surveys, analyses
- •“Don’t let perfect be the enemy of good” in early measurement efforts
- 45:43 – 50:07
Rollout pitfalls, how the DevOps space evolved, and the communication imperative
Nicole outlines common failure modes when organizations attempt DevEx/productivity initiatives. The big ones: unclear goals, lack of top-down and bottom-up alignment, and poor communication that prevents buy-in. She also describes how the industry evolved—more companies now accept they’re tech-driven—and why communicating in audience-relevant language is a force multiplier.
- •Pitfall: unclear objective leads to “a thousand flowers bloom” and scattered effort
- •Pitfall: not combining executive sponsorship with IC-level engagement
- •Communication matters: vocabulary, motivations, and value framing differ by audience
- •Industry shift: most orgs now recognize software delivery as strategic, not optional
- 50:07 – 54:05
AI and developer productivity: new workflows, new risks, and a ‘trust’ dimension
Nicole discusses how AI coding tools change the nature of work: developers may spend more time reviewing than writing, and productivity isn’t just “tasks done faster.” She warns against simplistic conclusions like reducing headcount based on speed-ups. A key open question is whether frameworks like SPACE need an explicit dimension for trust/reliability/overreliance in AI-assisted development.
- •AI tools shift effort from writing code toward reviewing and validating suggestions
- •Productivity gains should translate into tackling harder problems, not just doing the same work faster
- •Risks: overreliance, changed mental models, cognitive load shifts, learning impacts
- •Potential evolution: add a trust/reliability dimension to productivity measurement
- 54:05 – 57:29
First steps + exemplar orgs: define it, find signals, learn from Google’s approach
Nicole gives pragmatic guidance for getting started: write down the problem crisply and identify any existing signals tied to it. She highlights Google as a model for systematic measurement, strong telemetry, and continued investment in developer surveys. A notable lesson: when survey results disagree with instrumentation, the surveys are often right—because telemetry can miss what matters.
- •Step 1: ensure the challenge/problem is written down and unambiguous
- •Step 2: find any relevant signals (broadly defined) and build from there
- •Google example: phased measurement, rich telemetry, and ongoing DevEx surveys
- •Triangulation lesson: survey data often corrects incomplete or misleading instrumentation
- 57:29 – 1:04:13
Nicole’s Four-Box framework: turning ideas into measurable hypotheses
Nicole explains a simple tool she uses on napkins: two “words” boxes (a causal hypothesis) above two “data” boxes (the metrics/proxies to test it). The framework forces clarity, helps align stakeholders before analysis, and separates disputes about the idea from disputes about the proxies or data quality. She also notes an “advanced mode”: starting from available data, then translating back into testable statements.
- •Top row (words): define the hypothesized relationship in plain language
- •Bottom row (data): choose measurable proxies for each concept
- •If results fail, debug the proxies/data—not the people—then revisit the hypothesis
- •Prevents spurious correlation hunting by requiring a communicable sentence first
- 1:04:13 – 1:16:16
Decision-making systems + lightning round (books, shows, hiring, habits)
Nicole shares her decision-making spreadsheet approach: define options, criteria, weights, and scores—then use the process to clarify what you actually value. She emphasizes being data-informed rather than data-driven, and ties it to strategy: good strategy requires choosing what not to do. The episode closes with a lightning round covering books, media, interview questions, and small process changes that improve execution.
- •Decision process: options → criteria → weights (sum to 100%) → scoring → reflection
- •Often the exercise reveals the decision before the math is finished
- •Strategy lens: success requires not funding/doing everything
- •Lightning round topics: favorite books, shows, interview prompts, and small impactful process changes