How to measure and improve developer productivity | Nicole Forsgren (Microsoft Research, Github)

Dr. Nicole Forsgren is a developer productivity and DevOps expert who works with engineering organizations to make work better. Best known as co-author of the Shingo Publication Award-winning book Accelerate and the DevOps Handbook, 2nd edition and author of the State of DevOps Reports, she has helped some of the biggest companies in the world transform their culture, processes, tech, and architecture. Nicole is currently a Partner at Microsoft Research, leading developer productivity research and strategy, and a technical founder/CEO with a successful exit to Google. In a previous life, she was a software engineer, sysadmin, hardware performance engineer, and professor. She has published several peer-reviewed journal papers, has been awarded public and private research grants (funders include NASA and the NSF), and has been featured in the Wall Street Journal, Forbes, Computerworld, and InformationWeek. In today’s podcast, we discuss: • Two frameworks for measuring developer productivity: DORA and SPACE • Benchmarks for what good and great look like • Common mistakes to avoid when measuring developer productivity • Resources and tools for improving your metrics • Signs your developer experience needs attention • How to improve your developer experience • Nicole’s Four-Box framework for thinking about data and relationships — This episode is brought to you by DX—a platform for measuring and improving developer productivity: https://getdx.com/lenny Find the full transcript at: https://www.lennysnewsletter.com/p/how-to-measure-and-improve-developer Where to find Nicole Forsgren: • Twitter: https://twitter.com/nicolefv • LinkedIn: https://www.linkedin.com/in/nicolefv/ • Website: https://nicolefv.com/ Where to find Lenny: • Newsletter: https://www.lennysnewsletter.com • Twitter: https://twitter.com/lennysan • LinkedIn: https://www.linkedin.com/in/lennyrachitsky/ In this episode, we cover: (00:00) Nicole’s background (07:55) Unpacking the terms “developer productivity,” “developer experience,” and “DevOps” (10:06) How to move faster and improve practices across the board (13:43) The DORA framework (18:54) Benchmarks for success (22:33) Why company size doesn’t matter (24:54) How to improve DevOps capabilities by working backward (29:23) The SPACE framework and choosing metrics (32:51) How SPACE and DORA work together (35:39) Measuring satisfaction (37:52) Resources and tools for optimizing metrics (41:29) Nicole’s current book project (45:43) Common pitfalls companies run into when rolling out developer productivity/optimizations (47:42) How the DevOps space has progressed (50:07) The impact of AI on the developer experience and productivity (54:04) First steps to take if you’re trying to improve the developer experience (55:15) Why Google is an example of a company implementing DevOps solutions well (56:11) The importance of clear communication (57:32) Nicole’s Four-Box framework (1:05:15) Advice on making decisions (1:08:56) Lightning round Referenced: • Chef: https://www.chef.io/ • DORA: https://dora.dev/ • GitHub: https://github.com/ • Microsoft Research: https://www.microsoft.com/en-us/research/ • What is DORA?: https://devops.com/what-is-dora-and-why-you-should-care/ • Dustin Smith on LinkedIn: https://www.linkedin.com/in/dustin-smith-b0525458/ • Nathen Harvey on LinkedIn: https://www.linkedin.com/in/nathen/ • What is CI/CD?: https://about.gitlab.com/topics/ci-cd/ • Trunk-based development: https://cloud.google.com/architecture/devops/devops-tech-trunk-based-development • DORA DevOps Quick Check: https://dora.dev/quickcheck/ • Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations: https://www.amazon.com/Accelerate-Software-Performing-Technology-Organizations/dp/1942788339 • The SPACE of Developer Productivity: https://queue.acm.org/detail.cfm?id=3454124 • DevOps Metrics: Nicole Forsgren and Mik Kersten: https://queue.acm.org/detail.cfm?id=3182626 • How to Measure Anything: Finding the Value of Intangibles in Business: https://www.amazon.com/How-Measure-Anything-Intangibles-Business/dp/1118539273/ • GitHub Copilot: https://github.com/features/copilot • Tabnine: https://www.tabnine.com/the-leading-ai-assistant-for-software-development • Nicole’s Decision-Making Spreadsheet: https://docs.google.com/spreadsheets/d/1wItAODkhZ-zKnnFbyDERCd8Hq2NQ03WPvCfigBQ5vpc/edit?usp=sharing • How to do linear regression and correlation analysis: https://www.lennysnewsletter.com/p/linear-regression-and-correlation-analysis Find all lightning round references at: https://www.lennyspodcast.com/ Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com. Lenny may be an investor in the companies discussed.

Nicole ForsgrenguestLenny Rachitskyhost

Jul 30, 20231h 16mWatch on YouTube ↗

CHAPTERS

0:00 – 0:34
Start with a crisp problem statement (and why teams get stuck)
Nicole opens by stressing that most developer productivity initiatives fail early because the org isn’t aligned on the actual goal. “Improve developer experience” can mean culture, inner/outer loop friction, tooling, or something else entirely—and each implies a different strategy. The chapter sets up a recurring theme: clarity first, metrics second.
- •Misalignment on the definition of the problem derails months of work
- •“DevEx” can mean culture, toolchain friction, inner/outer loop, etc.
- •Executives often underestimate how hard crisp goal-setting is
- •Different goals require different measurement approaches
0:34 – 2:56
Show setup + what you’ll learn (DORA, SPACE, benchmarks, pitfalls)
Lenny introduces Nicole Forsgren and frames the episode around practical measurement and improvement of developer productivity. He previews the major frameworks (DORA and SPACE), what elite performance looks like, and common mistakes companies make. Sponsor messaging also reinforces the theme: productivity needs both qualitative and quantitative signals.
- •Episode focus: measuring and improving engineering productivity and experience
- •Key frameworks: DORA and SPACE
- •Promise of benchmarks and implementation guidance
- •Emphasis on combining qualitative + quantitative insights
2:56 – 7:54
Nicole’s career path: from IBM engineer to DevOps research leader
Nicole walks through her “choose your own adventure” background—software engineering, systems administration, and the pain that pushed her toward research. She explains how a technical PhD helped her connect engineering practices to business outcomes. The thread throughout is using data to drive better software delivery and org outcomes.
- •Early experience: enterprise engineering + operations realities
- •Pivot to a technical MIS PhD to “win with data”
- •Research focus: linking software delivery practices to ROI and outcomes
- •DORA, Google acquisition, GitHub, and Microsoft Research roles
7:54 – 10:06
Defining the terms: developer productivity vs. DevEx vs. DevOps
Nicole separates three commonly conflated ideas. Productivity is about outcomes over time (and must include sustainability), DevEx is the lived experience of building software (predictability, low friction), and DevOps is a set of capabilities—technical, architectural, cultural—that enable fast, reliable delivery. The distinctions matter because they change what you measure and optimize.
- •Productivity: how much gets done over time—cannot be brute-forced
- •DevEx: developers as “users” of internal systems; reduce friction/uncertainty
- •DevOps: capabilities and practices, not a tool you buy
- •Wellbeing and community effects are core to sustainable productivity
10:06 – 13:43
The real business conversation: faster delivery without sacrificing reliability
Lenny and Nicole discuss why leaders want speed but worry about ROI and instability. Nicole contrasts old ITIL-style change control beliefs (long approvals = stability) with evidence that speed and stability can improve together. The goal becomes a focused, data-informed conversation about constraints and guardrails—not “go faster at any cost.”
- •Common executive concerns: ROI, risk, reliability, “too fast = unstable”
- •Old change-approval batching creates instability and delays feedback
- •Use targeted questions to uncover what leaders truly fear or value
- •Translate time spent in delivery steps into business-relevant value math
13:43 – 19:00
DORA explained: the four metrics and why speed and stability move together
Nicole unpacks the DORA research program and the famous four delivery performance metrics: lead time, deployment frequency, MTTR, and change fail rate. The key insight is counterintuitive: faster teams are typically more stable because they ship smaller changes with smaller blast radii. This chapter establishes “ship small, ship often” as a quality strategy, not just a speed tactic.
- •DORA metrics: lead time, deploy frequency, MTTR, change fail rate
- •Two speed + two stability metrics; they correlate strongly
- •Small, frequent deploys reduce blast radius and simplify debugging
- •Batching changes (weeks/months) increases merge conflicts and recovery time
19:00 – 22:33
Benchmarks for elite performance (and how precise you need to be)
Nicole shares the (2019) elite benchmarks and explains why category-level precision is usually sufficient for decision-making. They clarify what “lead time” means in practice: moving through the deployment pipeline to early production exposure/feedback. The discussion ties metrics back to developer experience: long delays force context reloading and painful interruptions.
- •Elite benchmarks: on-demand deploys; <1 day lead time; <1 hour restore; 0–15% fail rate
- •Precision is less important than knowing your current category and improving
- •Lead time focuses on pipeline/feedback loops, not perfect “global rollout” end state
- •Long lead times degrade DevEx via context switching and rework
22:33 – 25:33
Why company size doesn’t matter (and why retail outperformed)
Nicole addresses the assumption that DORA benchmarks only fit startups or only fit large enterprises. Their research found no meaningful performance difference by company size—both small and large orgs try to explain away results. The one notable outlier was retail, which tended to perform better, likely due to market “natural selection” pressures.
- •No statistically significant performance difference between small vs. large companies
- •Common excuses exist on both sides: “we’re too complex” vs. “we lack resources”
- •Retail performed better as a significant outlier
- •Hypothesis: survival pressure (cloud scaling, peak events) forced excellence
25:33 – 29:23
How to improve: work backward from outcomes to DevOps capabilities
Nicole responds to a common critique: metrics can feel like judgment without a path to improvement. She explains that DORA is more than the four metrics—there’s a full set of capabilities that predict performance. Working backward links business outcomes to delivery performance, then to concrete technical, architectural, and cultural practices to invest in.
- •DORA isn’t just metrics; it’s a research-backed capability model
- •DevOps is capabilities (technical/architectural/cultural/lean), not “a DevOps tool”
- •Examples: automated testing, CI/CD, trunk-based dev, version control discipline
- •Architecture matters: loose coupling, cloud (done well), and enabling fast feedback
29:23 – 35:39
SPACE framework: choosing balanced productivity metrics beyond delivery speed
Nicole introduces SPACE as a framework for measuring complex creative work, including developer productivity. Unlike DORA’s fixed four, SPACE helps you select context-appropriate metrics across five dimensions to avoid over-optimizing one signal. The rule of thumb: use at least three dimensions at once to keep metrics in balance.
- •SPACE dimensions: Satisfaction, Performance, Activity, Communication/Collaboration, Efficiency/Flow
- •Designed for metric selection in context (not a single prescribed dashboard)
- •Use at least three dimensions to balance incentives and avoid metric gaming
- •DORA can be seen as a SPACE implementation focused on the “outer loop”
35:39 – 37:52
Measuring satisfaction (and why surveys aren’t “soft” data)
The conversation drills into satisfaction and wellbeing—how to measure it and why it’s predictive. Nicole argues that dismissing self-report data as “people lie” is misguided; system data can be incomplete or misleading too. The best approach triangulates: instrument what you can, survey periodically, and use each data type to reveal what the other cannot.
- •Measure satisfaction via periodic surveys; don’t over-sample continuously
- •Wellbeing correlates strongly with other productivity dimensions and failure modes
- •Avoid single-dimension activity traps (e.g., lines of code, commits, PR count)
- •People-data reveals hidden heroics or broken processes that systems won’t show
37:52 – 45:43
Where to go deeper: papers, tools, and Nicole’s book-in-progress
Nicole shares resources for teams trying to implement measurement well, including the SPACE paper and work on combining system + self-report data. She also previews a forthcoming book focused on the end-to-end measurement journey, especially for orgs starting from “nothing.” The goal is practical accessibility: scripts, examples, and guidance that don’t require a full data science team.
- •Recommended reading: SPACE paper (ACMQ), people+systems data paper, Hubbard’s How to Measure Anything
- •Key idea: measurement maturity shifts from people-data heavy to more scalable system telemetry over time
- •New book aims to make measurement approachable: interview scripts, surveys, analyses
- •“Don’t let perfect be the enemy of good” in early measurement efforts
45:43 – 50:07
Rollout pitfalls, how the DevOps space evolved, and the communication imperative
Nicole outlines common failure modes when organizations attempt DevEx/productivity initiatives. The big ones: unclear goals, lack of top-down and bottom-up alignment, and poor communication that prevents buy-in. She also describes how the industry evolved—more companies now accept they’re tech-driven—and why communicating in audience-relevant language is a force multiplier.
- •Pitfall: unclear objective leads to “a thousand flowers bloom” and scattered effort
- •Pitfall: not combining executive sponsorship with IC-level engagement
- •Communication matters: vocabulary, motivations, and value framing differ by audience
- •Industry shift: most orgs now recognize software delivery as strategic, not optional
50:07 – 54:05
AI and developer productivity: new workflows, new risks, and a ‘trust’ dimension
Nicole discusses how AI coding tools change the nature of work: developers may spend more time reviewing than writing, and productivity isn’t just “tasks done faster.” She warns against simplistic conclusions like reducing headcount based on speed-ups. A key open question is whether frameworks like SPACE need an explicit dimension for trust/reliability/overreliance in AI-assisted development.
- •AI tools shift effort from writing code toward reviewing and validating suggestions
- •Productivity gains should translate into tackling harder problems, not just doing the same work faster
- •Risks: overreliance, changed mental models, cognitive load shifts, learning impacts
- •Potential evolution: add a trust/reliability dimension to productivity measurement
54:05 – 57:29
First steps + exemplar orgs: define it, find signals, learn from Google’s approach
Nicole gives pragmatic guidance for getting started: write down the problem crisply and identify any existing signals tied to it. She highlights Google as a model for systematic measurement, strong telemetry, and continued investment in developer surveys. A notable lesson: when survey results disagree with instrumentation, the surveys are often right—because telemetry can miss what matters.
- •Step 1: ensure the challenge/problem is written down and unambiguous
- •Step 2: find any relevant signals (broadly defined) and build from there
- •Google example: phased measurement, rich telemetry, and ongoing DevEx surveys
- •Triangulation lesson: survey data often corrects incomplete or misleading instrumentation
57:29 – 1:04:13
Nicole’s Four-Box framework: turning ideas into measurable hypotheses
Nicole explains a simple tool she uses on napkins: two “words” boxes (a causal hypothesis) above two “data” boxes (the metrics/proxies to test it). The framework forces clarity, helps align stakeholders before analysis, and separates disputes about the idea from disputes about the proxies or data quality. She also notes an “advanced mode”: starting from available data, then translating back into testable statements.
- •Top row (words): define the hypothesized relationship in plain language
- •Bottom row (data): choose measurable proxies for each concept
- •If results fail, debug the proxies/data—not the people—then revisit the hypothesis
- •Prevents spurious correlation hunting by requiring a communicable sentence first
1:04:13 – 1:16:16
Decision-making systems + lightning round (books, shows, hiring, habits)
Nicole shares her decision-making spreadsheet approach: define options, criteria, weights, and scores—then use the process to clarify what you actually value. She emphasizes being data-informed rather than data-driven, and ties it to strategy: good strategy requires choosing what not to do. The episode closes with a lightning round covering books, media, interview questions, and small process changes that improve execution.
- •Decision process: options → criteria → weights (sum to 100%) → scoring → reflection
- •Often the exercise reveals the decision before the math is finished
- •Strategy lens: success requires not funding/doing everything
- •Lightning round topics: favorite books, shows, interview prompts, and small impactful process changes

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Start with a crisp problem statement (and why teams get stuck)

Show setup + what you’ll learn (DORA, SPACE, benchmarks, pitfalls)

Nicole’s career path: from IBM engineer to DevOps research leader

Defining the terms: developer productivity vs. DevEx vs. DevOps

The real business conversation: faster delivery without sacrificing reliability

DORA explained: the four metrics and why speed and stability move together

Benchmarks for elite performance (and how precise you need to be)

Why company size doesn’t matter (and why retail outperformed)

How to improve: work backward from outcomes to DevOps capabilities

SPACE framework: choosing balanced productivity metrics beyond delivery speed

Measuring satisfaction (and why surveys aren’t “soft” data)

Where to go deeper: papers, tools, and Nicole’s book-in-progress

Rollout pitfalls, how the DevOps space evolved, and the communication imperative

AI and developer productivity: new workflows, new risks, and a ‘trust’ dimension

First steps + exemplar orgs: define it, find signals, learn from Google’s approach

Nicole’s Four-Box framework: turning ideas into measurable hypotheses

Decision-making systems + lightning round (books, shows, hiring, habits)

Get more out of YouTube videos.