The ultimate guide to A/B testing | Ronny Kohavi (Airbnb, Microsoft, Amazon)

Ronny Kohavi, PhD, is a consultant, teacher, and leading expert on the art and science of A/B testing. Previously, Ronny was Vice President and Technical Fellow at Airbnb, Technical Fellow and corporate VP at Microsoft (where he led the Experimentation Platform team), and Director of Data Mining and Personalization at Amazon. He was also honored with a lifetime achievement award by the Experimentation Culture Awards in September 2020 and teaches a popular course on experimentation on Maven. In today’s podcast, we discuss: • How to foster a culture of experimentation • How to avoid common pitfalls and misconceptions when running experiments • His most surprising experiment results • The critical role of trust in running successful experiments • When not to A/B test something • Best practices for helping your tests run faster • The future of experimentation Enroll in Ronny’s Maven class, Accelerating Innovation with A/B Testing, at https://bit.ly/ABClassLenny. Promo code “LENNYAB” will give $500 off the class for the first 10 people to use it. — Brought to you by Mixpanel—Event analytics that everyone can trust, use, and afford: https://mixpanel.com/startups | Round—The private network built by tech leaders for tech leaders: https://www.round.tech/apply?utm_campaign=lennys-letter&utm_medium=email-ad&utm_source=email-marketing&utm_content=send-2-2023-07-27 | Eppo—Run reliable, impactful experiments: https://www.geteppo.com/ Find the full transcript at: https://www.lennysnewsletter.com/p/the-ultimate-guide-to-ab-testing Where to find Ronny Kohavi: • Twitter: https://twitter.com/ronnyk • LinkedIn: https://www.linkedin.com/in/ronnyk/ • Website: http://ai.stanford.edu/~ronnyk/ Where to find Lenny: • Newsletter: https://www.lennysnewsletter.com • Twitter: https://twitter.com/lennysan • LinkedIn: https://www.linkedin.com/in/lennyrachitsky/ In this episode, we cover: (00:00) Ronny’s background (04:29) How one A/B test helped Bing increase revenue by 12% (09:00) What data says about opening new tabs (10:34) Small effort, huge gains vs. incremental improvements (13:16) Typical fail rates (15:28) UI resources (16:53) Institutional learning and the importance of documentation and sharing results (20:44) Testing incrementally and acting on high-risk, high-reward ideas (22:38) A failed experiment at Bing on integration with social apps (24:47) When not to A/B test something (27:59) Overall evaluation criterion (OEC) (32:41) Long-term experimentation vs. models (36:29) The problem with redesigns (39:31) How Ronny implemented testing at Microsoft (42:54) The stats on redesigns (45:38) Testing at Airbnb (48:06) Covid’s impact and why testing is more important during times of upheaval (50:06) Ronny’s book, Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing (51:45) The importance of trust (55:25) Sample ratio mismatch and other signs your experiment is flawed (1:00:44) Twyman’s law (1:02:14) P-value (1:06:27) Getting started running experiments (1:07:43) How to shift the culture in an org to push for more testing (1:10:18) Building platforms (1:12:25) How to improve speed when running experiments (1:14:09) Lightning round Referenced: • Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing: https://experimentguide.com/ • Seven rules of thumb for website experimenters: https://exp-platform.com/rules-of-thumb/ • GoodUI: https://goodui.org • Defaults for A/B testing: http://bit.ly/CH2022Kohavi • Ronny’s LinkedIn post about A/B testing for startups: https://www.linkedin.com/posts/ronnyk_abtesting-experimentguide-statisticalpower-activity-6982142843297423360-Bc2U • Sanchan Saxena on Lenny’s Podcast: https://www.lennyspodcast.com/sanchan-saxena-vp-of-product-at-coinbase-on-the-inside-story-of-how-airbnb-made-it-through-covid-what-he8217s-learned-from-brian-chesky-brian-armstrong-and-kevin-systrom-much-more/ • Optimizely: https://www.optimizely.com/ • Optimizely was statistically naive: https://analythical.com/blog/optimizely-got-me-fired • SRM: https://www.linkedin.com/posts/ronnyk_seat-belt-wikipedia-activity-6917959519310401536-jV97 • SRM checker: http://bit.ly/srmCheck • Twyman’s law: http://bit.ly/twymanLaw • “What’s a p-value” question: http://bit.ly/ABTestingIntuitionBusters • Fisher’s method: https://en.wikipedia.org/wiki/Fisher%27s_method • Evolving experimentation: https://exp-platform.com/Documents/2017-05%20ICSE2017_EvolutionOfExP.pdf • CUPED for variance reduction/increased sensitivity: http://bit.ly/expCUPED • Ronny’s recommended books: https://bit.ly/BestBooksRonnyk • Chernobyl on HBO: https://www.hbo.com/chernobyl • Blink cameras: https://blinkforhome.com/ • Narrative not PowerPoint: https://exp-platform.com/narrative-not-powerpoint/ Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com. Lenny may be an investor in the companies discussed.

Ronny KohaviguestLenny Rachitskyhost

Jul 27, 20231h 23mWatch on YouTube ↗

EPISODE INFO

Released: July 27, 2023
Duration: 1h 23m
Channel: Lenny's Podcast
Watch on YouTube: ▶ Open ↗

EPISODE DESCRIPTION

Ronny Kohavi, PhD, is a consultant, teacher, and leading expert on the art and science of A/B testing. Previously, Ronny was Vice President and Technical Fellow at Airbnb, Technical Fellow and corporate VP at Microsoft (where he led the Experimentation Platform team), and Director of Data Mining and Personalization at Amazon. He was also honored with a lifetime achievement award by the Experimentation Culture Awards in September 2020 and teaches a popular course on experimentation on Maven. In today’s podcast, we discuss:
How to foster a culture of experimentation
How to avoid common pitfalls and misconceptions when running experiments
His most surprising experiment results
The critical role of trust in running successful experiments
When not to A/B test something
Best practices for helping your tests run faster
The future of experimentation
Enroll in Ronny’s Maven class, Accelerating Innovation with A/B Testing, at https://bit.ly/ABClassLenny. Promo code “LENNYAB” will give $500 off the class for the first 10 people to use it. — Brought to you by Mixpanel—Event analytics that everyone can trust, use, and afford: https://mixpanel.com/startups | Round—The private network built by tech leaders for tech leaders: https://www.round.tech/apply?utm_campaign=lennys-letter&utm_medium=email-ad&utm_source=email-marketing&utm_content=send-2-2023-07-27 | Eppo—Run reliable, impactful experiments: https://www.geteppo.com/ Find the full transcript at: https://www.lennysnewsletter.com/p/the-ultimate-guide-to-ab-testing Where to find Ronny Kohavi:
Twitter: https://twitter.com/ronnyk
LinkedIn: https://www.linkedin.com/in/ronnyk/
Website: http://ai.stanford.edu/~ronnyk/
Where to find Lenny:
Newsletter: https://www.lennysnewsletter.com
Twitter: https://twitter.com/lennysan
LinkedIn: https://www.linkedin.com/in/lennyrachitsky/
In this episode, we cover: (00:00) Ronny’s background (04:29) How one A/B test helped Bing increase revenue by 12% (09:00) What data says about opening new tabs (10:34) Small effort, huge gains vs. incremental improvements (13:16) Typical fail rates (15:28) UI resources (16:53) Institutional learning and the importance of documentation and sharing results (20:44) Testing incrementally and acting on high-risk, high-reward ideas (22:38) A failed experiment at Bing on integration with social apps (24:47) When not to A/B test something (27:59) Overall evaluation criterion (OEC) (32:41) Long-term experimentation vs. models (36:29) The problem with redesigns (39:31) How Ronny implemented testing at Microsoft (42:54) The stats on redesigns (45:38) Testing at Airbnb (48:06) Covid’s impact and why testing is more important during times of upheaval (50:06) Ronny’s book, Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing (51:45) The importance of trust (55:25) Sample ratio mismatch and other signs your experiment is flawed (1:00:44) Twyman’s law (1:02:14) P-value (1:06:27) Getting started running experiments (1:07:43) How to shift the culture in an org to push for more testing (1:10:18) Building platforms (1:12:25) How to improve speed when running experiments (1:14:09) Lightning round Referenced:
Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing: https://experimentguide.com/
Seven rules of thumb for website experimenters: https://exp-platform.com/rules-of-thumb/
GoodUI: https://goodui.org
Defaults for A/B testing: http://bit.ly/CH2022Kohavi
Ronny’s LinkedIn post about A/B testing for startups: https://www.linkedin.com/posts/ronnyk_abtesting-experimentguide-statisticalpower-activity-6982142843297423360-Bc2U
Sanchan Saxena on Lenny’s Podcast: https://www.lennyspodcast.com/sanchan-saxena-vp-of-product-at-coinbase-on-the-inside-story-of-how-airbnb-made-it-through-covid-what-he8217s-learned-from-brian-chesky-brian-armstrong-and-kevin-systrom-much-more/
Optimizely: https://www.optimizely.com/
Optimizely was statistically naive: https://analythical.com/blog/optimizely-got-me-fired
SRM: https://www.linkedin.com/posts/ronnyk_seat-belt-wikipedia-activity-6917959519310401536-jV97
SRM checker: http://bit.ly/srmCheck
Twyman’s law: http://bit.ly/twymanLaw
“What’s a p-value” question: http://bit.ly/ABTestingIntuitionBusters
Fisher’s method: https://en.wikipedia.org/wiki/Fisher%27s_method
Evolving experimentation: https://exp-platform.com/Documents/2017-05%20ICSE2017_EvolutionOfExP.pdf
CUPED for variance reduction/increased sensitivity: http://bit.ly/expCUPED
Ronny’s recommended books: https://bit.ly/BestBooksRonnyk
Chernobyl on HBO: https://www.hbo.com/chernobyl
Blink cameras: https://blinkforhome.com/
Narrative not PowerPoint: https://exp-platform.com/narrative-not-powerpoint/
Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com. Lenny may be an investor in the companies discussed.

SPEAKERS

Ronny Kohavi
guest
Lenny Rachitsky
host
Narrator
other

EPISODE SUMMARY

In this episode of Lenny's Podcast, featuring Ronny Kohavi and Lenny Rachitsky, The ultimate guide to A/B testing | Ronny Kohavi (Airbnb, Microsoft, Amazon) explores ronny Kohavi Reveals How To Build Truly Trustworthy Experiment Cultures Ronny Kohavi, a leading authority on A/B testing, shares hard-won lessons from building experimentation platforms at Amazon, Microsoft/Bing, and Airbnb. He emphasizes that most experiments fail (often 70–90%), so organizations must embrace humility, rigorous statistics, and strong guardrails rather than chasing silver bullets. The conversation covers when to start experimenting, how to define the right success metric (OEC), why trust and platform quality matter more than speed, and how to institutionalize learning from both wins and surprising failures. Kohavi also addresses concerns that experimentation kills innovation, arguing instead for a portfolio of small optimizations and a minority of high-risk, high-reward bets—all tested rigorously.

RELATED EPISODES

What happens after coding is solved? | Fiona Fung (Claude Code and Cowork)

The hidden pattern behind successful products | Mark Pincus (FarmVille, Words with Friends, & more)

Tony Fadell: How to build real taste (and why AI makes it matter more)

The most rational take on AI you’ll hear this year

AI predictions: Job markets, Codex beats Claude, and the death of org charts | Dan Shipper

Why the next AI boom is physical AI | Caitlin Kalinowski (ex-OpenAI, Meta, Apple)

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Episode Details