The ultimate guide to A/B testing | Ronny Kohavi (Airbnb, Microsoft, Amazon)

Lenny's PodcastJul 27, 20231h 23m

Ronny Kohavi (guest), Lenny Rachitsky (host), Narrator

High failure rates of experiments and what they imply for product developmentDesigning an Overall Evaluation Criterion (OEC) and focusing on lifetime valueBuilding an experimentation platform and creating an experiment-driven cultureCommon pitfalls in A/B testing: p-values, false positives, and sample ratio mismatchesBalancing micro-optimizations with big bets and large redesignsInstitutional learning: documenting, searching, and reusing experiment insightsWhen and how startups should begin running controlled experiments

In this episode of Lenny's Podcast, featuring Ronny Kohavi and Lenny Rachitsky, The ultimate guide to A/B testing | Ronny Kohavi (Airbnb, Microsoft, Amazon) explores ronny Kohavi Reveals How To Build Truly Trustworthy Experiment Cultures Ronny Kohavi, a leading authority on A/B testing, shares hard-won lessons from building experimentation platforms at Amazon, Microsoft/Bing, and Airbnb. He emphasizes that most experiments fail (often 70–90%), so organizations must embrace humility, rigorous statistics, and strong guardrails rather than chasing silver bullets. The conversation covers when to start experimenting, how to define the right success metric (OEC), why trust and platform quality matter more than speed, and how to institutionalize learning from both wins and surprising failures. Kohavi also addresses concerns that experimentation kills innovation, arguing instead for a portfolio of small optimizations and a minority of high-risk, high-reward bets—all tested rigorously.

Ronny Kohavi Reveals How To Build Truly Trustworthy Experiment Cultures

Ronny Kohavi, a leading authority on A/B testing, shares hard-won lessons from building experimentation platforms at Amazon, Microsoft/Bing, and Airbnb. He emphasizes that most experiments fail (often 70–90%), so organizations must embrace humility, rigorous statistics, and strong guardrails rather than chasing silver bullets. The conversation covers when to start experimenting, how to define the right success metric (OEC), why trust and platform quality matter more than speed, and how to institutionalize learning from both wins and surprising failures. Kohavi also addresses concerns that experimentation kills innovation, arguing instead for a portfolio of small optimizations and a minority of high-risk, high-reward bets—all tested rigorously.

Key Takeaways

Expect most experiments to fail—and plan portfolios accordingly.

Across Microsoft, Bing, and Airbnb, 66–92% of experiments did not improve the target metric. ...

Get the full analysis with uListen AI

Test everything once you have scale; even tiny changes can have huge impact.

Kohavi advocates that every code change eventually be in an experiment because small tweaks (like reordering ad lines at Bing) sometimes drive massive revenue shifts. ...

Get the full analysis with uListen AI

Define a clear Overall Evaluation Criterion that reflects long-term value.

Optimizing for a single short-term metric (e. ...

Get the full analysis with uListen AI

Trust in the experimentation platform is more important than speed.

If people don’t trust the stats, they’ll ignore or override results. ...

Get the full analysis with uListen AI

Beware p-value myths and high false positive risk in low–success-rate environments.

A p-value of 0. ...

Get the full analysis with uListen AI

Large redesigns usually fail; decompose them into testable increments.

Big-bang redesigns frequently underperform and are hard to roll back due to sunk costs and organizational momentum. ...

Get the full analysis with uListen AI

Institutionalize learning from surprising results, not just wins.

Teams should document experiments, run regular reviews of the most surprising outcomes (positive and negative), and make results searchable. ...

Get the full analysis with uListen AI

Notable Quotes

“I'm a big fan of test everything… any code change that you make, any feature that you introduce has to be in some experiment.”
— Ronny Kohavi

“Of these experiments, 92% failed to improve the metric that we were trying to move.”
— Ronny Kohavi (on Airbnb Search relevance experiments)

“We are often humbled by how bad we are at predicting the outcome of experiments.”
— Ronny Kohavi

“If you go for something big, try it out, but be ready to fail 80% of the time.”
— Ronny Kohavi

“If something looks too good to be true, investigate… hold the celebratory dinner.”
— Ronny Kohavi (on Twyman’s law)

Questions Answered in This Episode

How should a company decide the right balance between incremental experiments and high-risk, high-reward bets in its roadmap?

Ronny Kohavi, a leading authority on A/B testing, shares hard-won lessons from building experimentation platforms at Amazon, Microsoft/Bing, and Airbnb. ...

Get the full analysis with uListen AI

What’s the most practical way for a mid-stage startup to define an OEC that truly captures long-term user value, not just short-term revenue?

Get the full analysis with uListen AI

How can teams retrofit trust into an existing experimentation stack that may already have produced untrustworthy or biased results?

Get the full analysis with uListen AI

In a low-traffic product, what alternatives or adaptations to classic A/B testing does Kohavi recommend to still make evidence-based decisions?

Get the full analysis with uListen AI

How should leaders handle situations where a visionary, top-down idea fails in experiments but is still strongly favored by executives?

Get the full analysis with uListen AI

Transcript Preview

Ronny Kohavi

I'm very clear that I'm a big fan of test everything, which is any code change that you make, any feature that you introduce has to be in some experiment because, again, I've observed this sort of surprising result that even small bug fixes, even small changes can sometimes have surprising unexpected impact. And so I don't think it's possible to experiment too much. You have to allocate some times to these high-risk, high-reward ideas. We're gonna try something that's most likely to fail, but if it does win, it's gonna h- be a home run. And you have to be ready to understand and agree that most will fail. And I've, it's amazing how many times I've seen people come up with new designs or a radical new idea and they believe in it, and that's okay. I'm just cautioning them all the time to say, "If you go for something big, try it out, but be ready to fail 80% of the time."

Lenny Rachitsky

(instrumental music) Welcome to Lenny's Podcast, where I interview world-class product leaders and growth experts to learn from their hard-won experiences building and growing today's most successful products. Today my guest is Rony Kohavi. Rony is seen by many as the world expert on A/B testing and experimentation. Most recently, he was VP and technical fellow of Relevance at Airbnb, where he led their search experience team. Prior to that, he was corporate vice president at Microsoft, where he led Microsoft's experimentation platform team. Before that, he was director of data mining and personalization at Amazon. He's currently a full-time advisor and instructor. He's also the author of the go-to book on experimentation called Trustworthy: Online Controlled Experiments. And in our show notes, you'll find a code to get a discount on taking his live cohort-based course on Maven. In our conversation, we get super tactical about A/B testing. Rony shares his advice for when you should start considering running experiments at your company, how to change your company's culture to be more experiment-driven, what are signs your experiments are potentially invalid, why trust is the most important element of a successful experiment culture and platform, how to get started if you want to start running experiments at your company. He also explains what actually is a p-value and something called Twyman's law, plus some hot takes about Airbnb and experiments in general. This episode is for anyone who is interested in either creating an experiment-driven culture at their company or just fine-tuning one that already exists. Enjoy this episode with Rony Kohavi after a short word from our sponsors. This episode is brought to you by Mixpanel. Get deep insights into what your users are doing at every stage of the funnel at a fair price that scales as you grow. Mixpanel gives you quick answers about your users from awareness to acquisition through retention. And by capturing website activity, ad data, and multi-touch attribution right in Mixpanel, you can improve every aspect of the full user funnel. Powered by first-party behavioral data instead of third-party cookies, Mixpanel is built to be more powerful and easier to use than Google Analytics. Explore plans for teams of every size and see what Mixpanel can do for you at mixpanel.com/friends/lenny. And while you're at it, they're also hiring, so check it out at mixpanel.com/friends/lenny. This episode is brought to you by Round. Round is the private network built by tech leaders for tech leaders. Round combines the best of coaching, learning, and authentic relationships to help you identify where you want to go and accelerate your path to get there, which is why their wait list tops thousands of tech execs. Round is on a mission to shape the future of technology and its impact on society. Leading in tech is uniquely challenging, and doing it well is easiest when surrounded by leaders who understand your day-to-day experiences. When we're meeting and building relationships with the right people, we're more likely to learn, find new opportunities, be dynamic in our thinking, and achieve our goals. Building and managing your network doesn't have to feel like networking. Join Round to surround yourself with leaders from tech's most innovative companies. Build relationships, be inspired, take action. Visit round.tech/apply and use promo code LENNY to skip the wait list. That's round.tech/apply. Rony, welcome to the podcast.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome