Skip to content
Aakash GuptaAakash Gupta

AI for Product Managers: 10X Growth with Smart Experimentation

AI has completely transformed how we run experiments. What used to take weeks can now happen in minutes. Frederic De Todaro, CPO at Kameleoon with 12+ years helping thousands of teams, reveals exactly how AI is revolutionizing experimentation from ideation to analysis. ⏰ Timestamps: 00:00 How AI Changed Experimentation Overview 01:54 The 4 Steps of Experimentation Framework 14:12 ADS 16:00 How AI has Changed Experimentation 21:08 User Behaviour Models 26:56 Multi-Armed Bandit vs Contextual Bandit 30:05 ADS 31:55 AI Content Genration 35:13 How Vibe Coding Changes Experimentation 41:35 Live Demo From Idea to Running Experiment in 2 Minutes 43:36 Two-Minute Build Achievement 51:49 How to Measure AI Features Properly 54:17 Measuring RAG Systems 3 Key Metrics 01:07:18 Best Experimentation Company Booking.com 01:10:10 Biggest PM Mistakes in Experimentation 01:13:52 Ending Transcript: https://www.news.aakashg.com/p/frederic-de-todaro-podcast -- 🏆 Thanks to our sponsors: 1. Mobbin: Discover real-world design inspiration https://mobbin.com/aakash 2. Jira Product Discovery: Build the right thing, reliably https://www.atlassian.com/software/jira/product-discovery 3. Product Faculty: Get $550 off https://maven.com/product-faculty/ai-product-strategy-certificate?promoCode=AAKASH550C1 4. Maven: Get $100 off my curation of their top courses - http://maven.com/x/aakash -- Key Takeaways: 1. The Build Bottleneck is Dead. Most product ideas never get tested because building takes weeks. AI just killed this constraint - you can now go from idea to live experiment in 2 minutes using plain English prompts. 2. Prompt Your Way to Tests. Type "change sorting to price low to high" and AI builds the variation in 2 minutes. Still run it through design, engineering, and data reviews - but now you're reviewing the actual live variation, not specs. 3. Beyond Text: Draw Your Ideas. Upload mockups or sketch rough concepts. AI transforms drawings into live experiments you can actually review - newsletter popups, onboarding flows, layout changes. Share preview links with stakeholders before going live. 4. AI Reads User Intent. Like a digital sales rep, AI scores every visitor's conversion likelihood in real-time. Show discounts only to users who need them to buy, not everyone who visits your site. 5. Failed Tests Become Wins. 80% of experiments fail overall, but AI automatically finds segments where they succeed. "Failed globally but increased mobile conversions 25%" - insights that would take hours manually. 6. Speed vs Accuracy Trade-offs. Multi-armed bandits optimize news headlines in hours, not weeks - perfect when time beats perfect measurement. Contextual bandits personalize every individual user's experience. 7. Humans Still Drive Strategy. PMs bring business context AI doesn't have - customer constraints, strategic priorities, success metrics. Data scientists validate statistical approaches. Designers review brand compliance. AI handles building variations fast. 8. Measure What Actually Matters. Track business metrics, not just usage: prompts needed per experiment, time from idea to live test, developer dependency rate. If you still need developers 80% of the time, AI isn't solving your bottleneck. 9. Discovery Meets Testing. User interviews reveal what people say they want. Experiments show what they actually do. Combine both for complete insight - validate problems through discovery, solutions through testing. 10. Experimentation Culture Wins. Harvard Business Review found direct correlation between experiments run annually and revenue growth. More experiments = faster growth. AI finally makes this accessible to every team. --- 👨‍💻 Where to find Fred: LinkedIn: https://www.linkedin.com/in/fdetodaro/ Kameleoon: https://kameleoon.com 👨‍💻 Where to find Aakash: Twitter: https://www.twitter.com/aakashg0 LinkedIn: https://www.linkedin.com/in/aagupta/ #ai #experimentation #abtesting #productmanagement 🧠 About Product Growth: The world's largest podcast focused solely on product + growth, with over 180K listeners. Hosted by Aakash Gupta, who spent 16 years in PM, rising to VP of product, this 2x/week show covers product and growth topics in depth. 🔔 Subscribe and turn on notifications to master AI-powered experimentation!

Aakash GuptahostFrederic De Todaroguest
Aug 29, 20251h 15mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:001:54

    How AI Changed Experimentation Overview

    1. AG

      How has AI changed experimentation?

    2. FT

      Yeah, AI has been the biggest driver of change in experimentation I've seen in all my career. I can't believe more people are, aren't using fun-based experimentation today.

    3. AG

      Fred De Todaro is one of the world's most knowledgeable people on AI and experimentation. What company do you think is doing experimentation best?

    4. FT

      I'm not going to choose the usual one like Netflix, Airbnb, Microsoft, and so on. I, I'm thinking about, you know, a company called Booking.com. [chuckles]

    5. AG

      What are the biggest mistakes PMs make when it comes to experimentation?

    6. FT

      I think the, the top three I, I hear the most about in between slowdown-

    7. AG

      [beep] How would you define the roles of PM, data scientist, and AI analyst?

    8. FT

      The PM, the data scientist have to work together to make sure that the business context is there, the constraints are known.

    9. AG

      How has AI changed experimentation over time?

    10. FT

      If you look at the past, uh, I think there has been two big waves in the end. The machine learning wave, which started around 2016, and then the generative AI wave, which really kicked off in 2022.

    11. AG

      What is the precise difference, and what exactly are these bandit algorithms? Talk to me a little bit about content generation. Do you have any stories yourself that your team has used?

    12. FT

      So content generation, so you basically, you take, you take, uh, content on your website, and then you ask the AI to come up with, uh, an alternative version of it, right? So that's the goal of content generation.

    13. AG

      This is one of the coolest demos out there. Would you be willing to show us how this all works? Fred, welcome to the podcast.

    14. FT

      Hi. Thanks for having me.

    15. AG

      So Fred, has AI really changed experimentation much?

    16. FT

      Yeah, AI has been the biggest driver of change in experimentation I've seen in my career.

    17. AG

      How has AI really changed experimentation?

    18. FT

      Yeah, sure. So may- maybe let's, let's take a step back for, for a second. So if you look at experimentation, well, it usually comes down

  2. 1:5414:12

    The 4 Steps of Experimentation Framework

    1. FT

      to four simple steps. You have an idea, uh, ma- then you make an assumption. If you release that feature in production, it will increase by X, that metric, because of these reason and these reasons, right? And then you build the experiment, the variations. Then you set up the targeting, you know, who, who are you going to target with that experiment? Is it all your users or a segment of users? And then the KPIs, which an important-- which is an important step as well. How, how will you measure the success, uh, of that experiment? And then you look at your results, right? So it's a simple loop, right? So you iterate as long as you, uh, you, as you learn something, uh, with, with your experiments.

    2. AG

      Okay. So there's four steps, and I imagine you're gonna map this against how AI affects each of these four steps?

    3. FT

      Yeah, exactly. If you look at, uh, you know, the, uh, hardest part of experimentation, it, it has always been the, uh, the build phase, you know? Right. That's why over the, the past decade, you know, most of the experimentation tools have been introducing some visual editor to build variations or to let anyone create some, uh, some variations. Even dedicated feature experimentation vendors have been introducing them very recently. But the truth is n-no one really delivered, you know, on that, uh, on that premise today. Most tool, most tools still rely, you know, a lot on developers. They are already busy, you know, building the next features in your roadmap. And so as a result, most teams, you know, do not A/B test, uh, the majority of what they deliver to their users. And they run meetings to just prioritize, you know, the best experimentation that might be worth, uh, building, and then they wait one or two sprints, uh, to see anything, anything go live.

    4. AG

      Yeah. I think this is an important point, right? If we look at the life cycle of experimentation between ideate, build, configure, and analyze, build is the bottleneck. That's kind of the tragedy of the commons that we're all facing, where very few teams are actually experimenting a high percentage of their releases. And it seems like that's the area where AI could be the most interesting. Before we get into that build phase, I wanna talk a little bit about the idea phase, right? So we have here, you know, it's a three out of five difficulty, perceived difficulty for folks, but how powerful can AI be in ideation?

    5. FT

      Well, it can be, it can be powerful. Uh, when you look at the history, I think, you know, it has been possible, uh, during the generative AI wave, so back in 2022. So, um, you were able to ask an AI to, uh, give you some ideas, right? Limited to content generation at the beginning. But then, uh, since the release of the latest model of GPT, for instance, GPT-4, well, things have changed a lot. You can really ask the AI to come up with some ideas. You provide a page, and then the AI will try to, uh, to, uh, to come up with some ideas. And so this is where it's starting, right? Because the AI, to some extent, has or can assume some context. You know, what's, what's your business about? Uh, who are u- your users? What is the framework, what do you use, you use, you do, you do use behind your website? Um, and so with all that context, context, then the ideas generation is possible. And so we'll see some of it, uh, later on, uh, during the demonstration as well.

    6. AG

      Okay. So I think that this is a really, really powerful set of techniques. Like, the way I personally have been coaching people to deploy this, if they don't have it built into their experimentation tool, is what you wanna do is you wanna set up the AI with all that context that you mentioned, and you wanna take advantage of the latest AI features. So I'll do something like either have a really strong prompt that has context on features we've shipped in the past, the reason we have certain messaging, the metrics that we have seen, or better yet, create something like a Claude or a ChatGPT project, depending on whatever AI you have access to at your work, that loads it up with all of this context so that you can then ask it the right questions and maybe even include, like, a screenshot of the image. It feels like the better the context you can engineer for the AI, the better the ideas it'll spit out.

    7. FT

      Yeah, absolutely. Absolutely

    8. AG

      So in the build phase, there's obviously a lot that we can build with AI. People are probably com- familiar with, you know, building with AI IDEs like Cursor or Windsurf, prototyping with AI prototyping tools like Lovable, Bolt or Replit. But before we get there, we gotta stay on this configure step, right? What is the right way to use AI to configure an experimentation experiment that you're gonna do? Where should you, the human, be putting in input, and where should you be relying on the AI?

    9. FT

      Yeah, it's, it's a good question. I f- I think the, the, the, the role of a product manager is still, uh, you know, absolutely, uh, critical there, and same for the data analyst or the data scientist. You know, they are, they are all working together and better in a way. As a product manager, well, your role is still to bring the business context that the AI, you know, does not have. Sometimes you need to deliver that feature for a customer. The AI does not have that context, right? You, you know, the PM comes with all the internal constraints that the AI does not have. Why are we working on that idea? What is the assumption in the end? And what i- what success, you know, will, uh, will look like as, as well when you ship that feature? That's the job of a product manager, right? And then the data scientist is still there as well, and I think it is, it is kind of this, the, the essential human in the loop, uh, you know, that, that you need, uh, when you need high confidence in the data, uh, that you are collecting in your experiments or things like, you know, what is the right models to use. Is it, you know, RAG versus fine-tuned, fine-tuning, tuning model and so on. So the data scientist is still very, very important, and he's also going to challenge basically the AI's output, right? Is the data bi- is the data biased as well, uh, to some extent? Are, are the results, you know, plausible? That's the job of the data scientist. So if I had to sum up, the, the AI makes, I would say, the whole experimentation process, you know, faster and more accessible to the whole organization. For example, the AI can fully determine or identify patterns, you know, and suggest some recommendations when an experiment is non-conclusive. You would normally do that, you know, by digging into, uh, the data manually or, you know, uh, and it's not possible here. Uh, beyond human scale, there are so many data points to crunch and so on, so the AI is fully capable to do that. But then, you know, the human in the loop is very, very important, both at the product management level and at, I would say, at the data, uh, or data scientist level.

    10. AG

      I think that really sums up very nicely the profiles that each person needs to take, right? So to s- if I were to repeat it back, you know, the product manager, they're gonna be the person who's taking in all the context and helping to come up w-with the initial idea, where they prove that context into the, an AI system. That AI system gives you some initial configuration options. The PM might sanity check that, and then the PM needs to bring in or work with the data scientist or have a process where a data scientist comes in, and they're further challenging and hardening the assumptions. And so there's kind of two human in the loop steps, but the PM is really using the AI to make sure that that first version is strong.

    11. FT

      Exactly. Exactly. And then the AI can really build that variation for you as, as long as you have, uh, you know, very good hypotheses and, and some metrics to, uh, you know, on which you will back up your, your, your measurement there.

    12. AG

      And then analysis of features. How would you define the roles of PM, data scientist, and AI analyst?

    13. FT

      Yeah. So I mean, l-like I said, uh, I think the AI is today fully capable to give you some, a summary of, you know, what happened, um, when that experiment was running for your users. It is also able to, you know, identify patterns like, you know, if your experiment is, you know, not providing what you, you are expected. Sometimes by, you know, uh, uh, bringing down the data by, uh, you know, breakdown in the data by, I don't know, device type of users and so on, you will spot out some opportunities to do better, and that AI can do that for you. You don't have to do that anymore. There are some, you know, many experimentation vendors who have released such capabilities as part of their offering because there is nothing very complex there, and the human at that stage, uh, you know, does not have to be, uh, basically. Again, the, the, the, the PM, the data scientist have to work together to make sure that the business context is there, the, the constraints are known, and, and that the metrics on which you will, uh, you know, measure the success of that experiment are, are clearly defined, uh, and, you know, measurable as well. If you run an experiment but you are not able to measure the success, what, what's the point, right? And the AI will not help you. So again, both the PM and the data scientist and the AI now are working, you know, together and, and better in, in a, in a way.

    14. AG

      So if we had to summarize here, in the ideate phase, sure, AI is doing some, but it's really just a co-pilot. In the building phase, AI is getting to the phase, and we're seeing a future where AI can really dramatically do most of the build for you. And in the configure phase, again, AI can really dramatically do most of it with some human in the loop review.

    15. FT

      Yes.

    16. AG

      And then in the analysis phase, again, it can do most of the basics as long as you have predefined the right metrics, the right North Star, the right guardrails, the right secondary metrics. It can go ahead and look at those. And so it's really that ideate phase where humans need to stay super connected, but all, all these other steps, it feels like AI is going to be taking over most of the workload.

    17. FT

      Yes. Yes. Exactly. Exactly.

    18. AG

      So for PMsProduct sense here and becoming a great experimenter and coming up with great ideas is really gonna be the most important thing to becoming a great experimenter.

    19. FT

      Exactly, exactly. I mean, if you look at what a product manager or, you know, does in the end, it identifies, you know, features to improve the user experience but also, you know, the business, right? And so I don't think that what's preventing today PM to experiment is the lack of ideas. Usually they have already a long backlog of ideas to test there. For sure, AI can come and validate some of these ideas. You know, you provide some context to that, and then the AI can tell you, "Yes, this idea has a high chance to, uh, to succeed," especially because when AI is fully connected to, uh, your, I call it the experimentation knowledge base, because maybe you have been running, you know, tons of experiments by the past and so on. But AI can tell you, you know, "This thing has been tested already by the past. Uh, here are the results." And maybe you didn't know it because another team, uh, uh, had, had run that, run that, uh, that experiment already by the past. And so the AI can also bring some context from history. I like to call it UX memory, uh, in a way, because you come, you come up with an idea as a product manager, and then the AI will pop up there saying, "This thing has already been te- been tested, and here are the results that were, you know, uh, discovered at that time. Do you want to go and proceed with that idea? Because maybe you can test it again. Uh, your users may have been evolved. You have new behavior. Your website has changed," and so on. So that's something really, really important that the AI can come as well with UX memory, which is very, very important.

  3. 14:1216:00

    ADS

    1. AG

      Today's episode is brought to you by Jira Product Discovery. If you're like most product managers, you're probably in Jira tracking tickets and managing the backlog. But what about everything that happens before delivery? Jira Product Discovery helps you move your discovery, prioritization, and even roadmapping work out of spreadsheets and into a purpose-built tool designed for product teams. Capture insights, prioritize what matters, and create roadmaps you can easily tailor for any audience. And because it's built to work with Jira, everything stays connected from idea to delivery. Used by product teams at Canva, Deliveroo, and even The Economist, check out why and try it for free today at atlassian.com/product-discovery. That's A-T-L-A-S-S-I-A-N.com/product-discovery. Jira Product Discovery, build the right thing. Today's episode is brought to you by the AI PM Certification on Maven. Run by Miqdad Jaffer, who is a product leader at OpenAI, this is not your typical course. It's eight weeks of live cohort-based learning with a leader at one of the top companies in tech. OpenAI just doesn't stop shipping, and this is your chance to learn how. Run along with product faculty and Mo Ali, the course has a 4.9 rating with 133 reviews. Former students come from companies like OpenAI, Shopify, Stripe, Google, and Meta. The best part, your company can probably cover the cost. So if you want to get $500 off, use my code AAKASH25 and head to maven.com/product-faculty. That's M-A-V-E-N.com/P-R-O-D-U-C-T-F-A-C-U-L-T-Y. I want to dive into that in a little bit in a second. But before we get there, let's take the historical point

  4. 16:0021:08

    How AI has Changed Experimentation

    1. AG

      of view on this. How would you break this down historically? How has AI changed experimentation over time?

    2. FT

      Yeah, if you, if you look at the past, I, I think there, there, there has been two big waves in the end. The machine learning wave, which started around 2016, so we, um, started to see some machine learning, you know, algorithm coming, uh, coming out. And then the generative AI wave, which really kicked off in 2022. And so if you look at the, uh, machine learning phase, I think AI helped, you know, removed some key bottlenecks like, you know, finding the right users to target with AI targeting, for instance, or allocating traffic to the best variant. And we saw multi-armed bandit, you know, contextual bandit, you know, being released there. And analyzing results as well. You don't have to, uh, to use generative AI to analyze to some extent some results. So machine learning can do that as well for, for you. And so they have all been designed to some extent to help teams optimize, uh, I would say faster, uh, and, you know, adapt in real time, right? Uh, and then there has been the generative AI wave, and that changed, you know, everything there. We saw the rise of content generation, you know, to quickly create some, uh, copy variations, because at the beginning, I think models were, you know, only good at this [chuckles] to be, to be honest. So many, many vendors, you know, outside of the experimentation world started to include content generation. If you go to Notion, you had, you know, generate that content for me. Uh, I want something more fun, blah, blah, blah. So you, you, you started to sort... to, to see this, this kind of, uh, of, um, you know, features being released in any tool. Then AI assistant, uh, as well started to, uh, to pop up, you know, often RAG, RAG based, uh, the help, you know, answer any question, interpret results or support users, you know, inside, inside the platform. So that was, you know, one year later. And more recently, uh, es- especially at the, at the end of, uh, 2023, things are still accelerating so much. I mean, uh, with, uh, GPT-4o, it was a new paradigm. And new model, you know, like again, GPT-4 changed what people thought, uh, you know, even was, that was even possible. Uh, and suddenly vibe coding became a mainstream like nine months ago.By, uh, by coding became mainstream there. And I mean, if you look at experimentation, uh, it change, it changes everything as well because it means that you can, you can, you can turn any idea into a running e-experiment just by prompting an AI. And that's why, you know, uh, at Kameleoon, we released prompt-based experimentation so that anyone in the organization, you know, HR, finance, product manager, could directly come up with an idea. "I want to improve that there." You provide that prompt to an AI, and then the AI will come up with a, you know, variation and an experiment that you can directly test in minutes and not anymore in, you know, a sprint or two, uh, sprints, you know, depending on your organization velocity.

    3. AG

      So let's take a step-by-step on some of these for people now that we got the broad stroke of how things go, right? In the beginning, we use AI to improve targeting. Can you say a little bit more about this? Is this about AI selectively showing which person gets which experience? Or how do you use AI correctly in targeting?

    4. FT

      Yeah. So AI targeting, so the, the main purpose of AI targeting is to predict, you know, what you will do, right? So, uh, let's say you, uh, browse a website. The AI is fully capable to, uh, give you a score, which is a kind of, uh, you know, conversion intent. Like what is your likelihood to convert? And so with AI targeting, we, uh, kind of try to replicate what the sales do-- does, you know, when someone enters a shop. A good sales will understand if you are in a, you know, uh, about to buy or not, depending on how you are dressed, you know, how you move in the shop, what kind of products you look at, and so on. The sales will, you know, score you. And so we try to reproduce that behavior, uh, on the web. So as soon as a visitor lands on a website, and depending on what you do, uh, what you consult, and so on, the AI will score you. And then customers can use that, you know, directly to target the right users. Instead of, you know, creating a, a segment manually, uh, you will just use the score which is provided by the AI. So it changes everything in terms of, uh, you know, at the configure, at the configure stage where you set up the, the segment you want to, uh, target with that experiment. Well, you don't do that anymore. You just use the score from the AI. It's very similar to multi-armed bandit or contextual bandit, but the, the role or the purpose is a little bit different.

  5. 21:0826:56

    User Behaviour Models

    1. AG

      I think this is actually really powerful, and I want people to look at the left side of the screen. I want people to look at these older things because these are actually probably the most mature and, in my opinion, probably really strong to just use in production, right? So if I'm understanding this correctly, right, this is these old school [lip smack] machine learning models, which are essentially v-really strong gradient descent algorithms, really strong regression models, which are looking at the user behavior to say, "Okay, this is the user that maybe is very high intent to churn off, so maybe we should give them a higher coupon to keep them here or give them something to keep them here." Or maybe they're looking at their behavior and saying, "This is a user who is almost ready to purchase, so maybe we should just help them get onto their purchase right away." So it's really about in sort of real time, based on the user behavior, you can rank your users and you can show them personalized experiences.

    2. FT

      Yeah, absolutely. Absolutely. Many customers have been using that for, you know, showing a coupon to the right user. So I'm sure, you know, every time you visit a website, you always have that pop-up [chuckles] that comes there, you know, providing you 10% straight away. Well, instead of doing that, maybe you should, you know, keep that 10% there and provide it to, you know, the users who have, you know, that, that, that are, that, that are in need of that percentage of discount to buy. You know, to, uh, to go to the next step and, and buy. So that, that is the main, the main goal of AI targeting.

    3. AG

      Love it. And you said it's adjacent and similar to multi-armed bandit and contextual bandit. I feel like I wanna really tease this out for everyone listening. What is the, what is the precise difference, and what exactly are these bandit algorithms?

    4. FT

      Yeah, sure. So if y- if you look at a, at a, at a standard AB test, usually your goal is to know if, you know, variant B, you know, actually improves a specific metric. And so you, you split your traffic equally, and when you weight, you know, a, a sample size, uh, and when you have enough, you know, data, you can make your decision, right? With multi-armed bandit, the, the focus is on performance. Uh, so the AI will gradually allocate, you know, more traffic to the, uh, I would say the, the best performing variant based on the, the data you have at the, at that time, right? But there is a trade-off. I mean, I like multi-armed bandit there, but there is a trade-off. You lose some accuracy in the data because the AI starts, you know, making some decisions based on a smaller sample. Which is fine, I mean, to some extent. In some, in some use cases, uh, it makes, uh, a lot of sense, like in the media, where you need usually to know sometimes within, you know, a day which article headline drives more, uh, I would say clicks or engagement. So you don't care about-- don't care that much about the accuracy. Your goal is to make sure that you push the best headline to all your users. So that, that's multi-armed bandit. Contextual bandit on the other hand is a little bit different. You have a bunch of, uh, variation, like let's say 10, 100 of variation, and your goal is to push, you know, variation one to the, to the group of users that prefer, uh, ver- version one, and you push version two to the segment of users that, that are in love with that variation. So your goal is to learn and adapt. So it's still based on machine learning algorithm, uh, and so the, the goal is really to push the, the, the best variant to each individual user, user.

    5. AG

      So it sounds like if you have something highly time-sensitive and you're okay with potentially slightly less confidence in the results, go for one of these multi-arm contextual bandit type things.

    6. FT

      Yes.

    7. AG

      And within those, can you just tease out really carefully so that I understand it, what's the difference between multi-armed and contextual?

    8. FT

      What's the, the difference? Sorry, you, you said?

    9. AG

      Between multi-armed bandit and a contextual bandit.

    10. FT

      Contextual bandit. So multi-armed bandit, you, you, your goal is to push the, the best variant t-to all users, right? With contextual bandit, you push to each user the variation that you know is the most performing for them. So in the end, it's like, you know, having a website which is fully personalized. You know, every user's, every user will not see exactly the same offering, uh, the same content. So that's, uh, highly advanced. Uh, I'd like to say it's, it's more tailored for personalization, where your goal, again, is to adapt the content based on each user preference in the end.

    11. AG

      So when should you be using which, like which industries, which types of organizations should have the head space of using one or the other?

    12. FT

      I think media, as I said, for, for multi-armed bandit makes a lot of sense because usually, you know, you don't want to, uh, run an experiment on the headlines or article titles. Your goal is to, uh, you know, go with the best one as soon as possible. Contextual bandit appeals to any kind of industry. Uh, if you have, you know, let's say a coupon, for instance, you have 10%, 20% and 40%, well, you let the AI push right percentage depending on each user. So that's one usage of contextual bandit as well. Or if you have, uh, you know, uh, multiple homepages with different contents and so on, and your, your goal is not to A/B test or to know which homepage is the best, your goal is to push the best homepage to each individual user. So you are really, you know, in the h- in the hyper-personalization, uh, way of seeing experimentation here.

    13. AG

      It sounds like

  6. 26:5630:05

    Multi-Armed Bandit vs Contextual Bandit

    1. AG

      probably your e-commerce and those types of things are doing a lot more contextual bandit, where they have a lot more users, the ability to personalize maybe your social media, and probably more your B2B type of things are using more multi-armed, where they just have one experience that they can refer to. Is that a high level fair, or is there some... What is the nuance to that?

    2. FT

      Yeah. What, what, what's for sure is that you, you need a lot of traffic, uh, for both usually, because again, as you need to make decisions very, uh, fast, you need a lot of traffic, right? If you have less than 100K, I would say, of visitors, you know, per month, maybe, you know, that's not the best approach there. But when you look at media, they have millions of, uh, you know, visitors every month, so that's not a problem at all, right? And for contextual bandit, as you know, the, the, the goal again is to learn and act based on that learning. Well, you need to learn a lot of behaviors before, you know, starting to push, you know, variation one to that user or variation two to that user. So it appeals to use, to, to customers with, with, you know, some significant amount of traffic in the end.

    3. AG

      All right. And then let's marinate a little bit on this opportunity detection. This was happening right at the end of the machine learning wave. What does this enable you to do in your experimentation?

    4. FT

      Yeah. So the, the reason we, uh, we, we've built or we, we've seen this, this kind of feature is because if you look at the, the broad statistic of experimentation, well, less than 20% of them are successful, meaning that when you ship that feature, uh, well, uh, 20% of the time it will increase the metric that you want to improve that with that experiment. And so most of the time when that, that happens, you see a lot of data analysts, you know, trying to, to drill down into the data to understand why it happens and to, uh, to, to, to see there is an opportunity to do, to do better. Because in the end, maybe that variation does not work for, you know, all of your users. But maybe if you, uh, you know, drill down, you start to understand that maybe that feature works w-well, uh, for mobile devices, uh, users. And so instead of doing that manually, AI opportunity detection does that automatically. As soon as an experiment, you know, is non-conclusive, uh, it show, it is showing, uh, uh, a down lift on a metric, then that AI will start crunching the data and, you know, um, you know, point out some opportunities to do better. Uh, like again, this, this variation is performing well on, you know, mobile devices. Uh, maybe if you personalize, you know, the experience for mobile device users and you show that variation, you will increase, you know, the performance, uh, of your website and ultimately the revenue of your website. So that's the main goal of, uh, AI opportunity detection. It's trying to under-- to, to spot out some opportunities to do better.

    5. AG

      Yeah. If 80% [chuckles] of what you're shipping isn't gonna hit your metrics, you wanna make sure that you're still learning-

    6. FT

      Exactly

    7. AG

      ... something from that 80%. And so this is really gonna help you in that step.

  7. 30:0531:55

    ADS

    1. AG

      Before we dive deeper, let's talk about something every PM faces, getting alignment on product decisions. You know that feeling when you're trying to explain a user flow to engineering or justify a design choice to leadership, and you're just describing it with your hands? That's where Mobbin comes in. Mobbin is the world's largest library of real world mobile and web app designs from industry leading apps like Airbnb, Uber, and Pinterest. Instead of spending hours taking screenshots or hunting for inspiration, you can instantly find exactly how successful products handle onboarding, paywalls, checkout flows, whatever you're facing. Over 1.7 million product builders use Mobbin to benchmark against best in class products and show their teams proven solutions.Whether you need to convince stakeholders there's a better way to handle user activation or research how top apps approach feature discovery, Mobbin gives you the visual proof to back up your product decisions. Check out mobbin.com/aakash. That's M-O-B-B-I-N.com/A-A-K-A-S-H and get 20% off your first year. Today's episode is brought to you by Maven. The problem with most courses online, like Udemy, is there's no live component, and the instructors aren't experts in their fields, they're professors. At Maven, you get direct live access to experts and operators from the world's best tech companies. You can't get that access anywhere else, in any university, and you usually can't find them on YouTube either. I featured so many of Maven's experts in the newsletter and podcast for that reason. To help you out, I've put together a collection of courses I recommend at maven.com/x/aakash. This includes courses like AI Prototyping for PMs, Product Sense for PMs, and getting an AI PM certification. Visit it now at M-A-V-E-N.com/x/A-A-K-A-S-H. Talk

  8. 31:5535:13

    AI Content Genration

    1. AG

      to me a little bit about content generation. I'm curious, do you have any stories yourself that your team has used where AI content generation has led to a win?

    2. FT

      Yeah, we see, we see that a lot. So co- so content generation, so you basically, you ta- you take, uh, you know, content on your website, and then you ask the AI to come up with, uh, an alternative version of it, right? So that's the goal of content generation. So we have seen users, you know, uh, uh, putting that kind of experiment when they try to, uh, they create some pop-ups for, for instance, or some messaging, uh, for banners and so on. They want to make sure that they have the best, you know, performing content, and so they use AI for that. They will generate a content there to make sure that, again, they increase the engagement. Uh, because sometimes, you know, uh, you come up with a version and you like to, uh, you know, engineer that, uh, with a prompt. You're asking, you know, "Give me a, a better version for that goal." I'm sure everybody is doing it today. Uh, even, [chuckles] you know, myself, I'm doing it every day. I go and I say, "Okay, I'm doing a podcast with, uh, Akash. Can you, uh, provide me, you know, the, the, the best thing I should say there?" And so I provide my content, and then I generate that co- uh, I have, uh, AI will generate something which is appealing for a podcast.

    3. AG

      Love it. Talk to me about RAG Assist.

    4. FT

      Yeah, RAG Assist, so this is the, uh, the era of, uh, chatbot, uh, basically. Uh, so we, um, we have that in our, our tool today. So the idea is to, uh, you know, have an AI that, uh, you know, users can interact with to get answers to anything like, uh, g- you know, "I have that experiment there. Tell me what is the best, I don't know, stat engine to use for that experiment." We know that it is increasingly complicated. You have, you know, frequentist, Bayesian, sequential testing, and so on. So users are like, "Okay, which one should I choose?" Uh, and so you can, you can go and, and talk with that AI there. The AI will ask some context like, "Okay, what is that experiment about?" Blah, blah, blah. And then the AI will tell you, "For that experiment there, you should use, uh, CUPED, for instance, because I know that you have some, some, some history of data on the metric you want to improve. If you use that, you can, you can, you know, uh, you know, uh, decrease the, the duration of your, of your experiment by maybe two weeks." So that's the main goal of a- of AI Assist. We've seen also that kind of feature being very useful for generating some, uh, some code. Uh, some, some-- for instance, some, some of our customers, um, do put in place some feature flags. Well, it's, it's, it's very annoying to look at the documentation because [chuckles] you have a lot of methods, uh, to look at and so on. So you can go and talk with that AI and say, "Look, I have this feature there. I want to release it behind the feature flags. I want to variabilize some, some behavior or a feature. Well, provide me with a, with a code." And so you can, you can directly go and, and, and, and, and interact with that A- that AI that will generate, you know, some steps to put in place, uh, you know, by using some SDKs, uh, so that you can, you can go with, uh, with your feature flags as, as, as soon as possible. So that's the first part.

    5. AG

      This is huge. I feel like this is like the copilot, right?

    6. FT

      Yeah.

    7. AG

      So this is all about, especially if it's built into your experimentation platform, you being able to converse with it. Now,

  9. 35:1341:35

    How Vibe Coding Changes Experimentation

    1. AG

      you dropped two pieces of experimentation knowledge there that people's ears probably perked up on, and I want to at least give them the quick overview on. So you talked about, uh, frequentist and Bayesian statistics. So can you unpack that for us first?

    2. FT

      Yeah, sure. So there are, there are, there are usually two, you know, methods to, uh, to kind of analyze the, the experiment. So, um, we look at, uh, you know, the statistical confidence of the results you are looking at because, you know, an experiment is, uh, is basically a, um, a s- a statistic experiment. You, uh, you take a decision based on the data that you have at one point of time, and so you can use a frequentist approach, which us- which is usually what most of, uh, you know, the users will use. This is a default, you know, statistical engine there. Where then there is the Bayesian, which give you a probability. Um, you know, that variation has 80% chance of being, you know, better than the, the, the original or the control. So, you know, it's like having two s- two, two schools here. Uh, some will love Bayesian, some will love frequentist. What we see is that usually 90% of the market, you know, they go with frequentist there. And then you have, uh, CUPED, which is, uh, another statistical method which is very powerful, uh, especially if you have a lot of returning users in the B2B, for instance, uh, industry. That's very powerful because it takes some data as an input, so that when you start your experiment, you actually haveTwo weeks of, uh, you know, metrics already available. And so ultimately it means that you can get some results much faster than if you start, you know, from scratch. So they are all designed, you know, to answer a specific case. Uh, but in the end, your goal is to make sure that you are taking the, the decision with the best accuracy. And so all these statistical engines are, you know, very powerful and very known by, uh, by, uh, by our customers and users in the experimentation industry.

    3. AG

      And it's Cupid C-U-P-E-D-

    4. FT

      Exactly

    5. AG

      ... right?

    6. FT

      Yeah.

    7. AG

      All right. So one of the coolest things that you talked about as you were previewing the Gen AI wave was how the vibe coding has really changed experimentation. Can you unpack this further? What has vibe coding enabled with experimentation?

    8. FT

      Yeah. It's, it's a good question. So what, what vibe coding, uh, really, uh, you know, has un- has unlocked is, uh, this ability to take a product idea from a concept to a live prototype, and in j- in just a few hours, right? So it helps you build things, you know, faster, and I, I love it. I mean, we are u- I think all using, you know, vibe coding in, in the product management, uh, ecosystem today, like Lovable or Bolt. You, um, you know, you have your idea and you get a prototype that you can then test with your users. But as, as a product manager, my, my goal isn't to, uh, just, you know, build and deliver faster. It's to learn faster, right? You want to make sure that when you, uh, ship that feature, you are able to learn, you know, some insights so that you are sure that, you know, that feature is actually being used and increases, you know, uh, I would say the user experience, but also your, the business outcomes, which are also very, very important there. And so that's important because that way I can kill features, I can drop the features that don't bring any value to, to my users or to the business. So that's what vibe coding unlocked. You, you go from a, an idea to a prototype that you can test with all your users. But the real quest- question to me isn't can you build it? It's really should you, should you build it, right? And so that's why experimentation is very important because experimentation answers that question well. Should you build that feature or not? And so that vibe coding won't help that much with, with that. Well, for sure you can, you know, test this prototype very quickly. And again, it's, uh, it's outstanding and so on, so you can test that with, uh, a few users, and that's very great. But the truth is you, you will only really know, and, uh, and I want to say that loud [chuckles] here, if it works, it... You will only know if, you know, it works when that feature will be going live, you know, to all your users because I've seen it so many times. I mean, users say they want something. They say they're, you know, they like it when you show a prototype, and then once you, uh, launch it to, to everybody, you launch it broadly, well, nothing happens, right? And I, I'm sure you, you, you, you've got that as well as a-

    9. AG

      Yes

    10. FT

      ... as a PM by the past. And so prompt-based experimentation sits right at the crossroad of vibe coding and experimentation. Well, we actually call it internally vibe experimenting because you can, you can really... It's the same concept in the end. You, uh, you can test ideas in minutes just by prompting the, the AI, and you do it right from your website or your web application. You don't do it, you know, from a blank page. You go to your website, you prompt your idea, then the AI will generate, you know, that feature or that variation, and then you can test it directly with all your users. So you, you, you get, you get basically the best of, of both worlds here. The speed and creativity, uh, that, that is provided by, uh, I would say, uh, vibe coding, uh, with prototyping, you know, but directly from your, with your web apps. And I would say the, the structure or, or, or maybe the discipline of real testing to make the right product calls, right? You are sure that when you get the data about that experiment, you know or you do not know if that feature will improve your metric. And so this is very, very important.

    11. AG

      For my money, this is one of the coolest demos out there. Would you be willing to show us how this all works?

    12. FT

      Yeah, sure. So le- let's say I want to, uh, I want to create

  10. 41:3543:36

    Live Demo From Idea to Running Experiment in 2 Minutes

    1. FT

      an experiment on this, uh, shop here. So it is, you know, classic shop. You, you have some products there. I have a catalog. I have some, uh, some products I can see. I have some pagination there. I can then click there and see the product and add to cart that product, right? So what I can do there on is directly prompt my ideas by using, uh, Kameleoon. So I'm going to, uh, do that directly from our product. So the way it works here, you create a new experiment, then you have a choice here by prompt. You give it a name. Live. I will select, you know, my website, my website here, then a page I want to optimize, and then you create it. Then your website will open directly there. And so you have very similarly to what you, you, you can, you can see with, uh, GPT, you have on the right side, uh, you know, a UI that allows you to prompt any changes, right? So what I can do here on, so let's say I want, for instance, to change the sort here, and instead of having it ordered alphabetic- alphabetically, I want to order it, you know, uh, price high to low or price low to high. So what I can do here is that I can directly prompt that with a simple promptWhich would be this one here. So I can say, okay, change the default sorting option to price low to high. And then you hit that button there, and then the AI will start, you know, uh, you know, extracting all the context, you know, uh, it gets from that page here. So what is the framework behind, you know, where is the sorting, you know, being defined there. As you can see here, I haven't selecting anything. I'm not saying, you know, change the default sorting option to price low to high.

  11. 43:3651:49

    Two-Minute Build Achievement

    1. FT

      Uh, the sort, uh, is there. Uh, here are the default option which are, uh, currently available. I'm not providing anything there because the AI is fully capable to understand what I want to do to localize, you know, where the, the sorting is being done on the website. And then, you know, it is starting to show, you know, what it, it, um, the AI is going to do. So where is the target element there? So you can see that, you know, the element. So it has found that element there. Then what, what, you know, the AI has to do, what kind of, you know, behavioral changes, uh, it has to put in place. The JavaScript, because that thing will generate in the hands, you know, some CSS, some JavaScript, depending on your prompt. The CSS impact, accessibility will be there as well. So the AI is not only, you know, building that blindly. It is going to do every check, like, okay, is this thing also available on mobile? Uh, I know that, you know, it has to be accessible as well, so I don't have to break that when I will generate the code. So all that context is, you know, uh, taken from the website. So you can see that the JavaScript is, you know, has been, uh, has, has started to be, uh, to be generated there. Uh, and maybe, but I'm not even sure here, some CSS will be required. No, no visual changes required. And so you can see there that, you know, the price, the, the sorting has been updated. You know, price-

    2. AG

      And this was totally live, folks. Like-

    3. FT

      It was live

    4. AG

      ... that just happened in two minutes. So it went through all those systematic steps. Seems like it very much reminded me of like a chain of thought reasoning model where it showed you, "Okay, these are the five things we're gonna do." It went ahead and did those correctly, and boom, like this is Vibe experimenting.

    5. FT

      Exactly. So it's straight from your website. You know, it's not a prototype which is, uh, created. I can ship that, that experiment straight away. I can go there and I can say, "Okay, I want to finalize that," or I can even simulate, so it's like generating a preview that I can then share with, you know, other stakeholders. But if I want to, to, to ship that feature live, I just have to hit that, uh, finalize button there. And then-

    6. AG

      Seems, like, amazing for a small team, right? Where a PM might be empowered to go ahead and ship those changes. What is the right workflow at a bigger company where you have designers and engineers? How should you, uh, stage-gate this properly?

    7. FT

      Yeah. I th- I think, I mean, what's important here is that, you know, execution is definitely much faster. As you can see here in two minutes, boom, I have something that may have taken maybe three days to build because I would have to provide a mock-up to, uh, you know, uh, I, I would have to, to ask a designer to provide a mock-up. Then that mock-up is going to be, you know, uh, delivered to engineers with, you know, uh, specification documents, and then it lands, you know, in a sprint. If you are lucky that the sprint hasn't started, then it's good. Otherwise, you will have to wait that sprint to be, to be ended. And so you ended up waiting, you know, three to four weeks before that experiment goes live. Here-

    8. AG

      Oh

    9. FT

      ... in two minutes I have that experiment live. And so, you know, execution is definitely much faster, as I said, with PBX. But that does, that does not mean that you, you know, you can break everything and you can skip collaboration there. What I think works best here is keeping the same check- checkpoints in the hand that you do-- you would have in a normal product process. But you are just keeping them lighter and much faster. Before launching, I can, you know, generate that, uh, simulation link here. So you have that CTA there, which I will be able to, uh, to, uh, provide to all stakeholders. The designer can check if the variation, you know, fits the brand and the UX. If you are, you know, shipping something which we'll see, uh, you know, some advanced experiments, uh, maybe later. But if you are shipping something, you know, uh, with significant changes and so on, the designer can check everything, the brand, the UX, the style. Uh, engineers can jump, uh, in and directly look at the code. So we haven't seen that in the previous screen, but I can, I can click on the JavaScript section, and then I see the code. And so the engineers can look at what has been, uh, generated by the AI. And the data team here can directly, you know, look at the, this part here where they can see, you know, okay, what are the goals? What are we going to measure there? Is it the add to cart? Is it the, uh, the ho- the, the number of orders? So they can directly look at the metrics which, which, which are there, and they can say, "Okay, this metric is good," or, "This one will be missing, and maybe you, you, we, you need to configure that as well." So that's, that's really important, uh, there. Uh, and so a- a- as we've seen, the great thing here is, you know, everything is built directly on your website. So you can directly test everything. You can check, you know, uh, do the reviews directly based on the real thing, uh, not on specification or, or mockups. You will see, the designer will see, the engineer will see exactly what your users will see when the experiment, you know, goes live, uh, into production. So that makes, um, um, a total change here.

    10. AG

      And how many times have we seen a huge disconnect between the PRD, the Figma, and what actually gets shipped to production? This is really allowing you access to, okay, this is what we're about to ship to production. What changes do we need to make from a design point of view or from an engineering scalability point of view? And I think that the AI is super powerful. What else can it do besides text prompts? Can it take in sketches?

    11. FT

      Exactly. So let me show you, uh, that. Uh, so I'm going to open, uh, an experiment which I have created just before here. So if you look at the experiment there, so there are some, some tools as well. So if I want to start from a mockup, I can directly import that mockup there. So let's say you have a, a new layout with two products per row instead of four. You are adding a dropdown and so on. You can directly, you know, upload that mockup there and say to the AI, you know, "Build that version for me." And again, you don't have to provide a lot of context there as well. Uh, you can sketch as well. Uh, so, you know, you can use the sketch feature to, uh, I don't know, build some, uh, some popups if you want. Uh, so, you know, you can do everything there. So you can say, "I want an image." Sorry, because I'm very bad at drawing. Uh-

    12. AG

      Better than me.

    13. FT

      [laughs] Very hard with mouse there. So you can put it freely.

    14. AG

      Yeah, impossible.

    15. FT

      Exactly. And then you said, "Okay, build," you know, that, uh, newsletter popup. So of course, you need to provide a little bit of context there. Um, you know, what is the purpose of that, uh, of that, um, uh, experiment there. If you don't do that, the AI will come up with some questions as well. You know, "Do, do you want to generate an image for me?" I can maybe show you one that I've built here so that you can see the output. All right. So you can see here, so same story. Use this sketch to build a newsletter popup with a left column. Then the AI, you know, is coming back to me saying, "Okay, in order to complete this task, you need to provide, you know, the missing information. What kind of image do you want to do?" So I said, "Generate one for me. I want, you know, a basketball player wearing shoes." So the AI will provide me, you know, two images. I can choose one, you know, with a white background or with a black background. And then same story. The reasoning will be, uh, you know, shared with you, and then you end up with some J- JavaScript there and some CSS. And then you have, you know, something there. So sorry, because there is a sketch there [chuckles] on top of that. So that, that's what the AI built here. So from, from that prompt and from that sketch there. So you can see that I draw some lines there. I understood that there is some, some inputs, uh, which, uh, you know, uh, will allow the user to, um, to, um, to, um, to share the email and so on. I can go from a mockup as well.

  12. 51:4954:17

    How to Measure AI Features Properly

    1. FT

      So same story here. I've provided a mockup, and then that's the output there. So the mockup add more information like why should you join, you know, that newsletter. So same story. It has built the whole behavior there. So you can do, you know, uh, you can create some elements that, that, that do not exist like popups, banners. You can completely change the layout. So as soon as it is able to do it, you know, front end sh- front end side, you can build some onboarding flows. I know that, you know, product managers love as well to put some, uh, some flows in here to point out some features. You can also [chuckles] do that very graphically with, uh, with, uh, that, uh, that AI there. So it can build everything.

    2. AG

      That's amazing, right? This is the power of what you've been l- playing with in your AI prototyping tools, but it's actually happening in context in your experimentation system. It is configuring the experiment for you. It is even coming up with your hypothesized variables. All the steps we talked about, ideate, build, configure, analyze. It can help you with all four of those. But interestingly, right, that build one that we talked about, that one that really is causing all the troubles where people are too reliant on developers, it is just completely obliterating that step.

    3. FT

      Exactly. You can see here of that last example maybe where I have, I have asked the AI to, uh, add some quick view and some add to cart, you know, actions directly from the, from the, uh, product listing. So these features were not there, so the AI has been adding them there. So I can have access to the quick view. I can add that, that product to the cart. I have, uh, I have asked the AI to, uh, you know, add a little notification as well. And then you can iterate. You know, by prompting with the AI, I can say, "That notification there, maybe you should have a reminder of the product that has been added to the cart," you know, if it makes sense. So you can really iterate with that AI. It's just not one prompt. You can, you can add many prompts until you, uh, you know, you come up with, uh, the best version that you want to ship with your, your features. With your users, sorry.

    4. AG

      All right. So that about covers this amazing change that we've seen in AI affecting experimentation across all features. Now, can you break down for us the second side of this topic? How do we measure and

  13. 54:171:07:18

    Measuring RAG Systems 3 Key Metrics

    1. AG

      experiment AI features themselves?

    2. FT

      Yeah. I, I think there are, there are, there are some, uh, key indicators that you, you have to look at. Well, the first one is, you know, the usage or the adoption. Are people, you know, using your, your feature? The second, the second one is the, uh, the outcome. Um, you know, is that feature, you know, fixing the problem, you know, users came for, for? Or is it helping them, you know, complete, uh, more task faster? You know, they activate more experiments and so on. And, and the last one is, is the, uh, the experience. Once the feature is live, well, you want to ask users, you know, the feedback about it. And usually you, you, we, we see, you know, in AI features, you know, the, the quick thumbs up, down that you see everywhere because this is one of the only way to get, you know, feedbacks about, you know, is the answer provided by the AI accurateUh, or is it miss-missing some context there? So very important to, to include them as well. Uh, and if we look at, you know, what we are following with prompt-based experimentation, it, it is not just about usage. Uh, it's about how quickly our users can go from an idea to a running, right? Because in the end, this is ultimately the, the main goal of experimentation. So we look at things like, you know, how many prompts it takes to create an experiment. Well, if it's too many, something is off. Uh, maybe, you know, uh, the AI is not providing, you know, the, the piece of code of a variation that the user, uh, wants there. The second thing is, you know, how long it takes from the first prompt to an experiment live. Very important as well. If it is faster than any of our methods, that, that's a big win for us, right? Instead of taking days, maybe you took five minutes to, uh, to put in a production your experiment. And the last one, which is also very important for us, is how often developers still need to, uh, step in. If you look at the past, before, uh, prompt-based experimentation, I would say that maybe 80% or more of the experiments needed code. That's the true, the reality of, uh, our market. If we drop that number to, let's say, 20%, which is our target at Kameleoon, well, we know it is, you know, a massive ga-game changer for the industry, uh, because we haven't seen that, you know, any-anywhere else, uh, in the last decade. So those are the metrics we, uh, we do follow. And so again, it's not only about usage, it's about the value, uh, that we want to generate to our users. Uh, you know, the feature, uh, needs to, uh, allow them to, uh, to run, you know, experiment faster, to get to running faster, and so on.

    3. AG

      I love this. It's almost structured as like the whys, the five whys, right? Are people using it? Is it solving the problem? Was it helpful or reliable? And then translate that to your experiment, like you've done with prompt-based experimentation. I'm curious, what was the North Star metric you guys looked at, and how did you determine the North Star metric for success?

    4. FT

      Yeah, it's a good question. So at Kameleoon, we look at the, uh, number of experiments, you know, which run daily on our platform. So we have been following that for, you know, many years. Uh, we came up with, uh, that North Star metric because it is, you know, simple to, uh, to follow, to measure. We don't have to ask data scientists to get access to that metric. And the sec- the s- the second reason is that it is, it is very easy to anyone in the organization, right? It's not something very complex, uh, you know, very specific to a product team and so on. The number of experiments that run daily, that's the metric we look at because we know that, you know, when there is a drop on that metric, it means that there is some churn at Kameleoon. It means that we have less customers, you know, creating campaigns, uh, every day. Uh, and if it goes up, it means that either, you know, we are increasing the business with new customers, or we have, you know, more and more users creating experiment. And so this is critical for prompt-based experiments because, uh, again, our goal is to have more users, you know, creating experiments so that it is not only limited to, uh, a team of user, it is not only limited to product teams or to engineering teams. We really want to democratize experiments as part of the organization. And so the more experiments are created, well, the better it is. It means that, you know, prompt-based experimentation is delivering on the promise, uh, which is again, to be able to have an idea to an experiment in minutes, and not days or weeks as it is in most organizations today.

    5. AG

      So I think there's a lot of hidden takeaways there for people listening. The first is the North Star metric you chose is basically your business's overall North Star metric. You're not just choosing some random North Star metric specifically for this AI feature experiment. If it's a big AI swing, you want to just move the whole business. The second thing is that you didn't choose like a revenue-oriented feature. You actually chose a product input metric that you guys can change, but that you know is highly correlated with impacts on your output metrics like revenue. Are those fair takeaways?

    6. FT

      Exactly. Exactly. Because with that North Star metric, well, everyone can, you know, play the same rules of the game. If you look at product teams, you know, they can improve features that drive it forward. That's why we, uh, you know, introduced prompt-based experimentation. It's a product initiative, uh, as well. And I know that my product, you know, managers can deliver on that and increase that North Star metric. Customer success as well. Customer success can, you know, help users get more value out of the product, and their goal is to increase the North Star metric. Because if they are, if they have happy customers, if they know how to use the product, you know well, they will ultimately create more experiments. Marketing can also work, you know, in, uh, you know, pu- uh, publishing some benchmark per industry, per market, so that, you know, our customers can compare with each other. So every, uh, I would say department, divisions in the organization work towards, you know, improving that North Star metric. It's not just revenue, because revenue, well, you d- you, you can't, you can't all add on that metric. It's very hard in the end. But if we, you know, uhfind an after metric which is highly correlated to revenue, and that can be again, uh, actionab- we say actionable by any team. Uh, that's something that can change the face of an organization there, because everybody is moving into the, the same direction in the end.

    7. AG

      So that's how you measure kind of a game changer feature like PBX. It's really changing the way people use your platform. Talk to us a little bit about how you measure RAG systems.

    8. FT

      Yeah. So RAG systems, so again, this is, this is what is usually powered behind this, uh, you know, uh, assistant, uh, there. So same story. I think it's very important to look at some, um, um, business-oriented metrics, uh, there. You know, adoption, uh, the, the usage, you know, the outcome, the experience. And then I think there are also some, uh, some, uh, technical metrics, uh, to look at. And at, at Kameleoon, we have been looking at them before, you know, shipping that, uh, that feature into production. Uh, and I mean, there, there are three things we look at, uh, at Ka- Kameleoon, and I think it can be applied to any organization, uh, using a RAG, um, a RAG system there, the accuracy. You know, is the answer really based on some, uh, documentation that, uh, you know, do exist? Uh, you know, for instance, if someone ask, "How is traffic, you know, split in a multi-armed bandit?" And if the AI answers, "Well, traffic is always split evenly across all variants," well, that's, that's a miss, right? So that's a bad-

    9. AG

      Yeah

    10. FT

      ... that's a bad answer in terms of accuracy. Relevance is very important as well. So relevance is all about, you know, uh, does it, uh, actually answer all the user's question? For instance, if I ask something like, you know, "What are, uh, ways to, uh, target feature flags, or what are the different options there are?" And the AI will answer saying, "You need to create a new experiment." Well, it, it, you know, it's a miss as well. Uh, it is not answering the, the, the, the targeting logic or the targeting part of the experiment creation. And the last one is the context quality. Uh, you know, where are the documents or documentation it used really helpful. Um, you know, if it pulls some outdated documentation or some, you know, uh, or not the most recent, I would say, release notes, well, some- something is off as well there. So these are good signals to me to check if the AI is giving, uh, answers you, you can trust. Okay? Now, [chuckles] I ju- just wanted to, to add on something. There are also, uh, you know, if you look at, um, what we've been doing at Kameleoon, if I can, if I can say s- something as well there. One, one method which works well, because I'm sure people will ask, "Okay, but how, how do I do that? How do I make sure that, you know, accuracy, relevance, and so on, uh, you know, is good?" One method we've seen working at Kameleoon is to use, you know, a LLM, uh, so a learning language model as a judge, right? So you have a judge which will judge the performance of your RAG, uh, system. So for instance, here for the response relevance, uh, which is again, how well does the chatbot answer, you know, match the user's question. One method is to, to use an LLM which will generate two free new questions based on the answer which is provided by the first LLM. So for instance, if I say, "How do I enable AI targeting Ka- in Kameleoon?" And the chatbot answers me, "You need to go to the experiment settings, enable the AI targeting toggle, and save." That's a good answer, right? And so the LLM will, the alternative LLM as a judge, you ask him to generate two free questions based on that answer. Like, "Where do I find the AI targeting setting?" Or, "Do-- How do I activate AI targeting?" And so these questions are very similar to the, to the question being asked by the user. So it means that, you know, the, uh, relevance is there. And so you can do that, you know, for the, uh, uh, the, the metrics and the retrieved context as well, context relevance. You know, the, the scoring, the LLM can score the answer as well. You can do that with the accuracy. Um, that's what works, uh, well, uh, today, I think.

    11. AG

      So it's that combination of looking at faithfulness, relevance, context quality, operationalizing those into metrics for you, and then building a good eval suite, potentially using an LLM as judge to look at things like response and relevance, context relevance, faithfulness. That's the overall recipe for measuring an AI feature. Are there any big mistakes you see teams making while measuring these AI features?

    12. FT

      Yeah. I, I guess one of the biggest mistake, uh, is, you know, to stop at usage, uh, again, again, be- because, you know, mo- most of what we see is when, you know, some users, uh, you know, will complain about, uh, you know, an AI feature, we'll just stop using it, right? Uh, so I mean, usage is one piece of the, uh, of the cake here, but you need to look at the outcome and the user experience. And so some of the insights I've been sharing there are pretty useful for, for you as well there.

    13. AG

      Amazing. Before you go, I have a experimentation expert here with us. So I have a couple questions for you. First, what

  14. 1:07:181:10:10

    Best Experimentation Company Booking.com

    1. AG

      company do you think is doing experimentation best?

    2. FT

      Yeah. So I'm not going to choose the usual one like Netflix, Airbnb, Microsoft, and so on. Uh, I'm thinking about, you know, a company called Booking.com, [chuckles] very known as well. And, you know-They have built such a strong, uh, I would say culture of, uh, of experimentation. You know, nothing goes live into production, whether you are, you know, shipping feature on your website, on your application, on your mobile application. They do everything through experimentation. So I think it's a, it's a great, uh, model for any company, uh, that wants to, uh, to learn fast and build with confidence.

    3. AG

      And I think they test almost everything, right? So they've really built that experimentation culture from the top.

    4. FT

      Yes. Exactly. Exactly.

    5. AG

      And I think that's incredibly rare, but it's incredibly common amongst winning companies.

    6. FT

      Yeah. Yeah. And my advice, I think, just to, uh, not, you know, scare every people there, I think it's, it's not, it's not have, you know, uh, don't, don't ha- don't... You don't have to go, you know, all in one and, and test everything from day one. Uh, you know, at Kameleoon we have been, you know, going step by step on this topic, even at Kameleoon. So we are also shipping feature, experimenting with them and so on. And I think you have to start simple at the beginning. If you are not doing any experimentation today, well first use, you know, feature flags, uh, just to push code, you know, safely into production. At least you have a turn on/off. You can see how it goes into production. Then you introduce wha- what we call progressive rollouts or targeted delivery, where you ship the feature to, let's say, 20% of your users or to a, to a segment of users, and then you learn from that, right? So you learn from the usage, uh, you know, are they using that feature? Are they complaining about the, uh, the, the user experience? And then you improve that feature before releasing it to every, uh, single user. And once you're ready, then you start, you know, adding some A/B testing there to make data-driven, you know, decisions at scale. And I mean, I, I do believe that, that people, uh, have been putting barriers, uh, because of the high build, uh, cost. Uh, so that's why I think with AI, that thing is, uh, going to be, uh, to be removed, uh, so that you can hopefu- hopefully, you know, going to, uh, to push more things behind experiments as well, because you are able to ship them, you know, without having to bother your developer teams.

    7. AG

      Yeah. AI is a game changer to becoming a company like Booking. You have no excuse at this point.

  15. 1:10:101:13:52

    Biggest PM Mistakes in Experimentation

    1. AG

      What are the biggest mistakes PMs make when it comes to experimentation?

    2. FT

      Yeah. I wouldn't say there are, there are mistakes, but there are, there are some common mis- misconceptions there about experimentation. And, you know, I put three of them here, but there are, there are, there are others of course. And I, I think the, the, you know, the, the top three I, I hear the most about is it will slow down our delivery. The second one is we don't have enough traffic there. Or, you know, because you do product discovery, well, I'm fine. You know, I, I've learned a lot from product discovery, so why, why should I bother with experimentation? And so n- none of these obje- objections for me are, are, are perfectly true. I mean, ex- experimentation a- actually helps team, uh, build, you know, better products. There are many studies about it. Uh, there is one which is, which I like to talk about, which is, uh, which is one being done by, uh, Harvard Business Review, which basically says this, there is a, a direct correlation between the number of experiments you run annually and your revenue growth. That's a fact, right? So the more you experiments, you know, the faster you can grow there. So if we look at all of these objections there are, I think the first one, well actually it sp- it speeds things up. Uh, it's not slowing, you know, uh, it's not slowing things, uh, down there. With experimentation you can release, as I said, early versions of your feature. You can learn from, you know, real users, you know, using that feature, and then you, you improve as you go, right? That way you get to the right feature faster in the end. So you don't have to work on the, the best feature before shipping it. You ship the first version of it, you iterate with it, you learn, and then you ship the final version to, to all your users. Like I always say to my own team, you know, fast gets, you know, good, good quicker than good get fast. So the, the key message here is that you don't, not, do not wait for the perfect feature. You ship early and you iterate until you reach, you know, your goal. You will learn so much more in a short period of time that if you don't do experimentation there. The second one very quickly here, well it's, it's always a little bit funny to me because teams are often find validating features with, you know, just a few users. Uh, they have a prototype, they go and show that prototype to 10 users, and then, you know, they think that all their users will love the feature, right? So why, why it is, it is a problem with experimentation there. So it doesn't make any sense to, uh, to me there. And the traffic is, is one point. The number of conversions is also very important when you look at, you know, the sample size you need. You need, um, a good number of, um, of visitors, but you also need to look at the metric. The more conversion you have, you know, the fastest you will get some, some, some insights about your experiment. And the last one, product discovery is enough. Um, I, I would rather say that discovery and experimentation go hand in hand. Uh, you know, disc- discovery gives you early signals. That's a fact. Uh, that's why it's very, very important to do some product discovery. Uh, you know, uh, but, you know, experimentation will tell you exactly what, you know, users do when they interact with, uh, that feature. So product discovery will tell you what users say they want. Experimentation tell you, uh, what they actually do. So very important for me to, uh, to have both so that you, you get the full picture, uh, in production there.

    3. AG

      Amazing.

  16. 1:13:521:15:03

    Ending

    1. AG

      This episode, our complete course on AI experimentation is coming out an episode after our complete course on AI discovery with Theresa Torres. I think it's the perfect combo of content, perfect ending for our episode today. Fred, this has truly been a masterclass. If people wanna find more, if they wanna connect with you, where should they go?

    2. FT

      Well, directly on my, on LinkedIn. Directly on LinkedIn, uh, you know, I will answer on, you know, most of the, most of the question I got.

    3. AG

      Amazing. Thank you so much for your time today.

    4. FT

      Yeah. It has been a pleasure. Thanks for having me.

    5. AG

      So if you wanna learn more about how to shift to this way of working, check out our full conversation on Apple or Spotify Podcasts. And if you want the actual documents that we showed, the tools and frameworks and public links, be sure to check out my newsletter post with all of the details. Finally, thank you so much for watching. It would really mean a lot if you could make sure you are subscribed on YouTube, following on Apple or Spotify Podcasts, and leave us a review on those platforms. That really helps grow the podcast and support our work so that we can do bigger and better productions. I'll see you in the next one.

Episode duration: 1:15:11

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode MC1FfWdAyX4

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome