Skip to content
Aakash GuptaAakash Gupta

AI for Product Managers: 10X Growth with Smart Experimentation

AI has completely transformed how we run experiments. What used to take weeks can now happen in minutes. Frederic De Todaro, CPO at Kameleoon with 12+ years helping thousands of teams, reveals exactly how AI is revolutionizing experimentation from ideation to analysis. ⏰ Timestamps: 00:00 How AI Changed Experimentation Overview 01:54 The 4 Steps of Experimentation Framework 14:12 ADS 16:00 How AI has Changed Experimentation 21:08 User Behaviour Models 26:56 Multi-Armed Bandit vs Contextual Bandit 30:05 ADS 31:55 AI Content Genration 35:13 How Vibe Coding Changes Experimentation 41:35 Live Demo From Idea to Running Experiment in 2 Minutes 43:36 Two-Minute Build Achievement 51:49 How to Measure AI Features Properly 54:17 Measuring RAG Systems 3 Key Metrics 01:07:18 Best Experimentation Company Booking.com 01:10:10 Biggest PM Mistakes in Experimentation 01:13:52 Ending Transcript: https://www.news.aakashg.com/p/frederic-de-todaro-podcast -- 🏆 Thanks to our sponsors: 1. Mobbin: Discover real-world design inspiration https://mobbin.com/aakash 2. Jira Product Discovery: Build the right thing, reliably https://www.atlassian.com/software/jira/product-discovery 3. Product Faculty: Get $550 off https://maven.com/product-faculty/ai-product-strategy-certificate?promoCode=AAKASH550C1 4. Maven: Get $100 off my curation of their top courses - http://maven.com/x/aakash -- Key Takeaways: 1. The Build Bottleneck is Dead. Most product ideas never get tested because building takes weeks. AI just killed this constraint - you can now go from idea to live experiment in 2 minutes using plain English prompts. 2. Prompt Your Way to Tests. Type "change sorting to price low to high" and AI builds the variation in 2 minutes. Still run it through design, engineering, and data reviews - but now you're reviewing the actual live variation, not specs. 3. Beyond Text: Draw Your Ideas. Upload mockups or sketch rough concepts. AI transforms drawings into live experiments you can actually review - newsletter popups, onboarding flows, layout changes. Share preview links with stakeholders before going live. 4. AI Reads User Intent. Like a digital sales rep, AI scores every visitor's conversion likelihood in real-time. Show discounts only to users who need them to buy, not everyone who visits your site. 5. Failed Tests Become Wins. 80% of experiments fail overall, but AI automatically finds segments where they succeed. "Failed globally but increased mobile conversions 25%" - insights that would take hours manually. 6. Speed vs Accuracy Trade-offs. Multi-armed bandits optimize news headlines in hours, not weeks - perfect when time beats perfect measurement. Contextual bandits personalize every individual user's experience. 7. Humans Still Drive Strategy. PMs bring business context AI doesn't have - customer constraints, strategic priorities, success metrics. Data scientists validate statistical approaches. Designers review brand compliance. AI handles building variations fast. 8. Measure What Actually Matters. Track business metrics, not just usage: prompts needed per experiment, time from idea to live test, developer dependency rate. If you still need developers 80% of the time, AI isn't solving your bottleneck. 9. Discovery Meets Testing. User interviews reveal what people say they want. Experiments show what they actually do. Combine both for complete insight - validate problems through discovery, solutions through testing. 10. Experimentation Culture Wins. Harvard Business Review found direct correlation between experiments run annually and revenue growth. More experiments = faster growth. AI finally makes this accessible to every team. --- 👨‍💻 Where to find Fred: LinkedIn: https://www.linkedin.com/in/fdetodaro/ Kameleoon: https://kameleoon.com 👨‍💻 Where to find Aakash: Twitter: https://www.twitter.com/aakashg0 LinkedIn: https://www.linkedin.com/in/aagupta/ #ai #experimentation #abtesting #productmanagement 🧠 About Product Growth: The world's largest podcast focused solely on product + growth, with over 180K listeners. Hosted by Aakash Gupta, who spent 16 years in PM, rising to VP of product, this 2x/week show covers product and growth topics in depth. 🔔 Subscribe and turn on notifications to master AI-powered experimentation!

Aakash GuptahostFrederic De Todaroguest
Aug 29, 20251h 15mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 1:54

    Why AI is the biggest shift in experimentation (and why most teams still under-test)

    Aakash introduces Frederic De Todaro and frames the core problem: experimentation is powerful, but most organizations don’t A/B test most releases because the process is too slow and developer-dependent. Frederic argues AI is the biggest change he’s seen in his career and can make experimentation far more accessible.

    • AI is the biggest driver of change in experimentation over the last decade
    • Most product teams don’t test the majority of shipped work
    • The build phase is the long-standing bottleneck due to developer constraints
    • AI creates an opportunity to increase experimentation velocity and coverage
  2. 1:54 – 14:12

    The 4-step experimentation loop and where AI helps most

    Frederic outlines a simple experimentation framework: idea/assumption → build variations → configure targeting & KPIs → analyze results. The discussion maps how AI can accelerate each stage, especially build and analysis, while keeping humans accountable for context and decision quality.

    • Four steps: ideate/hypothesize, build, configure (targeting + KPIs), analyze
    • Experimentation is an iterative learning loop
    • AI can speed up all steps, but build has historically been hardest
    • Good hypotheses and measurable metrics remain prerequisites for success
  3. 14:12 – 16:00

    AI-assisted ideation: better ideas through better context (and ‘UX memory’)

    They discuss how generative AI improves the ideation phase when it has strong context—product history, constraints, prior experiments, and even screenshots. Frederic highlights the value of connecting AI to an experimentation knowledge base so it can surface previously run tests and results, creating organizational “UX memory.”

    • GenAI can generate experiment ideas if given rich business/product context
    • Using projects/prompts with historical context improves output quality
    • AI can warn: “this was tested before” and share past results
    • ‘UX memory’ helps avoid duplicated work and strengthens ideation
  4. 16:00 – 21:08

    Human-in-the-loop roles: PM vs data scientist vs AI analyst

    Frederic clarifies division of labor: PMs provide business context and constraints; data scientists ensure methodological rigor, bias checks, and model choices; AI automates pattern finding and summarization. The group emphasizes that AI boosts speed, but humans are still needed for accountability and correctness.

    • PM role: business context, constraints, hypothesis, definition of success
    • Data scientist role: validate plausibility, detect bias, select methods/models (RAG vs fine-tune)
    • AI role: summarize results, detect patterns, suggest next steps for inconclusive tests
    • AI makes experimentation faster and more accessible, but humans ensure trust
  5. 21:08 – 26:56

    Two waves of AI in experimentation: ML (2016) → GenAI (2022+)

    Frederic gives a historical model: the machine learning wave improved targeting, traffic allocation, and analysis; the generative AI wave enabled content generation, assistants, and rapid creation of variations via prompting. The key leap is moving from tool-assisted testing to prompt-based experimentation that can ship in minutes.

    • ML wave (around 2016): AI targeting, multi-armed/contextual bandits, result analysis aids
    • GenAI wave (from 2022): content generation and RAG-based assistants
    • Late 2023+: models and ‘vibe coding’ accelerate idea-to-experiment dramatically
    • Prompt-based experimentation aims to remove sprint-level build delays
  6. 26:56 – 30:05

    AI targeting: predicting intent to personalize who sees what

    They explain AI targeting as real-time intent scoring (likelihood to convert, churn risk, etc.) based on behavior signals. This score replaces manual segmentation and enables smarter targeting—like showing discounts only to users who need them rather than giving coupons to everyone.

    • AI targeting generates an intent/conversion score from on-site behavior
    • Replaces manual segment-building during experiment configuration
    • Common use case: targeted discounts/coupons to prevent unnecessary margin loss
    • Enables personalized experiences based on predicted propensity
  7. 30:05 – 31:55

    Multi-armed bandit vs contextual bandit: performance vs personalization

    Frederic distinguishes standard A/B testing from bandit approaches. Multi-armed bandits quickly shift traffic toward the best-performing variant (trading off statistical accuracy for speed), while contextual bandits learn which variant works best per user/context to power hyper-personalization—both typically requiring substantial traffic.

    • A/B testing: equal split to reach high-confidence conclusion
    • Multi-armed bandit: reallocates traffic to best performer faster; less accuracy early on
    • Contextual bandit: chooses best variant per user/context; enables personalization
    • Best fit requires high traffic; contextual bandits need even more data to learn preferences
  8. 31:55 – 35:13

    Opportunity detection: learning from the 80% of ‘failed’ experiments

    Because most experiments don’t produce an overall lift, teams spend time drilling into segments to find where a variant might work. AI opportunity detection automates those breakdowns (device, segment, etc.) to propose actionable follow-ups like targeting the winning variant only to certain users.

    • Less than ~20% of experiments show an overall positive lift
    • Analysts often manually slice data to find segment-level wins
    • AI can automatically detect where/why results differ across segments
    • Turns inconclusive tests into next-step hypotheses (e.g., mobile-only targeting)
  9. 35:13 – 41:35

    GenAI for experimentation: content generation and RAG assistants inside tools

    Frederic shares practical uses of GenAI: generating alternative copy for banners/popups and providing an in-product assistant to answer experimentation questions and generate SDK/feature-flag code. They briefly cover statistical approaches (frequentist vs Bayesian) and CUPED as examples of what an assistant can recommend.

    • Content generation: create copy variations quickly for messaging, banners, popups
    • RAG assistant: answers “which stats engine should I use?” and can guide configuration
    • Assistant can generate code snippets for feature flags/SDK integrations
    • Stats concepts referenced: frequentist vs Bayesian; CUPED for faster experiments with historical data
  10. 41:35 – 43:36

    From vibe coding to ‘vibe experimenting’: prompt-based experimentation concept

    They contrast rapid prototyping (“can you build it?”) with experimentation (“should you build it?”). Prompt-based experimentation sits between the two: generate real, testable variations directly on production surfaces so teams can learn from actual behavior at scale, not just prototype feedback.

    • Vibe coding accelerates prototyping, but doesn’t validate real-world impact
    • Users often say they like prototypes but behavior differs in production
    • Prompt-based experimentation creates variations directly on the real site/app
    • Goal is faster learning and better product decisions, not just faster shipping
  11. 43:36 – 51:49

    Live demo: idea → running experiment in ~2 minutes (prompt-to-variation)

    Frederic demonstrates creating an experiment on an e-commerce catalog by prompting a change (default sorting to price low-to-high). The system extracts page context, identifies elements, generates JS/CSS, runs checks (including accessibility and mobile considerations), and produces a shippable variation rapidly.

    • Workflow: create experiment → open target page → prompt desired change
    • AI localizes the relevant UI element and generates implementation code
    • Outputs include target element, behavior changes, JS/CSS, and checks
    • Result: a working, previewable, shippable variation created in minutes
  12. 51:49 – 54:17

    Governance at scale: keeping design/engineering/data checkpoints—just faster

    They discuss how larger orgs should use prompt-based experimentation without bypassing collaboration. The proposed approach: keep the same stage gates (design review, engineering code review, data/metrics review) but make them lighter through simulation links, code visibility, and in-context previews of what will ship.

    • Prompt-based speed doesn’t eliminate the need for cross-functional review
    • Simulation/preview links help stakeholders review the real experience
    • Design reviews ensure UX/brand consistency; engineers review generated code
    • Data teams validate goals and instrumentation before launch
  13. 54:17 – 1:07:18

    Beyond text prompts: generating variations from mockups and sketches

    Frederic shows how experiments can be generated from uploaded mockups or quick sketches, including creating net-new UI elements like a newsletter popup. The AI asks clarifying questions (e.g., image needs), generates assets, and produces the required front-end code to implement the experience.

    • Mockup-to-variation: upload a design and prompt “build this version”
    • Sketch-to-variation: rough drawings can define layout/components
    • AI can generate images/assets and ask clarifying questions when needed
    • Supports more advanced UI changes (popups, layout changes, onboarding flows)
  14. 1:07:18 – 1:10:10

    Measuring AI features: adoption, outcomes, experience—and the right North Star

    The conversation shifts to how to evaluate AI features themselves. Frederic recommends tracking adoption, task outcomes, and user experience feedback (thumbs up/down), plus value-focused operational metrics like time-to-live and reduced developer involvement; he shares Kameleoon’s North Star: daily experiments running.

    • Core AI feature KPIs: adoption/usage, outcomes (task success), experience (feedback)
    • Operational metrics for PBX: prompts per experiment, time from prompt to live, % dev intervention
    • Kameleoon North Star: number of experiments running daily (correlated with churn/growth)
    • Choose actionable metrics that align teams and correlate with business outcomes
  15. 1:10:10 – 1:13:52

    Measuring RAG systems: accuracy, relevance, context quality (+ LLM-as-judge)

    Frederic provides a practical measurement model for RAG assistants: accuracy (faithfulness), relevance (does it answer the question), and context quality (are retrieved documents current and useful). He describes using an LLM-as-judge approach to score and validate these qualities and warns against stopping at usage alone.

    • Three RAG metrics: accuracy/faithfulness, response relevance, context quality
    • Failures include confidently wrong answers, off-topic answers, or outdated sources
    • LLM-as-judge can generate similar questions and score relevance/quality
    • Big mistake: measuring only usage instead of outcomes and reliability
  16. 1:13:52 – 1:15:11

    Experimentation culture: Booking.com example + common PM misconceptions

    Frederic points to Booking.com as a model experimentation organization where nearly everything ships through tests, but advises adopting it step-by-step. He addresses common objections—experimentation slows delivery, lack of traffic, and “discovery is enough”—arguing that experimentation accelerates learning, traffic isn’t the only limiter, and discovery must be paired with real behavioral validation.

    • Booking.com highlighted as best-in-class experimentation culture
    • Adopt progressively: feature flags → rollouts/targeting → A/B testing at scale
    • Misconceptions: ‘it slows delivery,’ ‘we lack traffic,’ ‘discovery is enough’
    • Experimentation reveals what users do (not just what they say) and correlates with growth

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.