AI for Product Managers: 10X Growth with Smart Experimentation

AI has completely transformed how we run experiments. What used to take weeks can now happen in minutes. Frederic De Todaro, CPO at Kameleoon with 12+ years helping thousands of teams, reveals exactly how AI is revolutionizing experimentation from ideation to analysis. ⏰ Timestamps: 00:00 How AI Changed Experimentation Overview 01:54 The 4 Steps of Experimentation Framework 14:12 ADS 16:00 How AI has Changed Experimentation 21:08 User Behaviour Models 26:56 Multi-Armed Bandit vs Contextual Bandit 30:05 ADS 31:55 AI Content Genration 35:13 How Vibe Coding Changes Experimentation 41:35 Live Demo From Idea to Running Experiment in 2 Minutes 43:36 Two-Minute Build Achievement 51:49 How to Measure AI Features Properly 54:17 Measuring RAG Systems 3 Key Metrics 01:07:18 Best Experimentation Company Booking.com 01:10:10 Biggest PM Mistakes in Experimentation 01:13:52 Ending Transcript: https://www.news.aakashg.com/p/frederic-de-todaro-podcast -- 🏆 Thanks to our sponsors: 1. Mobbin: Discover real-world design inspiration https://mobbin.com/aakash 2. Jira Product Discovery: Build the right thing, reliably https://www.atlassian.com/software/jira/product-discovery 3. Product Faculty: Get $550 off https://maven.com/product-faculty/ai-product-strategy-certificate?promoCode=AAKASH550C1 4. Maven: Get $100 off my curation of their top courses - http://maven.com/x/aakash -- Key Takeaways: 1. The Build Bottleneck is Dead. Most product ideas never get tested because building takes weeks. AI just killed this constraint - you can now go from idea to live experiment in 2 minutes using plain English prompts. 2. Prompt Your Way to Tests. Type "change sorting to price low to high" and AI builds the variation in 2 minutes. Still run it through design, engineering, and data reviews - but now you're reviewing the actual live variation, not specs. 3. Beyond Text: Draw Your Ideas. Upload mockups or sketch rough concepts. AI transforms drawings into live experiments you can actually review - newsletter popups, onboarding flows, layout changes. Share preview links with stakeholders before going live. 4. AI Reads User Intent. Like a digital sales rep, AI scores every visitor's conversion likelihood in real-time. Show discounts only to users who need them to buy, not everyone who visits your site. 5. Failed Tests Become Wins. 80% of experiments fail overall, but AI automatically finds segments where they succeed. "Failed globally but increased mobile conversions 25%" - insights that would take hours manually. 6. Speed vs Accuracy Trade-offs. Multi-armed bandits optimize news headlines in hours, not weeks - perfect when time beats perfect measurement. Contextual bandits personalize every individual user's experience. 7. Humans Still Drive Strategy. PMs bring business context AI doesn't have - customer constraints, strategic priorities, success metrics. Data scientists validate statistical approaches. Designers review brand compliance. AI handles building variations fast. 8. Measure What Actually Matters. Track business metrics, not just usage: prompts needed per experiment, time from idea to live test, developer dependency rate. If you still need developers 80% of the time, AI isn't solving your bottleneck. 9. Discovery Meets Testing. User interviews reveal what people say they want. Experiments show what they actually do. Combine both for complete insight - validate problems through discovery, solutions through testing. 10. Experimentation Culture Wins. Harvard Business Review found direct correlation between experiments run annually and revenue growth. More experiments = faster growth. AI finally makes this accessible to every team. --- 👨‍💻 Where to find Fred: LinkedIn: https://www.linkedin.com/in/fdetodaro/ Kameleoon: https://kameleoon.com 👨‍💻 Where to find Aakash: Twitter: https://www.twitter.com/aakashg0 LinkedIn: https://www.linkedin.com/in/aagupta/ #ai #experimentation #abtesting #productmanagement 🧠 About Product Growth: The world's largest podcast focused solely on product + growth, with over 180K listeners. Hosted by Aakash Gupta, who spent 16 years in PM, rising to VP of product, this 2x/week show covers product and growth topics in depth. 🔔 Subscribe and turn on notifications to master AI-powered experimentation!

Aakash GuptahostFrederic De Todaroguest

Aug 29, 20251h 15mWatch on YouTube ↗

CHAPTERS

0:00 – 1:54
Why AI is the biggest shift in experimentation (and why most teams still under-test)
Aakash introduces Frederic De Todaro and frames the core problem: experimentation is powerful, but most organizations don’t A/B test most releases because the process is too slow and developer-dependent. Frederic argues AI is the biggest change he’s seen in his career and can make experimentation far more accessible.
1:54 – 14:12
The 4-step experimentation loop and where AI helps most
Frederic outlines a simple experimentation framework: idea/assumption → build variations → configure targeting & KPIs → analyze results. The discussion maps how AI can accelerate each stage, especially build and analysis, while keeping humans accountable for context and decision quality.
14:12 – 16:00
AI-assisted ideation: better ideas through better context (and ‘UX memory’)
They discuss how generative AI improves the ideation phase when it has strong context—product history, constraints, prior experiments, and even screenshots. Frederic highlights the value of connecting AI to an experimentation knowledge base so it can surface previously run tests and results, creating organizational “UX memory.”
16:00 – 21:08
Human-in-the-loop roles: PM vs data scientist vs AI analyst
Frederic clarifies division of labor: PMs provide business context and constraints; data scientists ensure methodological rigor, bias checks, and model choices; AI automates pattern finding and summarization. The group emphasizes that AI boosts speed, but humans are still needed for accountability and correctness.
21:08 – 26:56
Two waves of AI in experimentation: ML (2016) → GenAI (2022+)
Frederic gives a historical model: the machine learning wave improved targeting, traffic allocation, and analysis; the generative AI wave enabled content generation, assistants, and rapid creation of variations via prompting. The key leap is moving from tool-assisted testing to prompt-based experimentation that can ship in minutes.
26:56 – 30:05
AI targeting: predicting intent to personalize who sees what
They explain AI targeting as real-time intent scoring (likelihood to convert, churn risk, etc.) based on behavior signals. This score replaces manual segmentation and enables smarter targeting—like showing discounts only to users who need them rather than giving coupons to everyone.
30:05 – 31:55
Multi-armed bandit vs contextual bandit: performance vs personalization
Frederic distinguishes standard A/B testing from bandit approaches. Multi-armed bandits quickly shift traffic toward the best-performing variant (trading off statistical accuracy for speed), while contextual bandits learn which variant works best per user/context to power hyper-personalization—both typically requiring substantial traffic.
31:55 – 35:13
Opportunity detection: learning from the 80% of ‘failed’ experiments
Because most experiments don’t produce an overall lift, teams spend time drilling into segments to find where a variant might work. AI opportunity detection automates those breakdowns (device, segment, etc.) to propose actionable follow-ups like targeting the winning variant only to certain users.
35:13 – 41:35
GenAI for experimentation: content generation and RAG assistants inside tools
Frederic shares practical uses of GenAI: generating alternative copy for banners/popups and providing an in-product assistant to answer experimentation questions and generate SDK/feature-flag code. They briefly cover statistical approaches (frequentist vs Bayesian) and CUPED as examples of what an assistant can recommend.
41:35 – 43:36
From vibe coding to ‘vibe experimenting’: prompt-based experimentation concept
They contrast rapid prototyping (“can you build it?”) with experimentation (“should you build it?”). Prompt-based experimentation sits between the two: generate real, testable variations directly on production surfaces so teams can learn from actual behavior at scale, not just prototype feedback.
43:36 – 51:49
Live demo: idea → running experiment in ~2 minutes (prompt-to-variation)
Frederic demonstrates creating an experiment on an e-commerce catalog by prompting a change (default sorting to price low-to-high). The system extracts page context, identifies elements, generates JS/CSS, runs checks (including accessibility and mobile considerations), and produces a shippable variation rapidly.
51:49 – 54:17
Governance at scale: keeping design/engineering/data checkpoints—just faster
They discuss how larger orgs should use prompt-based experimentation without bypassing collaboration. The proposed approach: keep the same stage gates (design review, engineering code review, data/metrics review) but make them lighter through simulation links, code visibility, and in-context previews of what will ship.
54:17 – 1:07:18
Beyond text prompts: generating variations from mockups and sketches
Frederic shows how experiments can be generated from uploaded mockups or quick sketches, including creating net-new UI elements like a newsletter popup. The AI asks clarifying questions (e.g., image needs), generates assets, and produces the required front-end code to implement the experience.
1:07:18 – 1:10:10
Measuring AI features: adoption, outcomes, experience—and the right North Star
The conversation shifts to how to evaluate AI features themselves. Frederic recommends tracking adoption, task outcomes, and user experience feedback (thumbs up/down), plus value-focused operational metrics like time-to-live and reduced developer involvement; he shares Kameleoon’s North Star: daily experiments running.
1:10:10 – 1:13:52
Measuring RAG systems: accuracy, relevance, context quality (+ LLM-as-judge)
Frederic provides a practical measurement model for RAG assistants: accuracy (faithfulness), relevance (does it answer the question), and context quality (are retrieved documents current and useful). He describes using an LLM-as-judge approach to score and validate these qualities and warns against stopping at usage alone.
1:13:52 – 1:15:11
Experimentation culture: Booking.com example + common PM misconceptions
Frederic points to Booking.com as a model experimentation organization where nearly everything ships through tests, but advises adopting it step-by-step. He addresses common objections—experimentation slows delivery, lack of traffic, and “discovery is enough”—arguing that experimentation accelerates learning, traffic isn’t the only limiter, and discovery must be paired with real behavioral validation.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

Why AI is the biggest shift in experimentation (and why most teams still under-test)

The 4-step experimentation loop and where AI helps most

AI-assisted ideation: better ideas through better context (and ‘UX memory’)

Human-in-the-loop roles: PM vs data scientist vs AI analyst

Two waves of AI in experimentation: ML (2016) → GenAI (2022+)

AI targeting: predicting intent to personalize who sees what

Multi-armed bandit vs contextual bandit: performance vs personalization

Opportunity detection: learning from the 80% of ‘failed’ experiments

GenAI for experimentation: content generation and RAG assistants inside tools

From vibe coding to ‘vibe experimenting’: prompt-based experimentation concept

Live demo: idea → running experiment in ~2 minutes (prompt-to-variation)

Governance at scale: keeping design/engineering/data checkpoints—just faster

Beyond text prompts: generating variations from mockups and sketches

Measuring AI features: adoption, outcomes, experience—and the right North Star

Measuring RAG systems: accuracy, relevance, context quality (+ LLM-as-judge)

Experimentation culture: Booking.com example + common PM misconceptions

Get more out of YouTube videos.