Aakash GuptaAI for Product Managers: 10X Growth with Smart Experimentation
CHAPTERS
- 0:00 – 1:54
Why AI is the biggest shift in experimentation (and why most teams still under-test)
Aakash introduces Frederic De Todaro and frames the core problem: experimentation is powerful, but most organizations don’t A/B test most releases because the process is too slow and developer-dependent. Frederic argues AI is the biggest change he’s seen in his career and can make experimentation far more accessible.
- 1:54 – 14:12
The 4-step experimentation loop and where AI helps most
Frederic outlines a simple experimentation framework: idea/assumption → build variations → configure targeting & KPIs → analyze results. The discussion maps how AI can accelerate each stage, especially build and analysis, while keeping humans accountable for context and decision quality.
- 14:12 – 16:00
AI-assisted ideation: better ideas through better context (and ‘UX memory’)
They discuss how generative AI improves the ideation phase when it has strong context—product history, constraints, prior experiments, and even screenshots. Frederic highlights the value of connecting AI to an experimentation knowledge base so it can surface previously run tests and results, creating organizational “UX memory.”
- 16:00 – 21:08
Human-in-the-loop roles: PM vs data scientist vs AI analyst
Frederic clarifies division of labor: PMs provide business context and constraints; data scientists ensure methodological rigor, bias checks, and model choices; AI automates pattern finding and summarization. The group emphasizes that AI boosts speed, but humans are still needed for accountability and correctness.
- 21:08 – 26:56
Two waves of AI in experimentation: ML (2016) → GenAI (2022+)
Frederic gives a historical model: the machine learning wave improved targeting, traffic allocation, and analysis; the generative AI wave enabled content generation, assistants, and rapid creation of variations via prompting. The key leap is moving from tool-assisted testing to prompt-based experimentation that can ship in minutes.
- 26:56 – 30:05
AI targeting: predicting intent to personalize who sees what
They explain AI targeting as real-time intent scoring (likelihood to convert, churn risk, etc.) based on behavior signals. This score replaces manual segmentation and enables smarter targeting—like showing discounts only to users who need them rather than giving coupons to everyone.
- 30:05 – 31:55
Multi-armed bandit vs contextual bandit: performance vs personalization
Frederic distinguishes standard A/B testing from bandit approaches. Multi-armed bandits quickly shift traffic toward the best-performing variant (trading off statistical accuracy for speed), while contextual bandits learn which variant works best per user/context to power hyper-personalization—both typically requiring substantial traffic.
- 31:55 – 35:13
Opportunity detection: learning from the 80% of ‘failed’ experiments
Because most experiments don’t produce an overall lift, teams spend time drilling into segments to find where a variant might work. AI opportunity detection automates those breakdowns (device, segment, etc.) to propose actionable follow-ups like targeting the winning variant only to certain users.
- 35:13 – 41:35
GenAI for experimentation: content generation and RAG assistants inside tools
Frederic shares practical uses of GenAI: generating alternative copy for banners/popups and providing an in-product assistant to answer experimentation questions and generate SDK/feature-flag code. They briefly cover statistical approaches (frequentist vs Bayesian) and CUPED as examples of what an assistant can recommend.
- 41:35 – 43:36
From vibe coding to ‘vibe experimenting’: prompt-based experimentation concept
They contrast rapid prototyping (“can you build it?”) with experimentation (“should you build it?”). Prompt-based experimentation sits between the two: generate real, testable variations directly on production surfaces so teams can learn from actual behavior at scale, not just prototype feedback.
- 43:36 – 51:49
Live demo: idea → running experiment in ~2 minutes (prompt-to-variation)
Frederic demonstrates creating an experiment on an e-commerce catalog by prompting a change (default sorting to price low-to-high). The system extracts page context, identifies elements, generates JS/CSS, runs checks (including accessibility and mobile considerations), and produces a shippable variation rapidly.
- 51:49 – 54:17
Governance at scale: keeping design/engineering/data checkpoints—just faster
They discuss how larger orgs should use prompt-based experimentation without bypassing collaboration. The proposed approach: keep the same stage gates (design review, engineering code review, data/metrics review) but make them lighter through simulation links, code visibility, and in-context previews of what will ship.
- 54:17 – 1:07:18
Beyond text prompts: generating variations from mockups and sketches
Frederic shows how experiments can be generated from uploaded mockups or quick sketches, including creating net-new UI elements like a newsletter popup. The AI asks clarifying questions (e.g., image needs), generates assets, and produces the required front-end code to implement the experience.
- 1:07:18 – 1:10:10
Measuring AI features: adoption, outcomes, experience—and the right North Star
The conversation shifts to how to evaluate AI features themselves. Frederic recommends tracking adoption, task outcomes, and user experience feedback (thumbs up/down), plus value-focused operational metrics like time-to-live and reduced developer involvement; he shares Kameleoon’s North Star: daily experiments running.
- 1:10:10 – 1:13:52
Measuring RAG systems: accuracy, relevance, context quality (+ LLM-as-judge)
Frederic provides a practical measurement model for RAG assistants: accuracy (faithfulness), relevance (does it answer the question), and context quality (are retrieved documents current and useful). He describes using an LLM-as-judge approach to score and validate these qualities and warns against stopping at usage alone.
- 1:13:52 – 1:15:11
Experimentation culture: Booking.com example + common PM misconceptions
Frederic points to Booking.com as a model experimentation organization where nearly everything ships through tests, but advises adopting it step-by-step. He addresses common objections—experimentation slows delivery, lack of traffic, and “discovery is enough”—arguing that experimentation accelerates learning, traffic isn’t the only limiter, and discovery must be paired with real behavioral validation.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome