This AI Expert's Method Will Change How You Do Customer Research

AI for user research is unreliable. But Caitlin Sullivan, one of the world's leading experts in user research, knows exactly how to fix it. In this episode, she demos the complete workflow for analyzing surveys and interviews with AI, using Claude, Claude Code, and agentic workflows that cut analysis time in half without hallucinating. Complete write-up: https://www.news.aakashg.com/p/caitlin-sullivan-podcast ---- Timestamps: 0:00 - Intro 1:54 - What Good AI Research Actually Looks Like 8:22 - Step 0: Loading Context Into Claude 11:34 - Why Claude Is the Best Model for Analysis 16:12 - Step 1: Per-Participant Analysis Prompting 26:06 - Step 2: Verification & Contradiction Checking 34:51 - Survey Analysis: Why You Must Code First 46:18 - Adding Emotional Intensity Ratings 51:31 - Step 3: Auditing AI's Own Work 57:42 - Claude Code: The Agentic Parallel Version 1:09:01 - Final Output & Results ---- 🧠 Key Takeaways: 1. Replicate the human process - Good AI analysis mirrors how experienced researchers work: comb through data first, then synthesize. Never jump straight to "give me themes." 2. Use multi-step prompting - Load context in one prompt, run per-participant analysis in the next, then verify. Cramming everything into one prompt degrades quality. 3. Code before you count - For surveys, apply inductive coding labels to every response before asking for patterns. Skipping this step leads to miscategorized, unreliable results. 4. Always audit AI's work - Force the model to re-check its own analysis. It catches contradictions, overexaggerated intensity ratings, and miscoded responses regularly. 5. Claude wins on nuance, Gemini wins on frequency - Claude gives more thorough, complete analysis by default. Gemini surfaces top-frequency themes faster but misses smaller patterns. 6. Define everything explicitly - Quotes, ratings, emotional intensity levels, contradiction types. If you assume the model shares your definitions, you'll get inconsistent results. 7. Markdown files beat raw transcripts - Converting transcripts to structured markdown improves accuracy and helps you work around token limits on non-Max plans. 8. Parallelize with Claude Code agents - Set up agent markdown files for interview and survey analysis, then run both simultaneously. Cuts total analysis time in half again. ---- 🏆 Sponsors: 1. Maven: Get 15% off Caitlin’s courses with code AAKASHxMAVEN - https://bit.ly/4rHCCrb 2. Pendo: The #1 software experience management platform - http://www.pendo.io/aakash 3. Jira Product Discovery: Plan with purpose, ship with confidence - https://www.atlassian.com/software/jira/product-discovery 4. Kameleoon: AI experimentation platform - http://www.kameleoon.com/ 5. Amplitude: The market-leader in product analytics - https://amplitude.com/session-replay?utm_campaign=session-replay-launch-2025&utm_source=linkedin&utm_medium=organic-social&utm_content=productgrowthpodcast ---- 👨‍💻 Where to find Aakash: Twitter: https://www.x.com/aakashg0 LinkedIn: https://www.linkedin.com/in/aagupta/ Newsletter: https://www.news.aakashg.com Premium Bundle: https://bundle.aakashg.com Where to find Caitlin: LinkedIn: https://www.linkedin.com/in/caitlindsullivan/ Maven: https://bit.ly/4rHCCrb #aitools #userresearch ---- 🧠 About Product Growth: The world's largest podcast focused solely on product + growth, with over 200K+ listeners. 🔔 Subscribe and turn on notifications to get more videos like this.

Aakash GuptahostCaitlin Sullivanguest

Feb 12, 20261h 12mWatch on YouTube ↗

CHAPTERS

Why AI user research feels unreliable—and the human-style fix
Aakash frames the problem: AI analysis can be messy and hallucinatory. Caitlin sets the core principle for making AI reliable—replicate rigorous human research workflows instead of jumping straight to “insights.”
The rigorous workflow: analysis first, then verification, then synthesis
Caitlin breaks down what “good analysis” means across interviews, surveys, and short feedback. The goal is to force structured, stepwise work that mirrors how researchers manually comb through raw data.
Tools and scope: surveys + interviews, plus automation vs. agentic workflows
They outline what will be demonstrated: a Claude-based workflow in the web UI, and an “agentic” parallelized version in Claude Code. Caitlin distinguishes common automation tools (Zapier/n8n) from more advanced agent setups.
AI-moderated interviews: promise, quality variance, and why craft still matters
Caitlin explains AI moderation tools and the wide quality range in their questioning and bias handling. Aakash underscores that even leading AI labs can ship mediocre interviewing, reinforcing the need to learn research craft.
Step 0 (Interviews): load context separately to prevent instruction overload
Caitlin demonstrates a dedicated “context-only” prompt before any analysis. This reduces model overload and ensures the LLM understands business goals, product tiers, and domain specifics—like onboarding a new teammate.
Why Claude for analysis + structuring transcripts with markdown to beat token limits
Caitlin explains her preference for Claude’s nuance, while noting Gemini’s strength in frequency/accuracy. She also shares a practical tactic: converting transcripts into structured markdown to improve both token handling and analytical accuracy.
Step 1 (Interviews): per-participant extraction—value anchors, fragile points, stability
Instead of asking for themes, Caitlin forces Claude to analyze each participant file in detail. The prompt extracts retention drivers and churn risks with explicit rating definitions and quote rules, producing a granular foundation for later synthesis.
Verification step (Interviews): contradiction checking to prevent cherry-picking
Caitlin runs a stress test to find contradictions within each participant’s statements and adjust risk ratings. This catches subtle inconsistencies that AI might otherwise blur when it tries to tell a coherent story.
Survey analysis setup: why you must code open-text responses before counting
Switching to churn survey data, Caitlin warns against immediately asking for top themes or frequency counts. She introduces coding (qualitative labeling) as the necessary first step to avoid miscategorization and missed nuance.
Survey Step 1: inductive open coding with rules + compute counts safely
Caitlin demonstrates a structured coding prompt: one primary mutually exclusive code per response, clear criteria, and an emphasis on building a usable codebook. She also recommends doing math via code (e.g., Python/pandas) to avoid LLM arithmetic errors.
Add emotional intensity ratings: beyond sentiment to prioritize what to fix
Rather than generic sentiment analysis (where churn feedback is “all negative”), Caitlin adds an intensity layer to distinguish soft exits from genuinely angry detractors. She calibrates the scale with definitions and few-shot examples including the reasoning behind labels.
Audit Step 3 (Surveys): AI checks its own coding and intensity for overstatement
Caitlin runs an audit prompt to recheck high-intensity ratings and code consistency, surfacing corrections that would otherwise undermine credibility. The output shows recoding and downgrading exaggerated intensity levels, making conclusions safer to present.
Claude Code agentic version: parallelize interview + survey analysis and export outputs
Caitlin shows how to encode the workflow into Claude Code agents (“Interview Analyzer Lite” and “Survey Analyzer Lite”) that pull from reusable context files and run analyses in parallel. The agent workflow outputs structured markdown reports with executive summaries, per-participant detail, quotes, and traceability.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

Why AI user research feels unreliable—and the human-style fix

The rigorous workflow: analysis first, then verification, then synthesis

Tools and scope: surveys + interviews, plus automation vs. agentic workflows

AI-moderated interviews: promise, quality variance, and why craft still matters

Step 0 (Interviews): load context separately to prevent instruction overload

Why Claude for analysis + structuring transcripts with markdown to beat token limits

Step 1 (Interviews): per-participant extraction—value anchors, fragile points, stability

Verification step (Interviews): contradiction checking to prevent cherry-picking

Survey analysis setup: why you must code open-text responses before counting

Survey Step 1: inductive open coding with rules + compute counts safely

Add emotional intensity ratings: beyond sentiment to prioritize what to fix

Audit Step 3 (Surveys): AI checks its own coding and intensity for overstatement

Claude Code agentic version: parallelize interview + survey analysis and export outputs

Get more out of YouTube videos.