Aakash GuptaThis AI Expert's Method Will Change How You Do Customer Research
Aakash Gupta and Caitlin Sullivan on a rigorous, multi-step AI workflow for trustworthy customer research analysis.
In this episode of Aakash Gupta, featuring Aakash Gupta and Caitlin Sullivan, This AI Expert's Method Will Change How You Do Customer Research explores a rigorous, multi-step AI workflow for trustworthy customer research analysis Good AI research mirrors rigorous human research by separating analysis, verification, and synthesis rather than jumping straight to themes.
At a glance
WHAT IT’S REALLY ABOUT
A rigorous, multi-step AI workflow for trustworthy customer research analysis
- Good AI research mirrors rigorous human research by separating analysis, verification, and synthesis rather than jumping straight to themes.
- A Step 0 “context load” prompt onboards the model with business goals and product details to reduce wrong assumptions and instruction drop-off.
- Interview analysis is strengthened by per-participant extraction (e.g., value anchors and fragile points) followed by contradiction checks to prevent cherry-picking and hallucinations.
- Survey analysis should start with inductive coding before counting frequencies, then add calibrated emotional intensity ratings to prioritize what matters most.
- Agentic workflows in Claude Code can parallelize survey and interview analysis, output structured markdown deliverables, and cut analysis time dramatically—while still requiring audits and human judgment.
IDEAS WORTH REMEMBERING
7 ideasDon’t start with synthesis; start with granular analysis.
The workflow forces the model to comb through each file/response first (like a human researcher would) before summarizing themes, which reduces missed nuance and overconfident generalizations.
Separate “context loading” from task prompts to prevent instruction loss.
A dedicated Step 0 prompt onboards the model on goals and product/tier details and ends with “do not run analysis yet,” improving focus and reducing incorrect product assumptions.
Per-participant extraction creates traceability and better foundations.
Extracting value anchors, fragile points, quotes, and a churn/stability rating per participant replicates line-by-line human review and produces evidence you can later synthesize confidently.
Add a verification pass specifically designed to catch contradictions.
Having the model re-scan for conflicting statements (and defining what counts as a contradiction) prevents cherry-picking one narrative when the participant’s account is inconsistent.
For surveys, code first—then count.
Inductive open coding (with rules like mutually exclusive primary codes) produces a defensible codebook and prevents the model from miscategorizing or forcing responses into premature themes.
Calibrate rating tasks with definitions and few-shot reasoning.
For sentiment/intensity, the model must be shown what “frustrated vs angry” means and why; otherwise churn surveys become uniformly “negative” and not decision-useful.
Audit the model’s own work as a “CYA” step before presenting.
An audit pass (e.g., recheck all level 4–5 intensity items, spot code inconsistencies, recompute counts via code) often finds errors or exaggerations that would surface in stakeholder Q&A.
WORDS WORTH SAVING
5 quotesGood AI customer research and analysis actually looks like replicating the way that we do rigorous analysis as humans.
— Caitlin Sullivan
What most people do… is jumping straight ahead to synthesis, and that's exactly what we don't wanna do.
— Caitlin Sullivan
Internalize this only. Do not run analysis yet.
— Caitlin Sullivan
When we're working with survey responses or short customer feedback, we want to code first.
— Caitlin Sullivan
I’ll call this the CYA way to use AI. Cover your ass.
— Aakash Gupta
QUESTIONS ANSWERED IN THIS EPISODE
5 questionsWhat exactly goes into your Step 0 context doc (business goal, product tiers, ICP, hypotheses), and what do you intentionally leave out to avoid biasing analysis?
Good AI research mirrors rigorous human research by separating analysis, verification, and synthesis rather than jumping straight to themes.
In your per-participant interview step, why did you choose “value anchors / fragile points / stability rating” as primitives—what other primitives do you use for different research goals (activation, pricing, onboarding)?
A Step 0 “context load” prompt onboards the model with business goals and product details to reduce wrong assumptions and instruction drop-off.
How do you decide when a contradiction is meaningful signal (real behavior inconsistency) versus just normal conversational imprecision—and how should that affect synthesis?
Interview analysis is strengthened by per-participant extraction (e.g., value anchors and fragile points) followed by contradiction checks to prevent cherry-picking and hallucinations.
For inductive survey coding, how do you handle responses that genuinely contain two distinct reasons for churn if you require one mutually exclusive primary code?
Survey analysis should start with inductive coding before counting frequencies, then add calibrated emotional intensity ratings to prioritize what matters most.
What are the most common failure modes you see when PMs run AI analysis (e.g., forced categorization, quote fabrication, frequency errors), and which audit checks catch each one best?
Agentic workflows in Claude Code can parallelize survey and interview analysis, output structured markdown deliverables, and cut analysis time dramatically—while still requiring audits and human judgment.
EVERY SPOKEN WORD
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome