This AI Expert's Method Will Change How You Do Customer Research

Name: This AI Expert's Method Will Change How You Do Customer Research
Uploaded: 2026-02-12T00:00:00Z
Duration: 1 h 12 min 29 s
Description: Good AI research mirrors rigorous human research by separating analysis, verification, and synthesis rather than jumping straight to themes.

Aakash Gupta and Caitlin Sullivan on a rigorous, multi-step AI workflow for trustworthy customer research analysis.

Aakash GuptahostCaitlin Sullivanguest

Feb 12, 20261h 12mWatch on YouTube ↗

Replicating human research rigor with AIStep 0 context loading and instruction managementClaude vs Gemini vs ChatGPT for analysis tradeoffsPer-participant interview analysis (value anchors, fragile points, churn risk)Verification: contradiction checking and audit passesSurvey workflow: inductive coding → quantification → intensity ratingsAgentic/parallel workflows with Claude Code and markdown files

In this episode of Aakash Gupta, featuring Aakash Gupta and Caitlin Sullivan, This AI Expert's Method Will Change How You Do Customer Research explores a rigorous, multi-step AI workflow for trustworthy customer research analysis Good AI research mirrors rigorous human research by separating analysis, verification, and synthesis rather than jumping straight to themes.

WHAT IT’S REALLY ABOUT

A rigorous, multi-step AI workflow for trustworthy customer research analysis

Good AI research mirrors rigorous human research by separating analysis, verification, and synthesis rather than jumping straight to themes.
A Step 0 “context load” prompt onboards the model with business goals and product details to reduce wrong assumptions and instruction drop-off.
Interview analysis is strengthened by per-participant extraction (e.g., value anchors and fragile points) followed by contradiction checks to prevent cherry-picking and hallucinations.
Survey analysis should start with inductive coding before counting frequencies, then add calibrated emotional intensity ratings to prioritize what matters most.
Agentic workflows in Claude Code can parallelize survey and interview analysis, output structured markdown deliverables, and cut analysis time dramatically—while still requiring audits and human judgment.

IDEAS WORTH REMEMBERING

7 ideas

Don’t start with synthesis; start with granular analysis.

The workflow forces the model to comb through each file/response first (like a human researcher would) before summarizing themes, which reduces missed nuance and overconfident generalizations.

Separate “context loading” from task prompts to prevent instruction loss.

A dedicated Step 0 prompt onboards the model on goals and product/tier details and ends with “do not run analysis yet,” improving focus and reducing incorrect product assumptions.

Per-participant extraction creates traceability and better foundations.

Extracting value anchors, fragile points, quotes, and a churn/stability rating per participant replicates line-by-line human review and produces evidence you can later synthesize confidently.

Add a verification pass specifically designed to catch contradictions.

Having the model re-scan for conflicting statements (and defining what counts as a contradiction) prevents cherry-picking one narrative when the participant’s account is inconsistent.

For surveys, code first—then count.

Inductive open coding (with rules like mutually exclusive primary codes) produces a defensible codebook and prevents the model from miscategorizing or forcing responses into premature themes.

Calibrate rating tasks with definitions and few-shot reasoning.

For sentiment/intensity, the model must be shown what “frustrated vs angry” means and why; otherwise churn surveys become uniformly “negative” and not decision-useful.

Audit the model’s own work as a “CYA” step before presenting.

An audit pass (e.g., recheck all level 4–5 intensity items, spot code inconsistencies, recompute counts via code) often finds errors or exaggerations that would surface in stakeholder Q&A.

WORDS WORTH SAVING

5 quotes

Good AI customer research and analysis actually looks like replicating the way that we do rigorous analysis as humans.

— Caitlin Sullivan

What most people do… is jumping straight ahead to synthesis, and that's exactly what we don't wanna do.

— Caitlin Sullivan

Internalize this only. Do not run analysis yet.

— Caitlin Sullivan

When we're working with survey responses or short customer feedback, we want to code first.

— Caitlin Sullivan

I’ll call this the CYA way to use AI. Cover your ass.

— Aakash Gupta

QUESTIONS ANSWERED IN THIS EPISODE

5 questions

What exactly goes into your Step 0 context doc (business goal, product tiers, ICP, hypotheses), and what do you intentionally leave out to avoid biasing analysis?

Good AI research mirrors rigorous human research by separating analysis, verification, and synthesis rather than jumping straight to themes.

In your per-participant interview step, why did you choose “value anchors / fragile points / stability rating” as primitives—what other primitives do you use for different research goals (activation, pricing, onboarding)?

A Step 0 “context load” prompt onboards the model with business goals and product details to reduce wrong assumptions and instruction drop-off.

How do you decide when a contradiction is meaningful signal (real behavior inconsistency) versus just normal conversational imprecision—and how should that affect synthesis?

Interview analysis is strengthened by per-participant extraction (e.g., value anchors and fragile points) followed by contradiction checks to prevent cherry-picking and hallucinations.

For inductive survey coding, how do you handle responses that genuinely contain two distinct reasons for churn if you require one mutually exclusive primary code?

Survey analysis should start with inductive coding before counting frequencies, then add calibrated emotional intensity ratings to prioritize what matters most.

What are the most common failure modes you see when PMs run AI analysis (e.g., forced categorization, quote fabrication, frequency errors), and which audit checks catch each one best?

Agentic workflows in Claude Code can parallelize survey and interview analysis, output structured markdown deliverables, and cut analysis time dramatically—while still requiring audits and human judgment.

EVERY SPOKEN WORD

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

At a glance

A rigorous, multi-step AI workflow for trustworthy customer research analysis

Don’t start with synthesis; start with granular analysis.

Separate “context loading” from task prompts to prevent instruction loss.

Per-participant extraction creates traceability and better foundations.

Add a verification pass specifically designed to catch contradictions.

For surveys, code first—then count.

Calibrate rating tasks with definitions and few-shot reasoning.

Audit the model’s own work as a “CYA” step before presenting.

What exactly goes into your Step 0 context doc (business goal, product tiers, ICP, hypotheses), and what do you intentionally leave out to avoid biasing analysis?

In your per-participant interview step, why did you choose “value anchors / fragile points / stability rating” as primitives—what other primitives do you use for different research goals (activation, pricing, onboarding)?

How do you decide when a contradiction is meaningful signal (real behavior inconsistency) versus just normal conversational imprecision—and how should that affect synthesis?

For inductive survey coding, how do you handle responses that genuinely contain two distinct reasons for churn if you require one mutually exclusive primary code?

What are the most common failure modes you see when PMs run AI analysis (e.g., forced categorization, quote fabrication, frequency errors), and which audit checks catch each one best?

Get more out of YouTube videos.