This AI Expert's Method Will Change How You Do Customer Research

AI for user research is unreliable. But Caitlin Sullivan, one of the world's leading experts in user research, knows exactly how to fix it. In this episode, she demos the complete workflow for analyzing surveys and interviews with AI, using Claude, Claude Code, and agentic workflows that cut analysis time in half without hallucinating. Complete write-up: https://www.news.aakashg.com/p/caitlin-sullivan-podcast ---- Timestamps: 0:00 - Intro 1:54 - What Good AI Research Actually Looks Like 8:22 - Step 0: Loading Context Into Claude 11:34 - Why Claude Is the Best Model for Analysis 16:12 - Step 1: Per-Participant Analysis Prompting 26:06 - Step 2: Verification & Contradiction Checking 34:51 - Survey Analysis: Why You Must Code First 46:18 - Adding Emotional Intensity Ratings 51:31 - Step 3: Auditing AI's Own Work 57:42 - Claude Code: The Agentic Parallel Version 1:09:01 - Final Output & Results ---- 🧠 Key Takeaways: 1. Replicate the human process - Good AI analysis mirrors how experienced researchers work: comb through data first, then synthesize. Never jump straight to "give me themes." 2. Use multi-step prompting - Load context in one prompt, run per-participant analysis in the next, then verify. Cramming everything into one prompt degrades quality. 3. Code before you count - For surveys, apply inductive coding labels to every response before asking for patterns. Skipping this step leads to miscategorized, unreliable results. 4. Always audit AI's work - Force the model to re-check its own analysis. It catches contradictions, overexaggerated intensity ratings, and miscoded responses regularly. 5. Claude wins on nuance, Gemini wins on frequency - Claude gives more thorough, complete analysis by default. Gemini surfaces top-frequency themes faster but misses smaller patterns. 6. Define everything explicitly - Quotes, ratings, emotional intensity levels, contradiction types. If you assume the model shares your definitions, you'll get inconsistent results. 7. Markdown files beat raw transcripts - Converting transcripts to structured markdown improves accuracy and helps you work around token limits on non-Max plans. 8. Parallelize with Claude Code agents - Set up agent markdown files for interview and survey analysis, then run both simultaneously. Cuts total analysis time in half again. ---- 🏆 Sponsors: 1. Maven: Get 15% off Caitlin’s courses with code AAKASHxMAVEN - https://bit.ly/4rHCCrb 2. Pendo: The #1 software experience management platform - http://www.pendo.io/aakash 3. Jira Product Discovery: Plan with purpose, ship with confidence - https://www.atlassian.com/software/jira/product-discovery 4. Kameleoon: AI experimentation platform - http://www.kameleoon.com/ 5. Amplitude: The market-leader in product analytics - https://amplitude.com/session-replay?utm_campaign=session-replay-launch-2025&utm_source=linkedin&utm_medium=organic-social&utm_content=productgrowthpodcast ---- 👨‍💻 Where to find Aakash: Twitter: https://www.x.com/aakashg0 LinkedIn: https://www.linkedin.com/in/aagupta/ Newsletter: https://www.news.aakashg.com Premium Bundle: https://bundle.aakashg.com Where to find Caitlin: LinkedIn: https://www.linkedin.com/in/caitlindsullivan/ Maven: https://bit.ly/4rHCCrb #aitools #userresearch ---- 🧠 About Product Growth: The world's largest podcast focused solely on product + growth, with over 200K+ listeners. 🔔 Subscribe and turn on notifications to get more videos like this.

Aakash GuptahostCaitlin Sullivanguest

Feb 12, 20261h 12mWatch on YouTube ↗

EVERY SPOKEN WORD

65 min read · 13,477 words

0:00 – 1:54
Intro
1. AGAakash Gupta
  AI for user research is messy. It's unreliable. But what if I told you I know how to fix it?
2. CSCaitlin Sullivan
  There's a real art to doing this right, but if you do it the right way, then AI actually isn't gonna hallucinate on you, and you're gonna cut your analysis time in half. The key is actually replicating the way that you would do things in a rigorous way as a human, and just doing it like that in- with AI.
3. AGAakash Gupta
  This is Caitlin Sullivan, one of the world's leading experts in user research, and in today's episode, she's gonna give you all the tools, the demos, the workflows so that you can do amazing user discovery and customer research enabled with AI. These results genuinely shock. What does good AI customer research and analysis actually look like?
4. CSCaitlin Sullivan
  So I think there's actually a really simple answer, which is it looks like replicating the way that we do rigorous analysis as humans.
5. AGAakash Gupta
  You've chosen Claude. If people have the option to choose between all the AI models, why have you chosen Claude?
6. CSCaitlin Sullivan
  I keep coming back to Claude because by default it does a more thorough, more nuanced, uh, analysis than the other two platforms.
7. AGAakash Gupta
  And then there's two big types of analysis that people tend to run, surveys and interviews. What are we gonna do today on the surveys front?
8. CSCaitlin Sullivan
  I'm gonna demo how you can use any of the large language models to run a few key steps that I just talked about to, to get really solid results.
9. AGAakash Gupta
  Wow, that's insane. Before we go any further, do me a favor and check that you are subscribed on YouTube and following on Apple and Spotify podcasts. And if you want to get access to amazing AI tools, check out my bundle, where if you become an annual subscriber to my newsletter, you get a full year free of the paid plans of Maven, Arise, Relayapp, Dovetail, Linear, Magic Patterns, DeepSky, Reforge Build, Descript, and Speechify. So be sure to check that out at bundle.aakashg.com, and now onto today's episode.
1:54 – 8:22
What Good AI Research Actually Looks Like
1. AGAakash Gupta
  Caitlin, welcome to the podcast.
2. CSCaitlin Sullivan
  Hey, Aakash. Thanks for having me.
3. AGAakash Gupta
  What does good AI customer research and analysis actually look like?
4. CSCaitlin Sullivan
  So I think there's actually a really simple answer, which is it looks like replicating the way that we do rigorous analysis as humans. I'm not reinventing the wheel with pretty much anything I do. I have just taken what I know to be true, doing really solid research as an, an individual contributor, uh, as the head of user research, and replicating those processes with AI.
5. AGAakash Gupta
  Amazing. So let's break this down for people in terms of what the actual tasks are and what the actual tools are.
6. CSCaitlin Sullivan
  So analysis actually looks fairly similar, whether you're a- analyzing interview transcripts or survey responses or other kinds of short customer feedback. We're gonna use a similar set of kind of phases of analysis across all of them. So good analysis, rigorous analysis with AI or as an individual, as a human doing it, is gonna involve actual analysis, where you dig through data and you pick it apart, you look at what's in there closely, put it back together in different ways and see what pops at you, and then verification or stress testing, basically seeing how the findings you have hold up under pressure, and then finally synthesizing. But what most people do, both as humans and also especially with AI, is jumping straight ahead to synthesis, and that's exactly what we don't wanna do.
7. AGAakash Gupta
  Hmm. And then there's two big types of analysis that people tend to run, surveys and interviews. What are we gonna do today on the surveys front?
8. CSCaitlin Sullivan
  I'm gonna demo how you can use any of the large language models to run a few kind of key steps that I just talked about to, to get really solid results out of any analysis that you do with surveys or short customer feedback, and I'm gonna do that in Claude in the normal web UI, but you can replicate this process in ChatGPT or Gemini. I do that all the time as well.
9. AGAakash Gupta
  Awesome. And then we're gonna show you everything here in blue. So we're gonna cover the Claude analysis process, but also show you the agentic version with the ChatGPT Codex. You could use other tools like Zapier, n8n, and are people using those frequently in the fields?
10. CSCaitlin Sullivan
  So I'm gonna do the agentic part in Claude Code in my terminal actually, but, um, yeah, people are using the other tools. The agentic, the, the more advanced agentic tools I would say are not commonly used by most of my clients. But what a lot of people are already doing, because it's kind of more, a more familiar space, is automating a lot of these processes rather than making them fully agentic. So you can use n8n, Zapier, make.com, any of these automation tools to link up with your chosen large language model and sort of put some of these prompts in place to happen automatically for you.
11. AGAakash Gupta
  Awesome. And then the other side, of course, everybody knows are interviews. And so there are the human-moderated side of interviews. What are we gonna cover for people there?
12. CSCaitlin Sullivan
  I'm gonna basically replicate the same sort of process across survey analysis and interview analysis. What's worth knowing for interview analysis is that things can get really complicated really fast. AI is built to understand structured information best, so with really long transcripts and sometimes, you know, messy transcripts where speaker labels aren't particularly good, we can run into a lot of problems with AI. So I wanna show you how being systematic in terms of the sequence of prompts we use and the types of prompts we use can actually ensure that you get much better results even when we're dealing with a tougher space like interview transcripts.
13. AGAakash Gupta
  Awesome. So we're gonna use transcripts for ChatGPT and Claude. If you had a video-That is actually possible now with Gemini 3. We're not gonna demo that, but that is something to keep in mind. And then the agentic version, we'll show you guys how to do that in Claude Code and do that in parallel with the survey analysis. Now, a lot of people have been talking about AI-moderated surveys, so can you just talk to us about what these are, what the tools are, and how you might use these?
14. CSCaitlin Sullivan
  So there are already a lot of AI moderation tools on the market. There's a huge list of them. They are tools that run sessions with customers for you. So the idea is sort of that you can scale your team's ability to talk to customers by having all of these sessions run when you're sleeping or making your coffee or spending time with your family instead of having to line up, you know, 15 people to talk to every day. So as we've, uh, talked about before the, the recording started, there has been a little bit of discussion this past week where Claude, the team behind Claude, Anthropic, launched their own study where they're using an AI moderation tool of their own that has surprised some people, uh, in our space in terms of the quality of the questions that were being asked. So this space is a little bit tricky. There's a huge range of quality between the different AI moderation tools, to be honest. Some of them are asking really, really good questions and handling bias, not asking leading questions, and also doing decent analysis, uh, after collecting data. Some of them are not quite there yet, and I've, uh, seen the questions that Claude was asking in the Anthropic team study, and I also wasn't particularly impressed.
15. AGAakash Gupta
  Hmm. I can't emphasize this for you guys listening enough. In my 16 years of product management, one of the things I learned is to respect the craft of user research. Respect the people who are working on it in your company. Try to get some time shadowing the best PMs who do user research and the best user researchers in your company, because even Anthropic's interviewer couldn't do a good job, right? And so you wanna think about this as a skill that you're gonna continue to improve throughout your career. We could cover it all, but we don't think the AI interviewers are quite the most important thing for you to learn. So we're gonna cover what's in blue for you guys, how to analyze surveys, how to analyze interviews. So without further ado, can you get us started, Caitlin?
8:22 – 11:34
Step 0: Loading Context Into Claude
1. CSCaitlin Sullivan
  Yeah. Okay, so as I said, we have to go through multiple steps in the analysis process to get something really rigorous and reliable out. And I mentioned that we have to start with kind of picking apart the pieces of the data before we synthesis, uh, before we do synthesis, which is unfortunately what most people don't do. Most people jump straight to synthesis. Another piece of the, the puzzle here is, of course, giving Claude or any of these large language models enough context that it can do a really good job for us. And I know that this metaphor is really overused at this point of saying that it's like your, uh, junior assistant or something. I prefer to think of these models at this point not as juniors, but like someone who's new to your team. So if someone new joined your team, they don't necessarily have no skill in this space, but they don't really have the expertise in the domain or about your customers that you have. They might not even know exactly what kind of feature functionality you have built in if they haven't been fully onboarded. So the first step that I use in a lot of especially interview analysis is just loading context, and I make this an individual prompt rather than building it into anything else because large language models actually can be a little bit overloaded. If you try to paste a two-page long prompt in one go, a lot of the models will still have trouble handling that, and you'll see that they drop some of the instructions even if you said, "This is really important." So I'm just gonna copy my first prompt here, which is only about context.
2. AGAakash Gupta
  If you're enjoying this masterclass in AI analysis, you will love Caitlin's cohort-based course. This runs for the month of March, where Caitlin will walk you through not just these first level workflows that we've shown here, but the next layer deeper of complex workflows, the agent markdown files, everything you need. She will help you get to the next level of craft using AI for discovery and user research. For my money, it's one of the most important skills for PMs, but not just PMs, designers, engineers, go-to-market people, and obviously user researchers. So if you enjoy what she's saying now, take her course. You will learn even more. She'll walk through all of this in detail. And the best thing is you can get a discount on the course using my code, link in the show notes. So check her course out and back to today's episode. Today's podcast is brought to you by Pendo, the leading software experience management platform. McKinsey found that 78% of companies are using gen AI, but just as many have reported no bottom line improvements. So how do you know if your AI agents are actually working? Are they giving users the wrong answers, creating more work instead of less, improving retention or hurting it? When your software data and AI data are disconnected, you can't answer these questions. But when you bring all your usage data together in one place, you can see what users do before, during, and after they use AI, showing you when agents work, how they help you grow, and when to prioritize on your roadmap. Pendo Agent Analytics is the only solution built to do this for product teams. Start measuring your AI's performance with Agent Analytics at pendo.io/aakash. That's P-E-N-D-O.I-O/A-A-K-A-S-H. And obviously you've chosen Claude. If people have the option to choose between
11:34 – 16:12
Why Claude Is the Best Model for Analysis
1. AGAakash Gupta
  all the AI models, why have you chosen Claude?
2. CSCaitlin Sullivan
  I've chosen Claude. It's my favorite. It has been for a long time, and I keep checking myself on that, doing tests across the three major ones, meaning Gemini, Claude, and ChatGPT. And I keep coming back to Claude because by default it does a more thorough, more nuanced, uh, analysis than the other two platforms. I will say, though, I do like Gemini quite a lot because it seems to be fine-tuned a bit more for accuracy and focusing on frequency of findings. So you're more likely to get kind of a top five, the, uh, themes or patterns from Gemini really quickly that are solidly evidenced than you will in, uh, Claude or ChatGPT just because that's kind of what it lifts up. But you'll have to push Gemini more than you have to push Claude to give you the complete picture and all of the smaller themes or the outliers and things like that.
3. AGAakash Gupta
  Okay. Excited to see how Claude does.
4. CSCaitlin Sullivan
  So I'm, with my context prompt here, I'm just gonna-Get all of my files ready here. I'm just dragging them over. And I want to quickly point out as I send this context prompt that I'm using markdown files, and this is important because in Claude, if you don't have a Max account like I'm using right now, if you have Pro even, you've probably experienced uploading a bunch of transcripts and hitting this token wall. Even though Claude's token limits have been expanding and expanding and, you know, it can handle a lot more at once now, you can kind of circumvent that issue by transforming your interview transcripts into markdown files. But this is something that I teach in my course, because you have to do it right. You have to do it in a way that really keeps, maintains all of the most important information from those transcripts and carries them over into markdown files. But what you get out of a markdown file instead of just using the transcript is you've turned it into a format that's more structured than the original transcript is, which means that it's not just better for file handling, it actually helps these models do a more accurate job as well.
5. AGAakash Gupta
  Love it. And just the high level, like what does that transformation into markdown look like? What do people need to focus on?
6. CSCaitlin Sullivan
  You're basically telling it to take certain types of information, and it's not gonna take everything from the transcript. There's a lot of bloat in your transcripts. Not every conversation necessarily needs to be carried over. Some people will argue with that, but for, especially for a lot of the sort of product research that I've done, you have a, a, a specific focus. You're not always doing exploratory research. So you can usually have a large language model with you comb through a transcript and extract the right kinds of information that are most relevant to the topic areas you're exploring or the hypotheses you have, and turn that into a document that already has tagging or, uh, groupings of conversations under certain headers and things like that, putting in metadata, a little bit of extra coding that makes it easier for the model to, to make sense of it.
7. AGAakash Gupta
  And then how would you break down this context prompt? What are the key elements here?
8. CSCaitlin Sullivan
  So it looks like a lot at first glance, I realize, 'cause some people share a half of this or a third of this within a, a prompt with actual tasks. But I would call out what's most important here is I do have a bit of project context. I have a business goal, reduce churn. The percentage is kind of arguable here, though I would say in my experience, saying 15% versus 50%, like five zero, will make a difference. And then I have product context. So just like I was saying about onboarding this new team member, I want to make sure that it fully understands what kind of functionality is involved in the service that we offer, what the differences are between the product tiers, so that in its reading through of people's experiences or comments about the product, what they're using, it fully understands what kind of functionality would actually be present and what, what they would be using when they have that experience. Otherwise, if we don't include something like this, I've seen huge, hugely wrong guesses, you know, gross, uh, assumptions by large language models about what someone has as part of their account, kind of making things up or just, um, trying to piece together bits of information it had that were entirely wrong.
9. AGAakash Gupta
  Mm-hmm. Point it in the right direction of what your product is actually doing in that case makes sense.
10. CSCaitlin Sullivan
  Exactly. Yeah. What's important
16:12 – 26:06
Step 1: Per-Participant Analysis Prompting
1. CSCaitlin Sullivan
  here, though, is I said that for a context prompt, we actually are just loading context. So at the end, I said, "Internalize this only. Do not run analysis yet." It's really important that it... you don't let it jump ahead and start doing tasks, because I haven't given any instructions here for how, how to do analysis. So it hasn't done analysis yet. I'm just gonna jump to the next prompt and send that now.
2. AGAakash Gupta
  And I think this is a really cool lesson for people. I don't think many people have done this two sept, two-step prompting, right?
3. CSCaitlin Sullivan
  Yeah.
4. AGAakash Gupta
  First step, context. Second step, analysis. Most people try to shove it into one prompt.
5. CSCaitlin Sullivan
  Exactly. But this will get you much better results. I mean, it will, it will basically help the large language model focus on one thing at a time, which means that each individual step will be done more thoroughly.
6. AGAakash Gupta
  Mm-hmm.
7. CSCaitlin Sullivan
  So let me start working on it, on this step, so that I can talk through it at the same time, 'cause it will take a minute. What's really important here is I've started analysis, but I'm not saying, "Hey, go find me some themes and patterns." So what I said earlier about starting rigorous analysis means doing actual analysis and not synthesis. It actually means that we need to tell it to go look at every single file that I just uploaded. I'm basically asking it to pull out for each participant value anchors, and I'm telling it that means what's keeping them subscribed, and then fragile points here, the formatting is a little wonky. What frustrations exist de- despite subscription. And I'm spelling things out in detail. So where I say I want it to give ratings, right, like how strong, how big a deal is this? How, how strong a fragile point is this? I'm explaining what a high fragile point means. Like, what is a high rating? So I'm not letting it make any assumptions here. I'm trying to be pretty explicit in every case about how I would rate things. I've also started this prompt with quote selection rules, which seems a little funny, because I think most people just assume large language models know what a quote looks like. But if you prompt in Gemini and then you prompt in Claude, you will see that they interpret quotes [chuckles] completely differently. So Gemini will give you much shorter snipper, snip... Gemini will give you much s- shorter snippets of customer statements, and Claude will give you, you know, a longer sentence or even a short passage. So I like toYou know, avoid making the assumption that, that Claude in this case has the same understanding of quotes that I do, and I spell out what a quote looks like to me.
8. AGAakash Gupta
  Very interesting. Yeah, I've also noticed that sometimes the AI will, like you've had here, take away some of the hedges and qualifiers. It'll try to, like, clean up the quote as if it were cleaning it up for a news article. But in user research case, we don't want to clean up the quote, so we're giving that instruction to the AI.
9. CSCaitlin Sullivan
  Yeah, and in that case, you could even line, add a line here that says something like, "Leave ums, uhs, pauses," um, you know, fumbled language, things like that. I sometimes include that in here too to keep things exactly as they are with the customer's voice.
10. AGAakash Gupta
  And can you define some of these, these words for me 'cause I'm not totally familiar with them or I can imagine a lot of the audience is trying to guess at exactly what they mean. Value anchors, fragile points, stability rating, what are those... Those seem like cool terms of art in the user research discipline people could benefit from learning.
11. CSCaitlin Sullivan
  Let me give you a background quickly about the, the data I'm using here too. So I have eight interview transcripts from a synthetic user study that I created based on an, a study that I actually ran, and it's all about improving retention and decreasing churn. So these are all people who had a premium tier account on a fictional app called Flow that is a meditation and anxiety management app that also has on-demand therapy. And so what I'm doing here is I'm trying to figure out, I'm doing a bit of a retention analysis, where I'm trying to figure out what's keeping people subscribed and then what's likely to break their experience. So what frustrations exist despite subscription basically means the flip side of the things they're finding really valuable about the service, what could actually push them to cancel anyway? And I want to collect both sides of that, because I'm not assuming that when someone is super happy, they're just happy. No one is so one-dimensional as that as a customer or as a, a human being. So I want to know, okay, if they're happy about using the service and they continue to pay, that's great, but what would cause them to churn anyway if it continued or increased over time?
12. AGAakash Gupta
  Makes sense.
13. CSCaitlin Sullivan
  So we have some results here, and I've told it to put this into a table, so it followed my instructions really well. Basically, this looks like, you know, the, the value anchors are in a table here for, for participant one. This is by participant, so this is the first participant here. Value anchors, then fragile points, and then I also ask for a stability rating, so overall how risky or how much risk is there that this person will churn.
14. AGAakash Gupta
  Hmm.
15. CSCaitlin Sullivan
  And then I have a few key themes that I asked it to call out too, and then we start participant two, and it just looks like that for every participant. So what I said earlier about making sure we actually pick through the data and kind of look at it closely first is what I've forced the large language model to do here. But if you jump to asking for themes, patterns, pain points, whatever it is, you're actually forcing it to jump into, like, synthesis mode instead of real, real analysis, picking through the data mode. So I usually start with something like this, where I force it to look at every file and kind of look for specific pieces of information so that it actually combs through properly.
16. AGAakash Gupta
  And that's replicating the human process. That was the high level advice you gave us at the beginning, which is we're always gonna analyze the specific interviews, so let's not have the AI jump that step. Let's have it analyze those first.
17. CSCaitlin Sullivan
  Exactly. So as a very experienced user researcher, you always comb through. You know, we used to just sit with one interview transcript and look through it manually, every line, copying and pasting into Miro or Sticky Notes or whatever, and so I'm basically replicating that process here.
18. AGAakash Gupta
  Awesome.
19. CSCaitlin Sullivan
  So the kind of information that we get here is gonna be much more granular, much more specific. It hasn't really, other than this bottom section here, it's not mapping things into themes. It's really just picking up the, the foundational elements that a, a really good analysis will be built on. So we have things like this full passage about someone who is doing breathing exercises with this app every day, usually between meetings and so on, all of the things that she's using. And so the strength of this in terms of stickiness with the product is strong because there is daily integration into the work routine, multiple features used regularly, and so on. Let me see, what else is there? Uh, it's maybe more than what she expected 'cause she thought it would just be meditation, but it's become this whole system for managing my new role. So dealing with stress at work is kind of now dependent on this app. That's a good sign. And then the therapy sessions can be booked spontaneously. I don't love that this means I won't always talk to the same person, so there's something interesting there. It's labeled this as moderate. I would almost say, okay, well the, you know, this is where my perspective comes in because I don't just have AI do analysis without thinking about it myself. But this seems very positive, and the rest is starting to get a little bit more into the fragile points. So this is the sort of thing that I want to be pulled out, because if I were doing this as a researcher, I would be pulling out all of these quotes myself and then trying to categorize them, label them, give them ratings, and so on. So this is really what it looks like to do good analysis, whether you're doing it with AI or not, but it's doing it much faster for me.
20. AGAakash Gupta
  And there are a lot of tools that I think help with this. Um, when I was at a firm, we used a tool called Dovetail. What is your take on some of those tools, and which ones do you like?
21. CSCaitlin Sullivan
  So I use a few that are less talked about. The two that I've continued to return to because I think they're doing a particularly good job are Breda.ai, I can send you that for the show notes, and Reveal. They have completely different approaches, and I think there's so many different tools out there that really do things just very differently than each other, so it's hard to compare them all kind of apples to apples. But Breda is getting to a pretty good nuanced level of analysis fast because they have really good prompting in the back end. And RevealMaps the customer comments in a bit of a grid and does a bit of hypothesis testing for you, so it lays things out in a way that you, you know, kind of like this. I'm almost mimicking what they do as well, being able to see things yourself and make your own comparisons in Reveal versus just having a f- a finished analysis for you. Dovetail, I honestly hear mixed reviews about from, from researchers especially, and I have had mixed experiences with it, so I haven't tested it in about five months, so I wouldn't dare to kind of give it a final rating right now. But a tool like that is really meant to get you to insights as fast as possible, but you don't have control in any of these platforms over exactly what you're looking for and exactly the type of analysis that you want it to do. So I tend to stick to the large language models for the most part, unless the time to insight is so e-extreme that I, you know, feel better uploading things into a platform and having it do that for me, but it's rare.
22. AGAakash Gupta
  So the platform is gonna take some of this control
26:06 – 34:51
Step 2: Verification & Contradiction Checking
1. AGAakash Gupta
  that we're getting from these prompts, from this step-by-step prompting process out of your hands.
2. CSCaitlin Sullivan
  Yeah.
3. AGAakash Gupta
  And that's the power of this workflow. We're kind of showing you guys the advanced-- This is what an advanced user researcher would do.
4. CSCaitlin Sullivan
  Yeah. And obviously, the, the teams behind all of those tools have worked tirelessly on those prompts. They've also worked with the team at Maze to kind of develop the prompts behind their AI analysis and AI moderation. So I know what it looks like behind the scenes, and they're all working very hard to kind of figure out the right prompting sequences to do this for you. But I prefer, as someone who's used to doing research especially, I prefer to kind of have more control over the, the granularity in the analysis or focusing it on a retention analysis in a certain way and so on, over using a tool to do it for me.
5. AGAakash Gupta
  Makes sense. So is there a step three now prompting?
6. CSCaitlin Sullivan
  There is a step three.
7. AGAakash Gupta
  So I had said two-step before, but it's actually a three-step prompting process.
8. CSCaitlin Sullivan
  It's three-step here, and I would say it's four if synthesis is added. So the context is almost like a Step 0, I would say. You kind of just load the background information. So I would call this three steps in this workflow, which is condensed compared to what I usually do. So for the most rigorous possible version of this, you have more steps because you need more feedback loops with AI. But you'll see one version of this here. Let me just space this out so people can see this more easily. Just gonna let it do its thing while I talk. So this is what I referred to earlier as a stress testing or verification step. We need this because even if you think everything looks really good, which, you know, most people go, "Oh, AI did this analysis for me and it looks really solid," it's usually not as solid as you think it is. And what happens down the line is then a product manager or a designer or researcher, you know, presents some findings in a meeting and then can't trace them back or realizes there are actually lots of holes in this insight or set of insights. But if you push your model through an additional step for verification, or I usually call this some kind of audit step, then it very often catches its own mistakes. And I get questions all the time from people who are really rigorous or kind of know that these models have a lot of bias built in. They ask me, "Yeah, but how is it gonna critique its own work, you know, its bias toward its own work?" The truth is they, they do find mistakes. They find mistakes all the time, and I've tried this kind of splitting the process between tools too, but this actually using it in the same model that you're doing analysis in does find issues. So let's see if it, if it found some things. So yeah. Um, so in this particular prompt... Actually, let me bring this back up again. This version of a verification check, I'm asking for contradictions in user statements. So it's very common that one person in a transcript will be telling a story one way, like they say, you know, "Well, first I got my shopping cart when I was going grocery shopping, and then I looked for the apples or whatever" in a certain order. The second time around, in a different part of the conversation, they tell the story of the user journey in a completely different order. Or they count- contradict, contradict whatever they said earlier in some way. "Well, you know, I use this every day." The equivalent of like, "I go to the gym every day" or five times a week, and then, "Oh, when was the last time you went to the gym?" "Oh, well, it was probably Saturday," and it's Wednesday today, right? So very often, if you're doing a good interview, you find some contradictions in what someone does and says or what someone said the first time around and then what they said the second time around. But AI is very likely to blur that, kind of cherry-pick the stories that it likes or that match what you asked for mos-most closely, and forget about or ignore the second thing they said that is actually the opposite. So I'm putting it through this pass where I'm checking for contradictions, and that will make the full analysis in the end much more bulletproof, kind of preventing the cherry-picking of the story that best aligns and getting the full picture. So to spare you the details, I think people can look at this on the screen, stop the video if they want to. But I'm basically asking for, for contradictions, and I'm defining what contradictions look like to me. I've even spelled out what is not a contradiction and a few examples. So I'm consistently giving examples, definitions, and then what that looks like to me to be really, really clear. And just so you see a few contradictions here. So something was re-revised, the rating was revised from, let's see, original... No, this is confirmed, sorry. This one is moderate risk-In the first place, like at risk of churning, and then it was revised to high risk, so upgraded, because it double-checked what the person s- you know, was saying, w- the stories they were telling, and actually realized that there's more risk involved. So this is the sort of correction that I'm looking for. I wanna make sure that it has combed through the data again, made sure that the claims, the statements that it made were actually verified.
9. AGAakash Gupta
  You know what kills momentum? Spending 30 minutes on a brainstorm with Claude and then copy-pasting it into a slide tool to create the slides. That loop is dead. Gamma now works inside Claude through Claude Connectors, so you can go from thinking to a finished presentation without ever leaving the conversation. Here's what that actually looks like. You're brainstorming a product strategy in Claude. Mid-conversation, you say, "Turn this into a deck." Gamma generates it right there. You keep refining your thinking, and the presentation updates with you. No copying, no pasting, no starting over in another tool. And it gets better. Connect Gmail, Notion, or GitHub alongside Gamma, and Claude will use those connections to pull live data straight into your slides. Weekly business reviews, feature PRDs, all generated from the data you're already using. Your deck finally keeps up with your ideas. Try it now. Connect Gamma to Claude at gamma.app. Go from thinking to presenting in one flow. Claude helps you think, Gamma helps you show it. Today's episode is brought to you by the experimentation platform Kameleoon. Nine out of 10 companies that see themselves as industry leaders and expect to grow this year say experimentation is critical to their business, but most companies still fail at it. Why? Because most experiments require too much developer involvement. Kameleoon handles experimentation differently. It enables product and growth teams to create and test prototypes in minutes with prompt-based experimentation. You describe what you want, Kameleoon builds a variation of your webpage, lets you target a cohort of users, choose KPIs, and runs the experiment for you. Prompt-based experimentation makes what used to take days of developer time turn into minutes. Try prompt-based experimentation on your own web apps. Visit kameleoon.com/prompt to join the waitlist. That's K-A-M-E-L-E-O-O-N.com/prompt. Today's episode is brought to you by Amplitude. Replays of mobile user engagement are critical to building better products and experiences, but many session replay tools don't capture the full picture. Some tools take screenshots every second, leading to choppy replays and high storage costs from enormous capture sizes. Others use wireframes, but key moments go missing, creating gaps in your understanding. Neither approach gives you a truly mobile experience. Amplitude does things differently. Their mobile replays capture the full experience, every tap, every scroll, and every gesture, with no lag and no performance hit. It's the most accurate way to understand mobile behavior. See the full story with Amplitude. Mm. So this is how we increase the certainty that it hasn't hallucinated, and it's giving us the right analysis based on the quotes.
10. CSCaitlin Sullivan
  Exactly. So it's a bit of an analysis check in the first place. It's also an exaggeration check in some cases. You can run an audit to make sure that you're staying as close to the data as possible, that it, no statements were made by AI in the process that are a bit over or under-exaggerated. So you can use it for lots of different things. This is just one version of it.
11. AGAakash Gupta
  Excited to see where we go from here.
12. CSCaitlin Sullivan
  So I would end this process for this kind of very condensed workflow version with synthesis. I think, if it's okay with you, I would just skip a synthesis step because I think most people know how to do synthesis. Most people would just say, "Summarize this," or, "Give me the key themes, the top three action items," that sort of thing. And I would hop into survey analysis. Does that work?
13. AGAakash Gupta
  Let's do it.
14. CSCaitlin Sullivan
  Okay. So let me open a new
34:51 – 46:18
Survey Analysis: Why You Must Code First
1. CSCaitlin Sullivan
  window.
2. AGAakash Gupta
  The key takeaway here, what we did is we didn't jump into synthesis. We did this, whatever you wanna call it, two, three, four-step process to make sure that we increase the certainty in the results, in the quality of the analysis, and replicate the human process.
3. CSCaitlin Sullivan
  Exactly. So this is what I'm dealing with here. This is a very basic churn survey, but pretty similar to what I've, uh, ended up working with in the past. We've got some automatically logged information. We've got a column like Reason Text, where the, the customer chose a category from a dropdown, and these are a lot of other categories, so this isn't particularly well organized in the first place. Then we have Reason Detailed Text, and this was an open text field where people could just say something about why they were canceling. Pretty basic. And then we have what tier of service they were on at the end. So that's the file that w- I will be uploading now.
4. AGAakash Gupta
  Like what all of you guys have.
5. CSCaitlin Sullivan
  So let me get my file in here. Just dragging it over. And what most people would do here is similar to what I said before, kind of toss in a CSV file, say, "Show me the biggest pain points from the column with the open, open responses," or, "Count all response types in this other column," where, you know, we had the, the type of cancellation logged. But that's actually not what I wanna do here, because a lot can go wrong if you start there. Similarly to the interviews, just jumping into pattern finding, a lot can go wrong, a lot can be missed, and a lot can be miscategorized. So when we're working with survey responses or short customer feedback, we want to code first.
6. AGAakash Gupta
  And for people who don't know, what does coding mean? It's not software engineering.
7. CSCaitlin Sullivan
  No, definitely not. I'm not a particularly technical person, so definitely not that type of coding. We're talking about applying labels to things, putting things in buckets that help us make sense of them. So normally, if you're doing data analysis, you're looking at all of these responses, and there's an open text response field, like, you know, a customer feedback form or feature request form, that sort of thing. You have to kind of make sense of what people are talking about in those message fields, uh, in those messa- message responses. AndLabel them with something before you can start grouping and saying, "Well, most people complain about this," or, "Most people are requesting this." That coding needs to have an approach, right? You need a bit of a methodology for the coding in the first place. You can do inductive coding, where you look at the data available, and you try to make sense of it from what's there, or you can do deductive coding when you already have a taxonomy, you already have parts of the product, and you wanna, you know, label all of your data with those, those features or, um, customer types or things like that. So you might already have something that you like to use as labeling, but in this case, I'm using inductive open coding, where I'm saying, "Let themes emerge from the data rather than forcing preset categories." And what's important about this, I very often use inductive coding, at least at first, because I let AI try to see what I may not have noticed about the data and apply codes from a more open perspective first. If you just apply your own tags and say, "Here's the list of tags I want you to code this with," what very often happens is it will start force fitting. It will take the quotes that it sees and try to make them make sense in the set of kind of required buckets that you gave it, and that doesn't lead to the most accurate coding and analysis.
8. AGAakash Gupta
  Mm-hmm. Makes sense. So this goes back to the craftsmanship of research that you wanna be applying, and so we have this first step here where we code.
9. CSCaitlin Sullivan
  And I'm gonna let it start working 'cause it will take a couple of minutes on this. But I also have rules here, so I'm not just saying, "Go ahead and code this however you want, you know. Be open. Find some interesting things for me." I'm also saying, "Each response needs one primary code. They need to be mutually exclusive, not overlapping." I want really tidy little boxes. I want all of the quotes to be in one box, categorized, right, in a way that I can make sense of later, not having too many kind of messy, overlapping things. This is really up to the, the person doing the analysis with AI. You could, you know... Anyone watching could decide you would take a completely different approach, and that's fine. But what's most important here is just that there are rules, that there are criteria, that there are kind of lines that we need AI to paint within. We don't just kind of let it run this however it feels like running it.
10. AGAakash Gupta
  Mm.
11. CSCaitlin Sullivan
  And one last thing while it's still working on this, most likely. Yep. I finish with a line that is maybe becoming obsolete, but not quite yet. I've always included this as a bit of a fail-safe. In the past, all of the models have made mistakes with math calculations, and a lot of the time, the calculations on a data set like this come from sometimes running tasks with math in text instead of in code, which is a really goofy thing, because we know that all of these tools have been coding for many, many months at this point. But if, you know, they are originally text-based machines, and so I've seen so many cases in my students' projects that they've submitted and in my own work of saying, you know, "Calculate how many people said this," and then seeing that the answer is different if you prompt it three times because it doesn't do it in, in an actual math equation.
12. AGAakash Gupta
  Mm-hmm. Yeah. [laughs] It's... They're notoriously bad at math unless you tell them, "Code. Use coding form."
13. CSCaitlin Sullivan
  Yeah. So I'm hoping I can remove that in my latest tests with all the models. I, you know, got really consistent results without having that in as well. But I still am a little anxious about taking it out, so I, [laughs] I just leave it in all of my prompts. It's better, better safe than sorry.
14. AGAakash Gupta
  Good takeaway for anybody watching. If you wanna be safe on your math, tell it to code it.
15. CSCaitlin Sullivan
  Yeah.
16. AGAakash Gupta
  What I love about Claude these days is how much of the thinking is exposed, and how much planning it's doing, and how much step-by-step, and how you can even click into some of those and just watch it work. I find myself just, like, getting fascinated with this and getting better at coding and stuff like [laughs] that by doing it.
17. CSCaitlin Sullivan
  Yeah, and you don't really need to be a Python expert here to just see, you know, how it's thinking through things like team-specific issues. It's already kind of providing an overarching code group here. Invoice reimbursement, primary, primarily a team issue, and then, you know, logging. If you find any of these terms, then code it like that, right? So it, this is how it's thinking through, "I found these topics, and this is how I'm gonna start coding them."
18. AGAakash Gupta
  Seem to have brought a lot of the Claude Code functionality into the Claude user interface, so if people have just been stuck in Claude Code, the user interface version has gotten quite a bit better in the last couple months. People are curious. It's almost always gonna import Pandas, which is, like, a really common data science library. So you, as you watch this think and learn, you get to learn a little bit more deeply too.
19. CSCaitlin Sullivan
  Yeah, exactly.
20. AGAakash Gupta
  And so if you were doing this as a human, what would've been the process?
21. CSCaitlin Sullivan
  More or less the same. I mean, again, what I'm showing you is a bit of a condensed process compared to what I would usually do. I have more steps involved. So one way I might have done this, as a human or with AI as well, is actually to break this prompt into multiple steps. So human-wise, right, I would be reading all of the responses. If we didn't have a code book already, if we didn't have a taxonomy or a set of tags that we wanted to apply, and we really were doing this from inductive open coding, I would read through all the data or, you know, a sample of it at least, and start coming up with the labels that I think most apply from what I see. So that's this creating a code book. Like, what are the labels that I'm, I'm identifying the codes? Why do I think that they should exist? What's the definition of them? How would we know if something else should fit into this group or not? Giving some examples, and then applying that code to other responses. You could even say in between these two, there's a step of testing it, like taking another sample from the data set and checking, do these codes that I just came up with actually apply to this new sample, or am I missing something, right?
22. AGAakash Gupta
  Mm-hmm.
23. CSCaitlin Sullivan
  So you're doing this kind of chunk, look at it, create codes, apply it to the next chunk. Did that fit? Yes or no? Okay, iterate. Keep going. And this is really how I would've done things manually as well. So I've just laid out the steps that I would follow, and I've given them to Claude. But here, then, is where things do get a little bit better if you have a slightly more complex, stretched out process where you separate these steps into individual prompts, like first going through the data and, and thinking about what the code book might be, and then I could give it another set of data and say, "Do they still fit for this?" for example, and so on.
24. AGAakash Gupta
  Nice. Okay. And looks like it's got a response for us.
25. CSCaitlin Sullivan
  It's got a response. I'm gonna close this version 'cause it has things down here, hopefully. Yeah.It's still generating, but so dominant churn driver is clear, 23.5%. What I actually am more interested in is this. So there's some quantification. I want to see the, the code. Let me see. Usually it gives me a codebook.
26. AGAakash Gupta
  This is already so much more than I would have done as a PM. This is cool.
27. CSCaitlin Sullivan
  So you've got a bit of the codebook in here, and what's funny is it didn't give me a codebook. Usually it gives me a codebook. But it has basically coded everything again, so I have a downloadable version of my CSV file with all of the codes applied to it. So every single line, every row is now coded. Um, and then I have my codebook here. I think this is the codebook. Usually from this sort of prompt, it gives me a codebook, it gives me the recoded CSV file, and then it also gives me its kind of summary of the analysis. So that's what this is. What I actually don't need yet, from my perspective, is this full analysis, but I want the quantification. I want to see how it... like, first how it coded things, the codes that it came up with, to see do I agree or do I feel like it coded things in a funny way that I want it to iterate on. And then I want to see the how, you know, how many responses it coded as each, each of those codes. So missing favorites or save functionality, because this fictional app didn't have a way to kind of save your favorite meditation sessions so you can return to them. This was a big chunk, so 23.5%. And yeah, that's, that's like a clear that rises to the top, and then we have some other things like switch to competitor, content gaps didn't resonate. But to be clear, what's really important about the way that I structured this prompt is actually the order of things. I'm forcing it to go through the coding and the analysis in this order. I'm not saying, "Hey, provide frequency counts." I'm saying, "First, look at the data, then create the codes, then apply them, and then decide what the frequency counts are." Because if you jump to give me some patterns, it hasn't coded anything or made sense of it in a way that, you know, we would agree with or make sense to us before it starts counting things, and that doesn't make sense.
28. AGAakash Gupta
  Hmm. I think this is a more generalized advice for anybody watching. This step-by-step prompting is how you get the AI to follow your human process, and that's our North Star here.
29. CSCaitlin Sullivan
  Yeah. So let me jump into the next step. You can keep asking me questions about that if you want to, but just to keep things moving, copy-pasting prompts doesn't-
30. AGAakash Gupta
  What would you call this step? Yeah, if the last step was coding.
46:18 – 51:31
Adding Emotional Intensity Ratings
1. CSCaitlin Sullivan
  I'm ac- actually adding a little bit of a, of a step here for, for the survey analysis because I want to compare this a little bit with, for example, sentiment analysis. A lot of people-- For a lot of people, sentiment analysis is a natural thing to do on survey responses like NPS or churn. The problem with, with sentiment analysis is, and, and even a, a prompt like this, is that we're assuming that a large language model will understand sentiment and rate things in the way that we would. So first of all, with any rating task, like does this fit into, you know, negative, positive, or neutral sentiment, or, you know, which buckets do, do customer statements fit into, we need to explain what we would rate things at or how we would label them. So that's what I'm doing here. But also with a churn survey, for example, there's a pretty strong likelihood that a large language model will decide that everything is negative, and that doesn't really help us. So instead of saying, "Hey, categorize or group things in... by sentiment," I'm having it add intensity ratings so that we can make more sense of which of the problems that are causing churn are actually most worth us working on as a product team.
2. AGAakash Gupta
  Hmm.
3. CSCaitlin Sullivan
  So if people are having a soft exit where they're circumstantial, like, "Things in my life just changed. Like, there's no hard feelings. I'll, I'll come back if things change back again," well, this isn't necessarily something that we're probably gonna wanna fix. Whereas if something is really frustrated, if it's, you know, a group of people who very much fit in our ICP group or, um, they are high-value customers and they're frustrated or angry, then these are probably the things we want to focus on.
4. AGAakash Gupta
  And what's happening in this calibration example?
5. CSCaitlin Sullivan
  So all of this is about how to rate things. What do we call a soft exit versus an angry exit versus, you know, a vocal detractor? What's the difference between frustrated and angry? This... It would be making a mistake if we assume that Claude or any of the large language models know the difference or interpret the difference the same way that we would. And this is where it gets a little bit subjective, but, you know, as someone doing a lot of research, I have beliefs about some of these things, or I have an idea about what an angry customer looks like from my domain, right? Like from this flow app or whatever I've been working on. I know the difference between angry and, and frustrated, or you could kind of come up with your own definition. So I want it to follow my lead or the way that I've grouped people in the past so that I have comparable results, something like that. So it's really important to me that it doesn't just kind of come up with its own rating scales or ways of labeling things, but that it kind of follows, follows my lead and way of thinking.
6. AGAakash Gupta
  So when you're defining something, put some few-shot examples so that it's very explicit and clear what that means for the LLM. That's what I'm taking away here. Is that right?
7. CSCaitlin Sullivan
  Yeah. Yeah. And what's actually really important about few-shot is, is reasoning, not just showing what the example looks like, but why as well.
8. AGAakash Gupta
  Ooh, okay. So giving it that reasoning so it can replicate you.
9. CSCaitlin Sullivan
  So here today, it's putting things into TXT files, which is really funny because usually it outputs everything in, uh, in the normal view. But here we have-Analysis here. But we see most people are clustered in these, like, level two, three, and four here. There are far fewer angry exits than neutral and frustrated, which at least is a good thing. But we wanna see how they line up, how they stack up with these different codes as well, right? So this is kind of a stacking of prompts, getting more specific or more detailed in our analysis, or adding a layer to the analysis. So we did coding, and now we're determining the emotional strength of these different codes. So this is the point here. What I wanted to see was, okay, how frustrated are people who are missing favorites or save functionality, for example? Pretty frustrated. I mean, it's in the middle, level three, but there are more level threes than anything else, so that means they're pretty frustrated, and that would say to me, "Hey, this is something we need to work on."
10. AGAakash Gupta
  Makes sense.
11. CSCaitlin Sullivan
  But we have all of the codes ranked by intensity. We have a little bit of a visualization because Claude decided to go a little bit beyond the brief here. I didn't ask for any visualization, but hey, I'll take it.
12. AGAakash Gupta
  So overall, you're happy with these results?
13. CSCaitlin Sullivan
  I am happy with these results because, um, I can also check them. So because I had that exportable CSV that was recoded for me, I always ask for these kind of outputs at the end where I can double-check, first of all, that, uh, from the interview analysis, that the quotes are real. If it gives me, "Hey, this quote came from this participant on this page," or, "It was quote number whatever," in my markdown file, then I can go back and I can make sure that that quote exists, that it came from the participant it said it did. And in this case, when it quantifies things, but it's already given me the full set of coded, recoded responses, I can go do my own math there too if I want to.
14. AGAakash Gupta
  Awesome. Where do we go from here?
15. CSCaitlin Sullivan
  So we still need an audit, that verification step. Just like we did with the
51:31 – 57:42
Step 3: Auditing AI's Own Work
1. CSCaitlin Sullivan
  interview analysis, we need to check, or we need AI to check its own work here.
2. AGAakash Gupta
  Hmm. So it's gonna select random responses, check for code consistency.
3. CSCaitlin Sullivan
  Yeah.
4. AGAakash Gupta
  Okay.
5. CSCaitlin Sullivan
  What's really important here is simply, first of all, simply that you do a step like this, like I said before. We just need to force it to kind of do a s- another pass through its own work, and it will usually find something that's a little off. Sometimes what it finds is some big calculation errors. It's rare if you have really good prompts and you push it through this step-by-step process. It's rare to find big calculation errors. Sometimes they're small, though. Sometimes it goes, "Hey, I would change the coding a little bit here because I think there's another group that deserves another code that I kind of mixed up in some other groups." And then, you know, the quantification changes a little bit. So that's the sort of thing I wanna see here, because I wanna stay as close to the data as possible. I don't want to exaggerate things. I don't want to blow up, uh, responses to super angry exits, you know, from the s- the previous step if they weren't super angry exits. I wanna be as realistic as possible.
6. AGAakash Gupta
  Mm-hmm.
7. CSCaitlin Sullivan
  So it's, for example, here it's rechecking level four because I asked it to specifically check level here. For the intensity rating audit, recheck all responses rated four or five. We don't wanna go kind of running around saying, "Oh, people are super, super angry, and we have to fix this thing right away because they're all level four or five." We wanna know, is that true? Like, should this actually be priority number one? Are they actually really angry, or did we over-exaggerate a little bit?
8. AGAakash Gupta
  I think this is a really interesting way to use AI. I'll call this the CYA way to use AI. Cover your ass. What is anything that could come back to bite you when you're using AI in the project? Have AI check that out. I love this lesson.
9. CSCaitlin Sullivan
  Oh, CY... Yeah. That's a good, that's a good name for it. I've heard so many people... I- it's less common with researchers because I think they're, they're feeling more cautious with AI and a bit more protective of this work naturally. But I've heard a lot of product managers I've worked with, and friends, and designers as well kind of say like, "Yeah, I came up with all of these really nice-sounding insights, and then when I went digging into the data myself, or I presented it and someone asked questions I couldn't answer," and, you know, they didn't do a step like this, and if they had, I bet you that a few of the issues, a few of the, the holes that could be poked in the insights would've been, you know, highlighted here.
10. AGAakash Gupta
  So if we review the three prompts so far, the first prompt was coding, the second prompt was quantification, and the third prompt is verification. Is that the right summary?
11. CSCaitlin Sullivan
  So the first prompt was quote coding and quantification in one. The second one was rating the emotional intensity, you could say.
12. AGAakash Gupta
  Hmm.
13. CSCaitlin Sullivan
  And the third is auditing the results of everything so far.
14. AGAakash Gupta
  Got it.
15. CSCaitlin Sullivan
  So here you can see it's corrected things. It says, "Survey final corrected" at the top. So it's recoded potentially the primary codes and also the intensity levels here. Let me see. So corrections made, the coding, eight corrections. So some things were missing. Oh, missing favorites was the old code, and the new code is actually lack of control over selection. So it's gotten more specific about the code that it feels should be applied there. And because it's lifting this up, and because it had recoded the whole thing originally, I can go through this myself, and I would in a normal work- workflow scenario. I would go through and I would double-check, do I agree with these? Like, based on my understanding of the data, do I agree with these and would I kind of label them the same way?
16. AGAakash Gupta
  And I think this is a really important step because it gave us a preview of the synthesis even though we didn't ask, and it, its synthesis was like missing favorites is, like, driving all the churn. And I was like, wow, like if that's true, then you can really quickly solve a lot of your churn problems. But going back and coding it, it realized, oh, there's some nuance here beyond just favorite.
17. CSCaitlin Sullivan
  And here, exactly as I expected, the intensity rating for at least two things was redone, so it kind of over-exaggerated the frustration, right? It dropped things down from some from angry to frustrated, which could be meaningful, especially if it found more issues like that.
18. AGAakash Gupta
  Mm-hmm.
19. CSCaitlin Sullivan
  So now we have-
20. AGAakash Gupta
  You've got it to check its work. What is the next step?
21. CSCaitlin Sullivan
  So from here, same as with interviews, I would just do a synthesis. Like I said, I would say wrap up the top, um, the top issues that are worth fixing. The... I would maybe ask for things per tier, like per segment based on the service tiers that they, they were using, like the people on individual premium accounts versus team accounts and things like that, because those, those might be a little bit different. We see things like billing, account admin issues or technical issues, right? That might be across both of them, but billing an account would only be with team account users, for example, or proper invoice issues, things like that. So I might do a synthesis like that, but I think, again, most people are, are decent enough at asking for a, a wrap-up of the insights and not as good at pushing through this. So I would jump to the mega version of this, I would say, which is, if it's okay with you that I skip synthesis again, I would jump-
22. AGAakash Gupta
  Yeah, no, I think people can imagine it on their own, right?
23. CSCaitlin Sullivan
  Yeah.
24. AGAakash Gupta
  And I'd say, like, uh, how a PM and a user researcher do it might be different too, and so that's... Build it to your own use case. Like, as a PM, I'd probably be saying like, "You know, okay, draft up my first version of my product strategy document based on this. Like, what are the key problems we need to solve? Across those, please put one or two quotes from the specific survey and make it link backable so that I can go trace it if needed." But a user researcher might have some different goals, and so build the synthesis prompt for your goals.
25. CSCaitlin Sullivan
  Exactly, yeah. And all of these prompts can be modified for your goals, right? I've done this in a particular way with, you know, retention and churn as the focus, but this can be used for any kind of analysis you're doing, any kind of datasets. It's really... It's supposed to be a flexible model. That's kind of why I, I developed it this way, but I developed it, like I said, based on the way that people like me have been doing analysis for, for decades anyway, so it's not, not reinventing the wheel here.
26. AGAakash Gupta
  Awesome. So what does
57:42 – 1:09:01
Claude Code: The Agentic Parallel Version
1. AGAakash Gupta
  the agentic parallelized mega version of this look like? This somewhat scary but awesome Claude Code is here.
2. CSCaitlin Sullivan
  [chuckles]
3. AGAakash Gupta
  Or you just install it with, like, a simple command. It's like NPM something something. You might need to install NPM. In which case it'll have you use Homebrew to install that, so it can feel a little bit intimidating to install this, but once you have it installed, you just type in the word Claude and you're here.
4. CSCaitlin Sullivan
  Yeah. So I'm not using anything like a special Codebrew or like VS Code or anything. I usually use the, the terminal exactly as it is, especially for demo videos, because most people who aren't used to using the terminal will download it and look at it like this. So I wanna kind of keep it consistent. What's worth knowing in the first place is you have slash commands, and agents is what I'm gonna be working with right now. So if I just look at my agents, I have a bunch set up already. I kind of add new ones, delete them, and reiterate them all the time. But what I'm gonna be using right now are my survey analyzer light and my interview analyzer light, and this is exactly what you just saw. I've pretty much set up the same workflows, or mostly, but just to work a little bit more on autopilot. So I set up those agents beforehand. The only difference between that or the biggest difference between, um, what you've seen me do step by step in the web UI versus this is I have programmed those agents with system prompts that refer to and retrieve information from more files. So instead of being in the web UI and just uploading things every time and having an individual prompt for everything, I actually have a context document that it's gonna retrieve from instead of me prompting, "Hey, here's my context." And I can change that context over time, so I can kind of use this agent for different things and update the context when our team information, project information, customer information, whatever it is, feature set changes. I can keep that updated. But it's kind of doing things a bit more automatically than my step-by-step process in the web UI.
5. AGAakash Gupta
  Hmm. So just to summarize for folks, I heard three inputs. Your context document, which is similar to the context prompt we saw. Your agent markdown file. In this case, she's got two. She's got a survey and an interviewer light agent markdown file. And then your actual data or responses.
6. CSCaitlin Sullivan
  Yeah. And for some reason, my pasted prompt wasn't showing for you. I really want you to see it. Give me a second. There we go. So what I'm saying here, this prompt, what you see, is really simple compared to the other things, right? So what I just said was I'm, I'm basically loading all of the, the steps automatically in the system prompt. I built in a little bit of extra kind of feedback loop type stuff in there as well, and then it's pulling from documents. So I don't need to break this up into individual prompts. I just send it one prompt to kick off using my pre-programmed agents. I'm calling the interview analyzer light here. It's not a slash command, I just write it in. And the survey analyzer light, and I'm just basically telling it to refer to the markdown files here. And I can hit Enter, and it will just, in a moment, tell me, "Okay, it's starting that." So it's reading the markdown files that I just referred to for each agent. Oh, it's searching for them. This happens all the time. If my, um, if my paths, if I didn't give a file path here and I just name the markdown file, then it has to go hunting for them. Sometimes I write the full path wrong too, and then it has to go hunting for them. But the great thing about Claude is it will troubleshoot that on its own, so you don't normally have to go, "Oh, wait, go look over here. It's there." Instead, it will just go find it in your files for you.So it found them
7. AGAakash Gupta
  Sometimes it might ask you for, like, a permission or something. You usually just hit, like, two, I think, and then it'll just, like, keep going forward.
8. CSCaitlin Sullivan
  Yeah. I have something I don't necessarily recommend to everyone, but you can see at the bottom this red text, "Bypass permissions on," so you can actually type in a command to remove that request, which means, unfortunately, if I were to go beyond this prompt, it can kind of do anything it wants on my computer and destroy all of my files. So I don't recommend that you do that every time, but for a demo like this, it's really helpful because it won't ask me every five seconds whether I really want it to do the next action or not.
9. AGAakash Gupta
  And I think it's called YOLO mode, right?
10. CSCaitlin Sullivan
  Yeah.
11. AGAakash Gupta
  You can u- just type in, like, /YOLO, and it'll do that.
12. CSCaitlin Sullivan
  So you can see it's hunting through my files in my Dropbox projects AI tests live demos Akash, right? I have these files set up here, so you can see the, the paths, and it's reading everything it needs to based on what I, you know, told it to do in those system prompts that you didn't see. And it says now I have the paths here for interviews, and it's reading all of the files with the names here, so I can make sure that it has read the right interview files. What's really cool about this, I mean, the reason I'm showing this in the first place, is you just saw me do these analysis processes where, you know, if I weren't talking so much in between the survey analysis and the interview analysis processes individually for that amount of data and that number of prompts took about 20 minutes each. But that means you would spend 40 minutes on that, which is still remarkably fast if you felt that that was complete with a short synthesis prompt afterward. And usually it takes me a little longer than that because I have an even more rigorous process with more, more steps and feedback loops built in, but that's still just really fast. This allows us to do both of those things in parallel. So because the interview analysis and the survey analysis here don't build on each other, they don't need to be sequential. So instead of saying, "I have one hour to analyze all of this stuff. Will I have enough time?" Well, yes, you have enough time if you do it like this because you just let your agents with a pretty solid process do both things at once, and then you can triangulate on your own at the end, or you can put that into AI and triangulate for you. But you're cutting out half the time by parallelizing this.
13. AGAakash Gupta
  And how did we parallelize it?
14. CSCaitlin Sullivan
  So simply because I have agents set up that run the processes I went through manually in Claude in the web UI-
15. AGAakash Gupta
  Yeah
16. CSCaitlin Sullivan
  ... I can call them both here. So in this prompt where I say, "Number one, use Interview Analyzer Lite," this has all my prompts that I just showed you in Claude, plus a few extra feedback loops. And so it's basically saying, "Do that," what I did in, in Claude's web UI, "and then also do the survey version, but do them at the same time."
17. AGAakash Gupta
  Nice. Yeah, sometimes I did this, like, I would open this in Cursor, and I'd have multiple terminal windows, but I think this is actually an even cleaner way.
18. CSCaitlin Sullivan
  It is really clean. You can just follow the whole thing here too. There will be sections of the process where it says, "Expand with Command O," or something like that. So you can really look at the whole process it's going through just like you would in the web UI and make sure that it's kind of following the instructions, doing things properly. And I've told it to output everything, like output the final analysis to markdown files as well, so I'll be able to see them in my files and kind of review them there. I also kind of like seeing how many tokens.
19. AGAakash Gupta
  Yeah, you always want to know how many tokens you're burning. While this is running, could we look into one of those agent markdown files maybe and start to teach people about what those look like?
20. CSCaitlin Sullivan
  Yeah. Gotta stop sharing though. So this is... I mean, this should look familiar if anyone really stops and looks at the prompts that I've used. So this is the interview methodology markdown, and I've got the product, the project context, and the product context and everything that was already in those prompts before, right? So I'm just feeding it all of that background information in this file, and then also the definitions of contradiction types, things like that. So all of those pieces of information where I'm trying to teach it to analyze or identify information or label it the way that I would, that's basically built in here as a sort of context load that it can retrieve from.
21. AGAakash Gupta
  And does an agent markdown file need any special formatting or instructions?
22. CSCaitlin Sullivan
  I mean, it depends on what you're trying to do. There are ways of getting a little bit trickier about it, um, but generally it's still all grounded in being specific, being clear and specific, just the way that you would be in prompting. And for markdown files specifically, it's about structure, right? So the reason we use markdown files is really to help the large language model retrieve information in structured, organized ways that will make it easier for it to, it to do the processes faster and more accurately as well.
23. AGAakash Gupta
  Hmm, okay. And Claude can help you create these agent files. If you guys are nervous about them, you can put in some of these prompts and say, "Hey, create me a nice interview analyzer lite," and it'll work with you on it.
24. CSCaitlin Sullivan
  Exactly. So let's see how Claude is doing over here. Still working. You can see it's running Interview Analyzer Lite here. So it says, "Both agents are still analyzing the data." Let me wait for them to complete.
25. AGAakash Gupta
  Generally, do you find yourself using Claude Code more or the UI more for this these days?
26. CSCaitlin Sullivan
  It really depends on what I'm trying to do, but if I have everything based in my files, then I find Claude Code in the terminal to just be a sort of more efficient way of doing things. I'm, like I said earlier, I'm, I don't consider myself a super technical person. Um, I k- I have a marketing and then design and then research background, so I'm not coming from coding. And I, once I got used to this, like the whole interface not being particularly, you know, design-y, [chuckles] um, I, I started really appreciating just being able to use slash commands to kind of-Call things, start tasks with a couple of keystrokes. I think it's really interesting to automate these processes that I would normally do step-by-step manually, but just learn how to make it call the right information from different documents, or the fact that it can find the right information within my documents without me having to go, "Oh, where did I put that?" Or even saying, "Do we have..." You know, for repository cases where someone's saying, "What do we already know about X?" Well, if you have Claude Code, and it's actually synced to all of your company files, your project files, then it can just go figure that out for you. You need to know how to prompt really well. You need to be clear and explicit, and you need to know how to s- structure documents in a way that makes it easier for your large language model to call all of them. But, you know, if you figure that out, then a process like this goes really, really fast and much easier. So interview analysis is complete here. Survey analysis is complete, and it has a few quick findings, but what's more important is that it has output things to a file. So I'm gonna stop the screen share here, and I'm gonna go over to where I hope it saved it.
1:09:01 – 1:12:19
Final Output & Results
1. CSCaitlin Sullivan
  So here's our meditation app premium user retention analysis. There's an executive summary in here because of my instructions. So we have how many stable subscribers, moderate risk subscribers there are, and so on from the interview analysis, right? So this is my interview analysis results markdown file. Phase one, data confirmation, phase two, individual retention analysis. So there's a little bit more in here, but it's basically doing the same thing. We have the value anchors analysis and the results here. This particular therapy chat feature directly addresses a gap that traditional therapy couldn't fill, and so on. We've got quotes and timestamps from this participant. So it's done exactly what it did when I was doing this step-by-step in the u- uh, the browser UI per participant, pulling out everything. Then we've got P- P2, participant 02, and all of the value anchors from this person. It's all of this, the output you've already seen, but just now in a markdown file. So what I usually do from here is I kind of format this differently, put it in Pages or Google Docs or something, and make it look a little bit nicer, and then I'll pluck out things that kind of stood out to me. It has, you know, that executive summary at the top, so most of the time that's pretty ready. I can kind of copy-paste that and share it with someone. But, um, it has all of the details that I would need if I want to review this on my own as well, like you saw in, in the browser, and even some tables.
2. AGAakash Gupta
  Epic. I mean, this is a sick workflow. So-
3. CSCaitlin Sullivan
  And I have another... Yeah, sorry. Go ahead.
4. AGAakash Gupta
  Oh, go ahead.
5. CSCaitlin Sullivan
  I have another file like this for the survey analysis too, just to be clear.
6. AGAakash Gupta
  Cool. So if I were to summarize for you guys what we covered today, we gave you a masterclass in how to do discovery with AI. Discovery is the most important skill for building great products, and we've covered examples for how to do that both for surveys and for interviews. And we've showed you how to do that with Claude step-by-step using multi-step prompting, or to use Claude Code and build an agent to do it. Now you have all the tools. It's up to you to go apply this, see where you get stuck, drop in a comment below on this video, or find Caitlin online on LinkedIn or somewhere else. Maybe take her course to learn deeper, and really embrace this new way, because the best products are built with the best user understanding, and this is your roadmap. Caitlin, thank you so much for sharing all these valuable insights for free.
7. CSCaitlin Sullivan
  It was my pleasure. Thanks for having me.
8. AGAakash Gupta
  See you later, everyone. I hope you enjoyed that episode. If you could take a moment to double-check that you have followed on Apple and Spotify Podcasts, subscribed on YouTube, left a rating or review on Apple or Spotify, and commented on YouTube, all these things will help the algorithm distribute the show to more and more people. As we distribute the show to more people, we can grow the show, improve the quality of the content and the production to get you better insights to stay ahead in your career. Finally, do check out my bundle at bundle.aakashg.com to get access to nine AI products for an entire year for free. This includes Dovetail, Maven, Linear, Reforge Build, Descript, and many other amazing tools that will help you as an AI product manager or builder succeed. I'll see you in the next episode.

Episode duration: 1:12:29

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode rzAGo_XML1U

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome