Skip to content
Aakash GuptaAakash Gupta

AI Product Metrics Interview – Execution Case Explained

We break down the complete AI product metrics framework. The 40-minute case walkthrough, visual framework approach, and why output metrics matter more than you think. Full Writeup: https://www.news.aakashg.com/p/ai-success-metrics-interview ---- Timestamps: 0:00 - Why AI Product Execution Interviews Matter 1:18 - The Underlord Case: Live Mock Interview Begins 4:29 - Why I Pulled Up The Product Live (Don't Skip This) 6:42 - The User Segmentation Push-Back 9:16 - Building The Visual Framework in Real-Time 12:52 - Value Enumeration: The 4 Core Values 17:24 - The Positive Metrics Bank 23:00 - North Star Selection: Why Exports Won 25:47 - Breaking Down The North Star (3 Vectors) 31:00 - Trade-offs & Guardrails Deep Dive 34:12 - The "Genie Metric" Curveball 35:01 - The Output Metrics Miss (My 8.5/10 Moment) 38:08 - The Power Move: Post-Interview Follow-Up Strategy ---- Key Takeaways: 1. Visual frameworks are non-negotiable - Draw your structure live. Interviewer needs to follow you for 40 minutes. Without visual anchor, even great ideas get lost. This separates you from every other candidate. 2. Always push back on user assumptions - Bart said "beginners." I challenged it. Underlord is on homepage, so metrics need to work for ALL users. This type of thinking turns 8/10 into 10/10 answers. 3. Build metrics bank BEFORE choosing North Star - Generate 15+ positive metrics first. Time, volume, adoption, discovery, engagement, retention, output. Then evaluate. Don't pick your favorite and work backwards. 4. Control for complexity in time metrics - Underlord might INCREASE time to edit because users do more. Create table: 1 tool = 3min, 2 tools = 4min. If Underlord takes longer for SAME complexity, that's a problem. 5. The output metrics mistake cost me 2 points - I forgot upgrades, renewals, referrals. Bart had to prompt with "genie metric" question. Always include input metrics AND output metrics. Non-negotiable. 6. North Star selection needs explicit reasoning - Don't just pick. Evaluate out loud. "Time to export doesn't work because... Number of tools is hard to operationalize because... Number of exports works because..." Show your math. 7. AI guardrails are different from traditional metrics - Hallucination rate. Support requests: no increase. Build evals that verify AI actually did what it claimed. This is table-stakes. 8. Break down your North Star by segments - New users vs power editors. Free vs paid. Short-form vs long-form. By equation: sessions × completion rate × exports per session. Makes it operationalizable. 9. The eval-driven approach for discovering failures - Ship to 10%. Collect traces. Review failure modes weekly. Create synthetic evals. Add new guardrails. This is the Hamel Husain / Shreya Shankar methodology. 10. The 30-minute follow-up wins jobs - Take your framework. Find gaps. Fix them. Mock up dashboard. Email interviewer. At Descript, you'll be the ONLY person who does this. Immediate differentiation. ---- 👨‍💻 Where to find Dr. Bart Jaworski: LinkedIn: https://www.linkedin.com/in/drbartpm/ Land PM Job: https://www.landpmjob.com 👨‍💻 Where to find Aakash: Twitter: https://www.x.com/aakashg0 LinkedIn: https://www.linkedin.com/in/aagupta/ Newsletter: https://www.news.aakashg.com #productmetrics #pminterview ---- 🧠 About Product Growth: Aakash Gupta's newsletter with over 200K+ subscribers. 🔔 Subscribe and turn on notifications to get more videos like this.

Aakash GuptahostDr. Bart Jaworskiguest
Jan 30, 202640mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:001:18

    Why AI Product Execution Interviews Matter

    1. AG

      When it comes to PM case interviews, there are two major buckets: product sense and product design, and product execution and product success metrics. AI has its own flavor of both of these. AI product sense, AI product execution. Today, we are giving you the very first full mock interview on YouTube ever published for AI product execution and AI product success metrics. How do you measure the success of launching GPT 5.2? How would you look at a dashboard of metrics for Claude Code? These are really interesting questions, and these are the questions that companies like OpenAI, Anthropic, Meta, Google, all of the best AI companies are asking. So if you want to land a PM job at one of the top AI companies, if you want to earn that five, six, seven hundred thousand dollars salary... You know the average stock grant at OpenAI per year is one and a half million dollars. If you want to land an OpenAI PM job where you're making one point eight, one point nine million dollars, you have to nail the AI product success metrics interview. So today you are going to get an end-to-end case example of how to nail those with a ten out of ten response. And why are we delivering this value to you for free on YouTube?

  2. 1:184:29

    The Underlord Case: Live Mock Interview Begins

    1. AG

      Well, Dr. Bart and I have been running a cohort program where we take thirty PMs, highly qualified PMs, who have PM experience but want to become AI PMs, and we help them do that. We just are completing nine weeks of the twelve-week program of our first cohort, and already thirty percent of the cohort has jobs. People have landed jobs at OpenAI, Anthropic, and other companies like that. So we are going to share the types of questions that they were asked, how-- the coaching that we gave them so that they could answer correctly. And if you want to go through a similar cohort experience, be sure to check out landpmjob.com. That's where you can see all the details for this cohort. We have-- It is a premium cohort. It is not cheap. It is expensive. But if you are in India, if you are in Europe, we do adjust the pricing to your purchasing power. So be sure to check out cohort two. We already have sold out one-third of the cohort. There's just a few weeks left for you to secure your spot. So if you want to get a spot to get interview answers like this, be sure to check out landpmjob.com. And now into today's mock interview.

    2. BJ

      Hi, Aakash. Welcome to Descript. I'm Bart. I'll be your interviewer. How are you today?

    3. AG

      Good. Excited about this case interview today.

    4. BJ

      So listen, time is limited. I want to get the best out of you. So let's skip the formalities, the weather talk, and jump right into it. We here in Descript are launching a new feature called Underlord. So tell me, how would you measure its success?

    5. AG

      Okay, this is a fascinating question. Maybe to start, can we align together on what Underlord is? I'll just pull up the Descript website, and we can just take a look at the feature. Does that sound good?

    6. BJ

      Perfect.

    7. AG

      Awesome. So I'm gonna share my screen. So let's take a look and see what is the Underlord feature. So it looks like we have different options. It looks like it's an AI co-editor to help you do anything. It's a chat-based video editor. So you guys have made it really replace your overall AI editor, which is what I'm used to using. So maybe what we can do is we can jump into a quick project, and we can see how Underlord works. So Underlord, you click it on the bottom right. Okay, you can ask it for suggestions. It can give you creative ways you can enhance a composition. Very cool. So it's gonna be... It's like your editing partner, and it looks like it can even execute on some of these. Let's go ahead and see what it says. Multicam layouts, B-roll and screen recordings. These are all really good. So this is giving me a good sense of how this product feature works. It's like your buddy as your editor buddy. And then what I want to confirm is, like, what type of tool use it has access to. So do you have access to do any of these edits, or are you limited? I'm just gonna check what kind of tools this agent has access to. Okay, so you guys have built it so that it can take access of basically all tools. So I just wanted to confirm that in the first... I am the PM for Underlord, and it has access to all the tools in Descript. Is that right?

    8. BJ

      That's correct. We built it as a natural language

  3. 4:296:42

    Why I Pulled Up The Product Live (Don't Skip This)

    1. BJ

      alternative to old style editing, which can be frustrating, where you want-- you know what you want to achieve, but you don't know how. There's lots of menus and... Well, we people always communicated with language, so why not do essential thing like video editing in twenty-first century with your common words. And you are tasked to, well, assess whether we were successful in this mission or not.

    2. AG

      Awesome. So we're the PM of Underlord. We've confirmed that it has access to all of Descript tools. So you guys have created, you know, MCP or API hooks into all of the editing features. Um, can I take a minute to structure my thoughts and get a sense of where we should go from here?

    3. BJ

      Take your time.

    4. AG

      All right. All right. So I just created a quick framework. I'm not gonna bother to draw all these lines in here 'cause that's just gonna distract us, but you can assume that I've, like, drawn lines from all of these features. So we do the clarifications, which we talked about. We'll talk a little bit more about the product and what the product does, the users, the value that the product provides. Talk about positive metrics. So cr-create a bank of positive metrics. Identify a North Star metric. Break down that North Star. Talk about trade-offs and guardrails, and end with a summary. How does that sound?

    5. BJ

      Like a very structured answer. I'm happy and looking forward to hearing it.

    6. AG

      Awesome. Cool. So in terms of clarifications, I don't have too many clarifications. I guess I'm just curious, did you guys have a certain type of user in mind for this feature when you built it, or should I just make some assumptions and think about what that might be?

    7. BJ

      Think of people who are starting their editing journey, who want to have an amazing videos like top creators, but don't have the tools to do it yet, as in skills and knowledge are a little bit not tech savvy enough just to like find the right, mm, manual or a video, but still are very determined to achieve their goal.

    8. AG

      Okay, great. That's really helpful context. So let's see if we can connect those auto. Perfect. So now we talk about the product a little bit, and

  4. 6:429:16

    The User Segmentation Push-Back

    1. AG

      from the product angle, one of the things that we confirmed already is that it has access to all of Descript's tools. So I'm gonna go ahead and start to make some of this smaller here. Let's go ahead and make this smaller so we can start to look at stuff. So has access to all of Descript's tools. Basically, the user can chat with it and request anything. Um, it's also kind of like your editing buddy, right? So we saw that example that we demoed at the beginning where, um, the editing buddy suggested exactly how to improve the recording that I had. So I think those are the main features of the product. Is there anything else I should be aware of before I move into users?

    2. BJ

      No, carry on to users.

    3. AG

      Awesome. So now we think about the users. So you already told us a bit about the users, so actually what I'm gonna do is I wanna just think about these people a little bit further. People starting their editing journey, top creators, right? So, um, there's gonna be people with low editing fluency. They're not gonna understand, you know, how to use an editor, and so they're gonna be very reliant on Underlord. And then there's probably gonna be people with like medium editing fluency, probably nobody with high, 'cause you said they're starting their editing journey. But these people, they're-

    4. BJ

      Users, it's not limited to a particular user, but probably if you have the muscle memory, you don't need, uh, the Underlord to help you.

    5. AG

      Yeah. Well, I imagine there are some use cases for high editing fluency too, right? Because an AI agent might be able to perform an edit like 10 times or find 10 clips. So it might be able to do things like at scale for you.

    6. BJ

      Or save you time by removing all the mm, uh, or double edits or add a B-roll. Definitely there is value for any user using Underlord. Uh, and well, the question is how, how well should we, um, profile that to certain specific user, and where we will get the most success, and how do we understand the success?

    7. AG

      Awesome. Perfect. So in terms of like prioritizing user groups, because Underlord is on the homepage, I really feel like the success metrics we need to have need to accommodate any user. And I know that you mentioned that we should think about people starting their editing journey, but I was wondering if I could push back on that a little bit, because I imagine that the biggest power user of Descript is like an editor, somebody with high editing fluency. So I'm thinking like even though our success metric needs to care about

  5. 9:1612:52

    Building The Visual Framework in Real-Time

    1. AG

      people starting their editing journey, we need to make sure it satisfies his high editing fluency. Does that make sense?

    2. BJ

      It does, yes.

    3. AG

      Okay, great. So in terms of users, we're not gonna prioritize any user is just the note I'm gonna put here. Um, and the reason being is because this is on our homepage, and this is like a persistent agent. So be- for those two reasons, we're gonna make sure that our success criteria, um, really satisfies everybody. So let's talk about the value that they get. We talked a little bit like they save time, they do it at scale, but I think the value, uh, enumerating it in even more detail is gonna help us identify the right positive metrics. So one of the things, right, is it basically reduces the time it takes to edit, right? Then it also, um-- So basically what we're doing here is like time to export or time to publish. We're kind of reducing that. It also hopefully allows you to do more edits, right? So we basically wanna see like more exports, more publishes. Um, as I'm thinking about value, I also wanna go ahead and just talk about some of the trade-offs and guardrails, right? So we need to be really careful, and there's this concept of AI evals, right? And so we'll talk about that a little bit more when we get here. But we need to be really careful that it's not, um, you know, it's not hallucinating that it did an edit. [chuckles] And we also need to make sure that it has, you know, like a high quality bar for those edits. Um, those are some of the things we'll need to make sure. And so as we think about value, that reminds me of that. These more exports and publish and time to export and publish, I feel like these are almost positive metrics, so I'll move those into the positive metrics. So as I think about value a little bit more, is there more value we need to think about? Let's see here. Probably another thing it does, like you said, for first-time editors, I imagine for new customers, one of the key values we're trying to drive from this feature is getting people to complete their first edit, right? Because I bet there's a huge amount of churn or drop-off from people who they record something, but then they never even complete a first edit, right? So first edit completion rate for new users on first, uh, vid file. That would be another one. So we're, we're kind of tying these metrics to the value, if you can see here, some of these positive metrics. Let me think a little bit more about this value. And you know what? Maybe like to just get us inspired, why don't we go back and just take a look at what we were looking at earlier, right, with this. So I'm gonna shift back into the product for a moment here. It can do things like add chapter markers. Okay, so that's almost like, um, helping you create like your YouTube description. Um, so if I switch back then into this tab, it's like, um-How would I even describe this? Um, gives you the info to write up the video and publish, you know? Things like chapter markers. And if I turn back to this tab and I look a little more, create scene boundaries to split the change the layouts, change the caption. So things like... Okay, so each one of these features, there's gonna be like a feature specific value. So what I mean by feature specific value is that for the captions, for instance, we're gonna need to make sure that the captions look good, that they're accurate, that they're in the language they want, right? Those are just some basic things, for example,

  6. 12:5217:24

    Value Enumeration: The 4 Core Values

    1. AG

      for captions. [lip smack] So because there's so many things going on here, I'm gonna go ahead and take one second. I know it's a little bit distracting, but I'm gonna take one second to just draw these arrows out so it's a little bit easier in case we revisit this information later. So it's just gonna help us trace everything back. So there's captions as an example. Another example might be like we just saw chapter markers, right? I'm not gonna do every feature, but I'm just enumerating for one or two so that we get a good sense of what type of success we're looking at. So for chapter markers, we're gonna have to look at, um, they're s- they're smartly divided. I think for chapters, you also wanna make sure that, uh, they start at a good place, um, that they're not too short or not too long, right? So some of these features, these are the same metrics we would use for that specific tool. And so what I'm thinking after enumerating all these is let's not focus on the tool specific metrics. Let's assume that the PM of that tool is responsible for these type of metrics. Is that okay with you?

    2. BJ

      Sounds fair.

    3. AG

      Okay. So what I'm gonna do is I'm gonna click on all of these boxes, and I'm gonna go ahead and say these are like red, like... Or not red. Let's say like these are like yellow. Like these are not our focus. Yellow, not our focus. Some other PM. So I think we've enumerated the value enough. I think the major values, and let's go ahead and draw arrows to those just 'cause I wanna summarize them. Reduce the time it takes to edit, allows you to do more edits, gets people to complete their first edit, and gives you the info to write up. I think these are the four major values. Did I miss anything?

    4. BJ

      I don't think so. It sounds good.

    5. AG

      Awesome. So now we need to get the four-- We need to, across these four major values, let's go ahead and make sure that we have all the right metrics. So what I'm gonna do is I'm gonna give myself some more space here. There we go. And why don't we go ahead and start to connect some of this up just so that we look at it and it looks pretty. I know it takes a second, but for somehow for me visually, it helps me. Okay. So now we're into positive metrics. So we talked about reduce the time it takes to edit. Now I wanna make sure that we have sort of a holistic idea of time to export, publish. That's right. And then sort of the flip side of this, we need to think about the negative metrics or the trade-offs and guardrails allos- associated with it. So ti- um, hallucinating that it did an edit, that's one. Another one is just like increasing time it takes to edit, right? I think there was this hilarious study with GitHub Copilot where engineers were spending more time like editing GitHub Copilot, and then it made them slower than just writing code by hand, right? So we don't want that same thing here. So we wanna really make sure we reduce the time it takes to export, publish. I'm trying to think about what other metrics I may... Basically, what I wanna look at is, um, the AI makes a change, and then the user changes what the AI did, right? So it's like edit the AI. We wanna reduce editing of the AI, so that would actually be like a negative metric we can put down here. [lip smack] And I'm wondering like, I'm wondering if I have the greatest framework here because I just separated positive and negative. What I'm gonna do just to make it easier for us to tie everything back, so I'm gonna change this arrow just so we know what value we were talking about when we come back to this in a second. Editing of the AI. Okay. What other things in terms of positive metrics? Reduce the time it takes to edit, time to export, publish. Is there anything else around that? I guess it's just like makes you feel like a better editor, right? And so-

    6. BJ

      Perhaps something to do with the, uh, overall quality of the final edit, like unlocking things that you would not consider yourself.

    7. AG

      I love that. Thank you for bringing that up. Like, um, unlocking features you wouldn't use. And that was one of the core values we saw of the product, right? It suggested, "Here's 12 ways to improve that recording that we had." So unlocking new features the user didn't use. So essentially what we would look at in this type of a metric if we were to operationalize this is like they never used, let's say, Studio Sound, and now they do. So it's like, uh, number of AI tools used going up is essentially the metric we're looking at there. So engagement, retention, I think both of these are different, right?

  7. 17:2423:00

    The Positive Metrics Bank

    1. AG

      We could operationalize these differently, right? On the engagement side, it would be something like they're opening-- they're spending more time per week. Well, is that right? We don't want them to spend more time because we just talked about reducing the time-

    2. BJ

      No, no, no

    3. AG

      ... per edit. It's like they're editing more videos in Descript.

    4. BJ

      We want to increase their efficiency really.

    5. AG

      Yeah. Editing more videos in Descript, and then retention, right? Like people are coming back, you know, day seven, day 30, these types of retention metrics. Okay. So let's do a quick time check here. How much time do we have? We are about 20 minutes in. Okay, I think we're doing fine on time. Let's keep going. In terms of this level of depth, because I just wanna make sure we choose a good North Star. So I wanna apply the same sort of rigor to all of these values. SoIf we think about allows you to do more edits. Yeah, more exports, publishes, we already talked about that. New number of AI tools, that's kind of similar, so I'll also draw a line there. So that, that makes that metric seem pretty attractive. Number of AI tools used. Okay, I'll keep that in mind. I won't spend too much time making these arrows look good. Okay. And then is there anything else around allows you to do more edits? More exports, publishes. Well, I guess, like, when you have a long form, it's, like, more short forms per long form, right? If I create a 90-minute video and I create 10 clips instead of five, that would probably be a win for Underlord.

    6. BJ

      And for the users.

    7. AG

      Exactly. And then getting people to complete their first edit. This seems like one of the most important, absolutely one of the most important, but at the same time, as we just talked about, our power user for Descript, since this is going on the homepage, is gonna be somebody who is a professional editor. Okay. And then gives you the info to write up the video plus publish. So basically, like, copy, paste, edit, copy, paste this info, plus, like, accept it with minimal edits. Those would be the two metrics that we would look at. Okay. So now I want to talk through what should be the North Star metric if you feel like we've covered all the positive metrics. Do you think so?

    8. BJ

      I believe so, yeah. Let's move on.

    9. AG

      Okay. Just for the sake of time, you know. In a real-life scenario, I might think about this a little bit more in detail, but given it's an interview, let's continue forward. So, um, what should be the North Star metric? So let me just talk about the pros and cons of each of these. How about that? So time to export publish, the pro is this could be an all-encompassing metric. The con is it doesn't really cover quality, and you never know, like, Underlord could lead to a higher time to export because people are using more AI tools. So that, I think, is leaning me then more towards something like number of AI tools used, and we might even operationalize that further into, like, number of AI tools used with no edits. You know? Like, they, they use Studio Sound and they accepted it. They didn't change it any further. They used the automatic layouts and they accepted it. They didn't go in and change more layouts. That's what I mean by this. So there would be... I'll give you something sim- Like, when we truly operationalize this, we'll have to do this for each tool, which does make it a little bit difficult. Um, but I think we could come up with a synthetic summary metric for it. Okay. So this could be a good one, just based on the pros and cons of this. Editing more videos in Descript. Now, that seems like something that really covers everything, right? And even if we did our AB test, let's say we rolled it out 90/10, 90 pe- 10% of people get Underlord, 90 don't. If we start to see the Underlord group editing more videos in Descript, let's say, like, we operationalize this as, like, number of exports in seven or 30 days, then we get a really good signal that they're, they're liking the Underlord experience overall. And to that point around liking the Underlord experience, I think one thing we should also think about, and we haven't put that here, so let's add that in here, is, like, uh, support, um, requests from users, right? Because theoretically, with Underlord, there should be less support requests. If we're creating more support requests, then that's a bad thing, right? So I almost think that's one of the, that's one of the guardrail metrics we also need to think about, is if we were to choose a North Star like time to export, number of AI tools, engagement, retention, any of these, we would want to see that they don't have supports increased. So right now, where I'm leaning is something like number of exports in seven to 30 days with a guardrail of support tickets. Um, allows you to do more edits. So more exports published. Yeah, that's similar to this number of exports. More short forms per long form. That's also captured in this number of exports. First edit completion rate. So we will actually see this in number of exports because this would be zero for people who have a first edit completion rate.

    10. BJ

      Mm-hmm.

    11. AG

      Copy, paste this info. This, I think, is just not as important. So what I'm gonna do is I'm gonna go ahead and give this, again, kind of like a yellow coloring, like it's not as important. It's important, but it's not a North Star. So based on everything we've talked about and all the pros and cons, I'm leaning towards this as a North Star, and support requests as one of the key guardrails. What do you think?

    12. BJ

      That's where the team is not already using it, but go ahead. It's, it's your North Star. I think we can proceed with it.

    13. AG

      Sorry, what did you say, though?

    14. BJ

      No, I mean, I, I think that's the actual one we are using, but again, it's, it's your North Star. I'm here to observe your process, and I like what I'm seeing. So go ahead.

    15. AG

      Oh, awesome. [laughs] I love to hear that that's what you guys are using too. So

  8. 23:0025:47

    North Star Selection: Why Exports Won

    1. AG

      yeah, let's break this metric down. We've kind of discussed, like, that's our North Star. So just to summarize where we've gone, right? We figured out who our users are, that we're prioritizing for everybody. Given the value, we really care about number of exports in seven to 30 days. So if we were to break this down, there's, like, a couple different ways we want to break down our North Star, right? Number one is probably by that user type. We already talked about how important it is to look at this for new users. Let's go ahead and make this black again. So we care about how this looks for new users versus power editors. And then another way to break this down, so that's, like, one vector of breakdown. So let me, let me actually give that its own vector, right? So this is, like, user type. Another way to break this down is, like, type of export. So it's, like, short form versus long form. We talked about that. Another way to break this down is literally the equation. So let me just think about the equation. I usually like to think about that for just a second before choosing a North Star. So what's the equation gonna be? The equation is gonna be number of... And we also want to look at how much they're editing the AI. So accept it with minimal edits. This needs to be, like, one of those key guardrails. So let me go ahead and make both these red and thicken them up so that we remember them. Okay, so the equation, I'm just trying to think, what is the equation? Let's go back to the product and just look at the product together. So the equation is basically they hit this Export button, and then what happens after that, right? They choose it, they publishOkay. So, and then, yep. Okay. So it's really number of publishes in that timeframe. That's the metric. Does that sound right?

    2. BJ

      It, yep. Yeah, it does.

    3. AG

      Perfect. So we have our equation for our North Star. We have it broken down in three different ways, user type, type of export. I feel like we've broken it down sufficiently. Are we good to move into trade-offs and guardrails?

    4. BJ

      Go ahead.

    5. AG

      Awesome. And then for the summary, I wanna kind of talk about the whole document and what the executive write-up would look like to, uh, Andrew Mason and Laura Burkhauser, our CEO and co-founder. So let's first talk about these trade-offs, guardrails. Okay. So the journey we've taken so far is that we looked at high editing, medium editing, low editing, so we need to think about all three of those. Okay. So for trade-offs, guardrails, we need to make sure it's not increasing the time it's edit, uh, taking to edit, and it, they're, they have minimal edits. So, okay. This is another important red one. So for each of these red ones, and then there's hallucination that it didn't edit. We need to make sure it didn't do that. So I think let's operationalize all of these. So I feel like it's like a less than five percent hallucination rate. No, probably less than one percent hallucination rate. Then, um, we want to see less than

  9. 25:4731:00

    Breaking Down The North Star (3 Vectors)

    1. AG

      ten percent increase in time to edit. Although, in reality, I would have to look at the data and see, like, if people are using a bunch more tools, maybe this isn't a strong guardrail. Um, assuming no additional tool use per edit, or, uh, broke down, broken down by number of tools used per edit. So I'm thinking of a table. Essentially, you could think of a table where we would say, like, "Okay, one tool use, our A group took three minutes. Our B group took two mins forty." That's fine. Two tool use, our A group took four minutes, our B group took five minutes. That would be a problem, right? Because now they're using the same number of tools, but it took longer. So this would be negative, and then this would be positive, is kind of what I'm describing. Does that make sense?

    2. BJ

      Perfect.

    3. AG

      Awesome. So increasing time it takes to edit, we've operationalized that. Hallucination, we agree on that. Let's go ahead and give that a-

    4. BJ

      Sorry. I think you're, you're narrating Miro board while I'm still seeing the, um, uh, the Descript. Yeah. Okay.

    5. AG

      Thank you so much. Increased time it takes to edit. What I had done is just drawn out this A and B case. So ten percent, less than ten percent increase in time to edit, but I kind of mentioned that ten percent's arbitrary. We'd have to really look at the data, but this is what I mean by broken down by number of tools. One, two tools. You can see us kind of imagining three, four. And then hallucinating that it didn't edit. So basically, we're gonna create an eval around, did it actually do what it said it did? And we would want less than one percent hallucination rate. Then support requests per user. So basically, what we would want is that, uh, they don't go up, right?

    6. BJ

      Mm-hmm.

    7. AG

      Uh, less than zero percent increase, or we'll just... That's kind of confusing language. No increase. Um, and then accept it with minimal edits. So essentially, what we're saying here is that, um, it's hard to operationalize this one, but it's like, kind of like I was describing earlier with, uh, some of these t- these... You're gonna have to do it on a feature level. Studio sound with no edits. Auto layout with no edits. And as we kind of talked about this, we're gonna assume that the AI tool PM itself does it. So really, what we're talking about is coordination-level, um, edits. So how do we operationalize coordination-level edits? I think a couple things. Like, this is, like, probably, like, people, like, rage interacting with the chat, right? So saying, "I already told you that," or, "You're dumb," or, "You're stupid," or something, right? We could kind of operationalize that as, like, various statements expressing rage. So what we would want is, like, these rage interactions with the chat. It's hard to... Again, I'd have to look at the data, but I'm gonna go ahead and say something like less than twenty percent of chats are rage interacting with the chat. We'd have to look at the data, associate the data with retention, associate the data with engagement. So we don't know exactly, but that's the high level of what I mean by this guardrail. Does that make sense?

    8. BJ

      Yes. Yes, of course. Just wondering if that wouldn't concern user that we are basically listening up on their conversations. But at the same time, we have to police our own responses, so we don't do anything out of the ordinary by accident. So I guess-

    9. AG

      We could definitely create a setting for users, um, where they check or uncheck if their conversations are used, um, for training data. And if people uncheck, then we can give them the option to report a particular conversation. Does that sound good?

    10. BJ

      It does. Thank you.

    11. AG

      Okay. So we can definitely handle that curveball here. All right. I would potentially, like, think more about trade-offs, but just looking at time, since we're about thirty-five minutes in, I feel like we got a pretty holistic sense of the trade-offs. Do you agree?

    12. BJ

      Yes.

    13. AG

      Awesome. So now I think we're ready for the summary. So if I were the PM for these, this feature, I would create a dashboard for myself, looking at things like number of exports as my North Star, and I would communicate that to everyone, "Hey, number of exports is my North Star." But I would also add into my dashboard many of these other, many of these other metrics that we looked at. So I would want all three of these metrics in my dashboard.

    14. BJ

      Mm-hmm.

    15. AG

      These are, like, my positive metrics. I would also want first edit completion rate, 'cause that's really important. So this might be, like, my high-level positive metric dashboard, and then I'd create this negative metric. So it's kind of like these eight metrics would kind of... Or these nine metrics, I guess, would be my major dashboard that I'm looking at in order to assess the success of Underlord. On the positive side, I'm looking at number of exports, that they're editing more videos, their day seven, day thirty, day sixty retention, their number of AI tools used, and their edit completion rate. And on the negative metric side, I'm looking at increasing the time it takes to edit, hallucinating that they did an edit, support requests per user, and accepting it with minimal edits. And the final thing I would say is I kind of am in the Hamel Husain, Shreya Shankar camp of AI evals, which is you have to look at production and synthetic data to do traces to classify more negative scenarios and create evals around them. So I'm just gonna go ahead and say, like, eval-driven use, neg metrics. Does that make sense?

  10. 31:0034:12

    Trade-offs & Guardrails Deep Dive

    1. BJ

      Yes.

    2. AG

      Awesome. So this is the dashboard I would look at, um, if I were the PM of the Descript Underlord feature.

    3. BJ

      Looks awesome, though I want to s- sh- uh, throw you a little curveball here because your process was awesome. You've given me everything a PM would need to see to evaluate the success of the feature, and I'd like you now to, like, put this process aside, put aside all, like, the technical constraints and, like, feasibility, and give me, like, an alternative metric that you would do, like a genie standard that you'd, uh, that if you could use, which would you use to evaluate success from, like, more of a user perspective and from a business perspective? For example, I used to work in Skype, and my dream metric was to be able to count how often do people say, "Oh, you're on-- Oh, you're on mute?" Just to, like, get this bad experience out of the way. So, so we would do automated tools to, like, mute you and unmute you automatically, or mute- unmute you when we detect that you want to speak. Of course, it could never happen. Privacy, listening in to users, no can do. But that wa- that was a dream success metric. What would be your dream success metric?

    4. AG

      Yeah, I'm just brainstorming based on what you were saying. So, like, people basically saying, "That was awesome," or, "That was great," I think is kind of the ca- analog of what you just described with Skype. In-app survey all being five out of five, no support tickets or questions. Um, really, really good social chatter. You know, like people posting on X and YouTube and showing up, like, on Reddit in an editing forum. Um, just connecting it back to the business goals of Descript, seeing more people sign up for Descript, right? Seeing more people upgrade to higher plans. You know, actually, now that we say that, I kind of missed that, that sort of whole bucket of metrics. So let me actually go back there and just summarize those for a second. I probably should have included in these positive metrics some output metrics. And in the A/B test, we probably wanna look at these too, right? So some output metrics might be things like upgrading plan, right? Renewing, um, referring more people. So all the things that tie back to revenue, I think those actually should go into my dashboard, so I'm really glad you asked that question. And going back to the genie metric. So I guess the ultimate genie metric is, like, a combination of that was awesome chats plus more revenue per user, plus higher K factor, more referrals. Um, it would be like the synthetic metric, putting those three together. I feel like that would be my genie metric. Does that sound right?

    5. BJ

      Oh, th- that's, like, very, um, personal, so thank you for sharing. I was just wondering how you'll react to that otherwise supportive interview.

    6. AG

      Awesome.

    7. BJ

      To, to, to, like, to that, to that level of, like, doubt questioning, and I love that you, uh, immediately reapproached your previous answers to, like, uh, get the most of them, given that there was a hint of, "Maybe I want to hear something more, but I wanna-- don't wanna tell

  11. 34:1235:01

    The "Genie Metric" Curveball

    1. BJ

      you, like, directly."

    2. AG

      Yep. Awesome. All right, guys, that's the end of the mock interview. So I personally would give myself an eight out of half out of ten because Bart needed to remind me about those output metrics. So one of the things you wanna do, and maybe you should even build it into your framework, is always remember I had a input metric here. Always keep this in mind. You can even, like, create little sticky notes. I like to do this sometimes before an interview, is always remember input plus output metrics. And you might even wanna think about, like, some sort of leading versus lagging. I didn't end up talking about that. Um, these are really good. So I did talk about North Star and guardrail, which is, like, one good way to break down metrics, but also consider input, output, leading, lagging. Luckily, I had a nice interviewer in Bart, and as you guys saw, I... and then went back and fixed my answer. And so that's the thing. When

  12. 35:0138:08

    The Output Metrics Miss (My 8.5/10 Moment)

    1. AG

      somebody gives you a hint, don't just assume, "Okay, my other answer was good." Go back and fix your other answer. And any interview that anybody gives, even if I were the VP of product at Descript, I was Laura Burkhauser at Descript, I could improve. So always take your answers, drop it into Claude and ChatGPT, create a Claude skill trained on my articles around this so that you can get feedback and improve further. But this is the high level for a good success metrics interview. Are there any other things you would say I did particularly well that people should make sure to do, Bart?

    2. BJ

      As always, the visual structure of the answer is so helpful for the interviewer especially. This is a long answer that take- took us, what, forty minutes? Exactly. And human being might get confused, like, miss a detail or, like, not keep all the narration. And here, even though I don't see all the screen all the time, I do have more stimuli than only speech to get me to focus, to remember, to understand what is being presented to me. So that's like-- And i- if you are to forget everything on this video and remember, uh, like, a functional advice rather than how to say, remember to do all you can to get your answers understood, uh, be... allow the interviewer to follow them easily, and, like, get the best out of you, what you're saying. And I loved-- As I said, I loved how, uh, Aakash did pick up on my hint that maybe there's something more without telling you, because I don't know about you, Aakash, but I had tons of interviewers that were smiling, that were friendly, that were doing, patting me on the back all the time just to fail me a few days later during, like, the feedback session from the interviewer, which always came as a shock. I mean, this person who told me I did everything right actually thought that I failed. That's, like, one thing that, uh, you need to remember, that they are not in the business of, like, being your cheerleader. They are in the business of finding the best PM.And they don't usually want to disturb you, but they are also not there as an examiner that will push you to the right answer. They might do that, don't get me wrong. But somebody may be rude and come to you with something like, "Well, why are you asking me? It's your answer." Like, I've had those too. So while this is an interview and there's a dialogue, remember that you own your answer, and you have the full time to deliver your best, to make sure you said all the right things. And especially with such visual anchor, be ready to go back to a story point where you missed some important bullet points and add them. Without this visual context, it would be very, very hard to reapproach certain aspects of the answer. But that makes it so easy.

    3. AG

      The extra power move on top of this

  13. 38:0840:35

    The Power Move: Post-Interview Follow-Up Strategy

    1. AG

      is once you create your visual anchor, after the interview, drop this into ChatGPT and Claude that's trained, either a custom GPT trained on my articles or a Claude skill trained on my articles, and say, "What could I have done better?" Add those into your visual anchor. Then send an email to your interviewer half an hour later. Say, "Hey, I really loved our case discussion. It was such a fascinating topic that I had to think about it more, and here are a couple other things I thought about." [laughs] So in this case, we just thought about input, output, leading, lagging, right? That's something we needed to fix. So you could say, "Hey, I thought about the input, output metrics, the leading, lagging indicators, and here's what I was thinking, and I just created a little mock-up of those metrics." Maybe you even go further if you want, and you say, "Oh, I also AI prototyped a version of the dashboard just so we could see what the A/B test final dashboard would look like." If you go that extra mile at a company like Descript, Descript is a $500 million company. If you go do that, they're, you're gonna be the only person who does that. Nobody else is gonna do that. And so you're immediately gonna stand out. They're gonna say, "This is somebody who really loves Descript, who is going above and beyond." So that can also really help you. That won't help you as much at the Meta and Googles of the world, because they're instructed not to look at that type of information. But pretty much everywhere else, it will help you. So if you guys want coaching like this, the type of practice with Bart and I, the type of coaching, feedback on your interview transcripts that's not just from an AI, but from people-

    2. BJ

      [laughs]

    3. AG

      ... then be sure to check out our landpmjob.com cohorts. The next cohort starts the beginning of February, just a few weeks from now. We are gonna be helping 30 people land their dream jobs. Are you gonna be one of those 30 people? Let us know, and we'll see you in the next video.

    4. BJ

      See you there. Thank you. Bye-bye.

    5. AG

      I hope you enjoyed that episode. If you could take a moment to double-check that you have followed on Apple and Spotify Podcasts, subscribed on YouTube, left a rating or review on Apple or Spotify, and commented on YouTube, all these things will help the algorithm distribute the show to more and more people. As we distribute the show to more people, we can grow the show, improve the quality of the content and the production to get you better insights to stay ahead in your career. Finally, do check out my bundle at bundle.aakashg.com to get access to nine AI products for an entire year for free. This includes Dovetail, Mobbin, Linear, Reforge Build, Descript, and many other amazing tools that will help you as an AI product manager or builder succeed. I'll see you in the next episode.

Episode duration: 40:44

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode yN8bm9Ul_ls

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome