EVERY SPOKEN WORD
115 min read · 23,313 words- 0:00 – 2:20
Why AI PM skills are becoming career-critical
- AKAman Khan
Companies are laying off entire teams, entire orgs, and PMs are sort of grouped up into that. I almost never see it where an AI PM team is gonna be laid off to some degree.
- AGAakash Gupta
Can anyone become an AI PM?
- AKAman Khan
I think we're all kind of feeling it, right? Like, as product managers, the expectations on us, we kind of know our role is changing.
- AGAakash Gupta
What is the right way to teach this material? What is the right sectioning of this material? And we've come up with five steps for you guys. So we're gonna go through AI prototyping, which is kind of the heart and soul of it all. We'll go into observability on top of our prototype, evals on our prototype, the difference between RAG, fine-tuning, and prompt engineering, and then we'll end with working with AI engineers, working with researchers.
- AKAman Khan
So let's hop into AI prototyping.
- AGAakash Gupta
So for AI PMs, you'd really recommend they learn Cursor over the other tools?
- AKAman Khan
I would recommend getting familiar with it, definitely. Yeah.
- AGAakash Gupta
When it comes to people creating AI PM content, Aman Khan is amongst the most insightful and informed, and that's because he's been an AI PM since 2019. He worked at Cruise on self-driving cars. He's worked with Spotify on their AI systems, and now he works at Arize, one of the leading observability and evals companies. So if we go back then and we compare those three terms, that fine-tuning, prompt engineering, RAG, how do those all compare?
- AKAman Khan
I think it's helpful to have just, like, a really quick diagram here of, like, what is each thing. It kind of depends on what your goal is. So if your goal is to adjust the tone or the instructions, I think prompt engineering is really helpful for that. With RAG, you can provide context over a lot of data. Fine-tuning is, think of this as adjusting the model layer a little bit, so it's actually taking the LLM and making it more specialized.
- AGAakash Gupta
Working with AI engineers and researchers, working on these longer development timelines, how can AI PMs master that?
- AKAman Khan
Yeah. So I think this is where-
- AGAakash Gupta
Really quickly, I think a crazy stat is that more than 50% of you listening are not subscribed. If you can subscribe on YouTube, follow on Apple or Spotify podcasts, my commitment to you is that we'll continue to make this content better and better. And now on to today's episode. Welcome to the podcast, Aman.
- AKAman Khan
Thanks so much for having me, Aakash. It's great to be here. I'm, I've been waiting for this one for a long time. I'm so excited to speak to you, so...
- 2:20 – 4:22
Can anyone become an AI PM? Defining “AI-powered PM” vs “AI product PM”
- AGAakash Gupta
Yeah, I think that there's no better person to really give us a crash course in all of the key AI PM skills as they stand here in June 2025. But before we even get there, I need to know, can anyone become an AI PM?
- AKAman Khan
Yeah, I mean, I think the, the whole narrative here of, like, you know, I think we're all kind of feeling it, right? Like, as product managers, the expectations on us, we, we kind of know our role is changing. Our stakeholders are expecting more from us. Our customers are expecting more from us. And I think we're already feeling that role of AI in our day-to-day life more and more. I mean, that's the reason why that, that narrative is really sticking. It's that, you know, can any PM become an AI PM? And I really think to just define what an AI PM is, it's really some flavor of either adopting AI in your day-to-day workflow, think of this as, like, an AI-powered PM, or building AI into your product, which is, you can think of that as, like, an AI product PM. And I really don't think that, you know, being an AI PM is not an either/or. I really view it more as an X, meaning, like, you can think of yourself as a fintech X AI PM or a healthcare X AI PM. And the reason I say that is because AI is really powering your workflows as a product manager rather than taking the job you have away. You really want to be able to take that core insight and knowledge and specific industry, uh, sort of knowledge that you have and apply that towards the field using AI at, you know, sort of to power those workflows. So that's really how I view it. I think, I think every PM will become some flavor of AI PM, either using those tools or building around them if you aren't already. And I wouldn't view it as mutually exclusive with the type of product managing you, you might already be doing. So that's kinda how I view it. Think of it as, like, more of an accelerator on top of the workflows you already have.
- 4:22 – 5:52
Real-world AI use cases across “edge case” PM roles
- AGAakash Gupta
Agreed. And I think that people often come up with the edge cases like, "Hey, I'm an internal tools PM," or, "I work in this really regulated industry." But in the last few weeks and months, I've been talking to exactly those types of PMs implementing AI. I talked to an experimentation PM who is dealing with the problem that everybody else has a slight variation on their PRD template by getting an LLM to convert that into a clear output of what the hypothesis is, what the North Star metric is, what the golden rule metrics is. So, genius use case to standardize input into his experimentation system. I've been talking to people over in the financial industries. They're working on new credit models based on AI. So it seems like whatever exception you draw up, there's gonna be a counterpoint to that exception, and just about every PM needs to learn how to build AI features.
- AKAman Khan
I think that's, that's totally true. Like, uh, and I, I think there's definitely a feeling where there's maybe some am- some amount of, like, hesitation or, you know, un- unsure of, like, wanting to brand or label yourself as like, "Oh, I'm an AI PM." Like, kind of worried you might be jumping on some sort of, like, hype train. But I really urge folks to think about what the market for product management looks like and what the roles and skill sets will require in the future. And that's really why I think that, you know, the, the sooner that you kind of think of yourself as an AI PM building in fintech, building in healthcare, the faster you'll kind of adopt those tools, the faster you'll become a leader in, in your own space in that, uh, using, using AI as well.
- 5:52 – 6:31
The 5-step AI PM crash course roadmap
- AGAakash Gupta
So enough talking. Let's get into the five skills. You and I have been going back and forth on what is the right way to teach this material, what is the right sectioning of this material, and we've come up with five steps for you guys. So we're gonna go through AI prototyping, which is kind of the heart and soul of it all. We'll go into observability on top of our prototype, evals on our prototype, the difference between RAG-fine-tuning and prompt engineering, and then we'll end with working with AI engineers, working with researchers. All right, so now we're gonna get into these skills, starting with AI prototyping. So maybe even consider opening up your browser alongside Aman as we walk you through these key skills.
- 6:31 – 8:47
Why Cursor for prototyping (vs Bolt/Lovable/Replit/v0)
- AKAman Khan
Okay. So let's hop into AI prototyping. Um, so what we've got here, if you haven't seen this tool before, this is Cursor. Cursor is basically a fork of VS Code, which is a really common tool used by developers for actually... You know, has been u- used for many years to kind of write and iterate on code in an IDE, which is an interactive developer environment. We're gonna hop into using Cursor as our prototyping tool, just because the amount of improvements that have been made to it in sort of the recent weeks and months have made it really my go-to tool for prototyping, even relative to some of the others right now. Um, y- you know, there's, there-- And just to maybe linger on that point for a moment, there's a lot of tools out there like Lovable, Bolt, Replit, Vercel, v0, and I think they all have their place when it comes to prototyping. For instance, you know, Vercel is really strong at front end. Lovable and Bolt is really easy to deploy and get started with. Uh, Replit is really powerful for Python-based applications and having an agent built in. But the reason I really like Cursor is just because of the amount of control and flexibility it gives me to be able to iterate on specific components. Um, I completely admit, like it, there's a little bit of a learning curve to get started with using Cursor, but I promise you, if you, you know, spend a little bit of time on being able to just be able to kind of feel comfortable with the interface, you're gonna get a lot more out of the tool just because of the, the sort of the features and components it has built into it from a usability perspective. Um, maybe just to-
- AGAakash Gupta
So for AIPMs-
- AKAman Khan
Yeah
- AGAakash Gupta
... you'd really recommend they learn Cursor over the other tools?
- AKAman Khan
I would recommend getting familiar with it, definitely. Yeah. I, I think that the other tools are gonna keep improving, and they're really helpful for building a really quick and dirty mock, uh, you know, you know, to build like just a quick UI. But if you really wanna get a little bit deeper than that and understand, "How do I implement, let's say, an agent next?" Or, uh, "Can I have more control over the system?" You're going to need a tool like Cursor, um, definitely.
- AGAakash Gupta
Okay, cool.
- AKAman Khan
Yeah.
- AGAakash Gupta
Yeah, maybe we can prototype like a agentic system since that's what's hot.
- 8:47 – 25:22
Live build: Trip Planner agent in Cursor using LangGraph + UI
- AKAman Khan
Yeah, absolutely. I think that's a great idea. So let me go ahead and actually start, uh, here from scratch. So when you first load up Cursor, you're gonna get, um, you know, the screen where you can either set up a repo or set up a directory. It doesn't really matter what you get started with here. I have a starting point of a, of a workspace, but the two t- this t- the two commands that you wanna kinda hit right off the bat on your laptop are Command+T, which pulls up your terminal. And don't worry, you can actually just type in natural language instructions here to get started with, uh, terminal commands as well. So that's actually running the code on your computer. And then Command+K, which is, uh, really how you spin up the, the, the agent, um, which you're, which you're gonna be using for, uh... [keyboard clacking] Oh, looks like that command's... Oh, sorry. Not Command+K. Uh, what you're gonna wanna do is actually hit Command+L to pull up the agent, and the agent is this new kind of s- somewhat new feature in Cursor that allows you to, uh, go ahead and, and actually it will write the code for you and actually run the code for you too. What I've been using recently is Claude 4 Sonnet. Claude 4 is just, I think, a massive improvement on top of previous models here when it comes to understanding commands and writing code. So really, I just go ahead and start typing in what I want this agent to do, and it's able to kind of get started from there. Um, let's take an example. So when we were talking about agent-based systems, I kind of pulled this up. This is in our, uh, repo in Arize. It's a fully open source repo. It's a workflow for actually using CrewAI, which is a very kind of popular framework these days for setting up agent-based systems. Um, you can either use CrewAI, there's a ton of others out there. Uh, it doesn't really matter, but all I'm, all I wanted to do is pull up some example context that I can use and plug in so that I kinda know what the output looks like. So this is a, a notebook that just creates a CrewAI agent, really just starts spinning up a workflow for research and deciding, you know, being able to do some market research. It doesn't matter, but it's mo- more so just for grounding the, the agent in the first place. And what I'm gonna do is actually rebuild and re-architect this system entirely on the fly just using this code example. So the instructions I'm gonna give are, "Build me a Trip Planner Agent using... Instead of CrewAI, use LangGraph," which is just another framework, really just to show, like, it doesn't matter which agent framework you use, you can be really flexible here. "The Trip Planner should have a front end I can use as an application." So what I'm doing here is, is basically defining I want this agent to have a UI that I can actually click and interact with further. So we've got kinda two components in here, uh, which is, "Build me a Trip Planner Agent. Here's the framework to use." And then if you need to, you know, y- what you used to be able to do, uh, in the sort of before was actually use @web, and, um, and @web allows the agent to go and sort of search the internet as well and take action, actually search, look at documents, and take that information and apply it in the code. So let's go ahead and-
- AGAakash Gupta
Interesting
- AKAman Khan
... and hit Enter here and see the agent sort of go off on its own and, and see what it generates.
- AGAakash Gupta
And it's not a very complex prompt, really
- AKAman Khan
No, and that, that's sort of the beauty of it. Like I, I know that there's a ton of upfront work you can do to make that initial shot, uh, sort of what we kinda call in, in prompting is like the first shot or zero shot better. But just know that the workflow I really promote i-in terms of people getting comfortable with these tools is just being able to iterate, and so knowing how to ask the right questions from the agent to give it what you want. So let's take a look here. So it says, "I'll help you build a trip planner agent using LangGraph instead of CrewAI with a front-end application." And so it'll actually go in and read the tutorial and understand what the components are. Great, and it says, "Okay, I'm gonna go ahead and take a look at this implementation and create the LangGraph trip planner." So it's gonna go ahead and create a new directory for me. So, so it didn't even matter what my starting point was because the agent can actually create folders on your machine, it can create directories, it can pull in the right data, and even import packages. So it's actually going off and doing this live. It's creating the requirements for the agent in the first place.
- AGAakash Gupta
Today's episode is brought to you by Miro. Let me ask you something. How many tools are you juggling just to get a single project across the finish line? One for brainstorming, another for planning, something else for tracking tickets. That's where Miro comes in. It becomes an all-in-one collaboration workspace. Whether you're consolidating user research from several interviews, developing and synthesizing product briefs or a wireframe, or project managing development, Miro brings everyone into the same space. It's fast, intuitive, and fully loaded with features like project templates, two-way Jira sync, and integration with software like Draw.io and PlantUML. Miro's AI features can be used to synthesize elements in a board to develop a ready-to-review product requirements document in seconds. If you're tired of tab overload and scattered workflows, try Miro. Head to miro.com and see why over 90 million users choose Miro to guide from idea to outcome. Today's episode is brought to you by Jira Product Discovery. If you're like most product managers, you're probably in Jira tracking tickets and managing the backlog. But what about everything that happens before delivery? Jira Product Discovery helps you move your discovery, prioritization, and even roadmapping work out of spreadsheets and into a purpose-built tool designed for product teams. Capture insights, prioritize what matters, and create roadmaps you can easily tailor for any audience. And because it's built to work with Jira, everything stays connected from idea to delivery. Used by product teams at Canva, Deliveroo, and even The Economist, check out why and try it for free today at atlassian.com/product-discovery. That's A-T-L-A-S-S-I-A-N.com/product-discovery. Jira Product Discovery, build the right thing. That chain of thought reasoning is really useful too. It seems like that's one of the important steps for people, is to actually understand a little bit of what's going on and not just ignore that, but start learning. And as you do that 10, 20 times, then you really get used to it.
- AKAman Khan
Absolutely, and being able to... You know what's, what's really cool about this is you can actually go in and see what are the files that it referenced, um, be able to see, okay, here's what it's referencing. So if you need to, you can always pause it, pause the agent, and say, "Hey, I actually want you to go take a look at this part of the code or this resource." And what's really cool, and we can kind of-- we'll kinda show this, is you can even paste in images, and using those images, the agent can actually, you know, kind of infer, "Oh, this is what I want the UI to look like or not." So it's a really, really powerful multimodal-based, uh, agent that can write code. Um, so now it's actually writing the code of the file itself.
- AGAakash Gupta
And here, what is going on behind the scenes? It's using those websites and the Phoenix, uh, thing that we started off with, or it's writing a lot of scratch code. What's going on?
- AKAman Khan
Yeah, so w- so actually, you know, the repo I kind of gave was just a starting point, and it can really just be any directory. What's going on underneath the hood is the agent has kicked off a search. It-- Well, first it took the prompt and the context I gave the agent, and it said, "Let me build a plan." And that's actually the first step in all of this, is actually to generate a plan, uh, which it, it doesn't show here, but there is a, a, a basically a, a chain of thought for the plan here. So if you hit this, uh, thought for four seconds, you'll see what the agent is thinking it should do. And it says, "I should first look at the CrewAI tutorial, design a LangGraph-based trip planner, then build a front-end application, and then integrate it all together." And so-
- AGAakash Gupta
Got it
- AKAman Khan
... that's just like you would give this to an engineer if you were... [chuckles] Hopefully, you're giving better requirements to an engineer. But you don't have to-
- AGAakash Gupta
[laughs]
- AKAman Khan
... give really good requirements to your agent because the agent-
- AGAakash Gupta
Yeah
- AKAman Khan
... will just, like, make sense of whatever you've given it and try its best to figure it out, so it's very robust to that. So once you've got that plan, then the agent executes code on your machine to actually implement that plan. The first thing it did was create a directory that it's writing code into, so it's actually created a new folder, uh, in my, my workspace where it's actually writing code, uh, you know, on my machine, and the code is really just text files. In this case, it's Python. Um, you can define different frameworks if you want to. I could have said use Python or use React, but in this case, I've just let the agent go off and do its thing. I, I wasn't super prescriptive. Um, so great. So it actually built the back end here, which you can see, and now it's gonna build the React front end. So it's going and actually building, uh, you know, building the UI components. And just to zoom out for a moment-To be able to do this even a year ago would've been really, really challenging because you're just not gonna get robust UI components and, uh, sort of an agent that understands what goes into building a React application with a degree of, like, confidence and understanding when there's errors, how to fix those errors. So let me show you an example here. The agent actually built a file, and what it found was that there was an error in the file. And on the fly, it went back and it's rewriting parts of the file because the agent can actually see and read errors as they, as they come up, and can... is able to go back and iterate on those same files, uh, all within the chat window as well.
- AGAakash Gupta
All right.
- AKAman Khan
Okay. In, in this case, there's times when you as a human might need to intervene s- particularly when there's files being deleted. So it might prompt you to either accept to do something or not. Um, I generally just let, you know, I'll, I'll take a look at what the agent is asking me to review, and then I'll either accept or reject based on that. And it can actually tweak its path based on whether or not you accept or reject the suggestion. Okay, great. So now it's going ahead and creating a directory and installing all of the components for the UI right now.
- AGAakash Gupta
So it seems like from zero to one, it might be a teeny bit slower than a Lovable, Bolt, or v0. But after zero to one, the power to edit more, implement more, Cursor is allowing more potential.
- AKAman Khan
Absolutely. I think that one way to think of it is, like, the application that we've given it is... And I'm using a pretty capable model as well. I'm using Claude 4, which is a reasoning model, so it's a little bit slower than some of the faster models out there. But the reason that you, you know, one reason to actually start here is you just have complete control over the files. You can go in, and if you wanted to, one thing I do is actually I'll take a look at the file, and I'll ask the agent. You can reference a specific file here in the context window, and you can actually say, "What is going on in this file?" And you can just have a conversation with the code, uh, you know, with the agent a- on top of your code. So even if you, you know, wanna know, "Do I really need this file? Um, can you make this better? I noticed this one thing. My engineer pointed something out," you're just gonna have a lot more control over that system using Cursor. So it's, it's worthwhile to invest a little bit more time to get this thing set up on your machine.
- AGAakash Gupta
Okay. And is it possible to do those types of things in parallel or while it's working, while it's generating, you can't chat until this generation is done?
- AKAman Khan
Yeah. I, I think, you know, we could try actually. So I've started a new chat. Let's see. [keyboard clacking] Okay. Yeah, so, so with Cursor, you actually, yeah, you can't have multiple... I don't believe you can have multiple chat windows going at the same time. I think you can only do one chat, but maybe someone can prove me wrong there.
- AGAakash Gupta
[chuckles]
- AKAman Khan
Yeah.
- AGAakash Gupta
So now that it's done, you could chat with it, though.
- AKAman Khan
Yeah. Let's see. So this one, I think it, I think we may have broken it actually. Continue building. [keyboard clacking] Which is, you know, part of, part of exploration. Did it actually finish or not? Uh, let's see. Okay, so, so what's funny is, like, the system is super robust, right? Aakash, like, we, we just broke it because I went to another tab and I came back and I'm like, "Hey, you know what? I'm sorry for interrupting your work. Just keep doing what you were doing."
- AGAakash Gupta
Okay.
- AKAman Khan
And it's just like, "Oh, okay. Here's what I was doing before. Let me just recap it for myself. I'm just gonna keep going on my way." Right? It's sort of like-
- AGAakash Gupta
Nice
- AKAman Khan
... tapping an engineer on the shoulder. Like, I'm not worried about breaking my code anymore or, like, changing one line because I know and I have so much confidence that the system is going to be able to recover from some of those mistakes when it comes from starting from scratch, for sure.
- AGAakash Gupta
Mm-hmm.
- 25:22 – 45:51
Debugging with an agent: dependency errors, ports, and resilience
- AKAman Khan
Okay. So it's writing all of this. Uh, and what, what we can actually do is it's given me enough information here where I can actually try to run this live. So, uh, I don't need to have it finish the Docker file. Um, and it's sort of just running tests, so it actually knows to actually test the agent itself-
- AGAakash Gupta
Nice
- AKAman Khan
... make sure that the back end is working. And as long as it's working, then it'll actually move on to the next step. Um, let's go ahead and... So it's got local host. We've got it here. I have a feeling that this might break, uh, but we will try it because it didn't ask me for an OpenAI key. So let's see what happens. Okay. Let's first actually go to here. We'll go to the back end. And so all I've done now is I'm looking at the README, and it actually s- lays out the quick start steps for how to spin up this system. So all you have to do is copy/paste these lines of code. So I can go to LangGraph. Um, and the, a little bit of like the terminal commands is helpful to know, like CD dot dot to navigate your directory structure. Um, but we're gonna go ahead and-
- AGAakash Gupta
And dot dot like moves you up a level, right?
- AKAman Khan
Exactly. Dot dot moves you up a level. Exactly.
- AGAakash Gupta
And then LS lists the files in there?
- AKAman Khan
Yeah. Um, so we're gonna try to go to CD agent.
- AGAakash Gupta
And CD is just change directory.
- AKAman Khan
Change directory, exactly. Pip, uh, is your Python, so, um, let's make sure we're in... We're gonna create a new Python, um, environment. So let's call this...
- AGAakash Gupta
So we created a workspace. We set up a project. We have an environment. Now-
- AKAman Khan
Yeah. So, so one thing you definitely want to be careful of when you're running Python on your machine, every developer will have faced this at some point, it's sort of a rite of passage, is you don't want to be writing, uh, you know, updating packages and installing packages to your Python locally. You want to be using virtual environments, uh, because your system Python is sort of... You want to kind of keep that protected. Um-
- AGAakash Gupta
Okay
- AKAman Khan
... 'cause that can really break things. So what we've done is just created, uh, a virtual environment using a tool called Conda, and if you have, if you get stuck on that or you're like, "What is going on here?" Uh, don't worry. The agent will actually... You can, you can also specify use a virtual environment or, you know, how should I do this on my machine, and it will help you out. It'll guide you through those steps as well.
- AGAakash Gupta
Okay.
- AKAman Khan
Yeah. Okay. So now we've kind of done that. Now we're going to install our requirements. So we just, we're in the right directory, and now we're just gonna hit pip install requirements, and if this works, it will actually install those requirements into my Python virtual environment.
- AGAakash Gupta
Okay.
- AKAman Khan
Great. Um, set up environment variables. Okay. So let me go here, uh, and open up that file. So we actually want to, uh, name this .env. So actually, let's see what environment variables it needs. I'm not even sure. So we're just gonna run the Python main and see what pops up.
- AGAakash Gupta
So we'll go back and do the env variables after.
- AKAman Khan
Exactly, yeah. 'Cause I'm kind of curious. I, you know, I could go in and read it. Okay, module not found. So it looks like it hit a problem here. This is great, um, because what that means is like I can go in and just copy that error and say, "Hey, you hit this error." Let's see what, let's see how it fixes that. So I'm gonna-
- AGAakash Gupta
Okay
- AKAman Khan
... hit skip, stop. And all I mostly do as my workflow is copy the terminal, paste it in and say, you know, just literally just give it back to it and say, "Hey, there's this bug here." And it will read the terminal lines and understand what's going on and try and infer that.
- AGAakash Gupta
Nice.
- AKAman Khan
So this is what I mean where I'm like, where I say, "Don't be scared about things breaking." They're going to break. What matters is how you can work with the agent to fix your problems.
- AGAakash Gupta
Yep. Today's episode is brought to you by Maven. If you're enjoying this episode with Aman, you'll love his course on Maven, today's podcast sponsor. The problem with most courses online, like Udemy, is there's no live component, and the instructors aren't experts in their fields. They are professors. At Maven, you get direct live access to experts and operators from the world's best tech companies. You can't get that access anywhere else, in any university, and you usually can't find them on YouTube either. I've featured so many of Maven's experts in the newsletter and podcast for that reason. To help you out, I've put together a collection of courses I recommend at maven.com/x/akash. This includes courses like AI prototyping for PMs, product sense for PMs, and getting an AI PM certification. Visit it now at M-A-V-E-N.com/x
- AKAman Khan
X/aakash
- AGAakash Gupta
Today's episode is brought to you by Amplitude. Replays of mobile user engagement are critical to building better products and experiences. But many session replay tools don't capture the full picture. Some tools take screenshots every second, leading to choppy replays and high storage costs from enormous capture sizes. Others use wireframes, but key moments go missing, creating gaps in your understanding. Neither approach gives you a truly mobile experience. Amplitude does things differently. Their mobile replays capture the full experience, every tap, every scroll, and every gesture, with no lag and no performance hit. It's the most accurate way to understand mobile behavior. See the full story with Amplitude. And also setting aside enough time to just persevere. [chuckles]
- AKAman Khan
Definitely, yeah. Okay, now it's gonna actually try to run these commands. Let's see if that works. Okay, so I have to enter an OpenAI key. That's what I was kind of expecting. So it actually built this .env file, and great. Let me go ahead and insert an OpenAI key. Hopefully, this works. Um, I'm gonna go ahead and move this to a different window so I don't blast this on the internet.
- AGAakash Gupta
Yeah. [laughs]
- AKAman Khan
Okay. Yeah, don't share your OpenAI keys or your keys, uh, you know, widely. Uh, that's definitely something to keep in mind. Um-
- AGAakash Gupta
That'll be expensive fast.
- 45:51 – 50:15
Testing the prototype: what makes it ‘agentic’ and how outputs are composed
- AGAakash Gupta
Oh, wow.
- AKAman Khan
So that is the application you just built. Um, we just built it, and all it took was giving a couple of examples, persevering through the Python dependencies that we hit-
- AGAakash Gupta
Yeah
- AKAman Khan
... and we have a real prototype here. You have a UI that can point to the back end and actually, uh, you know, actually service, serve up requests that we wanna make. So that's a, that's a real prototype. Let's go ahead and test it now. What do you think, Aakash?
- AGAakash Gupta
Yeah, let's see it. And-
- AKAman Khan
Okay
- AGAakash Gupta
... let's also just explain, like, we were trying to create an agentic system, so where are the agents involved here?
- AKAman Khan
Yeah. So great question. So we went ahead and, um... It's funny, it actually, it does list them out here in the UI, but you can specify whatever agents you want. Some of these were actually determined in the example that I gave, but let's go ahead and, and kind of break down, like, what are the agents here. So the agents that we've built in, and again, this is fully customizable, so you can give different agents for more specific tasks. The agents here are a research specialist, so that's an agent that's, like, an expert on doing research on a specific geography like climate, you know, the attractions, et cetera. You have a planner agent that can plan day by day, so for a specific day, what, what should we do for, uh, you know, for this trip? You have a budget advisor and a local curator. So the budget advisor just takes a budget and actually, you know, does analysis on that to, to... based on the user's input here, which you're gonna, you're gonna put in your budget, and then a local curator to kind of find, um, you know, maybe off the beaten path things. But these are, these agents, you can kind of think of them as LLMs and prompts and contexts that you've packaged and wrapped together to f- perform a specific task. It's a lot like saying, "I'm an expert on a specific area, and I'm just gonna go ahead and focus on giving the best possible output for that specific thing." Like, the budget-
- AGAakash Gupta
Yeah
- AKAman Khan
... agent is gonna be really good at budgeting. The planner agent is gonna be really good at making plans. So that's kinda how I would view these different systems a little bit.
- AGAakash Gupta
Cool.
- AKAman Khan
So let's go ahead and give this a shot. So we're gonna say we're gonna go to Spain. Um, you know, let's say we're gonna do, like, a quick Europe trip. I know you, you recently did, like, a summer trip to Europe. Uh, let's go ahead and click Spain. We can say, um, let's type in we're going for one week. We're gonna give a budget of, let's just say, like, $1,000. And maybe some interests we can give are food, and then we can even click the travel style. So let's say we wanna go a little bit more adventure. And these are, this form is fully programmatic, right? So if I wanted to, I could go back here and say, um, you know, change the form color, change the form fields. This is too long. Change how it looks and feels. But you've given something that kind of, uh, is, is a good starting point, um, for, for, you know, a, a prototype you might wanna build for yourself. Okay. I'm gonna click Plan My Trip. And what's going on in the background, and, you know, this, this loading state's not great, and that's probably something I could ask the, you know, the agent to go and improve on the loading state. But what it's actually gonna do is build an itinerary for me here.And, uh, and it's gonna take the inputs I have. Great, and it says, "Here's a seven-day itinerary for Spain, food and adventure." And it's actually given me a day-to-day, sort of hour-to-hour level analysis of what I could be doing in different cities.
- AGAakash Gupta
Nice.
- AKAman Khan
As well as accounting for travel.
- AGAakash Gupta
It likes to pack its days.
- AKAman Khan
Yeah.
- AGAakash Gupta
[chuckles]
- AKAman Khan
So-
- AGAakash Gupta
But it was pretty detailed. It's faster than a real human could plan or would've been for sure, or you reading up 10 Google search results, right?
- AKAman Khan
Right. Or even if you pasted... Like, think of this interface here, right? Like, what you've basically done is you've wrapped those prompts of, "Plan me a trip to Spain for one week with a budget of $1,000, interests are food, and in this range," and you've created something that's a lot more programmatic on top of that. You've created a prototype that you could actually go and deploy. And you can build so much more on top of this, right? You can have it reference, like maybe you want to use a specific API to help you book the flight or suggest flights. You can hook it up to that. You can give it access to search. And so it's a fully programmatic system that you can really go in and, and tweak on the fly in your Cursor environment as well. Okay.
- AGAakash Gupta
All right.
- 50:15 – 1:01:26
Observability: tracing the agent graph and seeing the real execution path
- AKAman Khan
So we've gotten, we've gotten, uh, an output here. Now, this is really helpful to just get started, but I think we want to go one level deeper, right? Like, as a product manager, just being able to look at this, like I'm not, I'm not really sure what's going underneath the hood unless I go and read the code. And that kind of takes us to what observability is. And observability is sort of a, a key part of being able to understand your AI application.
- AGAakash Gupta
Yes.
- AKAman Khan
So let's go ahead and, and, and hop to that. So-
- AGAakash Gupta
Yes
- AKAman Khan
... so what we did when we actually built the system is we've, we've added what's called tracing, and tracing is a really standard way of looking at the calls that your server is making. Um, that's actually, uh, related to the tool that I, I'm kind of working on, which, which helps with observability and with, with, uh, with sort of tracing applications. Um, and so let's go ahead and look at some specific examples here. So, um, so these are some requests that I've made, which are basically to this agent-based system, and, um, this is one we just made, which was, uh, Spain, one week. In this case, I clicked sailing, um, and adventure. And what-
- AGAakash Gupta
This diagram is really cool. What are we seeing here on the bottom left?
- AKAman Khan
Yeah. So this is, uh, actually the same agent that you just built in code represented graphically. So, uh, so what you actually have is a, a way for you to visually see what is your agent-based system doing. And remember, we, we were, you asked a great question, which was like, "What agents can we build?" And we have a research agent, we have a local experiences agent, we have a budget agent, and all of that goes into an itinerary, and that's what the output is. And so-
- AGAakash Gupta
Nice
- AKAman Khan
... what's, what's really helpful looking at this is when you are thinking about going one step further from your AI prototype to building a prototyped agent or an agent application, being able to visually see what are the paths that the agent is taking to accomplish a goal. You can see what happens here is when I give the input, it kicks off three different agents in parallel to generate an output, and all of those go into the itinerary. Remember, I didn't even really define this. I gave this to Cursor to go write, and I said, "Cursor, go ahead and build an agent-based system," and this is the architecture it developed and came back to me. And, uh, and that's what, what gives you the output that you get on the other end. And so what I've gotten is one level deeper on Cursor, and that's actually a really key point, which is it's, it's kind of tough to do this with like Bolt and LevelDB. You're not gonna get this representation the same way. And so if you want to see what's going on underneath the hood, you kind of need to use-- You have to be a little bit more in the code to be able to define how to get those outputs.
- AGAakash Gupta
And you could probably use Windsurf too, right?
- AKAman Khan
That's right, yeah. So as long as you, as long-- Really, what matters is that you can edit the code. Um-
- AGAakash Gupta
Yeah
- AKAman Khan
... so whether that's Windsurf or Cursor, um, you know, being able to add to the code, uh, tracing is really what matters. And you know, I work on this tool, but there's a lot of other tools out there for tracing as well. I think what matters is like, what's your workflow? So I'm, you know, don-don't take my word for it. Go out and try a tool and implement tracing or ask your, work with your engineer to implement tracing, and you'll be able to kind of get a visualization like this, um, at the end of the day.
- AGAakash Gupta
And what was the steps involved with implementing tracing?
- AKAman Khan
Yeah. So, um, so that, I think it's probably easier to just kind of show what that looks like here, which is, this is our docs for Arize, and, um, so we actually have a whole section on tracing. Tracing is, think of this as like the units of work that your code is making, and it's, it's actually, because this is, you know, we're taking software best practices and applying them to this like AI agent world, it's actually fairly straightforward these days. All you really have to do is install a tracing package and wrap your code in a sort of a decorator. That is a fancy word for saying, "Take this process or this function and call that a span or a trace," so that when you're actually running that code, it picks up that unit, and it puts it into what you see here, which is the UI for each of the steps that the agent is taking. So it's, it's really, the short answer, Aakash, is like it's a line of code that you, uh, that you implement on top of, um, on, on top of your functions.
- AGAakash Gupta
So you could probably just point the agent to this doc, and it would figure it out.
- AKAman Khan
Totally. That's actually how I did it as well. Yeah.
- AGAakash Gupta
Yeah.
- AKAman Khan
So like, and, and so, so that's where the example that I kind of gave it had tracing in it already-
- AGAakash Gupta
Yeah
- AKAman Khan
... as a starting point. But what you can do is, is just literally copy-paste thisType it in here, and I could say, you know, implement tracing. I, I won't need to do that now because it already has it, but it will be able to go and infer, "Okay, here are the steps I need to go and implement the tracing."
- AGAakash Gupta
Nice. And then you get that awesome thing that we were looking at. Can you break down the top left, what we're reading as well? It looks like there's, like, multiple levels there. So it's like budget, then what are we seeing after that?
- AKAman Khan
Yeah, exactly. So these are the agents that we've defined now, right? So this is a multi-agent system using LangGraph, and I've got a budget here, which is... You can look at the, the input. Let's take a look at the input really quick. So this isn't a chas- chat-based agent, right? Like, uh, I think everyone, a lot of people, it makes sense, right? You wanna build something with chat, but what if you just take a form and take these inputs and actually put them into here? That's what this looks like. Spain, one week, the budget, let's, in this case, it's sailing, and then the adven- the travel style, and those are inputs to the system, and then those get kicked out to each of these agents. Let's go ahead and take a look at what's going on, and this is really the budget agent, the local experiences agent, research agent. That agent has its own sort of tool that it has access to here, and let's go ahead and, and go one level deeper at the prompt. So this is the system prompt of the agent, and the system prompt says, "Analyze budget requirements for a one-week trip to Spain," and here's the budget. So it actually plumbed in the, the budget from the form, and it says, "This is what you should do. Include a breakdown of all of these things." These are all things that you would think about when you're developing a budget, like what would I spend money on when I'm traveling? And so the agent has actually defined for itself in the system prompt, "How should I take this $1,000 and best allocate it for accommodations?" And what's interesting is that it's, you know, it's actually kind of gone and done a pretty wide search for different tiers of options because-
- AGAakash Gupta
Yeah
- AKAman Khan
... it's not really making a decision on what, uh, you know, what is the, um, the range of the type of trip I wanna take. It's offloading that to another agent to make that decision. All it's doing is saying, "I have $1,000. What can I do with $1,000? How should I think about spending that money?" And let the other agents decide how to best, uh, pull that together. So then that goes into what's called this, like, analyzing this budget, uh, tool, which takes that destination, the week, and the budget, and that's, that's basically-- Uh, think of this as, like, pulling out the, a structured JSON that goes into the, into the system prompt. So-
- AGAakash Gupta
Mm-hmm
- AKAman Khan
... these tools are basically ways for you to get data from an unstructured way or from some one format and put it into another format, and it's really important to think about tools or functions as ways for you to get, uh, you know, sort of... Think of them as like API calls or ways to get data from a system for your agent to use, and that's what this little, uh, icon kind of represents here, um, is a, is a tool. And then this is, uh, the actual LLM call, and this is the top-level agent which wraps all of that together. And, and you'll notice, like, this, this kind of looks complex if this is the first time you are seeing a system like this, right? Like, what are all of these lines? There's all these boxes and colors. But I would really stress, like, this type of system is, is truly an MVP in today's world of agents. So if your team or you're an AI PM and you're thinking about building an agent-based system, your first starting point would probably look something like this. It's not really gonna look a ton simpler, to be honest, for a, for multiple agents.
- AGAakash Gupta
Mm-hmm.
- AKAman Khan
It's more likely that there will be multiple calls being made to different services and taking data out of that and putting them back in to then use for LLM calls. So just wanted to set that kind of context, which is, like, your starting point is to see what's going on underneath the hood and try to understand LLMs, LLM calls, tool calls, agents, and how they all sort of ladder up into this overall system.
- AGAakash Gupta
And we even get the time. So we were saying, like, "Oh, this might be a little bit slow," but this is breakdown. If you wanted to observe it, which is what we're talking about here, this is how you can break down. Okay, maybe we chip off some time with the budgeting one, and then you could go in and you could work on that.
- 1:01:26 – 1:09:59
Prompt iteration in a playground: model choice, tone, length, and conversion hooks
- AKAman Khan
So why does this matter, right? Well, I think you actually pointed out something really, uh, useful, right? Like, which is, this is kind of long. Like, I don't, I don't know if I'm gonna read all of this. It's, it's really detailed, but does it really need to be this detailed? And is this really the tone that I want the agent to have? What if I wanted this agent to offer a discount to users or act extra friendly? Well, that's really where prompt engineering comes in. Um, and this is another kind of core part of the, the workflows we were talking about, which is, so we've done prototyping, we've done observability. Now let's see what parts of the agent stack we can change and iterate on, and what the output looks like. And to that end, we have, we have RAG, prompt engineering, fine-tuning, right? And so we're gonna kind of go through each of those and see what the impact is on the end output of your agent.
- AGAakash Gupta
Okay.
- AKAman Khan
So, so I've got, um, a model here. I can change the model if I want to. Let's try a slightly different one.
- AGAakash Gupta
4o mini. Uh, Andre yesterday said, "Don't use ever," right? [laughs]
- AKAman Khan
Exactly, yeah. [laughs] I think it's being deprecated. I'm surprised, uh, you know, uh, maybe it still works, but let's see. Um, uh, and that's the thing. I mean, that, that's, honestly, it's a real... I know we joke, but, like, that's a really good point. Like, these models are gonna change all the time, right? Like, I, I love the like, "Oh, this new model came out. Here's my prompting guide for it." Those prompting guides, a lot of them, they do end up getting out of date when there's a new model or, you know, the old-- You know, how you work with these new models changes. So what you could do is say like, "I'm changing to this new model. How should I prompt the system?" And you can generate a new prompt based on this as well.
- AGAakash Gupta
Mm-hmm.
- AKAman Khan
We actually have a, a tool that lets you kind of generate, um, new prompts as well, um-
- AGAakash Gupta
Okay
- AKAman Khan
... in, in this, in the, in the product. But let's say I wanna just make some really specific tactical changes to this. So I'm gonna go ahead and say, um...
- AGAakash Gupta
I feel like we don't need a detailed day-by-day plan. Can we just delete that part and make it more like a day-by-day event summary or something like that?
- AKAman Khan
Yeah. That's a good point. So what I'm doing here is actually changing the, the prompt, and, um, what you can do is actually save, uh, you know, you could save this prompt and say, "I want to iterate on it in the system," and say like-
- AGAakash Gupta
Mm-hmm
- AKAman Khan
... "This is my travel agent prompt."
- AGAakash Gupta
Yeah, that's like the detailed version.
- AKAman Khan
Exactly, yeah. And, and then what we can do is sort of pull in that same travel agent prompt here.
- AGAakash Gupta
Mm-hmm.
- AKAman Khan
So now I'm actually iterating on the same prompt, but when I save it, I'll save it as a new version. So let's go ahead-
- AGAakash Gupta
Okay
- AKAman Khan
... and it's a good best practice. So we're gonna say, instead of a day-by-day plan, we're gonna do, um, okay, I think that's what it said before. We're gonna say, uh, give me, give me a, uh, so, so actions. Give me a day-to-day plan. Doesn't need to be super detailed, right?
- AGAakash Gupta
Okay.
- AKAman Khan
'Cause I don't think we're planning out our lives like day, hour to hour. It, it's really helpful when you're doing generation to also say, because we're giving all of this as context, this is all RAG to some degree of context that the agent is using. And we're gonna say, you know, max 1,000 characters, because we don't want it to go on super long.
- AGAakash Gupta
And when you say RAG, RAG is retrieval augmented generation, which means kind of like condensing a lot of knowledge, right? Is that what it's about?
- AKAman Khan
Yeah. Good point. So yeah. So in case you've heard this term before and you're like, "What is that thing?" RAG is retrieval augmented generation. It is, uh, I like to think of it as giving, you know, when you're thinking about like doing a test or let's say you, you, you go to a doctor, uh, the doctor might be super specialized. You can kind of think of specialization as like fine-tuning. And when the doctor is kind of answering your questions, wouldn't it be great if they just had access to like the internet or to a textbook? And that's what RAG is. RAG is basically getting access to a specific part of the data of your overall dataset that is useful to answer an, a question on the spot. So-
- AGAakash Gupta
Mm-hmm
- AKAman Khan
... um, so that's like the context that helps you answer a question basically or, or perform a task. Um-
- AGAakash Gupta
Okay
- AKAman Khan
... so it's like think of it as like pulling out a page from a notebook or a textbook.
- AGAakash Gupta
Mm-hmm.
- AKAman Khan
Um, now it's really hard to find the right page, and that's a whole 'nother area of study, but that's really what, uh, that's really what RAG is underneath the hood. It's just like pulling out data and using it. Um, so another, another few things we can do is say, um, you know, always answer in a super friendly tone, because it kind of sounds robotic to me. I feel like this itinerary, like I don't know if I would wanna use this. I kind of want like, you know, something that might feel a little bit more interactive. Um-
- AGAakash Gupta
Mm-hmm
- 1:09:59 – 1:30:04
Evals: from vibe coding to measurable quality (LLM-as-judge + human labels)
- AGAakash Gupta
So how do you set up the right evals to start to like, in an automated way, understand whether you should adopt the latest and greatest prompt instead of kind of just as a human looking at it each time?
- AKAman Khan
Yeah, good question. Um, so evals are really helpful when you're actually making changes to your system and being able to quantify. Okay, now I can, I can kind of-- This is what you kind of call vibe coding, right? So we've kind of come up to this point. I would say everything up until this point is pretty much vibe coding because you're kind of, like, looking at the-- You're basically giving text. You know, this whole time we've been vibe coding, we've been giving text to the agent, it's generating output. You have an agent system, you've made tweaks to the prompt, and you're like, "Looks good, looks fine." But I think that going one step beyond that is actually being able to run evals. And evals are the way that you can more quantify your system overall. So I, I like to joke, it's like going from vibe coding to thrive coding because you're going one step deeper, right? So, so what we can do is take some of these examples, and I, I actually ran a few of these, uh, yesterday on a similar agent, same, same agent system. And what I can do is actually build a data set and a l- a very, very common workflow. I mean, look, we work, you know, with some of the, the leading, like, companies in AI, like Uber, Reddit, Instacart, Duolingo, all these companies. And the reason that we're, we're building these tools to have a data set is because you want to be able to make a change to your system and know that, you know, quantitatively what the impact that change is having with evals. So that was a long-winded way of saying I basically constructed a couple of examples here, and let's go ahead and delete these, um, so that we can actually just do this from scratch. I, I don't wanna, uh, these are what we're gonna jump to later, so. Okay, cool. Okay, so what do we have here? So this is what's called a data set of the same data that you saw earlier, and specifically, I took the itinerary step, and I've constructed a set of examples I'm going to use to iterate on top of, and that's really what these are. So if I go in, you can see it was the same prompt that we were editing in the prompt playground, and what we're gonna do is actually, uh, run evals on top of this system and see what, i- if we're making the system better or worse, and that's really what-
- AGAakash Gupta
So-
- AKAman Khan
You can think of evals as basically a way for you to underst- Oh. Um, you can think of evals as a way for you to, uh, understand are you making your system better or worse, just very simply. Okay.
- AGAakash Gupta
Yep.
- AKAman Khan
So, um, so what we're gonna do is, uh, let me refresh this page. Just kind of look here. Okay, so let's create our first experiment, and we're gonna go back into that prompt playground. But now I'm actually pulling in the data that you just saw. So I've picked and hand sampled those examples that I want to use for iteration. I can take the same, I can do the same thing here, which is it has the same inputs, and I have those, those outputs, and now what I want to do is I actually made that change to that prompt that we were talking about earlier, and I saved it to the, to the prompt hub. And so I'm gonna pull in this latest version of the prompt, and you'll see this is the same prompt we made edits to before. And now what I can do is let me go ahead and A/B test this, and let's kind of make this a little more authentic. We said that this was GPT-4o mini, so we're gonna do an A/B test, apples to apples. And we'll do the same thing of hitting Run All, but now instead of on one example, we're generating this on a data set of, like-10 or 12 examples here. So it's basically giving you an output that you can use for experimentation. So this is generating a new output on that data. Um-
- SPSpeaker
Okay.
- AKAman Khan
Yeah. So to back up for a second, you have your initial dataset of examples that you've kind of built on top of. Even if you don't have that initial dataset, it's really just I could go in, and I could go and just add... Basically, what I did was I just re-entered, you know, instead of going to Spain, I want to go to Tokyo, and instead of one week, I want to make it two weeks. The budget could be like $500. And a lot of times what you kind of call this when you're building an application is bootstrapping a dataset, and it's just a way for you to get started. You can synthetically generate that data too, um-
- SPSpeaker
Yeah
- AKAman Khan
... so using an LLM if you wanted to. Okay. And then now what, what we've done is once we have that dataset, we can pull this in, and we've regenerated the prompts, and it looks like it actually generated the experiments for this. So let's go ahead and go back here. And so those experiments are the outputs from, uh, the, the prompt playground that we had before. So this is new outputs on the original prompt, and I can compare this to, uh, the, the sort of the change, the change prompt that I used as well. So I've got two prompts side by side next to each other now. And again, like, if you're using this system, it's kind of hard to read. It's kind of hard to say, like, is one better than the other? Uh, I don't really know. So let's go ahead and, uh, run some evals on here. So I've got these evals set up, but let's, let's go through the process of talking through what an eval is. So there are... Think of this as there are basically three types of evals that you have options to use right now. One of them is human labels, and it's, it's really important to go in and label this data yourself, and you can go and actually go through the dataset and label the data. We'll kind of talk through that. We'll come back to that one. And that's really an important role for an AI PM is to know when I'm looking at an output, is this what I want the LLM or the agent to actually generate? Like, is this good or bad? Because you're ultimately determining the end user experience as a PM. You're saying, like, good or bad, and that's what the label is. The second option is to use code, and so you can do things like checking code, um, basically to say, uh, it's like a Python-based eval, which y- you know, Python eval could be things like check for instances of, like, is a competitor referenced in the LLM's output? And those are really just think of those as ways of, you know, writing code to, to generate evals. And then the third option that we're gonna be kind of focusing on here is actually using an LLM to check the work of the other agent. And so you think of these as, like, eval types of agent systems that are-
- SPSpeaker
Yeah
- AKAman Khan
... really used to kind of scale up your feedback. So-
- SPSpeaker
And this is what almost everybody's using these days, these LLM as judge systems, where they create, like, almost, like, numeric scores with various dashboards to look at things.
- AKAman Khan
Exactly, yeah. So great point. So when we say, like, you know, there's, there's a lot of buzz around, uh, evals are the secret, and they're the moat. What people say when they're saying, like, evals are the secret to a great AI product experience, what they're saying is that you need a reliable way to scale up the feedback on your system, and the way that you can do that is using LLMs as a judge or a grader on the output. So that's what an eval as a judge or eval system looks like with LLMs. And I'm gonna break this down a little bit further for you, which is what we can do is basically give an eval template. And the same way that we had an agent basically, um, going in and saying, like, "Generate an iter- itinerary, generate a budget," what I'm doing is actually creating an, an eval which sets the, the role, which is saying, "You are examining written context, content. Here's the text." And then I've given the text from that we just generated as the output, and we're stuffing that into here as context. Then I'm giving the agent a task which says, "Examine the text and determine whether the tone is friendly or not." Friendly tone as defined. And then I'm defining and giving an example of, like, "Here's what I mean when I say evaluate for friendliness. Um, please focus heavily on the concept of friendliness." Then I'm gonna give it an action, which is based on the information, the context, give an output label of friendly or robotic based on the information that you have. So again, we've, we've said, we've given the agent a role, we've given it context, we've given it an example of what is good or bad, and then we've given it the action to perform. And those four steps are really all you need to get an eval in place, um, to at least get started. Now, uh, what I will kind of caveat and say as we run this, and so once I've defined that, I can actually set these up here, and I've got another one here, and we'll kind of quickly go through this one. This is, like, checking if we offered a discount to the user based on the, the email. So this is, this is text that says, "Determine whether the text contains an offer for a discount." And this was, might be something we want to check for, right? Did we, did we actually accomplish that goal of giving a discount to a user? I can go ahead and just run these on the system and say, "Select the experiments." Those are the two experiments we have. And I'm just gonna hit Run. And while that's going off, I'll kind of go back here, and this should run pretty fast, but what I'm basically doing is getting an LLM-generated label on all of those rows that you just saw. And that's really helpful for me to then go one level deeper and say, "Was my LLM correct or was the judge correct?" And I can basically go in and fine-tune that even further. So-
- SPSpeaker
Mm-hmm
- AKAman Khan
Okay, and we did, I, I'll, uh... One small note is we, like, flipped the order of operations here because experiment, uh, two... Okay, so it looks like, um, experiment two was, like, the second one that generated, and then experiment one was the first one that finished, which was the better one. So we're actually thinking this, like, backwards, this chart. But this was, like, the one that took a really long time. This was the older prompt. Um, think of this as the old prompt. And you can see, okay, it was actually... I guess the LLM as a judge did note that as friendly, uh-
- AGAakash Gupta
Mm-hmm
- AKAman Khan
... instead of robotic. But it looks like it offered us a discount 0% of the time.
- AGAakash Gupta
Mm-hmm.
- AKAman Khan
And then if I go to the one that was faster, the new updated prompt, the LLM judge actually did mark, uh, all of the responses as friendly, and then it offered a discount 100% of the time. So it actually went in and checked the outputs and said, "Did you give a discount or not?"
- AGAakash Gupta
Yeah. What if we did something like give it, like, a friendliness score? Maybe that'll give us, like, more dispersion.
- AKAman Khan
Yeah, exactly. So, so what we-- when we generate a, uh, the label, [clears throat] um, when we generate the label, we actually do also get a score with it as well. So w- we've just assigned a, a one or a zero as the output.
- AGAakash Gupta
Yeah.
- AKAman Khan
And then you could go in and say, um, "Give me a friendly score from one through five." Fun fact, Akash, for people that are listening to this, uh, this is a best practice is actually to use text to ground the output of the LLM judge. The reason-
- AGAakash Gupta
Mm
- AKAman Khan
... for that, instead of numbers, is although this technology is amazing, LLMs are still really bad at being able to understand numbers, um, fun fact-
- AGAakash Gupta
Mm
- AKAman Khan
... just from a tokens perspective. So I said, "Give me a score."
- AGAakash Gupta
They're not consistent.
- AKAman Khan
Yeah, if I say, you know, "Put a one or a two," it won't really be able to give you the justification for why it picked one versus two.
- 1:30:04 – 1:38:02
How Bolt/Lovable ‘magic’ works: system prompt + tools + code rendering
- AKAman Khan
Yeah. So I think that, like, it, it's helpful to zoom out and see, like, what do all of these things really mean when you build a product? And to that, I'm gonna j- go to just Excalidraw and just sort of whiteboard some of this. Um, by the way, uh, just as a note, do you think we should start here, or should we go up to the cursor or, like, the Bolt example? Like, what, what do you feel like would be more natural?
- AGAakash Gupta
Um, maybe we start with the Bolt diagram.
- AKAman Khan
Okay.
- AGAakash Gupta
Yeah.
- AKAman Khan
Perfect. Awesome. So when you think about pulling all of these concepts together, I think it's really helpful to go from you've built this initial system, but what does this look like in practice when you go from prototype to production? And I think it's helpful to, like, look at great tools out there that we all kind of have used or tried, uh, at some, at some point, and, like, that are really taking a lot of attention from the AI product mindset, and try to understand how they work a little bit more. Maybe we can use this as an example to just go through, like, how Bolt works at a really high level, just to pull all of this together. Um, so if you haven't used Bolt yet or, or, you know... I, I like to do this thing in person, like, where I'll ask, "How many people have heard of Bolt or Lovable?" Everyone raises their hand.
- AGAakash Gupta
[chuckles]
- AKAman Khan
Okay, how many people have, like, actually tried to use the tool? And, like, half the hands go down. And I think that's part of the problem.
- AGAakash Gupta
[chuckles]
- AKAman Khan
But I do recommend, like, you know, we jumped into the deep end with Cursor. If you haven't tried Bolt yet, please go and try it. It's really straightforward. Just ask it to do the same prompt we just gave, uh, Cursor, and you'll get a good A/B test feeling of, like, what's different between these systems. Um, so once we've built something in Bolt, what you'll kind of notice is it's, it's a workflow which also generates code and gives you a UI as a prototype. It kind of feels like magic, right? Like, uh, I don't, I, I feel like you-- You know, I had this feeling when I first tried. I was like, "Wow. Holy cow. It just knows exactly what to do and built this UI in, like, one second with everything that I asked for."But it's not magic. And let's talk about what's going on underneath the hood a little bit more, and I want to preface and say, like, this is just from reading the code, and that's why it's so important to be able to read code so that you can interpret, uh, you know, what's going on with your AI product. So what you can do is Bolt is, uh, has their code hosted on GitHub, like an open source version. And if you go in, I thought, "Wow, this is gonna be really sophisticated," but really at a high level, Bolt contains a system prompt, which y- we just saw what a prompt was with an agent, a system prompt. You're gonna notice a lot of similarities here, and you'll see, "You are Bolt, an expert AI assistant, an exceptional senior software developer with vast knowledge across multiple programming languages." So what Bolt really is, is it's basically a really big, good prompt, which is doing the same things we just talked about. You're setting the role. You are a developer. You're setting context. You're saying you are operating in an environment called a web container. You're generating tools or implicit tool calling. I call it implicit tool calling because you're, you're referencing the tools in the context. You're not explicitly calling a service externally, but you're sort of setting, uh, you know, here's what's available to you to be able to implement something. So it says, "Prefer using Vite," which is maybe just a framework here instead of implementing a custom web server. So that's like saying, "Don't go off and use a tool that's not Vite." Um, and then you set priorities in the instructions. So you can see literally the prompt contains, "Important: use valid markdown. Ultra important: do not be verbose." So you're really setting what the output looks like in your prompt. Very important, you're then providing few shot examples of what good looks like. Once Bolt contains all of this information in the prompt, really all that's going on is it's taking this user input request and that's being fed into the agent system with the same context we just talked about, the system prompt, the user prompt, and then access to all of those tools in the prompt itself, which are structuring the problem, picking the right framework. There's a concept of a terminal and then retrieving context above. And these are-- think of these are just like components in your prompt here. All of that goes into an LLM, and then you get generated code. And that generated code, I thought this was gonna be way more sophisticated. Like I-- You know, with Cursor, if you-- there's like a three-and-a-half-hour, like, interview with the Cursor founders on Lex Fridman. It's fascinating. I was like, "Wow, Cursor is a really sophisticated problem, uh, to solve." And, and then I was shocked at, like, you can get such a good result with Bolt and Lovable because really all they're doing is generating code and rendering that code. So it's just going into basically an environment which takes the code that's written and is just able to run the code, executes it. And if there's a problem, it will go back and fix itself, similar to the agent that you just saw. But it's, it's really why is this important to note? Because [clears throat] Bolt is even simpler to some degree in terms of the system you see here than the, the agent that we wrote in Cursor to some degree. Because what, what's really the secret sauce here is you can take generated code, break it up into files, and then render that code, and you get an, uh, you get a UI. Now, if you try to use Bolt to make external API calls, for instance, or call other services or images, it gets a lot harder to do because of what Bolt is wired up to. It's a closed box, basically. You can't go in and plug in external things to it very easily. And maybe just to recap that, it's really just a system prompt, goes into reasoning to generate, "What do I need to do? Let me make a plan." The same thing we just saw with that a-- the, the Cursor agent. What tools do I have? Takes that context, generates code, deploys it, and then renders that code, and then based on user feedback, it can, you know, iterate on that.
- AGAakash Gupta
Mm-hmm.
- AKAman Khan
And what that really means is, I like to think about this as product principles for a full stack coding agent, where you can pull together prompting, your system prompt, prompt engineering, reasoning in the form of agents. Re- really it's like agent-based reasoning or chain of thought reasoning. Tool calling, in this case, it's implicit tool calling, and RAG, which is the context from that you've just provided the agent. All of that goes into an LLM, and you get the generated code. So it's kind of constructing your... You could fine-tune the model more if you wanted to, the LLM layer. You can update your RAG and change what context is provided. You can change your prompts, and that's, that's a huge component of this as well. Uh, and then you kind of string all that together with evals, and you can basically get this really s- you know, kind of slick generated code on the other end. I actually put evals in green because from what I could tell, Bolt isn't running evals on the fly, and that's actually an opportunity. I think, like, this is an example where if you're an AI PM, take note of what are the opportunities in the system where you can go and actually improve on the system overall. For instance, there could be a version of Bolt that never makes mistakes. Like, you could have it running an eval on the fly. Right now, Bolt is, uh, when you're, when you're actually running Bolt, it can break because it's making mistakes in code. You could have an eval that's run that checks is the code correct or not, and that would be an example of running evals to actually improve on the system, make Bolt even more reliable. And so those are all-- I think of this as like tearing down a product out there and thinking, "Oh, what are the opportunities to make this better?"
- AGAakash Gupta
Mm-hmm.
- AKAman Khan
And that's, that's really helpful to pull all of this together.
- 1:38:02 – 1:43:08
RAG vs prompt engineering vs fine-tuning: choosing the right lever
- AGAakash Gupta
So if we go back then and we compare those three terms thatFine-tuning, prompt engineering, RAG. How do those all compare, and when do we use what?
- AKAman Khan
Yeah, good question. So you've still got this system here where this is all kind of, you know, different components. When should you use what, right? That's sort of your, your, the note of your question.
- AGAakash Gupta
Yeah.
- AKAman Khan
And I kind of, um, I think it's helpful to have just, like, a really quick diagram here of, like, what is each thing. So let's look at prompt engineering. We have a-- It kind of depends on what your goal is. So if your goal is to adjust the tone or the instructions, I think prompt engineering is really helpful for that. Uh, so that's basically changing... We, we kind of did that just now on the fly with our agent and with Bolt. You can see, you can change the instructions and the tone. That's Bolt, h- that's how Bolt literally works. With RAG, you can provide context over a lot of data. So if you need to give the agent access or the, the tool, your AI, uh, application access to data internally in your system, that's when you would use RAG, which is using that data to create a generation on top with the context. So you're stuffing that into the prompt. Fine-tuning is, think of this as adjusting the model layer a little bit. So it's actually taking the LLM and making it more specialized. What's really useful for fine-tuning is sort of style to make sure it always responds a certain way, increases the reliability. And then I put distribution, but distribution is, like, giving it more data in the LLM itself that makes it more specific or specialized. It's useful, uh, I think, you know, when you think about, um, the-- It's, it's useful to think about, like, what's the effort of each of these. Prompt engineering relative to some of the others is really, really low effort, actually. All you have to do is have access to the prompts, change those prompts, and then get the eval result to understand are you making the system better or worse. RAG is a little bit more complicated, where you have a database now, so how you retrieve information from the database can have an impact on how much work this is, actually. And then thinking about, like, fine-tuning, if you change the model layer, you change this variable that might have a lot more impact on the rest of the system. And so I kind of view this a little bit more as, like, medium to high today, um, because it requires a bit more sort of specialization to adjust the model. That being said, the impact is really important to think about too, right? Like as, as a PM, we're always thinking: What's the effort? What's the impact? The impact of prompt engineering is really high. Um, and in fact, that means that, you know, a small change to the prompt can get you 10, 20, 30, even more percent gains on your eval scores. Uh, think about it that way. Like, if you're designing your AI product around your evals, like evals are your requirements now, then you really wanna think about how can you have the highest impact on those, and I think prompt engineering is huge. RAG is another really high impact way to improve on your system. A lot of times this might mean, like, adding RAG to your system when it doesn't have it already, so just adding more context or adding better context. And then I think fine-tuning, it sort of depends on what you're trying to do. Fine-tuning is really helpful for saving cost, which might be a very serious concern as you scale up, or reducing latency in your system. So if you want the, the model to be faster, fine-tuning can be really useful for that. Another helpful way to think about this to some degree, and this is not perfect, so don't feel-- feel free to, like, grill me in the comments that I got this wrong, but, um, but you know, I... This is my mental model I use. I think of prompt engineering as giving really clear instructions to, like, an engineer or to an employee, because what you're trying to do, the more specific you are, the better result you're gonna get. Like in the beginning of this video, if I had given clearer instructions to the agent, I may have gotten my tool, my product out faster. But we wanted to see what it looked like to just, like, try to prototype with, with not great instructions, and that's, that's the output that you get. With RAG, I think about this as like a doctor having access to the medical textbook at all times, meaning it can, you know, this, this agent can go and look things up to get more information. Like if I copy-pasted a doc into Cursor, it's gonna go to that website and read the doc to try to understand what I'm asking it to do, and that can be really helpful when you're asking it for stuff that it doesn't know in its memory already. And then sort of last but not least, fine-tuning is sort of like going from college to specializing in a career. What's kind of interesting is that these models are so good now at generalizing that you kind of trade off things when you go to specialization. Like, you know, a lot of people view hallucinations as things you want to remove, but I think hallucinations are a feature, not a bug of these models. And so just note that when you change the model architecture by fine-tuning, you might be getting rid of some of the generalization that can be really helpful, um, for a, a, a production application. So this is my mental model for, like, prompt engineering, RAG, and fine-tuning, and how they all kind of come together when you're thinking about building an AI product system.
- 1:43:08 – 1:48:18
Working with AI engineers/researchers: evals as requirements and shared tooling
- AGAakash Gupta
Awesome. So we covered four of the five skills. The final skill, and I don't know if we'll need screen fursh- share for this, you tell me, is working with AI engineers and researchers, working on these longer development timelines. How can AI PMs master that?
- AKAman Khan
Yeah. So I think this is where I'll come back to, like, what are we, what are we kind of talking about when we say our, our job is changing as AI PMs? And I think about this as the, the expectation on AI PMs is changing from our stakeholders. And our stakeholders, when we think about who are they right now, they're not necessarily just engineers that were working in their own way either. Like e- the way engineers are working and how they're expected to work is changing too. And if you're working on AI products specifically, you might have data scientists or AI engineers that are also ramping up to using gen AI in their workflows as well. So they're gonna be using data to make decisionsAnd I think the best way I can think about here is your job now has become to get a little bit more in the details of what it actually takes to ship an AI product by understanding the core concepts and principles of what goes into great AI products, understanding how they work and when to use what tools, and then sort of very importantly, last but not least, when you work with an AI engineer, is to know what they're thinking, what they need from you. So let's take an example of what that means tactically. When, when we're thinking about evals, an AI engineer might be looking at an example and saying like, "Was this agent good or bad in this case?" Like you, as an AI PM, should be able to answer that question because you are representing what the end ep- sort of experience for a customer looks like, right? Like you are the, ultimately the person that's like on the hook if the AI product is successful or unsuccessful. And so I think I really view it as you want to be in the details of what the team is working on and how it works. You want to be a little bit more in the details of the data, of is the experience good or bad, and can you give that feedback back to the team to know what to go and improve on? And then I think last but not least, like being able to interact in the same platform and work in the same, the same tools as your AI engineers is gonna help that communication much more. Like when I talk to AI engineers, they're often like... They-- What they come back and tell me is, "I can't believe my PM is still sending me Google Docs of PRDs and saying like, 'Go and implement this thing.' I wish that they would just be able to look at the system as a whole of like what the agents actually look like and what they're calling, and be able to tell me, is this correct or not? Or I wish that they were actually looking at customer data and telling me what's good or bad." And so I think that that's really important, is to speak the same language as your engineering team now, as they're ramping up on building around AI as well. And this is really an opportunity for AI PMs to stand out, you know, from other, other sort of product managers that maybe haven't ramped up in that way either. Like my-- I would argue that the stronger you are at communicating with data, communicating around what are the concepts to implement in your product, the more impact and sort of influence you're gonna have as an AI product manager on your organization and on your leadership. Because to be honest, you might even be able to influence at a higher level than even what you were able to do before because you're able to communicate around these terms more powerfully.
- AGAakash Gupta
So should PMs be writing AI evals?
- AKAman Khan
Oh, I think absolutely. I think, I think evals are... I would really reframe this as like, you know, we kind of mentioned evals are kind of what tells you what's good or bad about your system. But what if evals were your requirements instead of your AI product? So like you come back to the team and instead of saying, you know, when you think about a PRD product requirements doc, what if that actually looked like an evals requirements doc? And instead of a doc, it's actually just here's the data, here's the eval score. Now you guys go and improve on this eval and show me how you're improving on that. And that is a really interesting position to be in because you can work with your engineering team on getting the right data in place, you can be hands-on with them, and you'd get to determine like what's good and bad at the end of the day, which is what the end product experience actually looks and feels like. Um, and I guarantee you, like the, what we just saw, a lot of the code, like AI engineers, they wanna be thinking about the right model. They wanna be thinking about context. They wanna be thinking about prompting. But they're not necessarily always gonna be thinking about the end user experience the same way that you will. Like, they're thinking about implementation, so they need someone to give that feedback of end user experience, and I think evals are a really good way to represent that.
- 1:48:18 – 2:04:26
What not to do + a 2-hours/week plan + the AI PM job market reality
- AGAakash Gupta
Awesome. So that's our mini crash course. We've talked a little bit about how to become an AI PM, and in other places you've talked in even more detail. What should you not do?
- AKAman Khan
Yeah. That's a really good question. I think... I kind of view this as like, what are some of the things I see people doing today that, you know, from an AI PM, like they could be doing... Like what are people not doing today is one way to look at this, like where there's a set of opportunities here. [smacks lips] Um, I think generally, if you are thinking about what we just walked through from a product, project perspective, imagine if you had side projects kind of going all the time to help you kind of use these tools. So very common mistake I see is AI PMs don't, you know, I'll talk to AI PMs and I'll say like, "What are you working on on the side?" That's actually, by the way, like a bit of an interview hack, is like, it's actually like my first interview is usually I ask like, "Aside from work, like what are you building on the side?" And the reason for that is you can immediately tell what someone is interested in, and you see that they're curious, you see what their interests are, but you also see that they're taking initiative and they're trying to build and use these tools on the side. So you can immediately gauge like how close are they to actually building products. How much do they really care about building products? So that's a very common mistake I think I see is like, you know, if you don't have side projects, you might kind of end up having a bad time in the interview process. And I think a classic example of someone who does this really well and posts about it is Claire, um, Claire Vo. Uh, like she has, she had ChatPRD, you know, two years ago out, right? Like I actually used an early version of that system and I'm like, "It's not that great. Like, what's different about this than like talking to ChatGPT?" But then you realize that she's been using this side projectTo learn about the stack every single week on her weekends. And so the architecture has changed, the models have changed. So very classic example kind of building off of that is like you don't have side projects. If you wait until the models get better, you're gonna be left behind as well. So Claire kind of took that project and kind of kept iterating on it and building on it. So now when the models get really good, the product gets better, and you already have all of that scaffolding and experience that you've built up building the side projects to take advantage of the newest models. So just imagine like if you already have some product ideas or problems that you want to solve in your day-to-day life, then you may as well start getting building on them now so that when the models do get better, you're not waiting on that. You actually already have something in place to just kind of plug and play the model. And then I'll kind of take another l- uh, sort of step back and say like on the other end of the spectrum, a very common mistake I think I see AI PMs make is trying to automate too much of their job off of the bat. So what I mean by that is you want to use AI as like a second brain to save costs by doing things like analysis and deep research, and maybe even taking some action on your behalf. But I would be really careful about, you know, automating too much right off of the bat. So kind of leverage the fact that these reasoning models are really good at being able to do analysis and push and poke your ideas a bit. So let me give you an example of that. Some of my favorite prompts when I use a reasoning model, and when I say reasoning model, I mean like o3 or now the new Claude 4 as well, which can basically run, uh, the LLM sort of runs for longer and thinks about what response to give you. And one prompt I'll use really commonly is, "Give me five alternative solutions to what we just talked about and rank them in order of risk or ability to accomplish a goal, and then give me pros and cons of each." So what that does is it helps me interrogate my own thinking a bit without trying to automate too much away or just give, get, take the first solution from the, the LLM and try to go implement it. So my recommendation is don't try to automate too much and just take the first recommendation. Like push back a little bit and learn how to work with your, your LLMs and agents to get more out of them. Um, another example of that is like, "Help me simulate follow-up questions from a customer or a vendor in a space I might not be as experienced in," and say, "What are some of the questions they might ask me? What are some responses I should give? And then what are some follow-up questions based on that?" And so now you can actually show up to an interview much more prepared for what direction some, some person might take the questioning. And that's like another recommendation of, you know, you can't really automate that because at the end of the day, you're still gonna be on the other side of the screen or, or a conversation, and you should kind of be able to anticipate what someone is gonna say. So those are a couple of examples. I'd say three examples, you know, have side projects. If you don't have side projects, that's a really common mistake. Don't wait until the models get better. Like now is a great time to just start building something and, and swapping your components in and out. And then don't try to automate too much right off the bat. Learn how to use AI as a second brain to scale up your analysis and research.
- AGAakash Gupta
Amazing. So a lot of that sounds like it could be a lot of work, especially the side projects.
- AKAman Khan
Yeah.
- AGAakash Gupta
But if somebody just has two hours a week and they want to become an AIPM, what are the exact steps they should follow?
- AKAman Khan
Yeah. Good question. And I, I feel this myself, right? Like we, we [chuckles] we have, we have to, you know, we have commitments. We have commitments to our jobs, to our families, to other people in our lives, and we can't, you know, trying to ramp up here, it feels like a lot, especially when the space is changing so rapidly, and it's hard to keep up, definitely. The three things I think about whenever I feel like there's something new or something I'm trying to ramp up on when it comes to AI specifically, it's really just three steps. I'd recommend to try the tools yourself firsthand. I then recommend to build AI intuition, and then apply that AI intuition. And let's kind of talk about each of those in a little bit more, uh, depth. So when I say try the tools, I don't mean like go and try to like implement them into your company right off the bat. I don't mean like go try and im- you know, go back to your CPO and say like, "We need an AI agent." I mean just try the tools for your own kind of use cases and, and day-to-day life. Like an example is, you know, I wanted to build an AI storybook generator, um, for someone in my life, and I was, you know, like for, for a young child and say, "Let me build images specific for this person based on a theme and based on, you know, what, uh, what I might be trying to like tell for like a bedtime story or something like that," right? Like really nice use case for, for a kid. And I think when I tried to use the tools, I actually found where they fell short and what was hard about them. And so when I actually started using the tools myself, I started realizing, "Huh, this is what is possible, and here's what's hard about using these tools." So when you do that, you kind of, you gain a sense of AI intuition a little bit of like what's possible today and what's hard. The next is, I think, trying to tear down AI products with, by building AI intuition like we just did with Bolt. So when you see an experience that feels magical, try to go one level deeper and understand how the system works a bit so you can see how these kind of buzzwords like MCP, RAG, like how they come together to actually form the system you just used, and they feel a little bit less like hypey and a little bit more real, right? Like the same way we just said with Bolt, like it's not using a ton of complexity. It's actually just prompting, and we kind of learned that by tearing the system down. You can do this by watching YouTube videos, talking to other AI PMs. You can try to look at code as you get more proficient, or copy-paste the code into an LLM and ask it to explain it to you. And then you can just try to recreate the product a little bit yourself and see what's hard about that.And then I think last but not least is actually try to apply these two things. Apply your curiosity and what you've learned, and try to build something that you can keep going on the side as a side project, where you can actually try to build your own product. And that way you're always kind of motivated to try to try a new technology and see if it makes your product better or worse. So those are-- I think if you have two hours a week, you can at least try a tool, uh, uh, pick a tool, whatever it might be. This week for me, it's gonna be V- Veo 3 because that one just launched, and I wanna go and try and understand, "Okay, how good is this thing really? Can I build something around it?" And then I'll kind of learn what the boundaries are, what the edges are of the technology, so that I know, "Okay, here's what-- here's the opportunities around that. Here's what I can take back as learning." And the whole goal is really just to keep learning, so that you can apply that in your day job or for projects you might be building.
- AGAakash Gupta
So I was really excited to have you on. I consider you, Tal, Colin, Pavel, like four of the best AIPM creators. But there's, there's a little group that's been criticizing us, talking all about AIPM, and what they keep saying is, "Where are the AIPM jobs?"
- AKAman Khan
Yeah.
- AGAakash Gupta
So are there really that many AIPM jobs out there?
- AKAman Khan
Yeah. I think-- So I think part of it is, like, I'll be honest, I don't know if the hiring managers have, like, rebranded, uh, LinkedIn jobs to say, like, AIPM just yet. What I have been noticing is, like, you'll see PM jobs that say product manager, like comma AI, and that's, that in my mind is sort of an AIPM job. What I will say is that the, the space, the, the PM space is really catching up here. So we're a little bit ahead of the curve, and that's kind of where you wanna be a little bit from a technology perspective, is ahead of the wave, so that when the wave really comes, you're able to ride that wave. Just like in surfing, right? Like you don't wanna be behind the wave. You're gonna miss it. So think about-- Here's a mental model for you or like a, a, you know, sort of a thought exercise, is think about, like, realistically, what is the overall number of PM jobs out there today? And think, "Okay, maybe it's in the thousands in your local city, wherever you might be living." But then think about in three-year times, how many PM jobs like that will there be? And what's the ratio of AIPM today relative to where we'll be in three years? So when you think about that, you might notice, like in the headlines, you'll see companies are laying off entire teams, entire orgs, and PMs are sort of grouped up into that. But it's really rare and very-- I think like, you know, I almost never see it where an AIPM team is gonna be laid off to some degree.
- AGAakash Gupta
Yep.
- AKAman Khan
And so what you're trying to do is future-proof your career a little bit by being ahead, by basically either taking an AIPM position or positioning yourself as a, what could be an AIPM to fill that role internally at your company. Because that's where companies are starting to invest their resources when it comes to product management headcount, because of the opportunity around this technology. So, uh, like coming back to an earlier point, I don't think it's an either/or. Like, I don't think it's like, you know, AIPM or bust. I think it's like AI as an opportunity that intersects with the type of product management you might be doing already, like fintech, healthcare, growth, and it's just that AI is another way to leverage that up a bit more.
- AGAakash Gupta
And to prove everybody who's getting mad at us about these AIPM jobs, while Aman was talking, I decided to search on LinkedIn, right? Pro- AI product manager, and like he said, it's that product manager comma AI. You're getting over 1,000 results just in New York City.
- AKAman Khan
Right.
- AGAakash Gupta
So this is a real job out there. I've examined the compensation of this in other articles. These AIPMs are getting paid 20% to 30% more than regular PMs. We have given you the full toolkit today. If people wanna go deeper, Aman, where can they find you online? Tell us more about what else you're doing outside of appearing on podcasts and doing your day job as a director of product.
- AKAman Khan
Yeah, for sure. Well, so you can find me on, uh, amank.ai. That's my website. I'm also on LinkedIn, if you just search for Aman Khan. Uh, and Twitter, we can kind of plug all the socials. Um, I think for, for me, I really am trying to be as helpful as possible to people that are trying to make this transition in their careers as well, to going from product management to either building around AI in their own products or using AI in the day-to-day. So my goal is really to try to just give away as much for free, give away, you know, as much as I possibly can to people, so that they can understand, you know, how this technology is gonna impact their j- day job, in the same way I wish people were doing that for me in, in the past. And so my recommendation is, like, pick, you know, content curators or creators, like yourself, Aakash. You do such an incredible job of bringing on, like, you know, extremely talent-rich people that are, you know, able to share unique perspectives. And I learn something every time I watch, you know, your, your videos and read your content. And I think that that's really what I'm, I'm aiming to do, is just try to give my perspective of being at the edge of building AI products, what I see, so that you can go kind of build that into your own companies or your own life. Kind of taking like-- Think of this as like taking the bleeding edge of AI and trying to make it more approachable for people a bit more.
- AGAakash Gupta
Love it. And you also, I'm just looking, you have a course on Maven. What's that all about?
- AKAman Khan
Yeah. So this one is a pretty recent addition, actually, to the offering. So, um, I kind of view this as a way for-- You know, you can think of this as like, you could go get a gym membership and, like, watch YouTube videos, and I think that that's useful. This is sort of more of like personal training, uh, is how I view it. So, like, you should really only take this course, to be honest, if you arekind of you've already... You're in the early stages of the curve, and you've maybe gotten to a point where you've built a prototype, and you feel a little bit comfortable in Cursor and some of the workflows we showed, that's really gonna be the starting point for this course. So think going from Cursor prototype to real production application using evals and some of the workflows we just showed. The goal for me here is to give you an H1 or H2 strategy doc that you can take back to your leadership team for what an AI product could look like in your organization. And to do that, I wanna help you build the foundations of trying the products out at the early stages, but really going into, you know, day-to-day workflows of what AI product management looks like when you're building these products. So yeah, that's this. The course is kicking off on July 1st. Uh, it's gonna be a relatively small cohort to get started with, uh, and based on that, we're gonna try and run that more repeatably, really as being sort of a way for you to bounce ideas off of, and really in a structured way to kind of give you the tools to build this H1, H2 AI strategy.
- AGAakash Gupta
[smacks lips] All right, guys. So if you wanna go deeper, you can check out my code. It's in the description to get a little bit of a discount off of Aman's course. I can personally vouch this man knows his stuff. Good luck to you on your AIPM journey. Find both of us on LinkedIn if you need more, and we'll see you next time.
- AKAman Khan
Thanks for having me on, Aakash. This was awesome.
- AGAakash Gupta
I really hope you guys enjoyed that episode. It would mean a ton to me and the team if you could please subscribe on YouTube, follow on Apple and Spotify podcasts, and leave a rating and review. Those ratings and reviews really help grow the show and help other people discover the show, and they help fund the production so that we can do bigger and better productions. Can't wait to share the next episode with you. Until then, see you later.
Episode duration: 2:04:35
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode Ej4pBDaHspk
