CHAPTERS
- 0:00 – 2:20
Why AI PM skills are becoming career-critical
Aman frames the shifting expectations for product managers and argues that AI PM teams are less likely to be cut amid broader layoffs. The conversation sets the stakes: PM roles are changing, and AI competence is becoming a differentiator.
- •PM expectations are rising as AI changes workflows and products
- •AI-focused PM work is viewed as more resilient during layoffs
- •The market is rewarding PMs who can build with AI now
- •AI is positioned as an accelerator, not a replacement, for PM craft
- 2:20 – 4:22
Can anyone become an AI PM? Defining “AI-powered PM” vs “AI product PM”
Aman answers whether any PM can become an AI PM by redefining the term. He distinguishes using AI to enhance PM work from building AI into products, and emphasizes AI as an overlay on domain expertise rather than a standalone identity.
- •Two modes: adopting AI in PM workflow vs building AI features
- •AI PM is an intersection (e.g., fintech × AI), not a separate track
- •Domain knowledge remains core; AI amplifies execution
- •Most PMs will become some flavor of AI PM over time
- 4:22 – 5:52
Real-world AI use cases across “edge case” PM roles
Aakash challenges common objections (regulated industries, internal tools) by sharing recent examples of PMs implementing AI anyway. Aman encourages embracing the label early to lead within your domain as hiring expectations evolve.
- •Internal tools: LLMs standardize messy PRDs/templates into structured hypotheses/metrics
- •Finance: AI-driven credit models and decisioning are actively being built
- •Hesitation about “hype” vs pragmatic career positioning
- •Earlier adoption helps you become the AI leader in your product domain
- 5:52 – 6:31
The 5-step AI PM crash course roadmap
Aakash introduces the episode’s structure: five practical skill areas to become effective as an AI PM. This becomes the backbone for the rest of the discussion and demo.
- •Step 1: AI prototyping (core hands-on skill)
- •Step 2: Observability (understanding what’s happening under the hood)
- •Step 3: Evals (measuring quality systematically)
- •Step 4: Prompting vs RAG vs fine-tuning (choosing the right lever)
- •Step 5: Working with AI engineers and researchers
- 6:31 – 8:47
Why Cursor for prototyping (vs Bolt/Lovable/Replit/v0)
Aman explains why he prefers Cursor as a PM prototyping environment: more control, deeper iteration, and better extensibility for agentic systems. They compare strengths of other tools and clarify when quick UI generators are enough.
- •Cursor is a VS Code fork optimized for AI-assisted coding and iteration
- •Other tools excel at fast mocks (front-end, deployment, Python agents), but offer less control
- •Cursor shines when you want to implement agents, customize components, and manage codebase complexity
- •There’s a learning curve, but higher long-term leverage for PMs
- 8:47 – 25:22
Live build: Trip Planner agent in Cursor using LangGraph + UI
They start from scratch and prompt Cursor’s agent to create a Trip Planner using LangGraph with a front-end. Aman explains key shortcuts, model choice (Claude 4), and how lightweight prompting plus iteration drives progress.
- •Key Cursor actions: terminal vs agent commands; natural language terminal guidance
- •Agent creates folders, installs dependencies, writes backend + React frontend
- •@web-style browsing/context enables the agent to reference docs/tutorials
- •Iterative workflow: ask, inspect, adjust—rather than perfect upfront prompts
- 25:22 – 45:51
Debugging with an agent: dependency errors, ports, and resilience
The demo hits realistic issues—missing modules, conflicting requirements, Node warnings, ports in use. Aman shows the core PM prototyping loop: copy errors, paste to agent, let it fix, and keep moving until it runs.
- •Use terminal errors as feedback; paste them into the agent to diagnose/fix
- •Expect Python/Node dependency friction when going beyond no-code tools
- •Agents can recover from interruptions and resume work with minimal guidance
- •Best practice: prototype on greenfield projects vs modifying complex production repos
- 45:51 – 50:15
Testing the prototype: what makes it ‘agentic’ and how outputs are composed
With the UI running, they clarify where agents live in the system and how responsibilities are divided (research, planning, budget, local curation). They test a Spain itinerary and discuss how turning a prompt into a form-based product changes usability and extensibility.
- •Agents = specialized prompt/context packages for sub-tasks (research, budget, planner, curator)
- •Form inputs become structured parameters; output becomes more productized than chat
- •Prototype can evolve: add search, APIs (booking), different UI flows
- •Demonstrates speed vs manual planning and multi-search workflows
- 50:15 – 1:01:26
Observability: tracing the agent graph and seeing the real execution path
Aman introduces observability as the bridge from prototype to understanding: tracing reveals the agent graph, parallel steps, prompts, tool calls, and latency. They show how tracing provides a visual map of the architecture Cursor generated.
- •Tracing visualizes the multi-agent workflow and the path taken per request
- •Shows parallel execution and how sub-agent outputs feed the final itinerary
- •Observability enables debugging, performance tuning, and architecture comprehension
- •Implementation is now lightweight: install a package + wrap functions/spans
- 1:01:26 – 1:09:59
Prompt iteration in a playground: model choice, tone, length, and conversion hooks
They move from observing to improving: adjust prompts to change level of detail, friendliness, and even add growth hooks like email capture + discount. They compare models and show how small prompt tweaks can dramatically change latency and output.
- •Prompt engineering controls tone, structure, brevity, and product behaviors
- •Model swaps affect speed/cost; prompts can become outdated as models evolve
- •A/B compare prompt versions using the same upstream agent context
- •Constraints (e.g., max length) can reduce latency and improve UX clarity
- 1:09:59 – 1:30:04
Evals: from vibe coding to measurable quality (LLM-as-judge + human labels)
Aman explains how to evaluate changes systematically using datasets, experiments, and evaluators. They demonstrate LLM-as-judge evals (friendliness, discount presence), discuss best practices (text labels over numeric), and show aligning judges with human annotations.
- •Build/bootstrapping datasets (hand-curated or synthetic) to test changes at scale
- •Three eval types: human labels, code-based checks, LLM-as-judge grading
- •Best practice: use text-based labels; include explanations for auditability
- •Meta-eval: compare judge outputs vs human labels to detect misalignment
- 1:30:04 – 1:38:02
How Bolt/Lovable ‘magic’ works: system prompt + tools + code rendering
They deconstruct Bolt as an example of an AI product pipeline: a strong system prompt, implicit tool constraints, code generation, execution, and iteration. The takeaway is that many “magical” products are understandable once you inspect prompts, tooling, and runtime limits.
- •Bolt relies heavily on a carefully crafted system prompt and environment constraints
- •Implicit tool calling via instructions (framework choices, terminal model)
- •Generated code is executed and rendered; errors trigger iterative repair
- •Opportunity: integrate evals to reduce code-generation mistakes and boost reliability
- 1:38:02 – 1:43:08
RAG vs prompt engineering vs fine-tuning: choosing the right lever
Aman provides a practical decision framework: use prompt engineering for behavior and tone, RAG for injecting relevant knowledge, and fine-tuning for specialization, reliability, cost, or latency. He also weighs effort vs impact and offers memorable mental models.
- •Prompt engineering: low effort, high impact for instructions/tone/output shape
- •RAG: adds context from a corpus; retrieval quality becomes the hard problem
- •Fine-tuning: changes the model layer; higher effort and tradeoffs in generality
- •PM framing: effort vs impact; prompts often yield big eval gains quickly
- 1:43:08 – 1:48:18
Working with AI engineers/researchers: evals as requirements and shared tooling
The final skill is collaboration: AI PMs need to operate closer to the data, traces, and evaluation metrics that drive model behavior. Aman argues PMs should write/own evals because they encode the desired user experience better than long PRDs.
- •Stakeholders expand beyond engineers to AI engineers/data scientists/researchers
- •PMs should judge outputs, label data, and define ‘good vs bad’ via evals
- •Shift from PRDs to ‘evals as requirements’ to drive iterative improvement
- •Shared tooling (traces, prompt hubs, datasets) improves team alignment and velocity
- 1:48:18 – 2:04:35
What not to do + a 2-hours/week plan + the AI PM job market reality
Aman closes with anti-patterns and a realistic learning plan: try tools, build intuition by tearing down products, then apply it in small side projects. They address skepticism about AI PM roles, citing job volume and compensation, and end with where to follow Aman and his course.
- •Don’t wait for models to get better; ship small side projects to learn continuously
- •Don’t over-automate immediately; use AI as a ‘second brain’ and pressure-test ideas
- •2-hour routine: try tools → build intuition → apply via a small build loop
- •AI PM roles exist (often ‘PM, AI’); reported pay premiums and strong demand
