CHAPTERS
Why “AI PM” is becoming the default (and safer) PM path
Aman frames the shifting expectations on PMs and argues most PMs will become some flavor of AI PM—either AI-powered in workflow or AI product PM building AI into products. They also discuss why AI-related product work appears more resilient amid layoffs and why labeling yourself “X × AI PM” is more accurate than a full rebrand.
The 5-step AI PM skill stack overview (the roadmap for the episode)
Aakash and Aman outline a structured crash course: AI prototyping, observability, evals, understanding RAG vs fine-tuning vs prompt engineering, and collaborating with AI engineers/researchers. This sets the navigation for the rest of the discussion and clarifies how each skill builds on the previous.
AI prototyping tool choice: why Cursor (vs Bolt, Lovable, Replit, v0, etc.)
Aman compares popular AI prototyping tools and argues Cursor’s strength is control and flexibility once you go beyond quick mockups. He acknowledges a learning curve but emphasizes Cursor becomes essential for deeper work like agents, debugging, and iterating on real components.
Live build: creating an agentic Trip Planner (LangGraph) with a React frontend
They demonstrate Cursor’s agent mode to generate a full-stack prototype: a LangGraph-based multi-agent trip planner with a UI. The segment highlights the minimal prompting needed, iterative workflow, and how the agent plans, creates directories, writes files, and installs dependencies.
Debugging as a PM skill: dependency conflicts, ports, and iterative recovery
The build hits common real-world issues (missing modules, yanked versions, Node/Python conflicts, ports in use). Aman models a pragmatic workflow: copy/paste terminal errors back into the agent, simplify scope when needed, and keep moving—building comfort with “breaking and fixing” as part of AI prototyping.
Demoing the working prototype: agents, roles, and user experience in the Trip Planner
They run the finished UI and explain where “agents” exist in the architecture (research, planning, budgeting, local curator). The output illustrates how a form-driven interface wraps prompts into a programmatic product surface—and how PMs can tweak UI/fields to evolve the experience.
Observability 101: tracing the agent system to see what’s happening under the hood
Aman introduces observability as the next layer after prototyping: tracing reveals what calls happen, in what order, and how agents interact. They show a graphical DAG-style view of the multi-agent system, emphasizing that without tracing you can’t confidently debug latency, failures, or unexpected behavior.
Prompt iteration with observability: A/B models and prompts to improve latency and tone
They use a prompt playground workflow to modify itinerary generation: shorten outputs, change tone, and add an email/discount call-to-action. By comparing model choices and prompt variants, they demonstrate how small instruction changes can drastically affect response quality and response time.
From vibe coding to eval-driven development: building datasets and running experiments
Aman explains that beyond subjective “looks good,” teams need evals to quantify whether changes improve the system. They create/describe a dataset of representative cases, run experiments across versions, and position dataset bootstrapping (including synthetic generation) as a practical starting point.
Evals types: human labels, code checks, and LLM-as-judge (plus best practices)
They break evals into three approaches and focus on LLM-as-judge for scalable feedback. Aman shows how to structure judge prompts (role/context/examples/action), why textual labels can be more reliable than numeric scales, and how explanations help diagnose judge behavior.
Calibrating the judge: comparing human labels vs LLM judge and iterating alignment
Aman demonstrates the need to evaluate the evaluator—comparing human judgments to judge outputs to detect misalignment. They discuss that PMs should be in the loop labeling “good vs bad” and hint at emerging workflows (“prompt learning”) that optimize prompts using human feedback.
Demystifying RAG vs prompt engineering vs fine-tuning (effort vs impact)
They clarify when to use each technique: prompting for instructions/tone, RAG for dynamic knowledge/context, fine-tuning for specialization/reliability/cost/latency tradeoffs. Aman adds practical mental models (instructions to an employee, doctor with textbook, specialization after college) and stresses prompt engineering’s outsized ROI.
Working with AI engineers & researchers: requirements become evals, and PMs go deeper in data
Aman argues AI PMs must collaborate differently: understand system behavior, read traces, participate in data labeling, and communicate in the same artifacts engineers use (not just PRDs). He reframes evals as the new “requirements doc,” aligning teams around measurable quality.
Common mistakes and the 2-hours-a-week plan to become an AI PM
They close with actionable career guidance: build side projects to develop real intuition, don’t wait for “better models,” and avoid over-automating your job—use AI as a second brain to challenge thinking. Aman offers a simple 2-hours/week loop: try tools, build intuition by deconstructing products, then apply learning in a small project.
AI PM jobs reality check: titles, demand, and compensation premium
They address skepticism about AI PM job availability: roles may be labeled “PM, AI” rather than “AI PM,” but demand is visible in job searches. They argue positioning as an AI-capable PM future-proofs careers and cite claims of higher comp for AI-oriented PM roles.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome