If you only have 2 hrs, this is how to become an AI PM

Every PM has to build AI features these days. And with that means a completely new skill set: AI prototyping, observability akin to telemetry, AI evals as the new PRDs, understanding RAG vs fine-tuning vs prompt engineering, and working with AI engineers. So this week, I bring you a 2-hour crash course into becoming a better AI PM. I teamed up with Aman Khan. When it comes to people creating AI PM content, Aman is amongst the most insightful and informed. And that's because he's been an AI PM since 2019. He worked at Cruise on self-driving cars. He's worked with Spotify on their AI systems. And now he works at Arize, one of the leading observability and evals companies. 🎥 Timestamps: Can Anyone Become AIPM? - 0:00 5 AIPM Skills Overview - 5:52 Skill 1: AI Prototyping - 6:31 Ad: Miro - 13:35 Ad: Atlassian - 14:50 Building Trip Planner Agent - 15:27 Ad: Maven - 29:46 Ad: Amplitude - 30:40 Skill 2: Observability - 50:34 Skill 3: Evals - 1:10:10 RAG vs Fine-Tuning vs Prompt Engineering - 1:29:54 Bolt Teardown - 1:30:32 Skill 5: Working With Engineers - 1:43:24 Don't Make These Mistakes - 1:48:33 2 Hours Weekly Plan - 1:53:55 AIPM Jobs Exist - 1:57:45 Aman's Resources - 2:00:48 Outro - 2:04:00 Podcast transcript: https://www.news.aakashg.com/p/aman-khan-podcast 💼 Check out our sponsors: 1. Miro: The innovation workspace is your team’s new canvas - http://miro.pxf.io/PO4WZX 2. Jira Product Discovery: Plan with purpose, ship with confidence - https://www.atlassian.com/software/jira/product-discovery 3. Maven: Get $100 off Aman’s course with my code ‘AAKASHxMAVEN’ - https://maven.com/aman-khan/thriving-as-an-ai-pm?utm_campaign=aakash-gupta&utm_medium=affiliate&utm_source=maven&promoCode=AAKASHxMAVEN 4. Amplitude: Test out the #1 product analytics and replay tool in the market - https://bit.ly/4hl25RG 👀 Where to Find Aman: LinkedIn: https://www.linkedin.com/in/amanberkeley/ X: https://x.com/_amankhan Substack: https://amankhan1.substack.com/ Company: https://arize.com/ Course: https://maven.com/aman-khan/thriving-as-an-ai-pm?utm_campaign=aakash-gupta&utm_medium=affiliate&utm_source=maven&promoCode=AAKASHxMAVEN 👨‍💻 Where to find Aakash: Twitter: https://www.twitter.com/aakashg0 LinkedIn: https://www.linkedin.com/in/aagupta/ Instagram: https://www.instagram.com/aakashg0/ 🔑 Key Takeaways: 1. Cursor beats Bolt for serious AI PMs. While Bolt is great for quick mockups, Cursor gives you the control you need to build real agent systems and understand what's happening under the hood. 2. Observability comes before evals. Just like regular products need telemetry for analytics, AI products need traces for evals. Point Cursor to documentation and it adds what you need. 3. Vibe coding doesn't scale. Looking at outputs and deciding if they "feel good" works for prototypes, but not production. You need systematic evals to measure what "good" actually means. 4. Most PMs fine-tune too early. Aman showed a prompt outperforming a fine-tuned model. Start with prompting (95% of results), add RAG for external data, only fine-tune for cost/speed. 5. Your evals need evals. When your LLM judge marks outputs as "friendly" while your human labels say "robotic," that mismatch tells you exactly where to improve your system. 6. Use text labels, not numbers. LLMs understand "friendly vs robotic" better than 1-5 scales. They're trained on language, not mathematics. 7. AI engineers want data, not docs. Stop sending Google Docs with requirements. They want you labeling datasets and defining success through evals. 8. Bolt is just a really good prompt. Aman tore down Bolt's architecture - it's system prompts + tool calling + code generation. The "magic" isn't magic. 9. Side projects are your interview hack. When Aman asks "What are you building?" he can immediately gauge curiosity, initiative, and hands-on experience. 10. Don't automate yourself too early. Use AI as a second brain for analysis, but don't try to automate your entire job. Learn to work with reasoning models to push your thinking.

Aman KhanguestAakash Guptahost

Jun 15, 20252h 4mWatch on YouTube ↗

CHAPTERS

Why “AI PM” is becoming the default (and safer) PM path
Aman frames the shifting expectations on PMs and argues most PMs will become some flavor of AI PM—either AI-powered in workflow or AI product PM building AI into products. They also discuss why AI-related product work appears more resilient amid layoffs and why labeling yourself “X × AI PM” is more accurate than a full rebrand.
The 5-step AI PM skill stack overview (the roadmap for the episode)
Aakash and Aman outline a structured crash course: AI prototyping, observability, evals, understanding RAG vs fine-tuning vs prompt engineering, and collaborating with AI engineers/researchers. This sets the navigation for the rest of the discussion and clarifies how each skill builds on the previous.
AI prototyping tool choice: why Cursor (vs Bolt, Lovable, Replit, v0, etc.)
Aman compares popular AI prototyping tools and argues Cursor’s strength is control and flexibility once you go beyond quick mockups. He acknowledges a learning curve but emphasizes Cursor becomes essential for deeper work like agents, debugging, and iterating on real components.
Live build: creating an agentic Trip Planner (LangGraph) with a React frontend
They demonstrate Cursor’s agent mode to generate a full-stack prototype: a LangGraph-based multi-agent trip planner with a UI. The segment highlights the minimal prompting needed, iterative workflow, and how the agent plans, creates directories, writes files, and installs dependencies.
Debugging as a PM skill: dependency conflicts, ports, and iterative recovery
The build hits common real-world issues (missing modules, yanked versions, Node/Python conflicts, ports in use). Aman models a pragmatic workflow: copy/paste terminal errors back into the agent, simplify scope when needed, and keep moving—building comfort with “breaking and fixing” as part of AI prototyping.
Demoing the working prototype: agents, roles, and user experience in the Trip Planner
They run the finished UI and explain where “agents” exist in the architecture (research, planning, budgeting, local curator). The output illustrates how a form-driven interface wraps prompts into a programmatic product surface—and how PMs can tweak UI/fields to evolve the experience.
Observability 101: tracing the agent system to see what’s happening under the hood
Aman introduces observability as the next layer after prototyping: tracing reveals what calls happen, in what order, and how agents interact. They show a graphical DAG-style view of the multi-agent system, emphasizing that without tracing you can’t confidently debug latency, failures, or unexpected behavior.
Prompt iteration with observability: A/B models and prompts to improve latency and tone
They use a prompt playground workflow to modify itinerary generation: shorten outputs, change tone, and add an email/discount call-to-action. By comparing model choices and prompt variants, they demonstrate how small instruction changes can drastically affect response quality and response time.
From vibe coding to eval-driven development: building datasets and running experiments
Aman explains that beyond subjective “looks good,” teams need evals to quantify whether changes improve the system. They create/describe a dataset of representative cases, run experiments across versions, and position dataset bootstrapping (including synthetic generation) as a practical starting point.
Evals types: human labels, code checks, and LLM-as-judge (plus best practices)
They break evals into three approaches and focus on LLM-as-judge for scalable feedback. Aman shows how to structure judge prompts (role/context/examples/action), why textual labels can be more reliable than numeric scales, and how explanations help diagnose judge behavior.
Calibrating the judge: comparing human labels vs LLM judge and iterating alignment
Aman demonstrates the need to evaluate the evaluator—comparing human judgments to judge outputs to detect misalignment. They discuss that PMs should be in the loop labeling “good vs bad” and hint at emerging workflows (“prompt learning”) that optimize prompts using human feedback.
Demystifying RAG vs prompt engineering vs fine-tuning (effort vs impact)
They clarify when to use each technique: prompting for instructions/tone, RAG for dynamic knowledge/context, fine-tuning for specialization/reliability/cost/latency tradeoffs. Aman adds practical mental models (instructions to an employee, doctor with textbook, specialization after college) and stresses prompt engineering’s outsized ROI.
Working with AI engineers & researchers: requirements become evals, and PMs go deeper in data
Aman argues AI PMs must collaborate differently: understand system behavior, read traces, participate in data labeling, and communicate in the same artifacts engineers use (not just PRDs). He reframes evals as the new “requirements doc,” aligning teams around measurable quality.
Common mistakes and the 2-hours-a-week plan to become an AI PM
They close with actionable career guidance: build side projects to develop real intuition, don’t wait for “better models,” and avoid over-automating your job—use AI as a second brain to challenge thinking. Aman offers a simple 2-hours/week loop: try tools, build intuition by deconstructing products, then apply learning in a small project.
AI PM jobs reality check: titles, demand, and compensation premium
They address skepticism about AI PM job availability: roles may be labeled “PM, AI” rather than “AI PM,” but demand is visible in job searches. They argue positioning as an AI-capable PM future-proofs careers and cite claims of higher comp for AI-oriented PM roles.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

Why “AI PM” is becoming the default (and safer) PM path

The 5-step AI PM skill stack overview (the roadmap for the episode)

AI prototyping tool choice: why Cursor (vs Bolt, Lovable, Replit, v0, etc.)

Live build: creating an agentic Trip Planner (LangGraph) with a React frontend

Debugging as a PM skill: dependency conflicts, ports, and iterative recovery

Demoing the working prototype: agents, roles, and user experience in the Trip Planner

Observability 101: tracing the agent system to see what’s happening under the hood

Prompt iteration with observability: A/B models and prompts to improve latency and tone

From vibe coding to eval-driven development: building datasets and running experiments

Evals types: human labels, code checks, and LLM-as-judge (plus best practices)

Calibrating the judge: comparing human labels vs LLM judge and iterating alignment

Demystifying RAG vs prompt engineering vs fine-tuning (effort vs impact)

Working with AI engineers & researchers: requirements become evals, and PMs go deeper in data

Common mistakes and the 2-hours-a-week plan to become an AI PM

AI PM jobs reality check: titles, demand, and compensation premium

Get more out of YouTube videos.