Skip to content
Aakash GuptaAakash Gupta

If you only have 2 hrs, this is how to become an AI PM

Every PM has to build AI features these days. And with that means a completely new skill set: AI prototyping, observability akin to telemetry, AI evals as the new PRDs, understanding RAG vs fine-tuning vs prompt engineering, and working with AI engineers. So this week, I bring you a 2-hour crash course into becoming a better AI PM. I teamed up with Aman Khan. When it comes to people creating AI PM content, Aman is amongst the most insightful and informed. And that's because he's been an AI PM since 2019. He worked at Cruise on self-driving cars. He's worked with Spotify on their AI systems. And now he works at Arize, one of the leading observability and evals companies. 🎥 Timestamps: Can Anyone Become AIPM? - 0:00 5 AIPM Skills Overview - 5:52 Skill 1: AI Prototyping - 6:31 Ad: Miro - 13:35 Ad: Atlassian - 14:50 Building Trip Planner Agent - 15:27 Ad: Maven - 29:46 Ad: Amplitude - 30:40 Skill 2: Observability - 50:34 Skill 3: Evals - 1:10:10 RAG vs Fine-Tuning vs Prompt Engineering - 1:29:54 Bolt Teardown - 1:30:32 Skill 5: Working With Engineers - 1:43:24 Don't Make These Mistakes - 1:48:33 2 Hours Weekly Plan - 1:53:55 AIPM Jobs Exist - 1:57:45 Aman's Resources - 2:00:48 Outro - 2:04:00 Podcast transcript: https://www.news.aakashg.com/p/aman-khan-podcast 💼 Check out our sponsors: 1. Miro: The innovation workspace is your team’s new canvas - http://miro.pxf.io/PO4WZX 2. Jira Product Discovery: Plan with purpose, ship with confidence - https://www.atlassian.com/software/jira/product-discovery 3. Maven: Get $100 off Aman’s course with my code ‘AAKASHxMAVEN’ - https://maven.com/aman-khan/thriving-as-an-ai-pm?utm_campaign=aakash-gupta&utm_medium=affiliate&utm_source=maven&promoCode=AAKASHxMAVEN 4. Amplitude: Test out the #1 product analytics and replay tool in the market - https://bit.ly/4hl25RG 👀 Where to Find Aman: LinkedIn: https://www.linkedin.com/in/amanberkeley/ X: https://x.com/_amankhan Substack: https://amankhan1.substack.com/ Company: https://arize.com/ Course: https://maven.com/aman-khan/thriving-as-an-ai-pm?utm_campaign=aakash-gupta&utm_medium=affiliate&utm_source=maven&promoCode=AAKASHxMAVEN 👨‍💻 Where to find Aakash: Twitter: https://www.twitter.com/aakashg0 LinkedIn: https://www.linkedin.com/in/aagupta/ Instagram: https://www.instagram.com/aakashg0/ 🔑 Key Takeaways: 1. Cursor beats Bolt for serious AI PMs. While Bolt is great for quick mockups, Cursor gives you the control you need to build real agent systems and understand what's happening under the hood. 2. Observability comes before evals. Just like regular products need telemetry for analytics, AI products need traces for evals. Point Cursor to documentation and it adds what you need. 3. Vibe coding doesn't scale. Looking at outputs and deciding if they "feel good" works for prototypes, but not production. You need systematic evals to measure what "good" actually means. 4. Most PMs fine-tune too early. Aman showed a prompt outperforming a fine-tuned model. Start with prompting (95% of results), add RAG for external data, only fine-tune for cost/speed. 5. Your evals need evals. When your LLM judge marks outputs as "friendly" while your human labels say "robotic," that mismatch tells you exactly where to improve your system. 6. Use text labels, not numbers. LLMs understand "friendly vs robotic" better than 1-5 scales. They're trained on language, not mathematics. 7. AI engineers want data, not docs. Stop sending Google Docs with requirements. They want you labeling datasets and defining success through evals. 8. Bolt is just a really good prompt. Aman tore down Bolt's architecture - it's system prompts + tool calling + code generation. The "magic" isn't magic. 9. Side projects are your interview hack. When Aman asks "What are you building?" he can immediately gauge curiosity, initiative, and hands-on experience. 10. Don't automate yourself too early. Use AI as a second brain for analysis, but don't try to automate your entire job. Learn to work with reasoning models to push your thinking.

Aman KhanguestAakash Guptahost
Jun 15, 20252h 4mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 2:20

    Why AI PM skills are becoming career-critical

    Aman frames the shifting expectations for product managers and argues that AI PM teams are less likely to be cut amid broader layoffs. The conversation sets the stakes: PM roles are changing, and AI competence is becoming a differentiator.

    • PM expectations are rising as AI changes workflows and products
    • AI-focused PM work is viewed as more resilient during layoffs
    • The market is rewarding PMs who can build with AI now
    • AI is positioned as an accelerator, not a replacement, for PM craft
  2. 2:20 – 4:22

    Can anyone become an AI PM? Defining “AI-powered PM” vs “AI product PM”

    Aman answers whether any PM can become an AI PM by redefining the term. He distinguishes using AI to enhance PM work from building AI into products, and emphasizes AI as an overlay on domain expertise rather than a standalone identity.

    • Two modes: adopting AI in PM workflow vs building AI features
    • AI PM is an intersection (e.g., fintech × AI), not a separate track
    • Domain knowledge remains core; AI amplifies execution
    • Most PMs will become some flavor of AI PM over time
  3. 4:22 – 5:52

    Real-world AI use cases across “edge case” PM roles

    Aakash challenges common objections (regulated industries, internal tools) by sharing recent examples of PMs implementing AI anyway. Aman encourages embracing the label early to lead within your domain as hiring expectations evolve.

    • Internal tools: LLMs standardize messy PRDs/templates into structured hypotheses/metrics
    • Finance: AI-driven credit models and decisioning are actively being built
    • Hesitation about “hype” vs pragmatic career positioning
    • Earlier adoption helps you become the AI leader in your product domain
  4. 5:52 – 6:31

    The 5-step AI PM crash course roadmap

    Aakash introduces the episode’s structure: five practical skill areas to become effective as an AI PM. This becomes the backbone for the rest of the discussion and demo.

    • Step 1: AI prototyping (core hands-on skill)
    • Step 2: Observability (understanding what’s happening under the hood)
    • Step 3: Evals (measuring quality systematically)
    • Step 4: Prompting vs RAG vs fine-tuning (choosing the right lever)
    • Step 5: Working with AI engineers and researchers
  5. 6:31 – 8:47

    Why Cursor for prototyping (vs Bolt/Lovable/Replit/v0)

    Aman explains why he prefers Cursor as a PM prototyping environment: more control, deeper iteration, and better extensibility for agentic systems. They compare strengths of other tools and clarify when quick UI generators are enough.

    • Cursor is a VS Code fork optimized for AI-assisted coding and iteration
    • Other tools excel at fast mocks (front-end, deployment, Python agents), but offer less control
    • Cursor shines when you want to implement agents, customize components, and manage codebase complexity
    • There’s a learning curve, but higher long-term leverage for PMs
  6. 8:47 – 25:22

    Live build: Trip Planner agent in Cursor using LangGraph + UI

    They start from scratch and prompt Cursor’s agent to create a Trip Planner using LangGraph with a front-end. Aman explains key shortcuts, model choice (Claude 4), and how lightweight prompting plus iteration drives progress.

    • Key Cursor actions: terminal vs agent commands; natural language terminal guidance
    • Agent creates folders, installs dependencies, writes backend + React frontend
    • @web-style browsing/context enables the agent to reference docs/tutorials
    • Iterative workflow: ask, inspect, adjust—rather than perfect upfront prompts
  7. 25:22 – 45:51

    Debugging with an agent: dependency errors, ports, and resilience

    The demo hits realistic issues—missing modules, conflicting requirements, Node warnings, ports in use. Aman shows the core PM prototyping loop: copy errors, paste to agent, let it fix, and keep moving until it runs.

    • Use terminal errors as feedback; paste them into the agent to diagnose/fix
    • Expect Python/Node dependency friction when going beyond no-code tools
    • Agents can recover from interruptions and resume work with minimal guidance
    • Best practice: prototype on greenfield projects vs modifying complex production repos
  8. 45:51 – 50:15

    Testing the prototype: what makes it ‘agentic’ and how outputs are composed

    With the UI running, they clarify where agents live in the system and how responsibilities are divided (research, planning, budget, local curation). They test a Spain itinerary and discuss how turning a prompt into a form-based product changes usability and extensibility.

    • Agents = specialized prompt/context packages for sub-tasks (research, budget, planner, curator)
    • Form inputs become structured parameters; output becomes more productized than chat
    • Prototype can evolve: add search, APIs (booking), different UI flows
    • Demonstrates speed vs manual planning and multi-search workflows
  9. 50:15 – 1:01:26

    Observability: tracing the agent graph and seeing the real execution path

    Aman introduces observability as the bridge from prototype to understanding: tracing reveals the agent graph, parallel steps, prompts, tool calls, and latency. They show how tracing provides a visual map of the architecture Cursor generated.

    • Tracing visualizes the multi-agent workflow and the path taken per request
    • Shows parallel execution and how sub-agent outputs feed the final itinerary
    • Observability enables debugging, performance tuning, and architecture comprehension
    • Implementation is now lightweight: install a package + wrap functions/spans
  10. 1:01:26 – 1:09:59

    Prompt iteration in a playground: model choice, tone, length, and conversion hooks

    They move from observing to improving: adjust prompts to change level of detail, friendliness, and even add growth hooks like email capture + discount. They compare models and show how small prompt tweaks can dramatically change latency and output.

    • Prompt engineering controls tone, structure, brevity, and product behaviors
    • Model swaps affect speed/cost; prompts can become outdated as models evolve
    • A/B compare prompt versions using the same upstream agent context
    • Constraints (e.g., max length) can reduce latency and improve UX clarity
  11. 1:09:59 – 1:30:04

    Evals: from vibe coding to measurable quality (LLM-as-judge + human labels)

    Aman explains how to evaluate changes systematically using datasets, experiments, and evaluators. They demonstrate LLM-as-judge evals (friendliness, discount presence), discuss best practices (text labels over numeric), and show aligning judges with human annotations.

    • Build/bootstrapping datasets (hand-curated or synthetic) to test changes at scale
    • Three eval types: human labels, code-based checks, LLM-as-judge grading
    • Best practice: use text-based labels; include explanations for auditability
    • Meta-eval: compare judge outputs vs human labels to detect misalignment
  12. 1:30:04 – 1:38:02

    How Bolt/Lovable ‘magic’ works: system prompt + tools + code rendering

    They deconstruct Bolt as an example of an AI product pipeline: a strong system prompt, implicit tool constraints, code generation, execution, and iteration. The takeaway is that many “magical” products are understandable once you inspect prompts, tooling, and runtime limits.

    • Bolt relies heavily on a carefully crafted system prompt and environment constraints
    • Implicit tool calling via instructions (framework choices, terminal model)
    • Generated code is executed and rendered; errors trigger iterative repair
    • Opportunity: integrate evals to reduce code-generation mistakes and boost reliability
  13. 1:38:02 – 1:43:08

    RAG vs prompt engineering vs fine-tuning: choosing the right lever

    Aman provides a practical decision framework: use prompt engineering for behavior and tone, RAG for injecting relevant knowledge, and fine-tuning for specialization, reliability, cost, or latency. He also weighs effort vs impact and offers memorable mental models.

    • Prompt engineering: low effort, high impact for instructions/tone/output shape
    • RAG: adds context from a corpus; retrieval quality becomes the hard problem
    • Fine-tuning: changes the model layer; higher effort and tradeoffs in generality
    • PM framing: effort vs impact; prompts often yield big eval gains quickly
  14. 1:43:08 – 1:48:18

    Working with AI engineers/researchers: evals as requirements and shared tooling

    The final skill is collaboration: AI PMs need to operate closer to the data, traces, and evaluation metrics that drive model behavior. Aman argues PMs should write/own evals because they encode the desired user experience better than long PRDs.

    • Stakeholders expand beyond engineers to AI engineers/data scientists/researchers
    • PMs should judge outputs, label data, and define ‘good vs bad’ via evals
    • Shift from PRDs to ‘evals as requirements’ to drive iterative improvement
    • Shared tooling (traces, prompt hubs, datasets) improves team alignment and velocity
  15. 1:48:18 – 2:04:35

    What not to do + a 2-hours/week plan + the AI PM job market reality

    Aman closes with anti-patterns and a realistic learning plan: try tools, build intuition by tearing down products, then apply it in small side projects. They address skepticism about AI PM roles, citing job volume and compensation, and end with where to follow Aman and his course.

    • Don’t wait for models to get better; ship small side projects to learn continuously
    • Don’t over-automate immediately; use AI as a ‘second brain’ and pressure-test ideas
    • 2-hour routine: try tools → build intuition → apply via a small build loop
    • AI PM roles exist (often ‘PM, AI’); reported pay premiums and strong demand

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.