Aakash GuptaThe ONE AI Skill Every Product Manager NEEDS in 2026
Episode Details
EPISODE INFO
- Released
- July 11, 2025
- Duration
- 1h 34m
- Channel
- Aakash Gupta
- Watch on YouTube
- ▶ Open ↗
EPISODE DESCRIPTION
Today, we’ve got some of our most requested guests yet: Hamel Husain and Shreya Shankar, creators of the world’s best AI Evals cohort. You’ll learn everything you need to know about AI Evals, how to build them, common mistakes to avoid, and much more! If I were you, I’d stop everything right away and binge watch it right now & make my action plan to execute tomorrow. Also, we’ve also done a Newsletter deep dive with them, check it out - AI Evals: Everything You Need to Know to Start: https://www.news.aakashg.com/p/ai-evals 🎥 Timestamps: Preview - 00:00 Three reasons PMs NEED evals - 02:06 Why PMs shouldn't view evals as monotonous - 04:40 Are evals the hardest part of AI products solved? - 06:23 Why can't you just rely on human "vibe checks"? - 07:37 Ads - 12:11 Are LLMs good at 1-5 ratings? - 14:06 The "Whack-a-mole" analogy without evals - 15:45 Hallucination problem in emails (Apollo story) - 16:26 How Airbnb used machine learning models? - 21:22 Evaluating RAG Systems - 23:56 Ads - 29:52 Hill Climbing - 31:42 Red flag: Suspiciously high eval metrics - 35:51 Design principles for effective evals - 39:02 How OpenAI approaches evals - 42:42 Foundation models are trained on "average taste" - 44:39 Cons of fine-tuning - 49:36 Prompt engineering vs. RAG vs. Fine-tuning - 51:27 Introduction of "The Three Gulfs" framework - 53:00 Roadmap for learning AI evals - 56:04 Why error analysis is critical for LLMs - 01:01:41 Using LLM as a judge - 01:08:29 Frameworks for systematic problem-solving in labels - 01:10:15 Importance of niche and qualifying clients (Pro tips) - 01:17:42 $800K for first course cohort! - 01:18:43 Why end a successful cohort? - 01:20:15 GOLD advice for creating a successful course - 01:25:49 Outro - 01:33:39 ---- Podcast transcript: https://www.news.aakashg.com/p/hamel-shreya-podcast 💼 Check out our sponsors:
1. The AI Evals Course for PMs & Engineers :Get $800 off with this link - https://maven.com/parlance-labs/evals?promoCode=ag-product-growth
1. Jira Product Discovery: Plan with purpose, ship with confidence - https://www.atlassian.com/software/jira/product-discovery
1. Vanta: Automate compliance, security, and trust with AI (Get $1,000 with our link) - https://www.vanta.com/lp/demo-1k?utm_campaign=1k_offer&utm_source=product-growth&utm_medium=podcast
1. Product Faculty: Get $500 off the AI PM certification with code AAKASH25 - https://maven.com/product-faculty/ai-product-management-certification?promoCode=AAKASH25 👀 Where to Find Hamel & Shreya Hamel’s LinkedIn: https://www.linkedin.com/in/hamelhusain/ Shreya’s LinkedIn: https://www.linkedin.com/in/shrshnk/ 👨💻 Where to find Aakash: Twitter: https://www.twitter.com/aakashg0 LinkedIn: https://www.linkedin.com/in/aagupta/ Instagram: https://www.instagram.com/aakashg0/ 🔑 Key Takeaways:
1. Stop Guessing. Eval Your AI. Your AI isn’t an MVP without robust evaluations. Build in judgment — or you’re just shipping hope. Without evaluation, AI performance is a happy accident.
1. Error Analysis = Your Superpower. General metrics won’t save you. You need to understand why your AI messed up. Only then can you fix it — not just wish it worked better.
1. 99% Accuracy is a LIE. Suspiciously high metrics usually mean your evaluation setup is broken. Real-world AI is never perfect. If your evals say otherwise, they’re flawed.
1. Fine-Tuning is a Trap (Mostly). Fine-tuning is expensive, brittle, and often unnecessary. Start with smarter prompts and RAG. Only fine-tune if you must.
1. Your Data’s Wild. Understand It. You can’t eyeball everything. Without structured evaluation, you’ll drown in noise and never find patterns or fixes that matter.
1. Models Fail to Generalize. Always. Your AI will break on new data. Don’t blame it. Adapt it. Use RAG, upgrade inputs, and stop expecting out-of-the-box magic.
1. Your Prompts Are S**T. If your AI is bad, it’s probably your fault. The cheapest, most powerful fix? Sharpen your prompts. Clearer instructions = smarter AI.
1. Let AI Teach You. Seriously. LLM judges aren’t just scoring you — they can teach you. Reviewing how your AI fails is the best way to learn what great outputs should look like. #ai #aievals #aiproducts #aiprompt 🧠 About Product Growth: The world's largest podcast focused solely on product + growth, with over 175K listeners. Hosted by Aakash Gupta, who spent 16 years in PM, rising to VP of product, this 2x/ week show covers product and growth topics in depth. 🔔 Subscribe and like the video to support our content! And turn on the bell for notifications.
SPEAKERS
Aakash Gupta
hostProduct leader and creator (Aakash Gupta channel), formerly VP of Product at Apollo.io, focused on AI product management.
Hamel Husain
guestAI/ML practitioner and consultant (ex-Airbnb, GitHub) specializing in LLM evaluation methods and production eval pipelines.
Shreya Shankar
guestAI evaluation researcher/educator (UC Berkeley email provided), co-teaches LLM evals and authored work on validator reliability (e.g., 'Who Validates the Validators').
EPISODE SUMMARY
In this episode of Aakash Gupta, featuring Aakash Gupta and Hamel Husain, The ONE AI Skill Every Product Manager NEEDS in 2026 explores aI evals: the core PM skill to ship reliable products AI evals let PMs encode their taste and product judgment into the development critical path instead of relying on unscalable “vibe checks.”
RELATED EPISODES
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome




