The ONE AI Skill Every Product Manager NEEDS in 2026

Today, we’ve got some of our most requested guests yet: Hamel Husain and Shreya Shankar, creators of the world’s best AI Evals cohort. You’ll learn everything you need to know about AI Evals, how to build them, common mistakes to avoid, and much more! If I were you, I’d stop everything right away and binge watch it right now & make my action plan to execute tomorrow. Also, we’ve also done a Newsletter deep dive with them, check it out - AI Evals: Everything You Need to Know to Start: https://www.news.aakashg.com/p/ai-evals 🎥 Timestamps: Preview - 00:00 Three reasons PMs NEED evals - 02:06 Why PMs shouldn't view evals as monotonous - 04:40 Are evals the hardest part of AI products solved? - 06:23 Why can't you just rely on human "vibe checks"? - 07:37 Ads - 12:11 Are LLMs good at 1-5 ratings? - 14:06 The "Whack-a-mole" analogy without evals - 15:45 Hallucination problem in emails (Apollo story) - 16:26 How Airbnb used machine learning models? - 21:22 Evaluating RAG Systems - 23:56 Ads - 29:52 Hill Climbing - 31:42 Red flag: Suspiciously high eval metrics - 35:51 Design principles for effective evals - 39:02 How OpenAI approaches evals - 42:42 Foundation models are trained on "average taste" - 44:39 Cons of fine-tuning - 49:36 Prompt engineering vs. RAG vs. Fine-tuning - 51:27 Introduction of "The Three Gulfs" framework - 53:00 Roadmap for learning AI evals - 56:04 Why error analysis is critical for LLMs - 01:01:41 Using LLM as a judge - 01:08:29 Frameworks for systematic problem-solving in labels - 01:10:15 Importance of niche and qualifying clients (Pro tips) - 01:17:42 $800K for first course cohort! - 01:18:43 Why end a successful cohort? - 01:20:15 GOLD advice for creating a successful course - 01:25:49 Outro - 01:33:39 ---- Podcast transcript: https://www.news.aakashg.com/p/hamel-shreya-podcast 💼 Check out our sponsors: 1. The AI Evals Course for PMs & Engineers :Get $800 off with this link - https://maven.com/parlance-labs/evals?promoCode=ag-product-growth 2. Jira Product Discovery: Plan with purpose, ship with confidence - https://www.atlassian.com/software/jira/product-discovery 3. Vanta: Automate compliance, security, and trust with AI (Get $1,000 with our link) - https://www.vanta.com/lp/demo-1k?utm_campaign=1k_offer&utm_source=product-growth&utm_medium=podcast 4. Product Faculty: Get $500 off the AI PM certification with code AAKASH25 - https://maven.com/product-faculty/ai-product-management-certification?promoCode=AAKASH25 👀 Where to Find Hamel & Shreya Hamel’s LinkedIn: https://www.linkedin.com/in/hamelhusain/ Shreya’s LinkedIn: https://www.linkedin.com/in/shrshnk/ 👨‍💻 Where to find Aakash: Twitter: https://www.twitter.com/aakashg0 LinkedIn: https://www.linkedin.com/in/aagupta/ Instagram: https://www.instagram.com/aakashg0/ 🔑 Key Takeaways: 1. Stop Guessing. Eval Your AI. Your AI isn’t an MVP without robust evaluations. Build in judgment — or you’re just shipping hope. Without evaluation, AI performance is a happy accident. 2. Error Analysis = Your Superpower. General metrics won’t save you. You need to understand why your AI messed up. Only then can you fix it — not just wish it worked better. 3. 99% Accuracy is a LIE. Suspiciously high metrics usually mean your evaluation setup is broken. Real-world AI is never perfect. If your evals say otherwise, they’re flawed. 4. Fine-Tuning is a Trap (Mostly). Fine-tuning is expensive, brittle, and often unnecessary. Start with smarter prompts and RAG. Only fine-tune if you must. 5. Your Data’s Wild. Understand It. You can’t eyeball everything. Without structured evaluation, you’ll drown in noise and never find patterns or fixes that matter. 6. Models Fail to Generalize. Always. Your AI will break on new data. Don’t blame it. Adapt it. Use RAG, upgrade inputs, and stop expecting out-of-the-box magic. 7. Your Prompts Are S**T. If your AI is bad, it’s probably your fault. The cheapest, most powerful fix? Sharpen your prompts. Clearer instructions = smarter AI. 8. Let AI Teach You. Seriously. LLM judges aren’t just scoring you — they can teach you. Reviewing how your AI fails is the best way to learn what great outputs should look like. #ai #aievals #aiproducts #aiprompt 🧠 About Product Growth: The world's largest podcast focused solely on product + growth, with over 175K listeners. Hosted by Aakash Gupta, who spent 16 years in PM, rising to VP of product, this 2x/ week show covers product and growth topics in depth. 🔔 Subscribe and like the video to support our content! And turn on the bell for notifications.

Aakash GuptahostHamel HusainguestShreya Shankarguest

Jul 11, 20251h 34mWatch on YouTube ↗

EPISODE INFO

Released: July 11, 2025
Duration: 1h 34m
Channel: Aakash Gupta
Watch on YouTube: ▶ Open ↗

EPISODE DESCRIPTION

Today, we’ve got some of our most requested guests yet: Hamel Husain and Shreya Shankar, creators of the world’s best AI Evals cohort. You’ll learn everything you need to know about AI Evals, how to build them, common mistakes to avoid, and much more! If I were you, I’d stop everything right away and binge watch it right now & make my action plan to execute tomorrow. Also, we’ve also done a Newsletter deep dive with them, check it out - AI Evals: Everything You Need to Know to Start: https://www.news.aakashg.com/p/ai-evals 🎥 Timestamps: Preview - 00:00 Three reasons PMs NEED evals - 02:06 Why PMs shouldn't view evals as monotonous - 04:40 Are evals the hardest part of AI products solved? - 06:23 Why can't you just rely on human "vibe checks"? - 07:37 Ads - 12:11 Are LLMs good at 1-5 ratings? - 14:06 The "Whack-a-mole" analogy without evals - 15:45 Hallucination problem in emails (Apollo story) - 16:26 How Airbnb used machine learning models? - 21:22 Evaluating RAG Systems - 23:56 Ads - 29:52 Hill Climbing - 31:42 Red flag: Suspiciously high eval metrics - 35:51 Design principles for effective evals - 39:02 How OpenAI approaches evals - 42:42 Foundation models are trained on "average taste" - 44:39 Cons of fine-tuning - 49:36 Prompt engineering vs. RAG vs. Fine-tuning - 51:27 Introduction of "The Three Gulfs" framework - 53:00 Roadmap for learning AI evals - 56:04 Why error analysis is critical for LLMs - 01:01:41 Using LLM as a judge - 01:08:29 Frameworks for systematic problem-solving in labels - 01:10:15 Importance of niche and qualifying clients (Pro tips) - 01:17:42 $800K for first course cohort! - 01:18:43 Why end a successful cohort? - 01:20:15 GOLD advice for creating a successful course - 01:25:49 Outro - 01:33:39 ---- Podcast transcript: https://www.news.aakashg.com/p/hamel-shreya-podcast 💼 Check out our sponsors:
1. The AI Evals Course for PMs & Engineers :Get $800 off with this link - https://maven.com/parlance-labs/evals?promoCode=ag-product-growth
1. Jira Product Discovery: Plan with purpose, ship with confidence - https://www.atlassian.com/software/jira/product-discovery
1. Vanta: Automate compliance, security, and trust with AI (Get $1,000 with our link) - https://www.vanta.com/lp/demo-1k?utm_campaign=1k_offer&utm_source=product-growth&utm_medium=podcast
1. Product Faculty: Get $500 off the AI PM certification with code AAKASH25 - https://maven.com/product-faculty/ai-product-management-certification?promoCode=AAKASH25 👀 Where to Find Hamel & Shreya Hamel’s LinkedIn: https://www.linkedin.com/in/hamelhusain/ Shreya’s LinkedIn: https://www.linkedin.com/in/shrshnk/ 👨‍💻 Where to find Aakash: Twitter: https://www.twitter.com/aakashg0 LinkedIn: https://www.linkedin.com/in/aagupta/ Instagram: https://www.instagram.com/aakashg0/ 🔑 Key Takeaways:
1. Stop Guessing. Eval Your AI. Your AI isn’t an MVP without robust evaluations. Build in judgment — or you’re just shipping hope. Without evaluation, AI performance is a happy accident.
1. Error Analysis = Your Superpower. General metrics won’t save you. You need to understand why your AI messed up. Only then can you fix it — not just wish it worked better.
1. 99% Accuracy is a LIE. Suspiciously high metrics usually mean your evaluation setup is broken. Real-world AI is never perfect. If your evals say otherwise, they’re flawed.
1. Fine-Tuning is a Trap (Mostly). Fine-tuning is expensive, brittle, and often unnecessary. Start with smarter prompts and RAG. Only fine-tune if you must.
1. Your Data’s Wild. Understand It. You can’t eyeball everything. Without structured evaluation, you’ll drown in noise and never find patterns or fixes that matter.
1. Models Fail to Generalize. Always. Your AI will break on new data. Don’t blame it. Adapt it. Use RAG, upgrade inputs, and stop expecting out-of-the-box magic.
1. Your Prompts Are S**T. If your AI is bad, it’s probably your fault. The cheapest, most powerful fix? Sharpen your prompts. Clearer instructions = smarter AI.
1. Let AI Teach You. Seriously. LLM judges aren’t just scoring you — they can teach you. Reviewing how your AI fails is the best way to learn what great outputs should look like. #ai #aievals #aiproducts #aiprompt 🧠 About Product Growth: The world's largest podcast focused solely on product + growth, with over 175K listeners. Hosted by Aakash Gupta, who spent 16 years in PM, rising to VP of product, this 2x/ week show covers product and growth topics in depth. 🔔 Subscribe and like the video to support our content! And turn on the bell for notifications.

SPEAKERS

Aakash Gupta
host
Product leader and creator (Aakash Gupta channel), formerly VP of Product at Apollo.io, focused on AI product management.
Hamel Husain
guest
AI/ML practitioner and consultant (ex-Airbnb, GitHub) specializing in LLM evaluation methods and production eval pipelines.
Shreya Shankar
guest
AI evaluation researcher/educator (UC Berkeley email provided), co-teaches LLM evals and authored work on validator reliability (e.g., 'Who Validates the Validators').

EPISODE SUMMARY

In this episode of Aakash Gupta, featuring Aakash Gupta and Hamel Husain, The ONE AI Skill Every Product Manager NEEDS in 2026 explores aI evals: the core PM skill to ship reliable products AI evals let PMs encode their taste and product judgment into the development critical path instead of relying on unscalable “vibe checks.”

RELATED EPISODES