Episode Details
EPISODE INFO
- Released
- May 23, 2026
- Duration
- 39m
- Channel
- Claude
- Watch on YouTube
- ▶ Open ↗
EPISODE DESCRIPTION
Built rubric-driven replayable eval system from real user projects giving quality, cost, latency, error, token signals in under 6 hours per model change. Evolved into dev flywheel powered by real user dissatisfaction signals.
EPISODE SUMMARY
In this episode of Claude, Evals for taste: Hill-climbing a slide-generation agent explores building actionable evals to iteratively improve slide-generation agents fast Evals are positioned as the actionable bridge between subjective “vibes” and measurable signals for improving AI agents before issues hit production.
RELATED EPISODES
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome