The prompting playbook

How to apply core prompting principles to agentic systems that plan, act, and adapt.

May 22, 202633mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Prompting best practices: evals, hygiene, tools, and agentic loops

The talk frames prompting work around two common engineering scenarios: debugging an existing production prompt (often after a model migration) and designing a new agentic use case from scratch.
It emphasizes evaluations as the starting point for prompt iteration, helping distinguish “model behavior differences” that prompting can fix from true capability gaps that prompting cannot.
General prompt hygiene—removing redundant/copied content, clarifying role, and adding structure (e.g., XML sections)—can produce immediate performance gains before tackling specific failure modes.
Three concrete failure modes are addressed systematically: information withholding due to over-defensive instructions, unreliable mental math solved by tool use, and escalation failures caused by one-sided cost incentives in policy text.
For a constraint-heavy scheduling agent, the session compares model choice and prompting strategies, showing that a generate–evaluate–repair loop can outperform a single large prompt in cost/latency while supporting soft constraints at runtime.

IDEAS WORTH REMEMBERING

5 ideas

Start with evals to make prompt changes measurable.

Without an eval suite you can’t tell whether edits genuinely improve behavior or just shift failures around; evals also help determine whether issues are prompt-tunable or due to insufficient model capability.

Include control, edge, and boundary cases in every eval suite.

Control cases should always pass, edge cases prevent regressions on known failures, and boundary cases verify correct handoffs/refusals when the model shouldn’t proceed.

Prompt structure is not cosmetic—it changes performance.

Separating role, policy, guidelines, tone, and data (e.g., via XML tags) reduces instruction confusion; if a human can’t easily parse the prompt, the model likely can’t either.

Over-defensive “patch” instructions can cause information withholding.

The bot redirected hotspot questions to a URL because “never give the wrong plan details” dominated; updating the policy to treat provided customer context as the source of truth restored correct answers.

Instructions don’t add capability; tools do.

Telling a model to “calculate proration correctly” won’t fix mental-math unreliability; adding a proration tool with schema + implementation made the result consistently correct.

WORDS WORTH SAVING

5 quotes

And prompting is arguably one of the first skills, if not the first skill, that we had to learn as engineers when we first started to work with LLMs.

— Margot Van Lare

We need evaluations to provide that rigor, um, to understand whether a change to our prompt is actually correlating to an improvement in its performance.

— Margot Van Lare

A general rule of thumb that I like to follow is if you're reading a prompt and you can't tell guidelines from policy, from data, most likely the model isn't able to either.

— Margot Van Lare

We worry a lot about hallucinations or the invention of facts and numbers, but actually the opposite can also happen. The model can withhold information that it actually has access to.

— Margot Van Lare

So the key lesson to take away here is instructions don't add capability.

— Margot Van Lare

Eval suites for prompt iterationControl cases, edge cases, and capability boundariesPrompt hygiene: role clarity, removing cruft, structured sectionsOutput contracts, stop sequences, structured outputsFailure-mode-driven prompt debuggingTools for reliable calculation (vs mental math)Agentic generate–evaluate–repair loops and model/latency trade-offs

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.