State-Of-The-Art Prompting For AI Agents

State-Of-The-Art Prompting For AI Agents

Y CombinatorMay 30, 202531m

Garry Tan (host), Jared Friedman (host), Diana Hu (host), Harj Taggar (host)

Detailed anatomy of a state-of-the-art customer support agent prompt (ParaHelp)Prompt architecture: system, developer, and user prompts for vertical AIMeta prompting, prompt folding, and example-driven refinement of promptsDesigning escape hatches and debug channels to reduce hallucinationsEvals as the core data moat and product–user fit for AI agentsThe “forward deployed engineer” model and founders embedded with customersModel selection and personalities (Claude, LLaMA 4, GPT-4o, Gemini 2.5) and rubric-based scoring

In this episode of Y Combinator, featuring Garry Tan and Jared Friedman, State-Of-The-Art Prompting For AI Agents explores inside Frontier AI Startups: Meta Prompting, Evals, And Forward-Deployed Founders The episode dissects how top AI startups are actually building high-performing agents, using a detailed ParaHelp customer-support prompt as a case study. The hosts explain emerging prompt architectures (system/developer/user prompts), meta prompting techniques, and patterns like prompt folding, example-driven refinement, and giving models explicit escape hatches. They argue that evals, not prompts, are the true competitive moat, and connect this to the founder’s role as a “forward deployed engineer” deeply embedded in users’ workflows. Different model personalities (e.g., GPT-4o, Gemini 2.5, Claude, LLaMA 4) and toolchains like Gemini’s thinking traces are highlighted as crucial levers for debugging and scaling agents.

Inside Frontier AI Startups: Meta Prompting, Evals, And Forward-Deployed Founders

The episode dissects how top AI startups are actually building high-performing agents, using a detailed ParaHelp customer-support prompt as a case study. The hosts explain emerging prompt architectures (system/developer/user prompts), meta prompting techniques, and patterns like prompt folding, example-driven refinement, and giving models explicit escape hatches. They argue that evals, not prompts, are the true competitive moat, and connect this to the founder’s role as a “forward deployed engineer” deeply embedded in users’ workflows. Different model personalities (e.g., GPT-4o, Gemini 2.5, Claude, LLaMA 4) and toolchains like Gemini’s thinking traces are highlighted as crucial levers for debugging and scaling agents.

Key Takeaways

Treat complex prompts like code: structured, modular, and heavily commented.

The best production prompts, like ParaHelp’s, are multi-page documents with clear sections (role, task, plan, constraints, output format) and markdown-style structure or XML-like tags, which makes them more interpretable for both humans and LLMs.

Get the full analysis with uListen AI

Separate system, developer, and user prompts to balance reuse and customization.

Use a general-purpose system prompt to define the agent’s core behavior, a developer prompt to inject customer- or workflow-specific logic, and a user prompt for end-user inputs; this prevents becoming a bespoke consulting shop while still handling per-customer nuances.

Get the full analysis with uListen AI

Use meta prompting and prompt folding to let LLMs improve their own prompts.

By asking a powerful model (e. ...

Get the full analysis with uListen AI

Always give agents real escape hatches and a debug channel.

To reduce hallucinations, explicitly allow the model to say it lacks information or to write a “complaint”/debug field describing missing context; reviewing this debug info yields a to-do list for improving the prompt and surrounding system.

Get the full analysis with uListen AI

High-quality evals are a stronger moat than the prompts themselves.

Evals encode the real-world reward function of niche users (e. ...

Get the full analysis with uListen AI

Founders must act as forward deployed engineers embedded with customers.

The winning pattern is technical founders who sit physically with users, rapidly encode their workflows into prompts and evals, and return with demos that make buyers feel uniquely understood—enabling six- and seven-figure vertical AI deals unusually fast.

Get the full analysis with uListen AI

Choose models based on personality and task fit, not just benchmarks.

Different models behave differently with the same rubric—some strictly follow scoring rules (like “soldiers”), others flexibly reason about exceptions (like “high-agency employees”), so model choice should match whether you need rigid adherence or nuanced judgment.

Get the full analysis with uListen AI

Notable Quotes

Meta prompting is turning out to be a very, very powerful tool that everyone's using now.

Garry Tan

They actually don't consider the prompts to be the crown jewels. Like the evals are the crown jewels, because without the evals, you don't know why the prompt was written the way that it was.

Jared

It kind of actually feels like coding in 1995. The tools are not all the way there. We're in this new frontier.

Garry Tan

A good way of thinking about it [is] that founders should think about themselves as being the forward deployed engineers of their own company.

Jared

Personally, it also kind of feels like learning how to manage a person, where it's like, how do I actually communicate the things they need to know in order to make a good decision?

Garry Tan

Questions Answered in This Episode

How can small teams systematically build and maintain high-quality evals without sitting in-person with every type of end user?

The episode dissects how top AI startups are actually building high-performing agents, using a detailed ParaHelp customer-support prompt as a case study. ...

Get the full analysis with uListen AI

What concrete criteria should guide when to split logic between system, developer, and user prompts versus moving it into code or tools?

Get the full analysis with uListen AI

How do you decide when to rely on meta prompting and prompt folding versus manually rewriting prompts by hand?

Get the full analysis with uListen AI

What’s the best way to design and operationalize “escape hatches” so that agents stop, ask for help, and surface useful debug info instead of hallucinating?

Get the full analysis with uListen AI

As models’ personalities diverge, how should startups design systems that combine multiple models—rigid and flexible—within a single multi-stage workflow?

Get the full analysis with uListen AI

Transcript Preview

Garry Tan

Meta prompting is turning out to be a very, very powerful tool that everyone's using now. It kind of actually feels like coding in, you know, 1995. Like, the tools are not all the way there. We're, you know, in this new frontier. But personally, it also kind of feels like learning how to manage a person (laughs) where it's like, how do I actually communicate, uh, you know, the things they need to know in order to make a good decision?

Speaker

(intro music)

Garry Tan

Welcome back to another episode of The Light Cone. Today, we're pulling back the curtain on what is actually happening inside the best AI startups when it comes to prompt engineering. We surveyed more than a dozen companies and got their take right from the frontier of building this stuff, the practical tips. Jared, why don't we start with an example from one of your best AI startups.

Jared Friedman

I managed to get an example from a company called ParaHelp. ParaHelp does AI customer support. There are a bunch of companies who- who are doing this, but ParaHelp is doing it really, really well. They're actually powering the customer support for Perplexity and Replit and Bolt and a bunch of other, like, top AI companies now. So, if you- if you go and you, like, email a customer support ticket into Perplexity, what's actually responding is, like, their AI agent. The cool thing is that the ParaHelp guys very graciously agreed to show us the actual prompt that is powering this agent, um, and to put it on screen on YouTube for the entire world to see. Um, it's, like, relatively hard to get these prompts for vertical AI agents 'cause they're kind of like the crown jewels of the IP of these companies, and so very grateful to the ParaHelp guys for agreeing to basically, like, open source this prompt.

Garry Tan

Diana, can you walk us through this very detailed prompt? It's super interesting, and it's very rare to get a chance to see this in action.

Diana Hu

So, the interesting thing about this prompt is actually, first, it's really long, it's very detailed. In this document, you can see it's like six pages long, just scrolling through it. The big thing that a lot of the best prompts start with is this concept of, uh, setting up the role of the LLM. You're a manager of a customer service agent, and it breaks it down into bullet points what it needs to do. Then the big thing is telling the- the task, which is to approve or reject a tool call, because it's orchestrating agent calls from all these other ones. And then it gives it a bit of the high level plan. It breaks it down step by step. You see steps one, two, three, four, five, and then it gives some of the important things to keep in mind that it should not kind of go weird into calling different kinds of tools. It tells them how to structure the output, because a lot of things with agents is you need them to integrate with other agents. It's almost like gluing the API call. So, the- it's important to specify that it's gonna give certain, uh, output of accepting or rejecting and in this format. Then this is sort of the high level section, and one thing that the best prompts do, they break it down sort of in this markdown type of style, uh, formatting. So, you have sort of the heading here, and then later on, it goes into more details on how to do the planning, and you see this is like a sub-bullet part of it. And as part of the plan, there's actually three big sections, is how to plan, and then how to create each of the steps in the plan, and then the high level example of the plan. One big thing about the best prompts is they outline how to reason about the task, and then a big thing is giving it- giving it an example, "And this is what it does." And one thing that's interesting about this, it- it looks more like programming than writing English, because it has this, uh, XML tag kind of format to specify sort of the plan. We found that it makes it a lot easier for LLMs to follow, because a lot of LLMs were post-trained in RLHF with kind of XML type of input, and it turns out to produce better results.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome