Advanced Context Engineering for Agents

Dexter Horthy, founder of Human Layer, shares what his team has learned about scaling coding agents in real-world software projects. He walks through why naive back-and-forth prompting fails, how spec-first development keeps teams aligned, and why “everything is context engineering.” From compaction strategies to subagents and planning workflows, he shows how intentional context management turns AI coding from prototypes into production. Chapters: 00:09 - The Origin of Context Engineering 00:46 - Key Talks and Insights from AI Engineering 01:45 - Challenges with AI in Complex Systems 03:12 - The Shift to Spec-First Development 04:03 - Advanced Context Engineering for Coding Agents 04:48 - Intentional Compassion in Context Management 05:45 - Optimizing Context Utilization 07:27 - The Role of Subagents in Context Control 08:48 - Frequent Intentional Compaction 11:00 - Practical Implementation and Workflow 11:12 - Case Study: Fixing a Rust Code Base 11:59 - Insights on Effective Coding Practices 12:44 - Reviewing Features, Research, and Plans 13:30 - Conclusion and Future Directions

Dexter Horthyhost

Aug 25, 202514mWatch on YouTube ↗

EVERY SPOKEN WORD

15 min read · 3,141 words

0:00 – 0:09
Intro
1. DHDexter Horthy
  Hi, everybody. Uh, my name's Dex. I'm the founder of a company called Human Layer. Uh, I was in the fall twenty-- apparently we're all YC founders on stage today. I was in the fall '24 batch.
0:09 – 0:46
The Origin of Context Engineering
1. DHDexter Horthy
  Um, I'm giving you a little itty bitty history of context engineering and the term. Um, long before Toby, and Andre, and Walden were tweeting about this in, uh, June. In April 22nd, uh, we wrote a, a weird little manifesto called Twelve-Factor Agents, um, Principles of Reliable LLM Applications. And then on June 4th, um, and shouts out, I did not know Swix was gonna be here, but he's getting a shout-out anyways. Uh, changed the title of the talk to Context Engineering, um, to give us a shout-out for that. Um, so everyone's been asking me, "What's next?" We did the context engineering thing, um, we talked about how to build good
0:46 – 1:45
Key Talks and Insights from AI Engineering
1. DHDexter Horthy
  agents. Um, I will point out my two favorite talks from AI Engineer this year. Um, incidentally, the only two talks with more views than Twelve-Factor Agents. Um, [laughs] number one is, uh, Sean Grove, The New Code. Um, he talked about how, um... If I can figure out how to use this. Um, he talked about how we're all vibe coding wrong, and how the idea of sitting and talking to an agent for two hours and figuring out and exactly specifying what you wanna do, and then throwing away all the prompts and committing the code is basically equivalent to, um, if you're a Java developer and you spend six hours writing a bunch of Java code, and then you compile the JAR, and then you checked in the compiled asset, and you throw away the code. In the future where AI is writing more and more of our code, the specs, the, the s- the description of what we want from our software is the important thing. Um, and then we had the Stanford study, which was a super interesting talk. Um, they ingested data from a hundred thousand developers of all size, giant enterprises down to
1:45 – 3:12
Challenges with AI in Complex Systems
1. DHDexter Horthy
  small startups. Um, and they found that, like, AI engineering and software leads to a lot of rework. So even if you get benefits, you're actually throwing half of it away 'cause it's kinda sloppy sometimes. Uh, and it just doesn't work for complex tasks or brownfield tasks. Um, so old code bases, legacy code bases, things like that. Um, and especially for h- complex brownfield tasks, it can be counterproductive. Um, not even that it doesn't really help that much, but it can actually slow people down. Um, and this kinda matched my experience. Uh, talking to lots of smart founders, it's like, "Uh, yeah, coding agents is good for prototypes." Even Amjad from Replit, um, was on a podcast six months ago, and he's like, "Yeah, our product managers use this to build prototypes. And then when we figure it out, we give it to the engineers, and they build production." Um, doesn't work in big repos, doesn't work for complex systems. Maybe someday when the models get smarter, we'll be able to have AI write all of our code. But that is what context engineering is all about. How do we get the most out of today's models? Um, so I'm gonna tell you a story about kind of a journey we've been on the last couple months of learning to do better context engineering with AI-generated code. Um, so I was working with one of the best AI coders I've ever met. Um, they were shipping... Every couple days, I would get a two thousand line PR of Go code, and this was not a CRUD app or a Next.js API. This was complex systems code with race conditions and shutdown order and all this crazy stuff. Um, and it-- I just couldn't review it. I was like, "I, I hope you know I'm not gonna read this next two thousand lines of Go code."
3:12 – 4:03
The Shift to Spec-First Development
1. DHDexter Horthy
  Um, and so we were forced to adopt spec-first development because it was the only way for everyone to stay on the same page. And I actually learned to let go. I still read all the tests, but I no longer read every line of code because I read the specs, and I know they're right. And it took a long time, and it was very uncomfortable. But over eight weeks or so, we made this transformation, and now we're flying. We love it. So I'm gonna talk about a, a couple of things we learned on this process. Um, I know it works because I shipped six PRs last Thursday, and I haven't opened a non-markdown file in an editor in almost two months. Um, so the goals. I didn't set these goals. I was forced to adopt these goals. Uh, but the goals are: works well in big, complex code bases, solves big, complex problems, no slop, we're shipping production code, and everyone stays on the same page. Oh, and spend as many tokens as possible. [laughing] Um, this is Advanced Context Engineering for Coding
4:03 – 4:48
Advanced Context Engineering for Coding Agents
1. DHDexter Horthy
  Agents. Um, I wanna talk about the most naive way to use a coding agent, which is to shout back and forth with it until you run out of context or you give up or you cry. Um, and you say, "No, do this. No, stop. You're doing it wrong." Um, you can be a little bit smarter about this. Um, basically, if you notice the agent is off track, a lot of people have done this. I've seen some people from OpenAI post about this. This is pretty common. If it's really screwing up, you just, you just stop, and you start over and you say, "Okay, try again, but make sure not to try that because that doesn't work." Uh, if you're wondering when you should consider starting over with a fresh context, if you see this, it's probably time to start over and try again. [laughing] Um, we can be smarter about this, though.
4:48 – 5:45
Intentional Compassion in Context Management
1. DHDexter Horthy
  Um, and this is what I call Intentional Compaction. So it's not just start over and I'm gonna tell you something different, put my same prompt in with a little bit of steering. But even if we're on the right track, if we're starting to run out of context, um, be very intentional with what you commit to the file system and the agent's memory. I think /compact is trash. I never use it. Um, we have it write out a progress file very specifically, which is, like, my vibe of what I found works really well for these things. Uh, and then we use that to onboard the next agent into whatever we were working on. Um, what are we compacting? Why-- Like, how did I get to this? Lots of people have instincts about what works here. Um, so the question is like, what takes up space in the context window? Looking for files, understanding the flow, doing edits, doing work. If you have MCP tools that return big blobs of JSON, that's gonna flood your context window with a bunch of nonsense. Um, so what should we compact? We'll get onto, like, what exactly goes in there. Um, but it looks something like this. Um, and I'll talk about the structure of this file a little bit more.
5:45 – 7:27
Optimizing Context Utilization
1. DHDexter Horthy
  Um, why are we obsessed with context? Because LLMs are pure functions. I think Jake said a lot of interesting things about this. The only thing, other than, like, training your own models and messing with the temperature, the only thing that improves the quality of your outputs is the quality of what you put in, which is your context window. Um-And in a coding agent, your agent is constantly looping over determining what's the right next tool to call, what's the right next edit to make, and the only thing that determines its ability to do that well is what is in your context window going in. And, uh, we'll throw this one in too. Everything is context engineering. Everything that makes agents good is context engineering. So we're gonna optimize for correctness, completeness, size, and trajectory. I'm not gonna talk about, a lot about trajectory 'cause it's very vibes-based right now, um, but to invert that, the worst thing to have in your context window is bad info. Second worst thing is missing info, and then just too much noise. And if you wanted an equation, we made this dumb equation. Um, Jeff figured this out. Uh, well, Jeff-- Lots of people f- are figuring this out, but Jeff Hunley works on Sourcegraph Amp, um, which I know Beyond was supposed to be speaking tonight. I'm sure-- I hope he will appreciate this talk, uh, in the spirit of what they've been talking about. Um, you got about a hundred and seventy thousand tokens. The less of them you use to do the work, the better results you will get. Um, he wrote this thing called Ralph Wiggum as a software engineer, um, and he talks about, hey, this is the dumbest way to use coding agents and it works really, really well, which is just to run a same prompt in a loop overnight for twelve hours while he's asleep in Australia and put it on a live stream. I actually think that he's being humble. It's a very, very smart way to use coding agents if you understand LLMs and context windows. Um, I'll link that article as well. Um, I'll put up a QR code at the end with everything.
7:27 – 8:48
The Role of Subagents in Context Control
1. DHDexter Horthy
  Um, the next step is you can do inline compaction with subagents. A lot of people saw Cloud Code subagents, and they jumped in, and they said, "Okay, cool. I'm gonna have my product manager and my data scientist and my front-end engineer," and, like, maybe that works, um, but they're really about context control. And so a really common task that people use subagents for when they're doing this kind of, like, high-level coding agents is they will find-- You wanna find where something happens, or you wanna understand how information flows across multiple components at a code base. Um, you will say, maybe you'll steer it to use a subagent. A lot of models have in their system prompts to use a subagent automatically. And you say, "Hey, go find where this happens." And then the m- parent model will call a tool that says, "Go give this message to a subagent." The subagent goes and finds where the file is, returns it to the parent agent. The parent agent can get right to work without having to have the context burden of all of that reading and searching. Um, and the ideal subagent response looks something like this. And I'm not gonna talk about how we made this or where it comes from yet. Um, there's a lot to be said about subagents. The challenge of, like, playing telephone and, like, you care about the thing that comes back from the subagents. How do you prompt the parent model to prompt the child model about how it should return its information? Uh, if you've ever seen this thing, we're doing basically, uh, what is it? Uh, stochastic system. This is a d- deterministic system, and it gets chaotic. Imagine with non-deterministic systems.
8:48 – 11:00
Frequent Intentional Compaction
1. DHDexter Horthy
  Um, so what works even better than subagents and the thing that we're doing every day now is what I call frequent intentional compaction, building your entire development workflow around context management. Um, so our goal all the time is to keep context utilization under forty percent, and, uh, we have three phases: research, plan, and implement. Um, the research is really, like, understand how the system works and all the files that matter and perhaps, like, where a problem is located. This is our research prompt. It's really long. It's open source. You can go find it. This is the output of our research prompt. It's got file names and line numbers so that the agent reading this research knows exactly where to look. It doesn't have to go search a hundred files to figure out how things work. Um, the planning step is really just, like, tell me every single change you're gonna make, not line by line, but, like, include the files and the ch- snippets of what you're gonna change and be very explicit about how we're going to test and verify at every step. So this is our planning prompt. This is one of our plans. Um, and then we implement, and we go write the code. And honestly, if the plan is good, I'm never shouting at Cloud, Cloud anymore. And if I'm shouting at Cloud, it's 'cause the plan was bad. And the plan is always much shorter than the code changes, sometimes, most of the time. Um, and as you're implementing, we keep the context under forty percent. So constantly, we update the plan. We say, "This is done. Onto the next phase, new context window." Um, this is our implement prompt. These are all open source. I'll tell you where to find them. Um, this is not magic. You have to read this shit. It will not work. And so we build it around intentional human review steps because a research file is a lot easier to read than a two thousand line PR. But you can stop problems early. This is our l-l-linear workflow for how we move this stuff through the process. Um, and I want to talk... Does anyone know what code review is for? Anybody? Yeah, me neither. [laughing] Um, code review is a, about a lot of things, but the most important part is mental alignment, keeping the people on the team aware of how the system is changing and why as it evolves over time. Um, I can't read two thousand lines of Golang every day, but I can sure as heck read two hundred lines of an implementation plan. Um, and if the plans are good, that's enough because we can catch problems early, and we can maintain shared understanding of what's happening in our code.
11:00 – 11:12
Practical Implementation and Workflow
1. DHDexter Horthy
  Um, so putting this into practice, uh, I do a podcast with another YC founder named Vaibhav. He builds Baml. I don't know. Has anyone in here u-used Baml before? All right, we got a couple Baml guys. Um, I decided-- I didn't tell Vaibhav
11:12 – 11:59
Case Study: Fixing a Rust Code Base
1. DHDexter Horthy
  I was doing this. We decided to see if we could one-shot a fix to a three hundred thousand line Rust code base. Um, and the episode is seventy-five minutes, and we go through the whole process of all the things that we tried and what worked and what didn't work and what we learned. Um, I'm not gonna go into it. I'll just give you a link. But we did get it merged. The PR was so good, the CTO did not know I was doing it as a bit, and he had merged it by the time we were recording the episode. Um, so confirmed, works in brownfield code bases and no slop. It got merged. Um, and I wanna see if it could solve a c-complex problem. So I sat down with, uh, the Boundary CEO, and for seven hours, we sat down, and we shipped thirty-five thousand lines of code. Um, a little bit of it was generated, but we wrote a lot of code that day. And, uh, he estimated that was one to two weeks of work, roughly. Um, so it can solve complex problems. It can add Wasm support to a programming language.
11:59 – 12:44
Insights on Effective Coding Practices
1. DHDexter Horthy
  Um, and so the biggest insight from here that I would ask you to take away is that a bad line of code is a bad line of code.And a bad part of a plan can be hundreds of bad lines of code. And a bad line of research, a misunderstanding of how the system works and how data flows and where things happen can be thousands of bad lines of code. And so you have this hierarchy of where do you spend your time, and yes, the code is important and it has to be correct, but you can get a lot more for your time by focusing on specifying the right problem and what you want and by understanding, making sure that when you launch the coding agent, it knows how the system works. And of course, our CloudMD and our slash commands are like, we basically like test those for weeks before anyone's allowed to change them.
12:44 – 13:30
Reviewing Features, Research, and Plans
1. DHDexter Horthy
  Um, so we review the re-research and plans, and we have mental alignment. Um, I don't have time to talk about this one because I think I'm already over. Uh, but how did we do? Um, we, we did the goals. I didn't, I didn't ask for these goals, but they were thrust upon me, and we solved them. Uh, we spent a whole lot of tokens. This is a team of three in a month. Um, these are credits by the way. Um, but I don't think we're going... I don't, I don't think we're switching to the max plan 'cause this is working well enough that I'm... It's, I mean, it's worth, it's worth spending 'cause it saves us a lot of time as engineers. Um, our intern, Sam, is here somewhere. He's shipped two PRs on his first day. On his eighth day, he shipped like 10 in a day. This shit works. Um, we did the BAML thing, and again, I, I don't, I don't look at code anymore. I just read specs. [sighs] So
13:30 – 14:36
Conclusion and Future Directions
1. DHDexter Horthy
  what's next? I kind of maybe think coding agents are gonna get a little bit commoditized, but the team and the workflow transformation will be the hard part. Getting your team to embrace new ways of communicating and structuring how you work is gonna be really, really hard and uncomfortable for a lot of teams. Um, people are figuring this out. You should try to figure this out, 'cause otherwise you're gonna have a bad time. Um, we're trying to help people figure this out. We're working with everybody from six-person YC startups to 1,000 people, uh, public companies. Um, there is a... Oh, we're doing an event tomorrow on hyper-engineering. Uh, it is very, very close to capacity, but if you come find me after this and give me a good pitch, there are a couple spots left. Um, and there's a link to the video where we talk about this for 90 minutes, and, uh, me and Vaibhav bust, bust each other's balls for a while. That is Advanced Context Engineering for Coding Agents. Thank you. [audience applauding] [upbeat music]

Episode duration: 14:37

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode IS_y40zY-hc

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

Intro

The Origin of Context Engineering

Key Talks and Insights from AI Engineering

Challenges with AI in Complex Systems

The Shift to Spec-First Development

Advanced Context Engineering for Coding Agents

Intentional Compassion in Context Management

Optimizing Context Utilization

The Role of Subagents in Context Control

Frequent Intentional Compaction

Practical Implementation and Workflow

Case Study: Fixing a Rust Code Base

Insights on Effective Coding Practices

Reviewing Features, Research, and Plans

Conclusion and Future Directions

Get more out of YouTube videos.