Stanford OnlineStanford CS153 Frontier Systems | The AI Native Company: How One Founder Becomes a 1000x Engineer
EVERY SPOKEN WORD
45 min read · 9,301 words- SPSpeaker
We are super lucky to have with us today Garry Tan and Diana Hu from YC. [audience applauding] Um, before we dive in, I'm going to do a couple minutes of in, uh, sort of warm-up. This is a really special lecture becau- for, for a couple reasons. One, um, this class, CS153, which, as you know, Mike and I started teaching four years ago, which was Security at Scale, uh, small group, fifty people, was inspired by... It's sort of a composite of several different classes that have been taught at Stanford by Silicon Valley leaders. And when I was an undergrad, sophomore year, Peter taught, um, Peter Thiel taught, taught the first version of, uh, How to Start, uh, Zero to One. That... It was CS183, How to Start a Startup, and that became the book Zero to One. The following year, YC taught a v- a version of that class, uh, that Sam put together, and Garry was at YC at the time. I... Or I think it had just started Initialized.
- GTGarry Tan
Yeah.
- SPSpeaker
Right? And, uh, and so those are the spiritual descendants of this cla- and then there's CS43N, which was Terry Winograd's class we've talked about. That was Computers and the Open Society, which was the first freshman seminar I took. And so over the years, we've tried to, you know, sort of take the best parts of all those classes and, and bring it together in 153. But I think it's just really poetic to have Garry back because based on many of the, you know, things Garry learned here at Stanford, he went out and sort of took the spirit of Stanford out to Silicon Valley. Um, and to have him back and be able to talk about, you know, um, all of his work and, and now with Diana helping to update some of the, the, the sort of YC philosophy that I wanna, we're gonna talk about, uh, it's sort of a close-the-loop moment. So thank you guys for coming back.
- GTGarry Tan
Yeah.
- SPSpeaker
It's really appreciated. Yeah.
- GTGarry Tan
Thanks for having us.
- SPSpeaker
Oh, no, this is, this is the fun part. So, um, I'm gonna let y- b- before we sort of dive in, I, I'd like to give a cou- couple minutes of sort of context on why I think this is an important lecture for you guys. So as you know, 153 is a systems class. You know, you've heard up and down the stack from, uh, land, power, shell, and energy, like Scott Nolan at General Matter, to the chip layer, we had Jensen last week. Um, there's a full rewrite of systems going on to unblock bottlenecks on frontier progress, right, in the world. Um, one of those things that you need, that we need to unblock bottlenecks on is capital. And, uh, as you heard from Ben Horowitz a few weeks ago, you know, Mark and Ben came up with a system to try and scale the deployment of capital in Silicon Valley over ten years ago and are now thinking through how to update that system. And, and YC is very similar, and I'd like to connect the dots between lecture one, where we talked about the compute bottleneck, right? And if you remember, one of the reasons I, I talked about how bottleneck is, uh, compute is a bottleneck today is because we're in the pre-standardization of compute era. And if you zoom back to the Industrial Revolution, one of the things that allowed this very important thing called electricity to become a stable sort of resource, uh, a, a piece of infrastructure that lots of people could develop on and access, was the development of standards, right? One of them was AC/DC, and then we had institutions enforce those standards. Um, one of those institutions was w- with utility companies that developed a grid to coordinate the production, demand, and supply of electricity. In the capital world, um, when I was, uh, getting, showing up... When I showed up in Silicon Valley 20- in 2011, we were in the pre-standardization of capital, of venture capital. It was a complete mess. You know, there were a bunch of VC firms who were all trying to do their own deals and figure out how to negotiate with founders and so on. And into that mess stepped, um, Paul Graham and Jessica Livingston and introduced a new standard for how capital should be allocated, and that was called the SAFE. How many people have heard of the SAFE? There we go. Okay, so this is living proof. The, at the time, it wasn't legible to me how profound the SAFE was. It was basically a two-page legal document that YC put up online and said, "Here's how we're gonna fund startups. It's called a SAFE, Simple Agreement for Future Equity." And at the time, I was like, I was a founder, uh, sorry, st- student here, and I saw it, I was like, "Okay, whatever, legal document." In hindsight now, it's so obvious to many of us in the ecosystem that that was a pivotal moment in the history of Silicon Valley, where, you know, the YC team saw what was going o- on, realized... At that point, there was another, um, we, we were living through the rise of the cloud and SaaS era, right? AWS and GCP and so on had started to make compute, uh, quite accessible, and that had redu- reduced the marginal cost of innovation in the valley. But, but venture capital was still, hadn't caught up with that era. You know, it didn't cost much to produce software, and so there was a sort of moment of abundance we knew we were going to go through back then. But to get capital out to innovators like you guys, there was so much, there, there was a, there, there was sort of a venture capital bottleneck at that time, which now seems cute given the numbers we're, we- we're living through today. But at that time, it really did feel like it was hard to get time with VCs and get good deals and so on. And so when into that mess stepped YC and published the SAFE, it became a standard for how early-stage startups, um, you know, were gonna be funded. And then by enforcing it, YC became an institution that standardized seed-stage funding. Um, and, and I think, you know, the, the arc of Silicon Valley would have looked very different without that one document.Okay, and so s- uh, as you- as we th- it- it's very obvious to me, a- a- as at AMP, we, you know, we live through this every day on the compute side. We might even at some point open source a standard agreement for future compute, um, something like that. But, you know, we look to w- what YC has done as a s- somewhat of a spiritual, uh, ancestor for the work we're doing. And so it's very cool to have you guys back. Within that context, I hope, you know, this, this gives you a little bit of f- you know, connect the dots moment for why lecture one and, and this lecture are parallels. And systems design is not just something you do in engineering. You, you can do it in any, uh, domain you're in to try and accelerate the pace of progress and unblock bottlenecks. Is this making sense to people? Can I get a yes? Yes. Come on, it's spring quarter, guys. Can I get a yes? Yes. Yes. Okay, thank you. All right. With that, over to you guys. Thank you so much for coming. Why don't we start with, you know, introductions about yourself, how you got here, and then you can dive in.
- GTGarry Tan
Absolutely. Hey, hey, everyone. I'm Garry Tan. Uh, I was, uh, Stanford class of '03. I took a lot of classes in here. I fell asleep in this lecture hall a great many times. Thank you so much for bringing me back. This is-- It's great to be back to the Farm. And, uh, any-- every time I come back to the Farm, I'm like, you know, uh, sort of shocked that I get to be up here, uh, because, like, it-- I feel like I just blinked and I was in your seat. And, uh, you know, zooming out, that's actually desperately what I want for every single one of you, is like, how do we, like, you, you know... What we're talking about here is there's a grand shift, like all those historical things, like literally the new standards are being established right now, and there are people in this room who are actually gonna be the people who establish those things. And then Diana and I and the team at YC, we're hoping that we're, uh, you know, the SAFE was a legal instrument. Uh, what we're gonna talk about today is actually code. And not just code, markdown as code. [laughs] Like, literally the new... And, you know, we're gonna link it all the way over to, uh, what a startup is, what people in this room are going to be spending your entire lives, uh, building the railroad for the rest of society over. Like, you know, for our, our generation, uh, we were building the internet and we were building mobile phones and we were building social networks, and your generation is going to create the cognitive layer for all of society. And this, I mean, what we're talking about is just like stuff that we're, like, these are our hunches even. Like, you guys are gonna go and actually build it. And so, you know, thank you for bringing us back. I mean, Diana, do you want to introduce yourself?
- DHDiana Hu
Yeah. Uh, thank you for having us. I'm Diana. I'm one of the general partners at YC, and we are living through an exciting time, as you all know, with what all the capabilities with AI is unlocking. And we have a lot of interesting things to share for all of you in this lecture. We've seen unprecedented growth from a lot of the companies in our portfolio that have gone from zero to tens of millions in dollars in revenue in one year, which was impossible before. Within a year, it would have taken four or five years to get to basically Series B level traction.
- GTGarry Tan
And like hundreds of millions of dollars of, in capital. I mean, it's just a different moment right now.
- DHDiana Hu
Different world. And we're gonna tell you how these founders have done it. And we're gonna go through really what it means to build a company now to be AI native.
- GTGarry Tan
So with that, um, I mean, it's a pretty packed lecture, so we're gonna just get right in. I mean, AI is gonna change the unit of production. Like, you know, when I was sitting in your seat, uh, I knew that I needed to raise money, I needed to hire a lot of people. This was about me learning how to like, you know, uh, create a new cult. Like, you know, Palantir was like that. Um, YC, you know, ultimately, it's a religion, right? Like, this is something that we believe that nobody else believes yet, right? Um, that is still true. Um, all the things we're gonna talk about, like a team is still valuable, human beings are still valuable, but it's not going to be just humans. It's going to be humans in concert with agents, with memory and evals and a customer loop. So by the p- by a- by the end of this talk, you're gonna understand what we're talking about. Right now, it sounds like a bunch of buzzwords. We don't want this to be a bunch of buzzwords. We want you to take these ideas and actually implement them and remake society, and we think you will do that. Um, let's see. Yeah, in 2010, like, uh, I mean, I'll tell you my personal story. Uh, in 2008, I got into YC. We raised about $4 million. I hired, you know, 10 people. We created Posterous, which is a dead simple blog platform, and, uh, you know, we sold that to Twitter, uh, three years later for $20 million. And, uh, honestly, like I was able to create like everything, all the software we made over two years with 10 people and all that capital, but me with a $200 a month, uh, Claude Code Max plan. And anyone in this room could do that, and it, it didn't take like two years. It took about, uh, five days, right? So I experienced that speed up recently. Um, you know, I created Garry's List, and then it cr- uh, that caused me to create GStack. We're gonna talk about what those things are. But, you know, as Diana said, like we're in 2026 now, and so a six-person team can hit $10 million in revenue, uh, with just, just the things that we're talking about today. And a lot of you already know this, so it might be review, but for some of you, this is like some astonishing good news. Um, you know, so let's talk about GStack. This is something that, uh, I discovered. You know, la- late last year, I saw Steve Yegge, a famous blogger and engineer, uh, I believe he was an early Googler. He wrote that, you know, "People using AI coding agents are 10X to 100X more productive as engineers using Cursor and Chat today." And then at Anthropic, they're about 1,000X as productive as Googlers were in 2005. And I was like, "What is going on?" And so I had to try it. I opened Claude Code, and of course, I ended up writing, uh, I'm around like a million lines of code in, um, which is really, really crazy. Um, everything-- Let's see. You know, let- let's just talk about the things that you might read on the internet. These are all wrong. Um, it's not just AI slop. Actually, you know, yes, LLMs are very verbose, and some of it is boilerplate, but likeUh, when you create your own software factory, this is actually what you're fighting. This is actually what you're preventing from happening by default. Um, yes, there are hallucinations. Yes, those are actually the things that we're trying to control. Uh, you know, can you make demo code very quickly? Yes. But, like, how do you get it to production? Well, you actually have to get to 100% or 80 to 90% test coverage. That's actually one of the main reasons why, uh, plan-eng-review as a skill exists. Like, that's the one that ... That's the number one with a bullet skill that I use about 20 times a day to get to 80, 90% test coverage so that I am not shipping slop. I'm something ... I'm shipping something that is actually literally usable and that I rely on every day in production. Um, this is very controversial. I've gotten in trouble over this. I apologize to people for ... You know, who, who, like, you know, took my trolling as serious. Like, you know, is LOC gameable and, uh, something that might be, you know, not usable? Like, actually, yes. Like, LOC on its own can be wrong. But on the other hand, like, if you have tests, if ... You know, the, the real measure of whether or not these things work is actually look down, and does it work for you? Does it work for your customers? Are people actually paying? That's actually the true metric. Uh, you know, LOC might be a garbage metric. But I might argue that in the age of ... There's nothing in Claude Code or the model or the harness or any ... or GStack or any of these things that tell the model to write as many lines of code as possible. Like, if anything, the reverse is probably true. Like, we're trying to write as dis- dense and concise code as possible to serve the purpose, and, you know, I think that that's something that's quite important to talk about. Um, this is my experience. Like, I got to 87,000 stars. Um, my other project, GBrain, is 13,000 stars. So, I mean, basically, for someone who was not coding at all in December of last year, I have more than 100,000 GitHub stars, and about 15,000 people use it every single day. Um, you know, it's hundreds of thousands of skill invocations. And so, I don't know. This is sort of what I'm learning. Um, you know, last year, uh, probably before Claude, uh, 4 point 5- uh, Opus 4.5 came out, we were talking about copilots. Uh, today, I think we're really talking about a software factory. And so if you use GStack, you'll understand this is actually what's happening. Um, what I discovered is that, um ... And this was mo- more or less by accident. As I was writing half a million lines of code for, uh, recreating my startup that I created, like, two other times previously, but doing it in about five days or, you know, during the course of, like, several months creating GStack, I realized that it's actually really useful to pull out specific, uh, personas of what is already in the latent space. And so, um, the most famous skill that a lot of people use that I ... You know, it's actually interestingly a distillation of what we already do at Y Combinator when YC ... We have 15 partners, 16 partners at YC. When you have an idea and you're doing office hours with us, we're mainly asking questions about: What's the problem? Who's the customer? How do you know that? And then what are we building, right? And so that's what the office hour skill is. We basically took, uh, actually three, four months of, like, transcripts across, like, thousands of conversations, distilled that into something very, very potent, and then I had to distill that down by 90%, and then that's what is shipped in open source in /office hours in GStack. Um, but, you know, as I went, like, uh, it turns out th- there are lots of different things that I like to use, um, to actually make it easier and b- you know, far better to ... You know, the product that you can create, uh, with coding agents can be better if you're literally pulling out the latent space for a particular vibe and, like, thing that you're trying to go for. So plan CEO review, for instance, my favorite thing about that is, uh, it asks the question, okay, well, it has context. It knows what you're trying to build. Uh, what is the 10X version of that? What is the platonic ideal of that? And so, you know, when I was a product manager at, uh, both Palantir and Microsoft and, like, a founder for my startups, like, that was what ... That ... When I thought about product, that's what I wanted to do. I wanted to figure out, like, what is the perfect manifestation of the thing that we could build? And then when I build a, um ... What I, what I'm building right now needs to be on a road map that is a straight line from where we wanna go from where we are now. And then the other thing that I discovered as we were doing this stuff is that you can boil the ocean. You know, who here remembers that term, boil the ocean? Like, if you go and work someplace, you're gonna go into a meeting where people start saying things that are a little too scary, and then immediately people in that room are gonna say, "Whoa, whoa, whoa. Let's not boil the ocean." And my response to that based on my experience with, uh, coding agents and what's happening right now is actually let's boil the ocean. You know, the, the things that you can do, like, uh, basically you sitting in front of one of these terminals can ... You can do the work of about 500 to 1,000 people, and if that's true, then, like, all of the expectations that we currently have in society around what a founder can do, what a company can do, what a small team can do, what you can do sitting in front of a computer, they're actually 1,000X wrong, right? And actually, what's funny is that's baked into the model weights. Like, who here has asked Claude Code before, like, "How long is this going to take?" And it'll give you, "Oh, it's gonna take about three weeks to code all of this stuff." And then you press Approve on the plan, and then literally it's done in about an hour. So, I mean, all of us have experienced that. Like, the models themselves have not caught up to this new reality that we can actually boil the ocean. So anyway, use GStack. Like, there's a lot of stuff in there. Uh, we have very little time, so I feel like I need to skip ahead. Like, you know, GStack was basically my understanding of building open source and putting it out there, and I'm still working on it. Um, but the new thing that I've been working, that, like, everyone at YC has been, uh, you know, just completely immersed in, is OpenClaw and Hermes Agent, and they're actually teaching us brand-new primitives on how to think about code, how to think about markdown, and how those things work together to do real work. Um-And so this is like somewhat obvious, but I have to say it because I keep-- like, anytime I would build an agentic system and it broke, it would every single time break because something was wrong about what I was trying to do. Like, I was either trying to do deterministic work, like things that should be in code in my markdown skill, or I was trying to do, uh, latent stuff, like actually the things that, like, my agent should be doing using the LLM in the code. Uh, and, like, a concrete example, for instance, is, you know, uh, we spend a lot of time trying to curate the experience of people at YC events. I have, um, you know, anyone actually, you can just use, uh, Claude. You don't even need Claude Code. You could use ChatGPT. Put in, uh, you know, bios of, like, eight people coming to your dinner party, and you could have it go and, you know, Google that person, run a dossier, and then, like, figure out who should sit next to who. That's very easy to do in latent space. But try to do that with an eight hundred-person dinner party or with, uh, the six thousand people that are coming to startup school. You can't do it. Like, the model's not big enough. Like, it, it hallucinates. It doesn't work. And so what do you do? Well, that's the perfect example of, like, we-- you know, you need to make the latent space work with the deterministic space. Um, and so, you know, what-- how do you actually do that? Um, here's a toy. The toy example here is, like, well, what is a skill? Who here has, like, played with a skill or used a skill file? So a skill file is actually-- I mean, it sounds facile. I mean, if you go on Twitter and believe, like, the haters, they're gonna say like, "Ha ha, it's just a bunch of markdown files. Who cares," right? But the big difference now with LLMs is, like, L-- like, you can actually do real work with this stuff. Um, you know, the thing that keeps coming back over and over again is that you can do real investigations about it. And so, you know, basically, what is a skill? It's basically just a run book. Like, you know, even, you know, if, if you've ever thrown an event and you need to throw that event over and over again, what do you do? You go into your notebook, and you just write down, "Well, one, we need to do the secure a venue. Two, like, let's figure out who should come." Like, it's just this any human being or agent should be able to look at it and say, "Okay," like, "after I read, like, one, two, three, four, five, six," like however many steps it is, maybe it's branching. It could be very complicated, actually. Um, you know, do I know how to do that thing, right? This, you know, this is a very simple concept, but the really cool thing is that you can actually make it call code, and that's what I find myself doing inside of OpenClaw and Hermes all the time. And this, this is where it links to what you guys are doing as founders, and this is the pattern that we're seeing inside every YC founder or, uh, inside every YC startup now. Like, we're not picking up the phone and doing it ourselves, just like we're not opening VS Code and writing code ourselves. Like, every-- Like, Claude Code revolutionized how we write code, and we don't o-open, like I-- You know, me, Karpathy, and tons of other people in this room probably don't open the editor at all, right? Um, the same thing is happening with OpenClaw and Hermes Agent. So all non-technical or process-oriented things in knowledge work are now-- You can do it in OpenClaw. Like, you can have Twilio call someone. You can use Gemini Live to h- actually, like, book a thing or, like, buy a thing or here's my credit card. Like, all of these things. You know, like that-- Do you-- Who here remembers that Google demo where, like, they stood up on one of their conferences and they're, like, so proud? Like, you know, "Gemini can now call and, like, get you an appointment," and then they never ship that thing. Y-you don't need to wait for them to ship that anymore, 'cause you can have that yourself, and that's, like, the most empowering thing. So code is code. I mean, the concrete example I have is, like, who here uses OpenClaw, and, uh, it always, for some reason, thinks that you're in Greenwich, uh, in the UK? Like, it's always-- And so this is a perfect example of, like, uh, I had to write code in TypeScript. It's context-now.mjs, and I have tests for it, and then I have it built into my system so that I don't rely on the latent space to do it. It just tells me, "Here's the time," and then actually, "Here's the things that are coming up." And if I don't do that, like, left to its own devices, the latent space will be like, "Oh yeah, it's three AM." Like, "Why are you still up?" And it's like, "What are you talking about? It's the afternoon right now." Um, the next important thing that we discovered, like anyone who has used Claude Code a lot has probably seen this, uh, error message at the top of Claude saying, "Your Claude.md is forty thousand tokens or forty thousand lines," or something like that. Um, and then you Google around, and you're like, "Okay, well, how do I fix that?" Well, how you fix it is actually a resolver. So a resolver is actually really important because, uh, it's amazing how much you have to spend time getting this right. Um, you-- Claude is a whole bunch of instru-- Claude.md is a whole bunch of instructions of, on how to do things that you develop. Like, you got mad that Claude Code did this or that or wrote the change log in a certain way, and you say, "Hey, I don't want it like that. Don't do it like that anymore." Well, turning it into a proper resolver means that you take that instruction, and it's like anytime you have to write to the change log, load change log.md. And so suddenly, you don't need that in your context. Uh, like, the agent itself knows, "Oh, okay, here's this master directory of all the things I know how to do, and I need to, I need to load the instruction only when I actually need it." Uh, it sounds so simple, but it's kind of obvious, but, like, this is actually the core of having a really great agent, actually. It's having a resolver. When I, when I need to check signatures, I want it to actually go to my executive assistant skill, um, who is a particular person. Like, well, I need it to look up in my brain repo how to do that, and I have a skill, a specific code path, and it's not a code path. It's like a markdown code path, right? It's this-- I call it a skill pack. Um, I have a skill pack specifically for that thing. I did it once, and then that's where, um, like-Here's another primitive that I discovered that I, I find myself doing about 20 times a day when I'm using OpenClaw or Hermes Agent. Uh, it's called Skillify. So it's, you know, you're sort of going up one level in abstraction. So let's use one of these examples. Um, you know, save this article. Well, I do that once. I'm-- You know, I look at the input, I look at the output, I get the agent to do exactly what I want. And then once I have it in a position where I like it, I actually tell it, "Skillify." And then on the right, that's actually what the skill says. And in-- You know, this is a summarized version of it. I have a article on X about it if you wanna see, like, all the full details. But long story short, you write the skill, you write the code, and then here's the part that is actually broken in Hermes Agent. I think they're about to fix this, actually. But, um, it's not enough to do it once. You actually need to test it. Um, you have to-- It's like, kinda like, uh, if you work in a finance organization, like think about all the people, like 10 or 20% of people who work in some of these organizations just do compliance. And you're like, "What are all these people doing?" Actually, like, in an agentic system, this is exactly the illustration of that. Like, look at all these steps. Writing the skill and writing the code is only two out of the ten steps. All of the rest of it is making sure that this messy system, that is kinda more like a human system than perfect, beautiful beam of light code, can still work and do work that you want, right? Okay, so you want-- You did it some-- You did something in Claude Code, you-- Or sorry, you did something in OpenClaw, you made it work, then you say, "Skillify." What does it actually do? Well, you have to write unit tests for the actual code. You have to write LLM evals for the skill file. Then you have to write an integration test. Then you have to make sure that there's a resolver trigger in agents.md, and then you have to test that. You need an, uh, LLM as judge eval to make sure that when that thing comes up, it's broad enough that it actually gets triggered. And then, uh, there's this other concept that you can look up in GBrain called CheckResolvable that is very important. You want it to be dry. Don't repeat yourself, otherwise you end up with, like, 1,000 skills that do all the same thing. Um, you need end-to-end smoke test. And then, you know, ultimately, you need a schema. You need to figure out where does this live in my memory and my repo. Um, so we're going really fast, but you know, that's why memory is actually really important. And so my next project, uh, that is out now that I'm working on is called GBrain. It's actually a three-layer memory system built on top of what Karpathy already talked about with his knowledge wiki. So I started with a knowledge wiki as well, and then it started falling over because it just uses Grep. Um, and so I had to add, uh, you know, vector search, um, you know, RRF fusion, backlinks. I added a graph database as a type knowledge graph. I'm about to add, uh, an ep- uh, epistemology, um, system so that we know that things are take-- they're, they're like hunches or, um, beliefs by specific people or world knowledge. And I wanna track when things sort of... You know, what's funny about, uh, uh, maybe this is very specific to me, like, I'm super fascinated with the idea that people in this room are gonna go on to... Like, your, your journey as a founder literally is that you have a hunch. You think that, like, the world needs X. Nobody believes that yet. But you know, I want my knowledge system to be able to track. Like, "Oh, well, I heard so and so, this person in this room, this person in a red shirt right here, he tweeted this," and nobody else believed that yet, right? But he's gonna go and spend like a year, two years, five years proving it correct. And then if my GBrain is actually working properly, it's gonna spot that, and it's gonna be like, "Oh, actually, like, here's-- At Stanford, there was this one person who believed X, and then they manifested it." And so I don't know. I-- For me, like philosophically, I'm, I'm fascinated by knowledge systems, like, truly capturing what's going on, and that's sort of what we ha-- You know, I, I think about this... Like, I'm just building software for myself. Like, this is the stuff that we have to think about. And, um, I don't know. I-- If you spot in my, in my, um, voice, like I'm excited about this because I'm building again, and I'm building for myself, and then we're open sourcing this stuff because we want all of you to actually be able to do it. Um, I feel like I need to expand on like... You know, one of the things that GBrain does is like it's a very specific schema for my use case. But, you know, one of the last things I need to do before I go to V1, hopefully in the next couple weeks, is I actually need to make, uh, fully dynamic ontology, which is a great buzzword from-- that I've learned from Palantir back in the day. I mean, that's what-- You know, right now it's built-- it's, the schema is built for me, but there's no reason why it can't be built for you, whether you're a researcher, whether you're a journalist, whether you're a politician. Like, each person's gonna have a different schema. We need to support all of those things. So zooming out, I'm about to pass it over to Diana to take it all the way home. Like, I sort of gave you the primitives that we're learning literally, like, week by week. Like, I didn't even know about ski- uh, Skillify until it flew out of my hands at like 3:00 a.m. using OpenClaw, and then I put it on X and that went viral and... I mean, I'm just learning as I go. I'm not an expert, you know. Some- sometimes it's like, uh, my favorite line from, uh, Alan Watts, who, if you guys know Alan Watts, is, uh, he walks-- He goes to a room like this. He ge- used to give lectures, and he would say, "I am not a guru. I am just an entertainer." So, uh, you know, that's, uh... I wanna pass this over. I mean, we're talking about the agentic company. Diana's gonna tell you a b- a lot more about it. But like the, the concepts that I just talked about, like one of the weirder things we realized is these actually map to the company. So a skill is, you know, sort of a squishy human being who's an employee, who has a capability. A resolver is the org chart, like who handles what. Like, how does it happen? Like it's, you know, the filing rules, where it goes in the brain is the internal process. Where does the information live? CheckResolvable is this thing that makes sure that the resolver works for like the set of things that you want to get done, and that's like audit and compliance. Like-I, you know, when I was sitting in your seat, I had no idea why so many people in so many human organizations had to spend so much time on audit and compliance. But now, at age 45, building a lot of agentic systems and looking at Skillify and how much time I spend just trying to make the things, like, frigging work, you know? Uh, I actually understand now. Like, human systems are very messy, and that's what CheckResolvable is. And in the end, like, you know, the funniest thing is what a trigger eval is. Like, you would think, like, "Oh, well, of course, it's in the trigger, it's in the resol- you know, in, in agents.md, it should just work, right?" But no, you even have to check that. Like, that itself is its own latent space, squishy operation that you have to check. And that's, you know, in an org, those are performance reviews. So, um, with that, I wanna hand this over to Diana to take us to the actual applied portion that will actually help you.
- DHDiana Hu
So I think a couple of things that Garry went over are a lot of the details on how you could implement it with a lot of the building blocks. And if we really backtrack and step now a couple layers up, one of the key concepts of building a AI native company is you need to change fundamentally how companies are run. I think normally today, pre-AI companies are basically run as a open loop. People make decisions, and a lot of those, um, decisions take a while to come back and is basically lossy. There's no concrete tight feedback loop. If a lot of you have studied control systems... How many of you have taken control systems and know the difference between open loops and closed loops? Uh, the problem with open loop systems is as error accumulates, the systems become more erroneous, and then it goes off the rails. As opposed to, let's say, closed loop system, very famous closed loop systems could be like PID controllers. You have a tight feedback loop into the controller so that a lot of the error stays within check, and this is how a lot of, uh, robotic systems work a lot better. So we're basically now, with AI, have the capability to take a lot of these lossy information of how companies run into becoming a closed loop system. So what that means fundamentally today for old school companies, information lives in people's, in people's head in a org. They have a, a lot of side conversations, DMs in Slack. They have a lot of meeting notes that are not written. They have just vibes, how they feel about a particular decision, and all very lossy. This is basically how decision in companies are made. And now, the ability is to ch- change all of that into a closed loop system where you tie these agents that Garry described and how to implement it into basically the fabric of how you make decisions for a company. So the idea is that you would have a agent like a Hermes or OpenClaw embedded into all the decision-making. And what it means, the agent needs to have read access to every single artifact that the company produces. So for some of you that might be working on some projects in school, you could have a small version of this. You could have a agent that basically connects to your GitHub code base, connects to your Discord, and even start recording all the meetings you have with your teammates as you make progress. And as you get all this context, the agent can then suggest what are the best next items to work on or bug fixes. And-
- GTGarry Tan
You put it in your GBrain.
- DHDiana Hu
Put it in your GBrain, and the memory context, and this is how you start embedding this agentic system that starts building the system and self-healing. So that's one of the things that we're seeing companies do where they can pool this crazy stats of one employee making, in the revenue per company, at, at least like $1 or $2 million, which now the public comps is, I, I don't know, take like a, like a Salesforce. Maybe the employee comps of how much revenue they bring in is under six figures. So this is, this is huge. It's at least a 10X based on what we're seeing on the startups. And what does this look specifically is when agents are able to read the full state. In practice, we actually implemented this also at YC with our engineering team. We're basically able to cut the sprint time in half and produce 10X the amount of work. And some of you h- may have read this, uh, blog post from, uh, Jack Dorsey about the agentic organization. How many of you have read that post? Some of you are familiar with this concept. And I think he talks a lot about now making a organization very flat and basically getting n- less need for middle management, because middle management used to be just all about this lossy information routing. You end up basically having three roles in a company. One is, uh, everyone starts building, so everyone becomes effectively a individual contributor that ships something. And even people that are non-technical, you now have the power to build with all these tools. So even a salesperson could be building their whole pipeline of, uh, calls and meetings and automate all of that. And then the other person is the DRI, who tends to be... Some of you are familiar with this term from Apple. How many of you know DRI? The concept of a direct responsible individual, that every outcome in a company trace down to a particular owner that owns the outcome. And the way it works is that the DRI orchestrates with the IC to make sure something gets done. For example, a goal for a company might be we need to increase the revenue by 3X by the end of the week. They're responsible to orchestrate all the things that need to happen to get there. They work with the sales team to get all the calls booked, with engineering team to ship all of these, and that tends to be oftentimes the founder. Now, the new role that comes into this AI native, uh, organization is sort of a, we call it a AI founder. I mean, this is kind of a... If, if youHere, Garry, he re- really much embodies this, is you're living at the edge of the future with all the tools. In order to get your company to run fast, you've gotta be trying all the tools. Everything is changing and moving so quickly. I mean, literally, we had this big revolution with agentic coding that just happened end of last year with, uh, Claude 4.5 when it came out. That's when things started to work. But if you were not building, if you were not at the edge, you would not be able to bring all those innovations into your company. So that's one of the things that we're seeing the best founders at YC do.
- GTGarry Tan
Yeah. There are people who are still, uh, operating, like, copilot level from last-
- DHDiana Hu
Yeah
- GTGarry Tan
... year, and it's like, "Not gonna make it, bro."
- DHDiana Hu
They're not gonna make it. [laughs] Now, the other thing that gets talked a lot about is in order to build all these agentic systems to avoid, quote, unquote, "The AI slop," is you... What cannot be delegated is really this concept of, uh, taste. How many of you have been hearing a lot on the taste is what's gonna be durable? I think that, and a lot of you agree with this, right? Uh, coding, let's just call it, shipping code is going to zero the cost of it. But what is not going to zero is the taste to build something good, the taste to discern what's good or bad, and as part of that, that really manifests in terms of evals into the systems for how you build all these agents. And what that means is that generic benchmarks won't make it whether your product works. I know sometimes people are trying to just hit some generic public benchmark MLU. Doesn't tell you whether your product or, or agents are really working or upsetting the user. A lot of, uh, the product that a lot of you, if some of you want to hopefully start companies, raise your hand maybe. Yeah?
- GTGarry Tan
All right.
- DHDiana Hu
Great. So part of it, the actual judge ultimately of whether something is good is whether users really want it. And with that, is gonna be different in every single domain. There's no way to automate that. And how can you tell? I think the agent, you will have to go into all the details deep. Did it follow the instructions? Was the answer correct? Did it preserve the customer trust? Was it something that was spewing correctly or incorrectly? Did it actually hit the business goals? Did it comply with the domain rules? So a lot of these things that Garry talked about in terms of, uh, resolvers and skillifying it and improving the system apply here. But in order to do that, you still need the human in the loop to tell when something goes wrong and to basically label a particular interaction or pipeline or workflow that is incorrect. And that is something that is, that you're gonna have to own and do, and painstakingly actually look through all the traces. I mean, this is how, Garry, you go through a lot of the system too. You read through the traces and click when it's wrong or right and decide to skillify it, right?
- GTGarry Tan
Yeah. Well, what's cool though is, like, uh, once you get, like, the basics going, my favorite thing that I haven't released yet but I will release is, uh, cross-modal eval.
- DHDiana Hu
Oh, yeah.
- GTGarry Tan
So you know, I'm gonna, about to add to Skillify where you can actually have the frontier models of Opus, GPT 5.5, and DeepSeek V4, uh, all evaluate the inputs and the outputs, and then rate it, and then feed it back to the original sub-agent saying, "You know, this is the rating, and here's what you need to do for the next try," and then you actually iterate. And so you can meta-prompt to get something that is 10 times better than the first version of what it is. I mean, this... What's weird is, like, these abstractions are basically stacking, 'cause that's what... I learned that from GStack. A lot of YC founders said, "Well, I, I like Claude Code, but that's like my ADHD CEO, and then Codex is my, uh, you know, nearly non-verbal 200 IQ, uh, CTO, and I need both of them to do cross-modal analysis," and then it ships with zero bugs. So these are all things that are, like, stacking. Like, we're just discovering these things, like, week to week right now.
- DHDiana Hu
And this is effectively the section on all the founders here would be the ones building the evals and exactly that. As part of, uh, doing this cross-modal evaluation, you have to start with being able to capture a lot of the traces, and the way you capture the traces is gonna be very context dependent on the product you build. And, uh, if you're building a, let's say, a video application, it's very different than a speech application, consumer model, B2B SaaS, all very different. And then you need to convert a lot of the failure cases, and you have to detect when they fail into actual evals that you use. And then the step three is to be able to replay this constantly into the system to, in order to self-heal and improve the system and improve the prompts automatically, which is exactly what Garry's describing that he's gonna ship. He's doing, like, a general version. But for each of you, you can build all of these. These are still the same principles.
- GTGarry Tan
Can we meta-prompt here for a second? Like, you're sitting here listening to a lecture about this stuff, but the lecture is totally useless if you don't go and open your own Hermes agent and OpenClaw and, like, load up your own GBrain and, like, actually use the... Like, there are 40 skills that you can test out and try inside GBrain, and some of it is, like, make your own. Like, basically, ha- do stuff and then skillify your own stuff, and then release it open source too and see what other people want, you know? Like, that's... We're, we're sort of, like, getting there together, and so the exhortation is, like, not only are we meta-prompting, um, the machines themselves, we, we need to meta-prompt one another to be better and to be able to fuse with the machines in a new and more profound way every single day.
- DHDiana Hu
Now, the last section we're gonna go over is that for some of you here, here in the audience that are excited to start a company, this is probably one of the best times in history ever to start a company, and this is not an overstatement. You might have heard this from other lecturers that came here. Is that right? The times right now are, are unprecedented.And part of it is we're seeing this, a lot of the wedge in practice is you pick a painful workflow, you go inside deep into the customers, and you basically become the forward deploy engineer. And what that looks like, we've seen it across many industries, and these are examples of companies that have done this crazy growth that I'm telling you, that have gone zero to eight figures in revenue within a year. For example, Salient is this company that's doing, uh, voice agents for loan servicing. They closed some of the top banks in the US, and the way they did it is they built agents how Garry described it. Other companies, Happy Robot as well, that closed a series B recently last year and 10X their revenue in a year. Same thing, they embedded themselves with freight forwarders and built the best agents to automate a lot of that cruddy work with truckers and coordinating timelines. And then the other one is, uh, Reducto. I don't know how many of you may have heard of this company that's, uh, doing document processing. The other opportunity is there's just so much tooling that needs to be built for all these tools. Just the fact of doing better document processing is making all of the other agents better because they all need to now read documents. But if you increase it, it improves RAG and memory and brain to be a lot better. So Reducto is another of these teams that are growing. So what, what this means is that a lot of these companies are seeing all these impressive growth is they're, they aren't just demoing like AI or, or some sort of side project, they're actually deploying full solutions. And part of it, if you want to start a company in this fashion, you basically go undercover because some of you, a lot of you probably n- don't have necessarily a background, like the founders of Salient or Happy Robot did not come from a finance background or logistics.
- GTGarry Tan
Not in the training set.
- DHDiana Hu
Not in the training set. But the way they became experts is they actually shadow or took a job and learned the depths of everything that had to be done with it, and then they were able to automate a lot of the repetitive labor and handle a lot of messy domains into this latent space that Garry described. And all these workflows before were just done by like phone or email, spreadsheets, and all very random places where agent embedded into all the system could just create a solution that would just work. And I guess the other thing is we want to show you this, this graph that Anthropic posted in terms of the deployment in different industries, and we're seeing that right now, I think a lot of you... I don't know if a lot of you are in computer science. How many of you are a little bit afraid of the CS jobs after you graduate? I mean, there's a real fear because, yeah, for this chart taken by Anthropic, 50% penetration into the usage of these tools. But what is interesting, there's this giant white space in all these other domains in terms of like back office, finance, data, academics, cybersecurity, customer service. This is like a huge white space. There's room for hundreds and hundreds of AI unicorns that are waiting to be started, perhaps by some of you in the room. Um-
- GTGarry Tan
I guarantee it.
Episode duration: 47:14
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode Lri2LNYtERM