Skip to content
Y CombinatorY Combinator

How to Make Claude Code Your AI Engineering Team

GStack is an open-source toolkit built by YC President & CEO Garry Tan that turns Claude Code into an AI engineering team — with skills for office hours, design, code review, QA, and browser testing. Use it with Claude Code or Codex or Cursor. It's free and open source: https://github.com/garrytan/gstack In this video, Garry walks through how GStack works, starting with Office Hours, a skill modeled after real YC partner sessions that pressure-tests your idea before you write a line of code. He demos it live, going from idea through adversarial review, design mockups, and automated QA in a single session. 00:00 – AI Just Changed Coding Forever 00:09 – From YC to Building With AI 01:07 – Why AI Coding Feels So Different 02:45 – Turning AI Into a Real Team (GStack) 03:45 – Let’s Build an App Live 05:23 – The Question That Kills Most Ideas 07:13 – This Idea Just Got Way Bigger 08:38 – The “Feels Illegal” AI Hack 10:50 – Upgrading the Idea in Real Time 12:44 – Breaking + Fixing the Plan 14:25 – AI Designs the App 16:59 – The Full System Explained 18:00 – Running Multiple AI Engineers 20:00 – Shipping 10x Faster 21:20 – The Only Thing That Matters Now Apply to Y Combinator: https://www.ycombinator.com/apply Work at a startup: https://www.ycombinator.com/jobs https://www.conductor.build/

Garry Tanhost
Apr 23, 202621mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:000:09

    AI Just Changed Coding Forever

    1. GT

      [upbeat music]

  2. 0:091:07

    From YC to Building With AI

    1. GT

      Hi, I'm Garry, President and CEO of Y Combinator. I'm also an engineer who spent the first decade of my career building software full-time. I studied computer systems engineering at Stanford, then was employee number ten at Palantir, where I was an engineer, designer, and product manager all at once. I co-founded Posterous, a microblogging platform that sold to Twitter, and I also built the first version of Bookface, YC's internal social platform and knowledge base. Basically, I've written a lot of code in my career, and I'm here to tell you, we are in a completely new era of building software, the agent era. It turns out the way to get agents to do real work is the same way humans have always done it, as a team with roles, with process, with review. I built GStack to encode this three weeks ago, and now it has more GitHub stars than Ruby on Rails.

  3. 1:072:45

    Why AI Coding Feels So Different

    1. GT

      In this video, I wanna explain how it can help you build with agents. I've coded more in the past two months than I did in all of twenty thirteen, which is the last time I worked really, really hard as an engineer. I started playing with Claude Code back in January after hearing people like Andrej Karpathy and Boris Cherny say they weren't manually writing any code anymore. And I got completely hooked. Along the way, I've essentially built all of Posterous, which took two years to build with a co-founder and a team of ten engineers. I've essentially built all of my startup Posterous, which took two years, ten million dollars, and ten engineers to build. Out of the box, the model wanders. It doesn't know your data well, so it guesses. And guessing at that scale is how you get plausible looking code that silently breaks. The bottleneck here is not the model's intelligence. As long as you set the models up right, they are already smart enough to do extraordinary work on your code base. This is backwards. The scaffolding should be trivially thin. GStack is my implementation of the thin harness fat skills approach. It's an open source repo that I built that turns Claude Code into an AI engineering team for you. Skills that act like a team of specialists. Office Hours is one of those skills. It's actually modeled exactly after what we go through at YC as a partner doing office hours with startups. It starts by asking six forcing questions

  4. 2:453:45

    Turning AI Into a Real Team (GStack)

    1. GT

      for you to reframe your product before you start building. Let me show you how it works. The best way to get started with GStack is, uh, actually Conductor. And so we're gonna go in quick start, and GStack is actually built into Conductor right now. You just click GStack, and today, we're gonna make a tax app. Uh, it's gonna go into your Gmail and fish out all of your ten ninety-nines because it's tax day as of today. GStack is actually a set of skills, and the first one that we're actually gonna use is called Office Hours. This is actually the distilled version of what is thousands and tens of thousands of hours that the sixteen YC partners have spent many, many years honing and perfecting. And, uh, this is a distilled down ten percent strength version of what we do at YC every day. So as you can see, uh, Conductor actually just drops you right in

  5. 3:455:23

    Let’s Build an App Live

    1. GT

      there. We're in YC office hours now, and I'm trying to do a startup to create to help people get all their ten ninety-nine int out of their Gmail and financial institutions. Many banks will email you with, uh, new tax documents, but some won't. So we need to both search the user's inbox and accept URLs to go and search and download the ten ninety-nine int PDFs. Cool. That's our startup idea. It's just, uh, something to help people with their taxes, and it's something that I had to deal with just yesterday. So the user wants to s- do office hours about a startup idea, um, and it's starting on office hours. So one of the things you'll notice is, um, we have Garry mode on. And, uh, what that does is it actually shows you all of the reasoning traces. So that's one of the things I really like about, um, using GStack. It actually... With Conductor, you actually get to see exactly what, um, the model is thinking as it does it. Uh, right now, it's just sort of getting started, but now it's starting to do some context. Okay. This is a fresh project with an initial commit. No prior design docs. We're in startup mode. [beep] Oh, and here, this is, uh, this is sort of the first thing that happens. You know, your model with the skill, uh, of office hours does a lot

  6. 5:237:13

    The Question That Kills Most Ideas

    1. GT

      of thinking. It searches the web sometimes and figures out, well, what are, what are you trying to do? Here's the question that determines everything else. What's the strongest evidence that you have that someone actually wants this? This is actually one of the most important questions to ask yourself when you're trying to decide, should I work on project x or work on that startup? I actually just experienced this recently, so I'm gonna click that. I have this experience, uh, just the other day. [beep] So what happened exactly? How many bank accounts do we have, and which one sent, uh, ten ninety-nine? Well, I have a few, and then the process of hunting down is actually quite painful. Okay, the pain is real, but the consequence is friction and annoyance, not actual penalties. Well, I mean, frankly, my accountant, uh, sends me annoyed emails, and sometimes they get a phone call because they're just trying to do my... their job, and, uh, I'm supposed to be there to help them. And so I do have more than five accounts. This is actually a really interesting thing it points out. TurboTax and H&R Block actually have 1099 import features, and Plaid connects to banks. Why aren't those solving it for you? So I answered, and it said, "Well, the user's thinking bigger than a document aggregator. They're seeing this as a funnel. The hook is we'll find all your 1099 int for you, solving an immediate pain. But the expansion is now that you have your docs, let's actually get your taxes prepared, which is matchmaking and lead gen for tax preparers, and it's a classic wedge strategy. It's actually a more interesting business model than just the document aggregation alone," which is right. You could probably charge two or five dollars a month or two or five dollars maybe an entire year for 1099 int, int aggregation. But you can actually charge, um, a percentage of the transaction that ends up happening with a tax preparer, which, uh, might be 10X more.

  7. 7:138:38

    This Idea Just Got Way Bigger

    1. GT

      This is actually a very cool part of Office Hours. I might just build this for myself just for fun because I do have to do this every single year. And this is actually one of the most fun parts about using Office Hours in GStack, is it's not this sort of on rails thing. It's more like a conversation with your model. And so if you just came in and gave instructions to the model as it is, if I just type the original thing, which is I need to go and find my 1099s, it'll go do that. It will just literally do it, but it won't think about who's the user, what is this, what is the business model, who wants this, what's the pain point, how does it work, right? Like, this is the kind of stuff that we get to do every day with lots and lots of founders in office hours, and we're pretty good at it, but so is this skill. Revised. AI browser automation, GStack browser style, lets the user log in, then AI navigates to tax docs and downloads PDFs. The user watches it happen. I think that's correct. Let's see. I mean, this is a very interesting thing for me. I mean, I feel like I learn about the process of building a little bit, even though I do office hours with people all the time. This is sharp pushback on premise three. You're right, I do have GStack browser, and the approach you're describing is genuinely clever. The model would be user logs in, AI takes over, navigates to tax docs, finds the 1099 int, downloads it. No Plaid, no stored credentials. The user watches the whole thing happen in the visible browser. I mean, what I like about it is it can happen on their actual

  8. 8:3810:50

    The “Feels Illegal” AI Hack

    1. GT

      browser. It's not happening in the cloud. The cloud is just someone else's computer. One of the cool things that GStack does increasingly is it lets you use Codex to actually sort out all of the crazy bugs that might be in here. I'm not going to do it right now, but it is something that you can do when you're at home working on it yourself. The way to think about Claude Code is that by default it uses Claude, and I think Opus 4.6 is sort of ADHD CEO. He's the guy you want to get a beer with, and he's got a billion ideas. But when the going gets tough, you got to call in your autistic CTO, and that's Codex. All right, we're going to skip for now because we're actually pretty close I feel like. Basically, we're in plan mode, and Office Hours helps us start off with a plan that has a lot of the things thought through. So here's actually a really cool, uh, example. It actually thinks through, and here's three different approaches. The first approach is Gmail Auth, then search for tax doc notif-notification, and then output a checklist of banks which issue 1099s. There's no browser automation initially. The effort is small, and the risk is small. You know, when I look at that, and it's like, that sounds interesting, but it doesn't sound big enough for me to actually even work on this. Like, I could do that myself. Next is full stack Gmail and AI browser automation using, uh, and a CPA mar-marketplace. This sounds like what I want, actually. And then, uh, it sort of thinks out of the box. It says, "Oh, okay, what about approach C? CPA first, flip the go-to-market." You know, I would say B sounds right, and then actually I... Sometimes I like to add this extra thing, which is like when I have an idea, when I... one of the approaches speaks to me, but then I think about something else, I'm like, "Okay, well, I like B, but actually we could use the browser interaction to skip Google OAuth entirely and just have the user open Gmail, and a version of GStack browser could just use Gmail to find the 1099s au- to search for automatically.

  9. 10:5012:44

    Upgrading the Idea in Real Time

    1. GT

      Simultaneous to that, it could also ask the user what other banks they have. Also, and this is what happens for me, if they already have a CPA, you can find out from the email. And if you're me, you probably already have a bunch of emails from your CPA bugging you for the specific accounts. We're sort of at the end of Office Hours, but as you can see, we already went from sort of a half-baked, rough idea for something that we might want to do. I'm not saying this is actually a good startup idea, but you can see how this got farther along. We started with something that might start with OAuth and then CPAs nagging emails. But in the end, we realized, well, we have a browser, and the browser could be used with browser automation to search the inbox, find all of the 1099s that you need to download. It can also, using LLMs, ask you which bank portals you need to add to, and it can go log in with your account and actually download the PDFs for you and then send an email to the CPA. So I really like this. Browser automation is a very out-of-pocket, sort of unusual way to solve this problem. And the wild thing about coding models is, you know, a year ago, two years ago, even like three months ago, it's not clear to me that anyone would even try this. I think that's the most interesting thing about, uh, our time right now. You're able to have an idea and then get farther along with it than you ever would be. Frankly, sometimes I use office hours, and maybe one in three times I get to the end of it and I say, "You know what? This isn't something that makes sense." You'll notice that there's actually a feasibility aspect of office hours,

  10. 12:4414:25

    Breaking + Fixing the Plan

    1. GT

      and that's one thing I really pride myself on in office hours working with startups. I have a very strong opinion about how the world works and what might work, and, um, it's just very interesting to see Opus four point six mirror that in trying to help you figure out what your startup or product idea might be. Now what it's doing is a multi-step adversarial review. It's trying to put your idea through the paces. And as you can see, it's already found a bunch of things, and it's gonna try to auto-fix it. There's no failure handling. There's no privacy section. Two FA hand off has no proposed solution. It actually tries to auto-fill out these things, and it, if it can, it does. And so our doc survived two rounds of adversarial review, and it automatically caught and fixed sixteen issues. Um, so we're gonna approve this design doc. So as you can see, the adversarial review improved the score from six out of ten to eight out of ten with three remaining issues that we can worry about later. Now that we've locked in the adversarial review and addressed all these issues, uh, normally what I would do is run plan CEO review. But instead, I think what we're gonna do is jump directly to design shotgun, which is one of my most fun, uh, ways to use this. And this is just one of a bunch of different design tools that are in the bag. So it figured out, here's a bunch of different views. What do you want to actually design? And let's just do the main checklist dashboard. Design shotgun's, uh, my visual brainstorming tool. So it'll actually generate multiple AI versions and then ask us questions about it. These are three directions. It takes about sixty

  11. 14:2516:59

    AI Designs the App

    1. GT

      seconds. It actually farms it out to, uh, OpenAI Codex, which, um, is able to use ImageGen. So alright. Let's... There's three versions: command center, friendly progress, and split view. Let's take a look. All right. So let's let the agents cook, and we'll be back in about five minutes. Great. The agents are done cooking, and this is what we c- we got back. We got three different options for the actual page that shows up in the command center for tracking down our tax documents. So let's look at them one by one. There's option A, B, and C. All right. Here's one command center. There's a dashboard. Here's all the specific... I mean, this looks pretty good. If you can extr- extract, here are all the banks and here are all the ten ninety-nines and where are they coming from, um, and what their status is, that's pretty good. I like that. I'm, I'm gonna give that a four out of five stars. Option B is, like, much more friendly. Um, so option A is sort of like if you're a Linux hacker, I bet you would really like this. But option B, I think it's more friendly for just normal people. So I kinda like... I, I, I might put that as a five. That might be a pick. And then let's see. Option C, this makes it way more complicated than it needs to, so I really wouldn't do that. Let's go with option B. Uh, and then the cool thing is if you don't like it, you can enter, uh, you know, any of your feedback. You can click regenerate. But in this case, we're just gonna run with option B and continue. So that comes back in, and, you know, as you can see, we're gonna go ahead and select option B. And there it is. So a friendly card-based approach with progress and the progress ring. Good instinct. Variant B is locked in. So while I have you, I mean, that is just two of twenty-eight different commands. We've got more than seventy thousand, uh, stars now, and some of the people who use it, like, they, they actually talk about how when they're using Claude Code, they spend eighty to ninety percent of their time in office hours, plan CEO review, and auto plan. Um, this is sort of a rough view of how that sprint process actually works. We already talked about office space. But if you don't want to do a lot of back and forth, if you don't want to be in the weeds, I did create Auto Plan, which gets you through CEO, engineering, design, and developer experience review using basically my default recommendations. Like, these are sort of programmed to be what I would do if I were you. There are a bunch of design skills that you can use after the code is actually

  12. 16:5918:00

    The Full System Explained

    1. GT

      done. Claude Code will actually build when you click approve on the plan. And then after it's done writing the code, you can run review, which does a staff level, uh, bug-catching service that goes through, puts the work through the paces. Full code review, uh, finding bugs that might not have been in the plan mode. And then the coolest part, I think, that, um, is actually an incredible amount of code is I wrote a CLI around Playwright and Chromium. So there's actually an entire headed and headless browser in there, and that was a real magic moment for me as I was using Claude Code. As I sped up, um, there's this idea of trying to get a l- to a level eight software factory. And GStack does not get you to level eight, but I do think it gets you to level seven, and that's where I can run multiple conductor windows on different projects and sometimes three or four all on the same project all at the

  13. 18:0020:00

    Running Multiple AI Engineers

    1. GT

      same time. These are parallel PRs with parallel branches and parallel different features that all can land more or less simultaneously. And one of the bottlenecks I ran into was that You know, once the agent was doing all the work of planning and design and coding it, I found myself sitting there doing QA, probably the least fun part of software development. So that made it very, very important for me to try to automate that. And when I did, Claude in Chrome MCP is one of the worst pieces of software I've ever used. You know, every time it would try to do an action, it would think and think and think. There was crazy context bloat. Often it wouldn't even do anything. But it would take two to three seconds even when it was working to be able to take an action. And I was amazed that I could use all of my other skills in GStack to create the /QA and /browse tool. I basically wrapped Playwright at the CLI level, and now your Claude Code and any agent now can actually just use the browser. And so, you know, not only could it use the browser, it could take screenshots, it can do complex interactions, it can click on things, it can fill things out. Now it can even download media, run eventually re- full regression tests and update CSS and assess real browser bug issues, whether it's JavaScript or CSS. And finally, there's a ship tool. So it's sort of the last step before to make sure that your PR is ready to land on main. And this is actually how I work. I run 10 to 15 parallel Claude Code sessions all at the same time. I might in one session be running office hours on a brand-new idea, and I actually now have multiple open source projects with tens of thousands of stars, and I, I'm probably sitting on about 400, uh, PRs to review right now.

  14. 20:0021:20

    Shipping 10x Faster

    1. GT

      And so I almost always have one or two sessions active for each project, just evaluating and bringing in all the open source fixes that I'm getting from the community, uh, and I evaluate it in waves. Um, one of the things that's been really scary in AI coding right now is supply chain attacks. So I'm really, really paranoid about it. But the great thing is I have GStack that has my back. So I don't have a to-do list anymore. One of the things that has emerged is I actually click on... Whenever I have an idea or I get a bug report from a user or I see something on X where someone's frustrated with what GStack or GBrain does, I just click the plus icon in Conductor. It creates a new work tree, and each one of these things is a new work item. And all I have to do is run office hours, CEO review, eng review, uh, adversarial review, and then I just run my normal process. When it's ready to land, it lands. And I can do 10, 15, 20, sometimes 50 PRs in any given day, depending on the number of meetings I have in that day. So that's it. Uh, GStack is available right now. Just go to github.com/garrytan/gstack. When you run /office-hours, you're getting a version of the real product thinking we do at YC with founders. Similar pushback

  15. 21:2021:48

    The Only Thing That Matters Now

    1. GT

      and similar reframing before you ever meet us. Give it a try and let me know what you think. This is the most incredible time in history to build software. The barrier to building just collapsed. The only question left is what are you gonna build? It's time to let it rip. Go make something people want. [upbeat music]

Episode duration: 21:49

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode wkv2ifxPpF8

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.