How to Make Claude Code Your AI Engineering Team

GStack is an open-source toolkit built by YC President & CEO Garry Tan that turns Claude Code into an AI engineering team — with skills for office hours, design, code review, QA, and browser testing. Use it with Claude Code or Codex or Cursor. It's free and open source: https://github.com/garrytan/gstack In this video, Garry walks through how GStack works, starting with Office Hours, a skill modeled after real YC partner sessions that pressure-tests your idea before you write a line of code. He demos it live, going from idea through adversarial review, design mockups, and automated QA in a single session. 00:00 – AI Just Changed Coding Forever 00:09 – From YC to Building With AI 01:07 – Why AI Coding Feels So Different 02:45 – Turning AI Into a Real Team (GStack) 03:45 – Let’s Build an App Live 05:23 – The Question That Kills Most Ideas 07:13 – This Idea Just Got Way Bigger 08:38 – The “Feels Illegal” AI Hack 10:50 – Upgrading the Idea in Real Time 12:44 – Breaking + Fixing the Plan 14:25 – AI Designs the App 16:59 – The Full System Explained 18:00 – Running Multiple AI Engineers 20:00 – Shipping 10x Faster 21:20 – The Only Thing That Matters Now Apply to Y Combinator: https://www.ycombinator.com/apply Work at a startup: https://www.ycombinator.com/jobs https://www.conductor.build/

Garry Tanhost

Apr 23, 202621mWatch on YouTube ↗

CHAPTERS

0:09 – 1:39
The agent era of software: teams, roles, and process for AI coding
Garry Tan sets the stage: coding has entered the “agent era,” where AI can do real work when organized like a human team. He frames the core idea as roles + process + review, not just a raw model prompt.
- •AI coding works best when structured like a real engineering team
- •Roles, process, and review are the difference between demos and production work
- •Garry introduces GStack as his “thin harness, fat skills” approach
- •Claim: a new era where building speed and leverage have fundamentally changed
1:39 – 2:10
Why raw AI coding fails: wandering models, silent breakages, and missing context
He explains why out-of-the-box models often generate plausible but brittle code: they guess when they don’t know your codebase or data. The bottleneck isn’t model intelligence—it’s the scaffolding around it.
- •Models guess without sufficient project context, leading to subtle failures
- •“Plausible-looking code” can silently break at scale
- •The limiting factor is workflow and constraints, not raw capability
- •Need minimal scaffolding that reliably directs strong model skills
2:10 – 2:40
GStack overview and the “Office Hours” skill: product thinking before building
Garry describes GStack as a bundle of specialist “skills” that turn Claude Code into an engineering team. He highlights Office Hours, modeled after YC partner sessions, to pressure-test and reframe ideas with forcing questions.
- •GStack = open-source skills that behave like a specialist team
- •Office Hours mirrors YC partner office hours as a repeatable workflow
- •Uses forcing questions to clarify user, pain, and business model
- •Goal: reframe before writing code to avoid building the wrong thing
2:40 – 5:12
Live build kickoff in Conductor: a tax-doc 1099 finder idea
He launches GStack inside Conductor and proposes a concrete app: pull 1099-INT documents from Gmail and bank portals. The setup shows how the workflow begins in a fresh repo with minimal upfront artifacts.
- •Conductor provides the entry point and quick-start workflow
- •Idea: scan Gmail for tax docs and optionally fetch PDFs from bank portals
- •App concept includes both inbox search and URL/portal support
- •Starts from a clean project state and builds iteratively
5:12 – 6:44
The forcing question that kills most ideas: evidence of demand and real pain
Office Hours asks the most consequential question: what’s the strongest evidence users want this? The conversation probes whether the pain is meaningful and why existing solutions (TurboTax, H&R Block, Plaid imports) don’t already solve it.
- •Demand validation is treated as the top gating factor
- •Probes severity: annoyance/friction vs serious consequence
- •Competitive/existing-solution check: TurboTax/H&R Block imports, Plaid
- •Pushes for specificity: what happened, how often, for whom
6:44 – 7:45
The idea expands: wedge strategy from document retrieval to CPA marketplace
The skill reframes the concept as a funnel: the 1099 finder is a wedge into a larger tax-prep workflow. Garry discusses the business model upgrade—marketplace/lead-gen economics can be far bigger than charging for aggregation alone.
- •Reframing: doc aggregation as hook, broader tax workflow as expansion
- •Marketplace/lead-gen for tax preparers can monetize better than SaaS fees
- •Classic wedge strategy: solve a sharp pain, then broaden the product
- •Office Hours acts like a collaborative product strategist, not a code bot
7:45 – 8:45
The “feels illegal” hack: visible local browser automation instead of OAuth/Plaid
They pivot toward a novel implementation: have the user log in while an AI agent navigates Gmail/bank sites in a visible browser to download PDFs—no stored credentials, no cloud session. Garry emphasizes how new this approach is and why it’s now feasible.
- •Browser automation can replace OAuth/Plaid for early versions
- •User watches actions in their own browser (not ‘in the cloud’)
- •Avoids storing credentials; leans on supervised automation
- •Observation: only recently did AI make this kind of solution practical
8:45 – 9:15
Model roles in practice: ‘ADHD CEO’ Claude vs ‘autistic CTO’ Codex for hard bugs
Garry explains how he assigns different models to different “roles.” Claude is ideation-heavy and great for high-level planning, while Codex is brought in for difficult debugging and execution reliability.
- •Treat models as complementary specialists with different strengths
- •Claude (Opus) for brainstorming and momentum; Codex for tough debugging
- •Switching models is part of the team/process mindset
- •Tooling supports calling in the right ‘role’ when needed
9:15 – 12:47
Plan mode and multi-approach design: choosing the right build path
Office Hours produces a structured plan with multiple implementation approaches (smaller Gmail-only vs full automation + marketplace vs flipped go-to-market). Garry selects the more ambitious direction and iterates: skip OAuth by automating Gmail directly and infer CPA context from emails.
- •Plans present options with effort/risk tradeoffs
- •Approach selection: ambitious full-stack automation + CPA angle
- •Iteration: use browser interaction to bypass OAuth complexity
- •Product detail: ask about banks, infer existing CPA via email threads
12:47 – 13:47
Adversarial review: stress-testing feasibility, privacy, 2FA, and failure handling
The workflow runs an adversarial review pass that finds missing sections and brittle assumptions, then attempts auto-fixes. The design doc score improves after multiple rounds, showing how review gates quality before coding begins.
- •Adversarial review catches missing privacy, 2FA handoff, failure modes
- •Auto-fix attempts patch gaps directly in the plan/doc
- •Quality measured and improved across rounds (6/10 → 8/10)
- •Locks a stronger design doc before moving to implementation/design
13:47 – 16:19
AI designs the app: ‘Design Shotgun’ generates multiple UI directions
Garry jumps to a visual design tool that generates several UI concepts (command center vs friendly progress vs split view). He reviews the options, selects the most user-friendly variant, and locks it in for the build.
- •Design Shotgun = rapid visual brainstorming with multiple variants
- •Three UI directions compared and rated
- •Decision: choose the friendly card-based progress design (Option B)
- •Supports fast iteration via feedback and regeneration
16:19 – 17:49
The full GStack system: sprint loop from Office Hours to review to shipping
He zooms out to explain the broader GStack command set and the end-to-end sprint process many users follow: Office Hours, CEO/eng/design reviews, Auto Plan, coding, and post-build reviews. The emphasis is on repeatable process that scales across projects.
- •GStack includes many commands/skills beyond a single prompt flow
- •Auto Plan runs CEO/eng/design/DX reviews with default recommendations
- •Post-build review acts like staff-level bug catching and critique
- •Workflow encodes a full ‘sprint’ from idea → plan → build → review
17:49 – 19:50
Running multiple AI engineers: parallel sessions, QA automation, and browser tooling
Garry describes operating at high throughput with many parallel Claude Code sessions and PRs. He explains why QA becomes the bottleneck and how he built /QA and /browse by wrapping Playwright/Chromium to enable fast, reliable agent-driven browser testing compared to slower MCP setups.
- •Parallelization: multiple Conductor windows/branches/PRs simultaneously
- •QA becomes the limiting step once planning/coding are automated
- •Critique of slow/bloated MCP browser control; need low-latency tooling
- •/browse and /QA: Playwright CLI wrapper for clicks, screenshots, downloads, regression
19:50 – 20:51
Shipping 10x faster—while staying safe: supply chain paranoia and PR throughput
He shares his day-to-day operational workflow: spawn worktrees from ideas/bug reports, run the review pipeline, and land changes quickly—often dozens of PRs per day. He also flags supply chain attacks as a growing risk, reinforcing the value of automated review and caution.
- •Workflow replaces to-do lists with worktrees and repeatable review steps
- •High velocity: 10–50 PRs/day depending on schedule
- •Security concern: increased risk of supply chain attacks in AI-era coding
- •Ship step ensures PR readiness before merging to main
20:51 – 21:49
The only thing that matters now: the barrier collapsed—go build something people want
Garry closes by pointing viewers to the GStack repo and reinforcing the meta-lesson: building has become dramatically easier. With leverage up, the key question returns to product truth—what will you build that people actually want?
- •Call to action: try GStack (/office-hours) from GitHub
- •AI makes it historically easy to build and iterate
- •Product thinking and demand still decide outcomes
- •Closing mantra: build something people want

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

The agent era of software: teams, roles, and process for AI coding

Why raw AI coding fails: wandering models, silent breakages, and missing context

GStack overview and the “Office Hours” skill: product thinking before building

Live build kickoff in Conductor: a tax-doc 1099 finder idea

The forcing question that kills most ideas: evidence of demand and real pain

The idea expands: wedge strategy from document retrieval to CPA marketplace

The “feels illegal” hack: visible local browser automation instead of OAuth/Plaid

Model roles in practice: ‘ADHD CEO’ Claude vs ‘autistic CTO’ Codex for hard bugs

Plan mode and multi-approach design: choosing the right build path

Adversarial review: stress-testing feasibility, privacy, 2FA, and failure handling

AI designs the app: ‘Design Shotgun’ generates multiple UI directions

The full GStack system: sprint loop from Office Hours to review to shipping

Running multiple AI engineers: parallel sessions, QA automation, and browser tooling

Shipping 10x faster—while staying safe: supply chain paranoia and PR throughput

The only thing that matters now: the barrier collapsed—go build something people want

Get more out of YouTube videos.