Y CombinatorTokenmaxxing: How Top Builders Use AI To Do The Work Of 400 Engineers
CHAPTERS
Control vs. dependency: personal tools in the age of AI
The conversation opens with a framing question: will people control their AI tools, or will the tools control them. Garry uses the “Ferrari” metaphor to capture both the power and brittleness of modern agentic coding systems—fast and exhilarating, but requiring skill to maintain and debug.
- •Defining question: tool autonomy and user control
- •AI coding as a “Ferrari”: incredible performance with real operational risk
- •Builders must be able to diagnose, steer, and repair systems when they fail
- •This moment feels like an early personal-computing era for AI tooling
Returning to coding after 13 years—why now
Jared sets the premise: Garry returned from a long break and suddenly shipped an extraordinary volume of code while running YC. Garry describes the surprise of reawakening his builder identity and attributes the change to the new capabilities of tools like Claude Code.
- •Garry’s multi-year hiatus from hands-on building
- •Claimed leap in output (eventually discussed as ~400x)
- •Claude Code as the catalyst for returning to shipping
- •Building while managing a demanding full-time leadership role
Rebuilding Postrous/Posthaven as Garry’s List—at token-scale cost and speed
Garry explains the origin of Garry’s List: building a site to organize people around California policy issues that matter to him. He recounts rebuilding his old blogging platform a third time—this time in days and for roughly the cost of an AI subscription—while adding modern agentic research capabilities.
- •Garry’s List as a civic/political organizing and publishing platform
- •History: Postrous → acquired by Twitter → rebuilt as Posthaven → rebuilt again
- •Cost/time compression: from teams and millions of dollars to days and ~$200
- •Motivating issue example: access to algebra and education equity
Software that thinks like a journalist: agentic research, sourcing, and synthesis
The product evolves beyond a publishing tool into a system that performs investigative-journalism-like work. Garry describes agentic retrieval: crawling, cross-referencing sources, extracting quotables, and producing deeply sourced reports using multiple APIs and toolchains.
- •Not just a blog platform—software that performs research work directly
- •Agentic retrieval workflow: gather sources, annotate, cross-check disagreements
- •Using Perplexity/X/Groq APIs to “boil the ocean” on context collection
- •Emphasis on citations, completeness, and representing reality vs. headlines
The rise of “tokenmaxxing”: spending tokens to buy completeness and truth
Garry introduces tokenmaxxing as a philosophy: if more tokens yield more complete, higher-quality work, spend them. He argues this will extend to nearly all knowledge work, with humans supplying goals, values, and agency while machines handle the heavy lifting.
- •Tokenmaxxing = aggressively spending tokens to increase quality/completeness
- •“Boil the ocean” approach: go from 1 source to 20 and reconcile conflicts
- •Human agency remains central: the system needs direction and motivation
- •Tokenmaxxing as a general paradigm for knowledge work, not just coding
Accidental productization: how GStack emerged from repeated prompts
Garry explains that GStack wasn’t planned—it emerged from noticing repeated interactions and converting them into reusable ‘skills.’ He describes techniques like forcing models to produce ASCII diagrams of data flows and user flows before coding, which improves completeness and reduces confusion.
- •GStack started as a personal prompt/skill library in Apple Notes
- •ASCII diagrams as a pre-flight checklist: data flow, states, errors, dependencies
- •Structured “reviews” (architecture, quality, tests) to counter ‘vibe-code slop’
- •Learning to push toward strong test coverage (80–90% as practical target)
The 400x workflow: queued PRs, plan-first execution, and multi-agent reviews
Garry walks through his day-to-day system for shipping: using plan mode, batching features, and running a sequence of specialized reviews before execution. The workflow is designed to scale output while preserving quality through automated checks and human-in-the-loop decision points.
- •Plan mode → approve → execution loop for rapid iteration
- •Queueing many PRs/features in parallel, then validating systematically
- •Review sequence: office hours/CEO review → design → devex → ENG review
- •Human-in-the-loop via “ask user question” as a critical control mechanism
QA automation and the birth of a browser harness (Playwright → Browse/QA)
Manual testing became the bottleneck, so Garry tried to automate QA using browser tooling. Slow MCP browser control pushed him to wrap Playwright, which evolved into a long-lived daemon with a CLI (‘Browse’) and a QA mode that tests UI/data mutations based on branch context.
- •Manual QA as throughput limiter once tests pass but UX still needs checking
- •Chrome/MCP interaction was too slow for practical QA loops
- •Wrapping Playwright led to a durable browser harness with many commands
- •QA prompt instructs: detect UI/data mutations → run black-box browser tests
Thin Harness, Fat Skills: where code ends and ‘latent space’ begins
Garry crystallizes his philosophy: don’t rebuild harnesses; invest in high-leverage skills (often in markdown) that guide agents effectively. He contrasts deterministic code (brittle, exact) with LLM ‘latent space’ (contextual, adaptable), and argues the art is choosing the right boundary between them.
- •Harness = core loop + tool calling; reuse instead of rebuilding repeatedly
- •“Fat skills” as reusable plain-English playbooks that compile via LLMs
- •Deterministic code for actions (e.g., calling APIs), skills for flexible judgment
- •Testing remains non-negotiable to avoid fragile, user-facing ‘slop’
Agents are like Ferraris: brittle systems, mechanics mindset, and self-healing loops
The group returns to the Ferrari metaphor: the tools are powerful but can fail in surprising ways. A key shift is that brittleness matters less when another agent (or Claude Code) can continuously repair and maintain the system, accelerating iteration despite instability.
- •Evolution: Stack Overflow → ChatGPT → Claude Code (execution) → agent stacks
- •Brittleness is tolerable when repair is fast and automated
- •“Have another agent fixing it” becomes a new operational model
- •Analogy to early kit computers: exciting capability with hands-on maintenance
Gbrain and OpenClaw: building a personal AI with better memory than grep
Garry describes moving beyond Claude Code into OpenClaw-based workflows and building ‘Gbrain’ to improve how agents use personal context. He notes that naive approaches (grep over markdown) waste context, and he reuses RAG learnings from Garry’s List—chunking, embeddings, and hybrid retrieval—into a personal system.
- •Shift: significant time now spent in OpenClaw while building Gbrain
- •Karpathy-inspired idea: personal knowledge wiki + agent access
- •Problem: grep-based retrieval is wasteful; leads to deeper RAG work
- •Reusing proven patterns: chunking, embeddings, hybrid RRF, pgvector/Postgres
400x output and the lines-of-code controversy: what it measures and what it doesn’t
Garry addresses backlash over citing lines of code as a productivity metric. He argues LoC is imperfect but can be normalized (logical lines), and that AI-directed coding changes incentives versus human LoC ‘padding’; the bigger point is that the tools raise the ceiling on what a capable builder can ship.
- •LoC isn’t a full productivity metric, but can be standardized/normalized
- •After normalization, Garry’s multiplier increased rather than decreased
- •Typical pro engineer output (tested/prod) cited as tens of LoC/day historically
- •Core claim: tokenmaxxing + taste/skill dramatically increases capability
The future of personal AI and buying back time with tokens
The episode closes on a vision: everyone will have a personal AI with their own data, integrations, and prompts—or else rely on corporate-controlled feeds and opaque incentives. Token spend becomes a way to ‘buy time’ by borrowing machine labor, turning builders into “time billionaires” through scalable machine work.
- •Personal AI as the next personal computer revolution; a real societal choice
- •Owning prompts/data/integrations keeps you ‘above the API line’
- •Tokenmaxxing as an investment (like paying SF rent for serendipity)
- •Tokens buy machine time: leverage “millions of years” of computation to build