Skip to content
The Twenty Minute VCThe Twenty Minute VC

The $100,000 token budget EVERY engineer will need | Sierra Co-Founder

Clay Bavor is the Co-Founder of Sierra, one of the world's fastest-growing enterprise AI companies. Sierra is valued at approximately $15.8 billion, has raised more than $1.5BN from leading investors including Sequoia, Benchmark, Greenoaks, GV and Tiger Global, and today serves more than 40% of the Fortune 50. The company recently surpassed $150 ARR, making it one of the fastest-growing enterprise software businesses in history. ----------------------------------------------- Timestamps: 0:00 Intro 1:37 Why Clay finally said yes to building Sierra with Bret 5:53 Why Sierra chose not to train their own foundation models 7:15 The case for unbounded demand for frontier intelligence 10:41 Why token costs are rising, not falling 14:35 Open models: the US vs China gap 18:36 Inside Pinecone, Sierra's internal AI agent 26:00 Staying close to customers as an enterprise AI company 33:22 Forward deployed teams: kickoff to live in 6 weeks 43:02 Sierra's core values: craftsmanship, intensity, family 55:41 Advice for young people entering the AI job market 1:02:35 Quickfire: Sundar, books, and parenting lessons ---------------------------------------------------------------------------------------------- Subscribe on Spotify: https://open.spotify.com/show/3j2KMcZTtgTNBKwtZBMHvl?si=85bc9196860e4466 Subscribe on Apple Podcasts: https://podcasts.apple.com/us/podcast/the-twenty-minute-vc-20vc-venture-capital-startup/id958230465 Follow Harry Stebbings on X: https://twitter.com/HarryStebbings Follow Clay Bavor on X: https://twitter.com/claybavor Follow 20VC on Instagram: https://www.instagram.com/20vchq Follow 20VC on TikTok: https://www.tiktok.com/@20vc_tok Visit our Website: https://www.20vc.com Subscribe to our Newsletter: https://www.thetwentyminutevc.com/contact ----------------------------------------------- #20vc #harrystebbings #founder #ai

Clay BavorguestHarry Stebbingshost
Jul 4, 20261h 11mWatch on YouTube ↗

At a glance

WHAT IT’S REALLY ABOUT

Sierra co-founder on token budgets, enterprise agents, and AI demand

  1. Bavor explains why he left an 18-year Google career to co-found Sierra when LLMs reshuffled the competitive landscape for startups.
  2. Sierra chose not to pre-train foundation models, instead “slipstreaming” hyperscaler/lab investment and focusing on agent frameworks, fine-tunes, and deep product engineering.
  3. He argues demand for frontier-level intelligence is effectively unbounded, and reasoning/agentic workflows plus compute scarcity will keep token costs from collapsing.
  4. Sierra runs itself with an internal agent (“Pinecone”) connected via an MCP gateway to company systems, plus a “Sierra Brain” strategy partner grounded in internal documents.
  5. Sierra’s go-to-market relies on forward-deployed teams to implement quickly in complex enterprises, reinforced by operating rituals like memo-based boards every six weeks and AI-native hiring interviews.

IDEAS WORTH REMEMBERING

5 ideas

Training frontier models is a poor default for startups.

Bavor calls foundation models a “perishable bag of floating-point numbers” with recurring capital expense that only a few companies can sustain; Sierra instead uses open-weights plus proprietary fine-tunes and invests lower in the stack where it creates differentiation (agent architecture and frameworks).

Frontier models won’t be displaced; they’ll be selectively used alongside cheaper models.

He predicts an “assembly line” where yesterday’s frontier becomes tomorrow’s cheap open-weights workhorse, but high-stakes domains (coding, science, legal) will still justify frontier intelligence, leading to task-based routing and mixing models by capability/cost.

Token costs can rise even as hardware improves.

Reasoning models and agents consume more inference (“thinking out loud”), while GPU/power constraints create a price floor; unbounded demand plus limited Blackwells/H100s keeps compute scarce, preventing token prices from simply trending down.

Per-employee token budgets will become standard operating practice.

He’s seeing top engineers run-rate >$100K/year in token spend using Claude Code/Codex; he expects CFOs to allocate “salary + token budget” and believes steady-state token spend could approach ~20% of developer compensation rather than low single digits.

Internal AI agents become force multipliers when connected to real systems with permissions.

Sierra’s MCP gateway aggregates Slack/docs/reviews/etc. into a single interface for multiple agents, enabling Pinecone to assist across engineering and operations (e.g., scanning interview packets) while keeping access control aligned to each employee.

WORDS WORTH SAVING

5 quotes

I think we have not yet appreciated the unbounded demand for, call it frontier levels of intelligence.

Clay Bavor

The capital expense, uh, uh, the ongoing capital expense to create what is effectively a highly perishable bag of floating-point numbers, it just doesn't work, just doesn't work for any but a small number of companies.

Clay Bavor

I have heard and I have observed that top engineers who are really leaning in to Claude Code, Codex, and so on are spending more than $100,000 on a run rate basis on tokens per year.

Clay Bavor

I would not bet on 3.8%. I would bet on much closer to 20%.

Clay Bavor

Writing is just thinking on paper, and I, I think it's very hard to hide from writing.

Clay Bavor

Leaving Google and timing Sierra’s foundingBuild vs buy in foundation models (pre-training vs fine-tuning)Frontier intelligence demand and model mixing strategiesToken economics: reasoning models, supply constraints, and budgetingUS vs China open-weights gap and distillationInternal agents (MCP gateway, Pinecone, Sierra Brain)Enterprise deployment: forward-deployed teams, speed to production, vertical expertiseOperating cadence: 6-week board memos and truth-seeking cultureCompany values: craftsmanship, intensity, family (plus trust, customer obsession)AI-era hiring and interviewing for engineers

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.