The Twenty Minute VCThe $100,000 token budget EVERY engineer will need | Sierra Co-Founder
At a glance
WHAT IT’S REALLY ABOUT
Sierra co-founder on token budgets, enterprise agents, and AI demand
- Bavor explains why he left an 18-year Google career to co-found Sierra when LLMs reshuffled the competitive landscape for startups.
- Sierra chose not to pre-train foundation models, instead “slipstreaming” hyperscaler/lab investment and focusing on agent frameworks, fine-tunes, and deep product engineering.
- He argues demand for frontier-level intelligence is effectively unbounded, and reasoning/agentic workflows plus compute scarcity will keep token costs from collapsing.
- Sierra runs itself with an internal agent (“Pinecone”) connected via an MCP gateway to company systems, plus a “Sierra Brain” strategy partner grounded in internal documents.
- Sierra’s go-to-market relies on forward-deployed teams to implement quickly in complex enterprises, reinforced by operating rituals like memo-based boards every six weeks and AI-native hiring interviews.
IDEAS WORTH REMEMBERING
5 ideasTraining frontier models is a poor default for startups.
Bavor calls foundation models a “perishable bag of floating-point numbers” with recurring capital expense that only a few companies can sustain; Sierra instead uses open-weights plus proprietary fine-tunes and invests lower in the stack where it creates differentiation (agent architecture and frameworks).
Frontier models won’t be displaced; they’ll be selectively used alongside cheaper models.
He predicts an “assembly line” where yesterday’s frontier becomes tomorrow’s cheap open-weights workhorse, but high-stakes domains (coding, science, legal) will still justify frontier intelligence, leading to task-based routing and mixing models by capability/cost.
Token costs can rise even as hardware improves.
Reasoning models and agents consume more inference (“thinking out loud”), while GPU/power constraints create a price floor; unbounded demand plus limited Blackwells/H100s keeps compute scarce, preventing token prices from simply trending down.
Per-employee token budgets will become standard operating practice.
He’s seeing top engineers run-rate >$100K/year in token spend using Claude Code/Codex; he expects CFOs to allocate “salary + token budget” and believes steady-state token spend could approach ~20% of developer compensation rather than low single digits.
Internal AI agents become force multipliers when connected to real systems with permissions.
Sierra’s MCP gateway aggregates Slack/docs/reviews/etc. into a single interface for multiple agents, enabling Pinecone to assist across engineering and operations (e.g., scanning interview packets) while keeping access control aligned to each employee.
WORDS WORTH SAVING
5 quotesI think we have not yet appreciated the unbounded demand for, call it frontier levels of intelligence.
— Clay Bavor
The capital expense, uh, uh, the ongoing capital expense to create what is effectively a highly perishable bag of floating-point numbers, it just doesn't work, just doesn't work for any but a small number of companies.
— Clay Bavor
I have heard and I have observed that top engineers who are really leaning in to Claude Code, Codex, and so on are spending more than $100,000 on a run rate basis on tokens per year.
— Clay Bavor
I would not bet on 3.8%. I would bet on much closer to 20%.
— Clay Bavor
Writing is just thinking on paper, and I, I think it's very hard to hide from writing.
— Clay Bavor
High quality AI-generated summary created from speaker-labeled transcript.