Tom Brown: How Building GPT-3 Led to Founding Anthropic

Through GPT-3 scaling laws and then Claude Code architecture; Tom Brown traces the path from a B-minus in linear algebra to building frontier AI infrastructure.

Tom BrownguestGarry TanhostJared Friedmanhost

Aug 18, 202535mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Anthropic Co-founder on Scaling AI, Claude Code, and Startup Grit

Tom Brown, Anthropic co-founder and former OpenAI/GPT-3 engineer, traces his path from early YC startups through OpenAI and Google Brain to co-founding Anthropic with a mission-focused team. He explains how scaling laws and infrastructure choices around GPUs, TPUs, and Trainium shaped GPT-3 and Anthropic’s current systems, and why compute is driving humanity’s largest-ever infrastructure build-out.
The conversation dives into Anthropic’s early days, the uncertainty around products, and how Claude 3.5’s unexpectedly strong coding performance and the launch of Claude Code became turning points for the company. Brown highlights their philosophy of not “teaching to the test,” focusing on internal evals, dogfooding, and building tools where Claude itself is treated as a core “user.”
He also discusses competitive dynamics with startups, the importance of mission-aligned culture, and the massive constraints emerging around power, hardware, and data centers.
For younger engineers, Brown emphasizes taking more risks, optimizing for work that an idealized version of themselves would be proud of, and not over-indexing on traditional credentials.

IDEAS WORTH REMEMBERING

5 ideas

Treat uncertainty in early careers as a training ground for initiative, not a deficit.

Brown contrasts big-tech roles with early startups as the place he learned to stop waiting for tasks and instead adopt a ‘wolf’ mindset—taking ownership for finding and creating work, which later enabled him to tackle ambitious AI projects.

Scaling laws plus better algorithms made massive AI progress predictable years in advance.

Seeing a near-straight scaling curve over ~12 orders of magnitude convinced Brown and colleagues to pivot fully into scaling, anticipating that more compute applied with the right recipe would reliably increase capability.

Infrastructure decisions (chips, frameworks, and software stacks) are strategic leverage points.

The move from TPUs/TensorFlow to GPUs/PyTorch at OpenAI sped iteration and enabled GPT-3’s scale; Anthropic now deliberately runs on GPUs, TPUs, and Trainium to absorb capacity and match the right chip to the right job, despite engineering overhead.

Mission-first hiring and transparent communication can scale a 2,000-person lab with low politics.

Anthropic’s founding group and first ~100 employees joined largely for existential safety motives, not prestige, which Brown credits with preserving a mission-oriented culture and making it easier to call out misaligned behavior.

Not optimizing for public benchmarks can produce better real-world performance.

Anthropic avoids teams dedicated to ‘gaming’ published benchmarks, focusing instead on internal evals and dogfooding (especially for code) to improve practical usefulness—explaining why founders often see outsized coding gains compared to benchmark deltas.

WORDS WORTH SAVING

5 quotes

Big tech just teaches you to work at a big tech company, whereas it’s much more fun to be a wolf.

— Tom Brown

Seeing that line of reliably you get more intelligence if you spend more compute with the right recipe was the main thing that was like, this is happening now.

— Tom Brown

We don’t teach to the test, because if you start doing that, then it has weird bad incentives.

— Tom Brown

One thing that’s interesting to look at is just that humanity is on track for the largest infrastructure build-out of all time.

— Tom Brown

Taking more risks is wise… work on stuff where an idealized version of yourself would be really proud of you if you succeeded.

— Tom Brown

Tom Brown’s career path from early YC startups to OpenAI, Google Brain, and AnthropicLessons from GPT-3: scaling laws, infrastructure, and the move from TPUs to GPUsFounding Anthropic: mission focus, early team dynamics, and cultureClaude’s evolution, particularly Claude 3.5 Sonnet and its strength in codingDesign and impact of Claude Code and model-as-user product thinkingEvaluation philosophy: internal benchmarks, not teaching to public tests, and personality/alignmentMassive AI compute build-out: hardware diversity, power constraints, and data center strategy

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.