Aidan Gomez: What No One Understands About Foundation Models | E1191

Aidan Gomez is the Co-founder & CEO at Cohere, the leading AI platform for enterprise, having raised over $1BN from some of the best with their last round pricing the company at a whopping $5.5BN. Prior to Cohere, Aidan co-authored the paper “Attention is All You Need,” which introduced the groundbreaking Transformer architecture. He also collaborated with a number of AI luminaries, including Geoffrey Hinton and Jeff Dean, during his time at Google Brain, where the team focused their efforts on large-scale machine learning. ----------------------------------------------- Timestamps: (00:00) Intro (00:45) Childhood & Backround (04:29) Is More Compute the Only Path to Better Performance? (08:07 ) Can Anyone Afford to Stay in the AI Race Besides Tech Giants? (13:44) Is AI Heading Toward a Race to the Bottom? (16:55) Will Companies Keep Building Their Own Chips? (18:30) Is Model Progression Outpacing Compute Advancement? (19:41) Early Challenges in Accessing Compute Chips (23:48) Are We Underestimating the Short-Term Impact of AI Advancements? (27:06) Is It Too Late for Startups to Enter the AI Model Space? (27:55) AI Development: The Exponential Rise in Costs (30:40) Will Cloud Giants Continue Acquiring Smaller AI Model Providers? (35:10) Is OpenAI Prioritizing AGI Over Practical Products? (48:29) What's the Biggest Overlooked Factor in AI's Future? (50:09) Concerns About a Future Where AI Replaces Human Interaction (54:20) What Will AI Do in Three Years That It Doesn't Do Today? (55:48) Quick-Fire Round ----------------------------------------------- In Today’s Episode with Aidan Gomez We Discuss: 1. Compute vs Data: What is the Bottleneck: Does Aidan believe that more compute will result in an equal increase in performance? How much longer do we have before it becomes a case of diminishing returns? What does Aidan mean when he says “he has changed his mind massively on the role of data”? What did he believe? How has it changed? 2. The Value of the Model: Given the demand for chips, the consumer need for applications, how does Aidan think about the inherent value of models today? Will any value accrue at the model layer? How does Aidan analyze the price dumping that OpenAI are doing? Is it a race to the bottom on price? Why does Aidan believe that “there is no value in last year’s model”? Given all of this, is it possible to be an independent model provider without being owned by an incumbent who has a cloud business that acts as a cash cow for the model business? 3. Enterprise AI: It is Changing So Fast: What are the biggest concerns for the world’s largest enterprises on adopting AI? Are we still in the experimental budget phase for enterprises? What is causing them to move from experimental budget to core budget today? Are we going to see a mass transition back from Cloud to On Prem with the largest enterprises not willing to let independent companies train with their data in the cloud? What does AI not do today that will be a gamechanger for the enterprise in 3-5 years? 4. The Wider World: Remote Work, Downfall of Europe and Relationships: Given humans spending more and more time talking to models, how does Aidan reflect on the idea of his children spending more time with models than people? Does he want that world? Why does Aidan believe that Europe is challenged immensely? How does the UK differ to Europe? Why does Aidan believe that remote work is just not nearly as productive as in person? ----------------------------------------------- Subscribe on Spotify: https://open.spotify.com/show/3j2KMcZTtgTNBKwtZBMHvl?si=85bc9196860e4466 Subscribe on Apple Podcasts: https://podcasts.apple.com/us/podcast/the-twenty-minute-vc-20vc-venture-capital-startup/id958230465 Follow Harry Stebbings on Twitter: https://twitter.com/HarryStebbings Follow Aidan Gomez on Twitter: https://twitter.com/aidangomez Follow 20VC on Instagram: https://www.instagram.com/20vchq Follow 20VC on TikTok: https://www.tiktok.com/@20vc_tok Visit our Website: https://www.20vc.com Subscribe to our Newsletter: https://www.thetwentyminutevc.com/contact ----------------------------------------------- #20vc #harrystebbings #aidangomez #cohere #openai #venturecapital #founder #computing

Aidan GomezguestHarry Stebbingshost

Aug 19, 20241h 3mWatch on YouTube ↗

CHAPTERS

0:00 – 2:31
From rural Ontario to computer obsession: dial-up, gaming, and early CS curiosity
Aidan describes growing up in a remote, forested part of Ontario with limited access to technology and internet. That scarcity—paired with a love of gaming—pushed him to obsess over making computers and connectivity work better, ultimately leading him toward computer science.
- •Rural upbringing with minimal tech access shaped his motivation
- •Years on dial-up created a fixation on speed and systems
- •Gaming as an early gateway into technology interest
- •Early tinkering led naturally into learning coding and how the web works
2:31 – 4:29
What gaming teaches founders: grinding, resilience, and learning through failure
The discussion connects gaming habits to founder traits: willingness to grind, comfort with repetition, and resilience after failure. Aidan highlights the “respawn” mentality as a powerful psychological model for iteration and improvement.
- •Games train persistence through repetitive difficulty
- •Failure is normalized—retrying is built into the experience
- •Iteration leads to measurable progress across attempts
- •Mindset shift: you can recover from mistakes and improve
4:29 – 7:11
Scaling laws reality check: more compute works, but it’s the most inefficient path
Harry pushes on whether more compute is still the main driver of model improvements. Aidan agrees scaling reliably boosts performance but argues it’s an inefficient, brute-force approach—especially as smaller models rapidly catch up through better techniques.
- •Scaling compute and parameters remains the most reliable improvement lever
- •It’s low-risk for well-capitalized players but economically wasteful
- •Recent progress: dramatically smaller models can match/beat older huge ones
- •Market forces increasingly reward efficiency over raw scale
7:11 – 8:07
Horizontal vs vertical models: prototype big, then distill into focused systems
Aidan predicts a long-term world of both general-purpose foundation models and specialized vertical models. He explains a common pattern: teams prototype with a powerful general model, then distill or fine-tune into cheaper, task-specific models for production.
- •Ecosystem will include both horizontal and verticalized models
- •Big models are best for fast prototyping and proving feasibility
- •Distillation turns expensive prototypes into efficient production models
- •Specialization reduces cost and improves deployment practicality
8:07 – 8:54
Who can afford the AI race: beyond hyperscalers via data and method innovation
They discuss whether only Big Tech can remain competitive given massive training costs. Aidan argues that if you only pursue scaling, you need hyperscaler backing—but there’s still room to compete through data innovation and new training methods.
- •Pure scaling favors hyperscalers or their “subsidiaries”
- •Alternative paths include data, algorithms, and method breakthroughs
- •Economic constraints limit how far expensive models can be pushed
- •Pressure on price drives innovation toward smaller, smarter systems
8:54 – 13:45
Data and method innovations: better scraping, synthetic data, and reasoning-focused training
Aidan breaks down what “data innovation” means (higher-quality curated datasets and synthetic data) and what “method innovation” could look like (RL, search, and letting models think/try/fail). A key barrier to reasoning is that the internet rarely shows the intermediate steps of thought, so companies must generate that data.
- •Open-source gains largely driven by improved data quality and curation
- •Synthetic data is increasingly central to training and distillation pipelines
- •Method innovation includes RL, search, and iterative problem-solving loops
- •Reasoning data is scarce online because people publish conclusions, not work
13:45 – 15:53
Commoditization and price dumping: models trend to low-margin, value shifts to chips and apps
Harry raises concerns about OpenAI price cuts and Meta’s free releases driving a race to the bottom. Aidan agrees model-only businesses face tight margins in the near term, with value accruing at the chip layer (capex) and at the application layer (monetization).
- •Model APIs face pricing pressure from dumping and open-source releases
- •Model-only businesses risk becoming near-zero-margin in the short term
- •Value concentrates at chips/infrastructure and at applications/products
- •Cohere signals it will expand its product suite beyond only model access
15:53 – 18:30
Chips, platform optionality, and the next wave of training hardware competition
Aidan describes growing chip spend and why Cohere supports multiple chip providers and clouds: customers demand optionality and avoidance of lock-in. They discuss vertical integration trends and a future where training hardware becomes more competitive beyond Nvidia, including TPUs, AMD, and others.
- •Chip spend has become a dominant cost center
- •Multi-platform support reduces customer lock-in and meets enterprise requirements
- •Verticalization into chips is attractive due to high margins and limited supply
- •Training compute is becoming more heterogeneous (TPUs proven; more entrants coming)
18:30 – 20:13
Compute supply chain and infrastructure strategy: partner data centers unless economics flip
They explore whether model progress outpaces data center buildout and whether companies should build their own infrastructure. Aidan says Cohere partners today, but would build data centers if economics or access to a compelling chip demanded it; early compute access was easier because Cohere predates the GPU crunch.
- •Potential misalignment between model iteration speed and compute availability
- •Cohere doesn’t build data centers now, but would if it became cheaper/necessary
- •Infrastructure decisions depend on chip availability and provider procurement
- •Early days benefited from starting before the current compute supply constraints
20:13 – 23:48
Transformers to ChatGPT: why adoption suddenly exploded and why chat/voice interfaces matter
Aidan reflects on co-authoring the Transformer paper and not anticipating the architecture’s massive consolidation across AI. He identifies ChatGPT as the key inflection point because it put the capability directly in users’ hands, then explains why chat isn’t universal and why voice is a uniquely powerful interface.
- •Transformer’s downstream impact was not obvious in 2017
- •ChatGPT accelerated adoption by making the tech experiential, not theoretical
- •Chat is useful but not the right interface for everything; GUIs still matter
- •Voice interaction feels emotionally compelling and could drive major consumer shifts
23:48 – 27:06
Short-term vs long-term progress: gains get harder, but the “plateau” narrative is wrong
Harry asks whether we’re underestimating near-term AI progress. Aidan argues improvements are getting more expensive because training increasingly requires scarce domain experts, yet major method-based breakthroughs (reasoning, planning, long-horizon tasks) are still coming and will unlock new capability jumps.
- •Incremental improvements require more specialized, expensive human expertise
- •Compute cost trends still fall, but data/teaching bottlenecks rise
- •Perceived progress may feel slower to non-experts despite real capability gains
- •Upcoming method advances: planners/reasoners, try-fail-recover, long-horizon autonomy
27:06 – 37:46
Is it too late for startups in model-building? ‘No market for last year’s model’ + consolidation risks
They debate whether falling compute costs enable new startups to enter the model space. Aidan notes that while last-gen models get cheaper fast, demand concentrates on the newest generation—so being behind is fatal; he also predicts consolidation and warns against becoming a cloud provider’s dependent subsidiary.
- •Barrier drops primarily for older generations, not the frontier
- •‘There’s no market for last year’s model’—obsolescence is rapid
- •Progress is still worth funding, but value depends on who pays for it
- •Consolidation is likely; dependence on a cloud investor/provider can be dangerous
37:46 – 43:52
Enterprise adoption: trust, private deployments, RAG to reduce hallucinations, and shift from POC to production
Aidan outlines why enterprises hesitate: security, IP risk, and distrust of data usage. He explains Cohere’s approach (private/VPC/on-prem deployments) and how RAG provides citations and reduces hallucinations, while enterprises increasingly move from experimentation to production urgency with workforce augmentation as the top use case.
- •Top blocker: trust/security and fear of training on proprietary enterprise data
- •Private deployment models reduce data exposure and alleviate adoption barriers
- •RAG enables retrieval + citation, cutting hallucinations and enabling customization
- •Enterprise budgets are shifting from POCs to production; employee augmentation leads demand
43:52 – 50:09
Agents and copilots: why true workforce augmentation needs tool-agnostic platforms and better reasoning
They discuss agent hype and whether the best agent products will be built by model builders or application-layer companies. Aidan argues agents are the promise of AI, but success depends on reasoning quality and model-level control; he also critiques siloed copilots and emphasizes enterprise tool diversity.
- •Agent hype is justified: long-horizon autonomous work is transformative
- •Agent performance hinges on the underlying model’s reasoning/planning
- •Non-model-builders can be structurally disadvantaged without model-level levers
- •Enterprise assistants must span many tools (Office, Salesforce, SAP, internal apps), not one ecosystem
50:09 – 55:55
Human displacement fears, social implications, and where AI could break through next: robotics
Harry worries about AI replacing human interaction and jobs; Aidan argues humans remain central, with localized displacement but net productivity growth. He then points to robotics as a likely major breakthrough area as foundation-model-based planning reduces brittleness and enables more general-purpose machines.
- •AI likely augments rather than replaces humans in most high-accountability contexts
- •Some roles (e.g., customer support) may see meaningful localized displacement
- •Bots may handle emotionally taxing interactions, with humans focusing on higher-value cases
- •Robotics could see big gains as planners/reasoners become more robust and adaptable
55:55 – 1:03:26
Quick-fire: underrated importance of data, fundraising realities, Europe vs UK tech culture, and productivity as the north star
In a rapid Q&A, Aidan says he most changed his mind about the importance and sensitivity of data quality. He discusses Cohere’s fundraising scale, remote vs in-person work, his views on UK vs broader European tech attitudes, and closes with a focus on productivity growth as the most important (and under-hyped) AI outcome.
- •Data quality is highly leverageable; small amounts of bad data can matter a lot
- •Raising at the scale of hundreds of millions distorts intuition about money and competition
- •UK shows more tech optimism than much of Europe, where regulation-first attitudes dominate
- •Desired direction: use AI to boost productivity, abundance, and economic growth

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

From rural Ontario to computer obsession: dial-up, gaming, and early CS curiosity

What gaming teaches founders: grinding, resilience, and learning through failure

Scaling laws reality check: more compute works, but it’s the most inefficient path

Horizontal vs vertical models: prototype big, then distill into focused systems

Who can afford the AI race: beyond hyperscalers via data and method innovation

Data and method innovations: better scraping, synthetic data, and reasoning-focused training

Commoditization and price dumping: models trend to low-margin, value shifts to chips and apps

Chips, platform optionality, and the next wave of training hardware competition

Compute supply chain and infrastructure strategy: partner data centers unless economics flip

Transformers to ChatGPT: why adoption suddenly exploded and why chat/voice interfaces matter

Short-term vs long-term progress: gains get harder, but the “plateau” narrative is wrong

Is it too late for startups in model-building? ‘No market for last year’s model’ + consolidation risks

Enterprise adoption: trust, private deployments, RAG to reduce hallucinations, and shift from POC to production

Agents and copilots: why true workforce augmentation needs tool-agnostic platforms and better reasoning

Human displacement fears, social implications, and where AI could break through next: robotics

Quick-fire: underrated importance of data, fundraising realities, Europe vs UK tech culture, and productivity as the north star

Get more out of YouTube videos.