The Twenty Minute VCMike Krieger, Instagram CoFounder & Anthropic CPO: Where Will Value Be Created in an AI World?|E1265
CHAPTERS
- 0:31 – 2:42
Where AI startups create durable value: GTM, domain expertise, and proprietary data
Harry opens by asking where venture-scale value will accrue in an AI-driven decade. Mike argues durability comes less from generic model wrappers and more from differentiated go-to-market, deep industry knowledge, and unique data access—especially in complex regulated verticals.
- •Durable moats: differentiated GTM, domain expertise, and special/unique data
- •Vertical complexity (e.g., healthcare, legal, finance) rewards non-obvious legwork
- •Use foundation models as leverage; fine-tune/specialize when needed
- •Long-term advantage comes from learning loops once deployed in a vertical
- 2:42 – 4:31
Incumbents vs new entrants in vertical AI: the trust and expectation trap
They explore whether AI’s next wave favors vertical SaaS incumbents or new startups. Mike frames it as a product-design and expectation-management problem: startups can “dream louder,” while incumbents risk breaking trust if AI features underdeliver.
- •Startups can push the frontier with early adopters; incumbents face higher expectation risk
- •Incumbents must evolve without alienating existing customers/behaviors
- •Startups lack relationships/data but can win with a compelling future narrative
- •Key challenge: don’t overpromise capabilities that models can’t reliably deliver yet
- 4:31 – 6:55
Build for today’s models or tomorrow’s breakthroughs? Don’t wait—iterate into the frontier
Harry asks how founders should plan when model capability shifts can make or break products. Mike’s view: exploring early is valuable even if current systems are frustrating, because the winners are usually those who’ve already built context and workflow understanding when the “right” model arrives.
- •Many products become viable only after a step-change in model accuracy/capability
- •Early “lovingly assembled” systems build domain learning and workflow context
- •Model leaps reward teams already iterating (example: Cursor’s multiple attempts)
- •Guidance: don’t wait for perfection; aggressively test each new generation
- 6:55 – 10:20
Is the foundation model layer commoditizing? Three defensible advantages for labs
They move to whether there’s lasting value at the model layer. Mike outlines three durable advantages for frontier labs: talent density aligned to mission, differentiated model characteristics/focus areas, and enterprise-grade partnership (not just token vending).
- •Defensibility #1: talent attraction/retention and breakthrough capacity
- •Defensibility #2: models will differentiate (style, strengths) rather than converge
- •Defensibility #3: deep customer relationships and ‘AI partnership’ beyond APIs
- •Failure mode: incremental benchmark chasing + treating API as pure commodity
- 10:20 – 13:00
What actually blocks progress: real-world environments, evals, and agentic workflows
Asked about the biggest bottleneck (compute, data, algorithms), Mike emphasizes training/evaluating models in environments that resemble real work. Today’s evals measure narrow tasks; the hard part is multi-step, social, organizational, and iterative collaboration.
- •Main blocker: environments/evals that match real-world, multi-turn work
- •Software engineering is more than writing code: requirements, planning, iteration
- •Need agentic evals and broader “office professional” task evaluations
- •Goal: models that become reliable collaborators, not narrow point-solvers
- 13:00 – 15:35
Human vs synthetic data—and the missing piece: measuring ‘vibes’ and character
They discuss whether future gains come from synthetic data compounding or continued reliance on human data. Mike argues it must be a mix, and adds a less-discussed frontier: training and evaluating qualitative ‘feel’—tone, personality, and user experience—where regression testing is weak.
- •Progress needs both human ‘seed’ data and synthetic environments for exploration
- •Games illustrate controllable synthetic environments; real-world tasks are harder
- •Character/tone (‘vibes’) is hard to evaluate and easy to regress between versions
- •Better data + evals needed for soft skills, not just benchmark performance
- 15:35 – 18:02
Leaky abstractions in AI UX: model selection, memory, and prompting should disappear
Harry predicts model choice will become irrelevant; Mike agrees current UX exposes too much internal machinery. He flags three ‘leaks’ that should be abstracted away: choosing models, fragmented chat memory/context, and the skill gap between expert and novice prompting.
- •Model pickers are confusing; most users can’t rationally choose variants
- •Chat/threading lacks shared persistent memory like real coworkers have
- •Prompting should become transparent; systems should ask clarifying questions
- •Design goal: collapse the prompter/non-prompter gap across generations
- 18:02 – 20:13
Model quality vs product UX: you’re designing a scaffold around non-determinism
Mike argues model quality and product design can’t be separated anymore. Building AI products means shaping behavior through prompts, reasoning settings, tool use, and robust evaluation/regression testing—because changes can come from models, prompts, or UI decisions.
- •AI products are non-deterministic; UX includes prompts, evals, and system behavior
- •Product decisions: follow-up questions vs none; longer reasoning vs faster outputs
- •Need strong evaluation frameworks to prevent silent regressions over time
- •Hard debugging: failures may stem from model updates, prompt changes, or features
- 20:13 – 28:28
Shipping and marketing in a hyper-competitive release cycle: staying nimble without breaking trust
They examine the pressure of constant launches across labs and the resulting product-marketing chaos. Mike contrasts API expectations (stability, opt-in betas) with consumer/enterprise surfaces, and describes how launch timing now feels reactive amid weekly competitive drops.
- •APIs prioritize predictability; experiments often gated behind opt-in/beta headers
- •Consumer experiences need faster iteration and less friction than opt-ins
- •Launch timing is chaotic (‘Crossy Road’); teams constantly read competitive signals
- •Internal mindset: avoid ‘we’re so back/it’s so over’ emotional whiplash
- 28:28 – 31:49
Open source and distillation: usefulness vs sustainability, security, and incentives
Harry probes whether distillation is ‘wrong’ and what open source implies about value distribution. Mike distinguishes internal distillation as a practical technique from cross-entity copying, raising national security and long-term commercialization incentives as key concerns.
- •Distillation is valuable internally to make models cheaper/faster to serve
- •Cross-nation or uncontrolled distillation raises security and policy concerns
- •Sustainable frontier progress requires viable commercialization models
- •Open source can thrive without distillation; ToS and provenance still matter
- 31:49 – 38:01
China, DeepSeek, and the breakthrough playbook: narrative, product speed, and UX novelty
They discuss underestimating China’s AI capability and what Anthropic learned from DeepSeek. Mike highlights that the surprise wasn’t frontier talent; it was the speed of productization, the geopolitical narrative, and the novelty of features like visible chain-of-thought that captured attention.
- •China’s frontier capability shouldn’t be surprising; avoid Western-centric assumptions
- •DeepSeek’s breakthrough: compelling cost/efficiency narrative matched the moment
- •Product lesson: ship ideas faster; novelty can be valuable even if imperfect
- •Chain-of-thought display may shift as distillation risks and UI patterns evolve
- 38:01 – 43:44
From model provider to application provider: what to build, and why Claude Code exists
Harry asks when a model company should build applications. Mike sets criteria: prioritize broadly generalizable products, avoid overly bespoke vertical apps, and focus on areas where first-party products accelerate learning—illustrated by Claude Code’s internal dogfooding to model improvements.
- •App-bet criteria: generalizability across users/surfaces; careful resource allocation
- •Claude Code started as internal acceleration, then shipped externally
- •Anthropic focuses on agentic loops rather than building a full IDE
- •First-party products create tighter feedback loops that improve next model versions
- 43:44 – 48:32
The future software developer: delegation, review, and automated verification loops
Mike predicts developers shift from primarily writing code to delegating work to agents and reviewing outputs. The bottleneck becomes scalable verification—security, correctness, UI testing—supported by AI-assisted analysis and multi-agent checks.
- •Skills shift: multidisciplinary product thinking + delegating effectively to agents
- •Code review changes when much code is AI-generated; idioms/patterns matter
- •Need better model learning from codebases + review feedback
- •Future workflow: agent proposes approaches, tests in-browser, scans for vulns, escalates decisions
- 48:32 – 1:02:41
API vs consumer balance—and speeding up: org design, abstractions beyond tokens, and rebuilds
They close with how Anthropic balances API and consumer products, and how to increase iteration speed. Mike emphasizes first-party learning velocity, building higher-level API abstractions (planning, tool use, memory), and removing organizational calcification to ship faster; quick-fire then covers competitive comparisons, privacy/agent trust, Europe’s role, and AI for longevity.
- •First-party products teach faster; APIs provide distribution and ecosystem leverage
- •API roadmap: abstractions beyond tokens (planning, tool use, memory, knowledge graphs)
- •Speed gains: break org boundaries, form ‘right people’ squads, reduce bureaucracy
- •Quick-fire themes: OpenAI ships V1s faster; Anthropic aims for cohesive personality; key risk is privacy/discernment with agent-to-agent systems