The Twenty Minute VCCerebras CEO, Andrew Feldman on Why Raise $1BN and Delay the IPO & Why NVIDIA’s Worried About Growth
CHAPTERS
- 0:00 – 0:35
AI’s breakneck pace: demand uncertainty, depreciation, and power myths
Feldman opens with a high-level framing: AI compute demand is exploding so quickly that even 6–12 month forecasts are unreliable. He tees up two themes that recur throughout the episode—how to think about chip depreciation and why the US power constraint is often misunderstood.
- •AI demand is huge but highly uncertain even a year out
- •Depreciation depends on how much faster next-gen hardware becomes
- •The US has plenty of power; the mismatch is location and infrastructure
- •Market risk rises when investors underestimate it
- •Talent is expensive, but exceptional people are worth it
- 0:35 – 4:29
Why Cerebras raised $1B (and why Fidelity matters) instead of rushing the IPO
Harry presses on Cerebras’ newly announced $1B round. Feldman explains the strategic value of late-stage capital, the signaling effect of top public-market investors like Fidelity, and why a pre-IPO round can be the rational move while opportunities are abundant.
- •$1B round positioned as largest in category, high valuation, premier investor set
- •Fidelity’s leadership provides a strong Wall Street signal for eventual IPO
- •Capital enables manufacturing scale-up and new data center expansion
- •Pre-IPO round can be non-distracting if done quickly
- •Big progress requires real engineering, not superficial “bit-width” marketing gains
- 4:29 – 9:19
Compute deal hype and ‘options on the future’: reading the fine print in AI announcements
Feldman argues that many blockbuster AI infrastructure announcements are intentionally vague and spread over long time horizons. He suggests interpreting them as options—organizations paying a premium today to secure future capacity in an unknowable demand environment.
- •Large AI ‘commitments’ often include escape hatches: “up to X over Y years”
- •Little accountability exists for claimed jobs/buildouts without tracking
- •Customers themselves can’t forecast demand precisely (orders vary by multiples)
- •In uncertainty, companies buy options on future compute capacity
- •Planning becomes about adaptable rules and frequent replanning, not fixed forecasts
- 9:19 – 10:51
Is the boom sustainable—and what it implies for the economy
Harry questions sustainability; Feldman counters that while perpetual hypergrowth is mathematically impossible for any one firm, AI is likely to reorganize the economy and boost productivity. The real question is not whether growth slows, but how the economy transforms around AI.
- •Skepticism is common, but betting only on incumbents offers little upside
- •NVIDIA can’t compound at current rates indefinitely, but AI impact can still be massive
- •AI may expand the economic pie via productivity gains
- •The pace of new ideas and valuations reflects extraordinary momentum
- •Expect a different economic structure within years if AI diffusion continues
- 10:51 – 15:37
Why Feldman thinks NVIDIA is ‘using the balance sheet’ and pre-announcing the future
Feldman claims some large-company behaviors signal anxiety about sustaining growth: buying demand via investments and aggressively pre-announcing new products. He also critiques marketing narratives and highlights operational realities like failure rates and delivery timing.
- •Big firms may ‘buy business’ (strategic investments) rather than win purely on tech
- •Predatory pre-announcements can freeze buyers (B300/Rubin before B200 availability)
- •Opaque mega-deals can be structured to resist clean analysis
- •Operational issues (e.g., field failure rates) may be downplayed in messaging
- •Net effect: shape customer expectations and delay switching decisions
- 15:37 – 18:00
Chip depreciation: the real question is generational speed-up (not a fixed 18–24 month rule)
Harry asks how to amortize chips; Feldman says simplistic timelines fail because older chips still produce value. He reframes depreciation as an economic decision driven by how dramatically a new generation improves performance-per-watt and system-level throughput.
- •H100s and even A100s can remain valuable beyond 2 years depending on use case
- •Depreciation triggers when new hardware makes old hardware uneconomic to run
- •Key driver: performance gains and power efficiency at the solution level
- •If generational improvements slow, hardware lifetimes extend
- •Data center economics depend on dollars-per-megawatt, not just chip MSRP
- 18:00 – 19:43
Performance reality vs marketing: memory bandwidth and system bottlenecks dominate inference
Feldman argues that headline FLOPS are misleading when memory and data movement constrain real throughput. He claims meaningful gen-over-gen gains are closer to ~2–2.5x when comparing apples-to-apples, and that inference on GPU architectures is often memory-limited.
- •Marketing claims can overstate progress; benchmarks need apples-to-apples comparisons
- •System performance matters more than raw chip FLOPS
- •Memory bandwidth can become the binding constraint as compute scales
- •For inference, GPU architecture is fundamentally limited by memory behavior
- •Bottlenecks shift: speeding one component just moves the constraint elsewhere
- 19:43 – 23:59
Why Cerebras went wafer-scale: SRAM speed, capacity limits, and avoiding ‘thousands of chips’ sprawl
A technical segment on SRAM vs HBM/DRAM tradeoffs leads to Cerebras’ wafer-scale thesis: use vastly larger silicon area to pack in fast SRAM at high capacity. Feldman argues this reduces the complexity of massive multi-chip clusters needed for trillion-parameter models.
- •SRAM is extremely fast but traditionally low capacity; HBM/DRAM is higher capacity but slower
- •GPUs chose memory optimized for graphics-era assumptions
- •Normal chips with SRAM require thousands of devices for huge models, creating networking/cabling pain
- •Wafer-scale increases die area to ‘stuff’ large SRAM capacity on-chip
- •Fewer systems simplifies features like speculative decoding and improves operational manageability
- 23:59 – 26:16
Training vs inference: speed claims, software friction, and why inference switches faster
Harry proposes a market split (GPUs for training, Cerebras for inference); Feldman disputes this on performance but acknowledges training has heavier software migration costs. He explains why inference adoption is easier: customers want an API rather than rebuilding training recipes tied to GPUs.
- •Feldman claims Cerebras is faster on both training and inference, but training lift is real
- •Training recipes are typically GPU-first; porting to new hardware is nontrivial
- •Inference is easier to switch because users care about an API, not CUDA/PyTorch details
- •Inference demand is broader than training demand, accelerating market growth
- •Demonstrating training superiority requires long, costly, apples-to-apples cluster runs
- 26:16 – 30:49
Inference demand goes exponential: three multiplying drivers and a ‘reorganization around AI’ thesis
Feldman breaks inference growth into three factors—user count, usage frequency, and compute per use—arguing all are rising quickly. He connects this to a broader historical analogy: productivity gains arrive when societies reorganize around a new general-purpose technology, not when they merely substitute old workflows.
- •Inference market size = users × frequency × compute per interaction
- •All three variables are growing rapidly, creating compounding effects
- •ChatGPT as ‘Google replacement’ yields modest gains; ‘operating system’ usage yields larger gains
- •Historical parallels: electricity and computers produced big productivity jumps only after reorganization
- •Younger cohorts adopt AI in fundamentally new ways, accelerating structural change
- 30:49 – 36:43
Energy for AI: feasible, but the real issue is siting, grids, fiber—and social value delivered per megawatt
Harry challenges feasibility given trillion-dollar buildouts and nation-scale power needs. Feldman says it’s feasible, but emphasizes the problem is geographic mismatch (power vs people/fiber) and argues the industry has an obligation to produce commensurate societal value if it consumes massive energy.
- •US power scarcity is often misframed; the issue is power location and transmission
- •Data centers need proximity to people, fiber, and facilities—not just generation
- •Feasibility differs from desirability; societal benefit must justify consumption
- •AI should deliver major outcomes (healthcare, drugs, aging support) to earn legitimacy
- •Controlling low-value uses of compute is difficult and unresolved
- 36:43 – 39:27
Mag 7 concentration and market risk: the danger is investors misreading ‘diversification’
Feldman addresses concerns about equity-market concentration in the largest tech firms. He argues the core risk isn’t that these companies are valuable, but that investors may underestimate exposure by assuming index investing is broadly diversified when it is increasingly sector-concentrated.
- •Concentration itself isn’t automatically bad if the future economy rewards it
- •Risk emerges when people underestimate risk and misprice it
- •The S&P can be mistaken for a proxy of the broader economy when it’s highly concentrated
- •Index holders may be inadvertently taking heavy sector bets
- •NVIDIA’s valuation may be hard to pin down, but its achievements justify a very large number
- 39:27 – 47:08
Bottlenecks: talent, fabs, and data center capacity—plus why many data center investors will lose money
The conversation turns to constraints on meeting demand: shortages of AI practitioners, fab capacity limits at TSMC/Samsung, and lagging data center buildouts. Feldman also explains why the ‘data centers are like bonds’ narrative can mislead and how cost, permitting, and tenant risk can destroy returns.
- •Top bottleneck: AI expertise (practitioners, data pipeline talent) and immigration friction
- •TSMC/Samsung can’t build $30–$50B fabs fast enough, constraining chip supply and keeping costs high
- •Data center capacity is behind announcements; gigawatt projects take 6–18+ months
- •Wall Street likes data centers because cash flows resemble real estate/bonds
- •Many projects fail by overpaying per MW, mismanaging permitting/construction, or failing to secure strong tenants
- 47:08 – 52:56
Vertical integration and custom chips: why software companies often fail at silicon
Harry asks whether OpenAI/Anthropic should build chips to control destiny. Feldman argues chip building is culturally and operationally incompatible with software cadence, noting that successes often come via acquisitions and long time horizons, and that many giants historically missed key markets despite resources.
- •OpenAI/Anthropic have succeeded without full vertical integration so far
- •Long history of software companies failing to build chips internally
- •Chip development demands ‘measure twice, cut once’—bugs cost months and tens of millions
- •Successful hyperscaler chip efforts often start with acquisitions (Apple, Amazon)
- •Even industry leaders have missed major compute shifts, suggesting deep, non-obvious execution risk
- 52:56 – 1:01:37
Margins, sovereignty narratives, and geopolitics: competing with NVIDIA while navigating China and policy
Feldman ties Cerebras’ fundraising strength to positive margins and contrasts it with competitors. He discusses NVIDIA’s extraordinary pricing power as the driver for custom silicon, touches on sovereignty positioning (e.g., Mistral), and gives a nuanced view on China—preferring peaceful engagement but advocating US improvements in infrastructure, permitting, and university compute access.
- •Strong margins matter for late-stage credibility and public-market readiness
- •NVIDIA’s gross margins are described as historically extreme, motivating customers to seek alternatives
- •Sovereignty can be a viable go-to-market advantage when paired with superior performance
- •On China: arms-race framing is harmful, but China is aggressive and capable (drones/robots/AI policy)
- •US needs faster infrastructure/permitting, sustained talent inflow, and more university compute
- 1:01:37 – 1:11:54
Quick-fire: Middle East peace, UAE revenue concentration, wafer-scale ‘almost failed,’ and where investors get burned
In rapid Q&A, Feldman predicts Middle East peace and reveals heavy revenue concentration in the UAE driven by massive early orders that consumed Cerebras’ capacity. He recounts the precarious 2017–2019 period when wafer-scale repeatedly failed before finally working, and he warns investors about backing inexperienced chip CEOs while urging more investment in unsexy data plumbing.
- •Belief: moderation and economic incentives can drive peace in the Middle East
- •Cerebras revenue was heavily UAE-weighted (per S-1 timeframe), driven by outsized orders
- •Wafer-scale success followed ~15 months of failures and a high burn rate
- •Investors may ‘lose their shirt’ backing very young CEOs in silicon due to relationship/execution complexity
- •Underinvestment: data cleaning, pipelines, and tokenization—projects often fail outside the model itself
- 1:11:54 – 1:20:12
Closing reflections: AI’s near-term impact, education disruption, and Feldman’s ‘David vs Goliath’ motivation
Feldman pushes back on claims of near-term labor shortages, arguing diffusion and dislocation take time, citing AlphaFold’s lag from breakthrough to tangible outcomes. He predicts major changes in education and entry-level work, then closes with a personal philosophy: he’s motivated by competing daily against dominant incumbents and winning through invention and execution.
- •AI diffusion will ‘nibble in’ rather than instantly transform labor markets in 3–5 years
- •AlphaFold illustrates time lag between breakthrough and downstream societal outcomes
- •AI can personalize education by diagnosing error patterns and adapting instruction at scale
- •Entry-level work (spreadsheets, summaries) will be reshaped; better early-career learning is possible
- •Feldman frames his career as choosing hard fights against incumbents (Cisco, now NVIDIA)