Nebius Co-Founder on AI Infrastructure Bubbles | How Price Elastic is Demand for Compute

Roman Chernin is Co-Founder and Chief Business Officer of Nebius, one of the fastest-growing AI infrastructure companies in the world. Today, Nebius operates some of the largest AI compute clusters globally and serves leading AI labs, enterprises, and developers. Today, Nebius has a market cap of $57BN. ----------------------------------------------- Timestamps: 00:00 Intro 01:24 Why AI Infrastructure Is Not a Bubble 04:11 The Real Impact of Open Source on OpenAI & Anthropic 11:03 Jevons Paradox: Why Cheaper AI Creates More Demand 13:06 The Four Layers of AI Infrastructure Explained 18:49 If Nebius Had 10x More Capacity Tomorrow 28:51 The Shift from Training to Inference and Agents 37:18 How Token Factory Cuts AI Costs by 70% 50:34 Sovereign AI, Europe, and the Future of Model Building 53:52 Competing Against Hyperscalers with 10x More Capital 01:08:46 The Biggest Threat to Nebius Isn't Competition ---------------------------------------------------------------------------------------------- Subscribe on Spotify: https://open.spotify.com/show/3j2KMcZTtgTNBKwtZBMHvl?si=85bc9196860e4466 Subscribe on Apple Podcasts: https://podcasts.apple.com/us/podcast/the-twenty-minute-vc-20vc-venture-capital-startup/id958230465 Follow Harry Stebbings on X: https://twitter.com/HarryStebbings Follow Roman Chernin on X: https://twitter.com/romanchernin Follow 20VC on Instagram: https://www.instagram.com/20vchq Follow 20VC on TikTok: https://www.tiktok.com/@20vc_tok Visit our Website: https://www.20vc.com Subscribe to our Newsletter: https://www.thetwentyminutevc.com/contact ----------------------------------------------- #20vc #harrystebbings #nebius #ai #founder #aimodels #gpu

Roman CherninguestHarry Stebbingshost

Jun 8, 20261h 14mWatch on YouTube ↗

CHAPTERS

0:00 – 4:12
Why AI infrastructure spend isn’t a bubble: adoption is still in “first percent” mode
Roman argues the current CapEx surge isn’t a bubble because real enterprise AI adoption is still nascent, with only a handful of use cases (like coding) recently proving they work at scale. He frames the moment as the start of “useful AI,” where most companies are only beginning to apply AI across workflows.
- •Defines “bubble” as mismatched long-term need vs short-term hype—and claims demand will be tens to hundreds of times larger
- •Coding as an early scalable use case; most others are still unproven or early
- •Enterprise AI adoption is minimal across most large companies
- •Belief that practical enterprise rollout alone implies massive future compute needs
4:12 – 9:17
Open source vs frontier models: specialization, tunability, and why it’s not a zero-sum game
The conversation shifts to whether open-source models will erode frontier model providers and infrastructure demand. Roman explains that teams often start with frontier APIs, then move to open-source/specialized models once they understand the use case and need better economics or control.
- •Common pattern: prototype on OpenAI/Anthropic, then specialize with open models at scale
- •Key advantage is trainability/tunability (post-training) to outperform general models for specific tasks
- •Frontier providers keep pushing into harder problems; open source fills downstream niches
- •Belief in a growing “total pie” with room for both frontier and specialized models
9:17 – 11:15
Jevons paradox in compute: cheaper intelligence drives more consumption (DeepSeek anecdote)
Roman uses the DeepSeek episode to illustrate that efficiency improvements can increase overall demand. Even as markets worried about lower compute needs, Nebius saw one of its best sales weeks because new economics unlocked production inference for more teams.
- •DeepSeek-driven fear caused stock volatility while sales surged
- •Cheaper inference makes previously uneconomic use cases viable at scale
- •Efficiency gains let teams attempt more complex tasks within the same budget
- •Compute demand can rise even when per-token or per-task cost drops
11:15 – 18:58
Nebius’s four-layer infrastructure stack: from megawatts to agents
Roman outlines Nebius’s product strategy as a layered stack evolving with customer needs: bare metal capacity, multi-tenant cloud, managed inference, and a potential future agentic optimization layer. Each layer expands the addressable customer base and changes the unit of value from megawatts to tokens to outcomes.
- •Layer 1: bare metal at scale (customers talk in megawatts)
- •Layer 2: multi-tenant cloud/IaaS (sold in effective GPU hours)
- •Layer 3: managed inference (sold in tokens; customers avoid cluster/tooling complexity)
- •Layer 4 (emerging): agentic workload optimization (optimize for outcome, reliability, and budget)
18:58 – 24:05
If Nebius had 10× capacity: demand exists, but portfolio and concentration matter
With more capacity, Roman says Nebius could sell significantly more, though not instantly; the core challenge becomes shaping a diversified demand portfolio across layers. They discuss managing customer concentration risk when serving hyperscalers and large labs versus broader inference customers.
- •10× capacity would be absorbed over time; key is balancing customer mix
- •Higher layers serve more customers: dozens (bare metal) → hundreds (cloud) → thousands (inference) → tens of thousands (agents)
- •Revenue concentration with mega customers is a strategic risk
- •Building full stack helps avoid being “just” a capacity provider
24:05 – 27:46
Pricing and elasticity: raising prices vs preserving customer unit economics (especially inference)
Harry probes whether constrained supply enables aggressive pricing; Roman notes Nebius raised prices recently yet still sees demand pressure. He emphasizes that inference economics are sensitive because inference is a recurring cost of goods sold for customers, and total cost depends heavily on system-level optimization—not just sticker GPU price.
- •Nebius raised pricing (~30%) and still sees pipeline pressure
- •Training is more one-off; inference is ongoing COGS where elasticity shows up faster
- •Nominal GPU price is less meaningful than effective TCO and reliability
- •Optimizations can change token economics by multiples, outweighing small price deltas
27:46 – 31:44
Product roadmap in the inference shift: new requirements beyond ‘training to inference’
Roman argues the training→inference shift is often discussed too abstractly; it brings concrete new requirements: inference platforms, fine-tuning workflows, data collection, and continuous improvement flywheels. A major theme is lowering the barrier so non-researchers can build AI products without mastering infrastructure and inference engineering.
- •Inference shift requires orchestration, monitoring, and production-grade tooling
- •Customers increasingly fine-tune open models and need end-to-end support
- •Data flywheel: run inference → generate/observe data → improve model/app → repeat
- •Platforms win by hiding infrastructure/AI complexity for product builders
31:44 – 34:25
Differentiation vs other ‘neo clouds’: full-stack down + full-stack up (and enterprise readiness)
Asked to compare with peers like CoreWeave, Roman frames Nebius’s differentiation as vertical integration both downward (data centers, racks, servers, platform control) and upward (serving higher-layer needs of enterprises and product companies). He positions enterprise adoption as the next major wave requiring deeper platform and integration capabilities.
- •‘Full stack down’: deep physical buildout enables speed and cost efficiency
- •‘Full stack up’: products that match evolving customer segments and workloads
- •Goal: less customer concentration and broader reach than bare-metal-only plays
- •Enterprise needs: integration, legacy constraints, security, and platform tools
34:25 – 37:18
Managed inference explained: why Token Factory exists (and what it replaces)
Roman explains managed inference through a practical journey: teams build on OpenAI, then hit margin/control limits, try open-source weights, and discover production deployment is hard. Token Factory aims to provide the missing production-grade layer—reliability, orchestration, scaling, and optimized cost—while supporting both vanilla and fine-tuned models.
- •Common failure mode: open-source models look cheap until production complexity appears
- •Token Factory abstracts orchestration, scaling, caching, observability, and reliability
- •Supports running many open models and deploying customer-tuned weights
- •Designed for product companies that don’t want to manage inference stacks
37:18 – 41:36
How token costs drop: distillation, speculative decoding, caching—and staying current as models change fast
Roman breaks down the concrete levers behind ‘70% cheaper tokens’: model/system optimization techniques and better serving stacks. He also highlights the operational burden of frequent model releases, arguing platforms add value by continuously integrating, benchmarking, and enabling fast switching for specific use cases.
- •Cost reduction methods: distillation, speculative decoding, caching, and system tuning
- •Serving is a system problem; token economics depend on end-to-end implementation
- •Model landscape changes weekly; teams need rapid evaluation and migration paths
- •Specialized/niche models across modalities increase the need for optimized inference
41:36 – 50:34
Enterprise AI cold start: Revolut example, eval foundations, and the path to exponential usage
Roman describes how enterprises often start with closed models, then move to open models when economics and control matter—but progress depends on building evaluation and deployment foundations. Once enterprises establish metrics, evals, and CI/CD for AI, their usage can ramp rapidly, resembling AI-native growth curves.
- •Revolut began heavily on OpenAI; economics limited scaling in some use cases
- •Critical enterprise investments: eval frameworks, metrics, and safe deployment processes
- •AI CI/CD and experimentation engines enable faster model iteration without quality regressions
- •After the ‘foundation’ phase, AI consumption can grow exponentially
50:34 – 53:52
Sovereign AI in Europe: focus less on megawatts, more on builders and demand flywheels
On Europe’s relative lack of leading model labs, Roman supports the importance of regional capabilities but argues the debate overemphasizes power and data centers. He believes infrastructure follows demand; therefore, Europe should prioritize fostering builders—research labs and product companies—to create the flywheel that justifies and sustains compute buildout.
- •World fragmentation increases interest in regional/sovereign capability
- •Sovereign AI discussions are overly centered on megawatts vs builder ecosystems
- •Infrastructure buildout is pulled by demand created by strong local companies
- •Need more European ‘builders’ (labs and product startups) to generate durable demand
53:52 – 1:08:46
Power dynamics with NVIDIA, capital constraints, and real-world friction in building data centers
Roman frames the NVIDIA relationship as engineering-led: earn respect through execution and strong technical collaboration. He also details the practical constraints of scale—capital matters over 12–24 months, but near-term delivery is limited by physical timelines, permitting, and phased project portfolios.
- •NVIDIA partnerships improve when engineers respect and collaborate deeply
- •Capital can’t fix the next 6 months; it meaningfully accelerates 12–24 month horizons
- •Nebius manages a portfolio: secure land/power → build → then fill with GPUs
- •Execution bottlenecks include regulation, supply chain, permitting, and community engagement
1:08:46 – 1:14:30
The biggest threat isn’t competition: consolidation, plus a closing reflection on staying relevant
Roman concludes that the primary strategic risk is over-consolidation of AI into a few empires, which would reduce the need for diversified infrastructure platforms. He reiterates a relentless execution mindset—AI infrastructure is a ‘post-sale’ business where each deal and investor vote is a credit that must be earned through delivery.
- •Biggest threat: industry consolidation into a handful of dominant ‘super companies’
- •Nebius benefits from a diversified ecosystem of builders and customers
- •Cloud/infrastructure success depends on post-sale delivery and customer satisfaction
- •Cultural emphasis on constant execution: ‘keep moving’ to remain relevant

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Why AI infrastructure spend isn’t a bubble: adoption is still in “first percent” mode

Open source vs frontier models: specialization, tunability, and why it’s not a zero-sum game

Jevons paradox in compute: cheaper intelligence drives more consumption (DeepSeek anecdote)

Nebius’s four-layer infrastructure stack: from megawatts to agents

If Nebius had 10× capacity: demand exists, but portfolio and concentration matter

Pricing and elasticity: raising prices vs preserving customer unit economics (especially inference)

Product roadmap in the inference shift: new requirements beyond ‘training to inference’

Differentiation vs other ‘neo clouds’: full-stack down + full-stack up (and enterprise readiness)

Managed inference explained: why Token Factory exists (and what it replaces)

How token costs drop: distillation, speculative decoding, caching—and staying current as models change fast

Enterprise AI cold start: Revolut example, eval foundations, and the path to exponential usage

Sovereign AI in Europe: focus less on megawatts, more on builders and demand flywheels

Power dynamics with NVIDIA, capital constraints, and real-world friction in building data centers

The biggest threat isn’t competition: consolidation, plus a closing reflection on staying relevant

Get more out of YouTube videos.