Skip to content
YC Root AccessYC Root Access

A New Approach To AI Models

During last month’s NeurIPS 2025 conference, YC’s Ankit Gupta sat down with Karan Goel, founder and CEO of Cartesia, to explain why today’s AI architectures may be fundamentally limited. They discuss why transformers behave more like retrieval systems than learning systems, how state space models enable compression and abstraction, and why multimodal intelligence may require a whole new approach. The conversation also covers why Cartesia chose AI voice as a wedge product, and how research-driven companies can balance deep technical bets with real-world product discipline. Apply to Y Combinator: https://www.ycombinator.com/apply Work at a startup: https://www.ycombinator.com/jobs Chapters: 00:11 — Introducing Cartesia 00:26 — From Architecture Research to Startup 01:20 — What “Architecture Research” Really Means 02:18 — Why Transformers Hit a Ceiling 03:33 — State Space Models Explained 04:21 — Intelligence as Compression 05:47 — Retrieval vs. Abstraction 06:41 — Hybrid Architectures and the Future 07:13 — Why Cartesia Chose Voice AI 08:25 — What Multimodality Actually Means 09:20 — Audio as a Recipe for Other Modalities 10:09 — Tokens, Representations, and Learning Signals 11:37 — Learning Representations End-to-End 12:29 — Building for the “Average Human” 13:54 — Research vs. Product Reality 15:18 — One Vision, Ruthlessly Executed 16:28 — Product as a Truth Serum for Research 17:25 — Startup Gravity Applies to Research Too

Ankit GuptahostKaran Goelguest
Jan 9, 202618mWatch on YouTube ↗

Episode Details

EPISODE INFO

Released
January 9, 2026
Duration
18m
Channel
YC Root Access
Watch on YouTube
▶ Open ↗

EPISODE DESCRIPTION

During last month’s NeurIPS 2025 conference, YC’s Ankit Gupta sat down with Karan Goel, founder and CEO of Cartesia, to explain why today’s AI architectures may be fundamentally limited. They discuss why transformers behave more like retrieval systems than learning systems, how state space models enable compression and abstraction, and why multimodal intelligence may require a whole new approach. The conversation also covers why Cartesia chose AI voice as a wedge product, and how research-driven companies can balance deep technical bets with real-world product discipline. Apply to Y Combinator: https://www.ycombinator.com/apply Work at a startup: https://www.ycombinator.com/jobs Chapters: 00:11 — Introducing Cartesia 00:26 — From Architecture Research to Startup 01:20 — What “Architecture Research” Really Means 02:18 — Why Transformers Hit a Ceiling 03:33 — State Space Models Explained 04:21 — Intelligence as Compression 05:47 — Retrieval vs. Abstraction 06:41 — Hybrid Architectures and the Future 07:13 — Why Cartesia Chose Voice AI 08:25 — What Multimodality Actually Means 09:20 — Audio as a Recipe for Other Modalities 10:09 — Tokens, Representations, and Learning Signals 11:37 — Learning Representations End-to-End 12:29 — Building for the “Average Human” 13:54 — Research vs. Product Reality 15:18 — One Vision, Ruthlessly Executed 16:28 — Product as a Truth Serum for Research 17:25 — Startup Gravity Applies to Research Too

SPEAKERS

  • Ankit Gupta

    host

    YC host/interviewer.

  • Karan Goel

    guest

    CEO of Cartesia and former Stanford (Chris Ré lab) researcher.

EPISODE SUMMARY

In this episode of YC Root Access, featuring Ankit Gupta and Karan Goel, A New Approach To AI Models explores cartesia’s bet: beyond transformers via compression-driven multimodal architectures for voice Cartesia was founded by Stanford PhD researchers to commercialize “architecture research,” not just scale existing transformer recipes.

RELATED EPISODES

Senator Scott Wiener Press Conference at YC

Senator Scott Wiener Press Conference at YC

How to Build an Internal AI Agent That Evolves Itself

How to Build an Internal AI Agent That Evolves Itself

How to Give AI Agents Enough Context to Be Useful

How to Give AI Agents Enough Context to Be Useful

Circle CEO: 3 Things That Will Transform Stablecoins in 2027

Circle CEO: 3 Things That Will Transform Stablecoins in 2027

This $1.5 Trillion Industry Still Runs on Paper and Fax Machines

This $1.5 Trillion Industry Still Runs on Paper and Fax Machines

The Tool the Best Engineers Are Using Right Now

The Tool the Best Engineers Are Using Right Now

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.