Skip to content
YC Root AccessYC Root Access

A New Approach To AI Models

During last month’s NeurIPS 2025 conference, YC’s Ankit Gupta sat down with Karan Goel, founder and CEO of Cartesia, to explain why today’s AI architectures may be fundamentally limited. They discuss why transformers behave more like retrieval systems than learning systems, how state space models enable compression and abstraction, and why multimodal intelligence may require a whole new approach. The conversation also covers why Cartesia chose AI voice as a wedge product, and how research-driven companies can balance deep technical bets with real-world product discipline. Apply to Y Combinator: https://www.ycombinator.com/apply Work at a startup: https://www.ycombinator.com/jobs Chapters: 00:11 — Introducing Cartesia 00:26 — From Architecture Research to Startup 01:20 — What “Architecture Research” Really Means 02:18 — Why Transformers Hit a Ceiling 03:33 — State Space Models Explained 04:21 — Intelligence as Compression 05:47 — Retrieval vs. Abstraction 06:41 — Hybrid Architectures and the Future 07:13 — Why Cartesia Chose Voice AI 08:25 — What Multimodality Actually Means 09:20 — Audio as a Recipe for Other Modalities 10:09 — Tokens, Representations, and Learning Signals 11:37 — Learning Representations End-to-End 12:29 — Building for the “Average Human” 13:54 — Research vs. Product Reality 15:18 — One Vision, Ruthlessly Executed 16:28 — Product as a Truth Serum for Research 17:25 — Startup Gravity Applies to Research Too

Ankit GuptahostKaran Goelguest
Jan 9, 202618mWatch on YouTube ↗

Episode Details

EPISODE INFO

Released
January 9, 2026
Duration
18m
Channel
YC Root Access
Watch on YouTube
▶ Open ↗

EPISODE DESCRIPTION

During last month’s NeurIPS 2025 conference, YC’s Ankit Gupta sat down with Karan Goel, founder and CEO of Cartesia, to explain why today’s AI architectures may be fundamentally limited. They discuss why transformers behave more like retrieval systems than learning systems, how state space models enable compression and abstraction, and why multimodal intelligence may require a whole new approach. The conversation also covers why Cartesia chose AI voice as a wedge product, and how research-driven companies can balance deep technical bets with real-world product discipline. Apply to Y Combinator: https://www.ycombinator.com/apply Work at a startup: https://www.ycombinator.com/jobs Chapters: 00:11 — Introducing Cartesia 00:26 — From Architecture Research to Startup 01:20 — What “Architecture Research” Really Means 02:18 — Why Transformers Hit a Ceiling 03:33 — State Space Models Explained 04:21 — Intelligence as Compression 05:47 — Retrieval vs. Abstraction 06:41 — Hybrid Architectures and the Future 07:13 — Why Cartesia Chose Voice AI 08:25 — What Multimodality Actually Means 09:20 — Audio as a Recipe for Other Modalities 10:09 — Tokens, Representations, and Learning Signals 11:37 — Learning Representations End-to-End 12:29 — Building for the “Average Human” 13:54 — Research vs. Product Reality 15:18 — One Vision, Ruthlessly Executed 16:28 — Product as a Truth Serum for Research 17:25 — Startup Gravity Applies to Research Too

SPEAKERS

  • Ankit Gupta

    host

    YC host/interviewer.

  • Karan Goel

    guest

    CEO of Cartesia and former Stanford (Chris Ré lab) researcher.

EPISODE SUMMARY

In this episode of YC Root Access, featuring Ankit Gupta and Karan Goel, A New Approach To AI Models explores cartesia’s bet: beyond transformers via compression-driven multimodal architectures for voice Cartesia was founded by Stanford PhD researchers to commercialize “architecture research,” not just scale existing transformer recipes.

RELATED EPISODES

Senator Scott Wiener Press Conference at YC

Senator Scott Wiener Press Conference at YC

Making Every Supermarket in America Autonomous

Making Every Supermarket in America Autonomous

From Zapier for Devs to Powering 90% AI Agents

From Zapier for Devs to Powering 90% AI Agents

The App That Changed How Engineers Ship Code

The App That Changed How Engineers Ship Code

Lecture 11 - Hiring and Culture, Part 2 (Patrick and John Collison, Ben Silbermann)

Lecture 11 - Hiring and Culture, Part 2 (Patrick and John Collison, Ben Silbermann)

Lecture 16 - How to Run a User Interview (Emmett Shear)

Lecture 16 - How to Run a User Interview (Emmett Shear)

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome