No Priors Ep. 67 | With Voyage AI Co-Founder and CEO

Name: No Priors Ep. 67 | With Voyage AI Co-Founder and CEO
Uploaded: 2024-06-06T12:00:00Z
Duration: 36 min 20 s
Description: Stanford professor and Voyage AI co-founder Tengyu Ma discusses his research journey from matrix completion and early sentence embeddings to modern contrastive learning, LLM optimizers, and domain-specific retrieval systems.

No PriorsJun 6, 202436m

Sarah Guo (host), Tengyu Ma (guest)

Tengyu Ma’s research trajectory: theory, embeddings, contrastive learning, LLM optimizersFounding Voyage AI and the commercialization timing for foundation modelsDefinition, architecture, and real-world applications of Retrieval-Augmented Generation (RAG)RAG vs. long-context LLMs vs. agent-chaining as architectures for proprietary dataImproving retrieval quality: embeddings, re-rankers, chunking, and software heuristicsDomain-specific and company-specific embedding fine-tuning and latency constraintsThe role of academia in AI: efficiency, reasoning, and long-term breakthroughs

In this episode of No Priors, featuring Sarah Guo and Tengyu Ma, No Priors Ep. 67 | With Voyage AI Co-Founder and CEO explores voyage CEO Explains Why RAG Beats Long Context For Enterprise AI Stanford professor and Voyage AI co-founder Tengyu Ma discusses his research journey from matrix completion and early sentence embeddings to modern contrastive learning, LLM optimizers, and domain-specific retrieval systems.

Voyage CEO Explains Why RAG Beats Long Context For Enterprise AI

Stanford professor and Voyage AI co-founder Tengyu Ma discusses his research journey from matrix completion and early sentence embeddings to modern contrastive learning, LLM optimizers, and domain-specific retrieval systems.

He explains Retrieval-Augmented Generation (RAG), why retrieval quality is now the main bottleneck for enterprise AI, and argues that RAG will remain more cost-efficient and practical than ultra-long-context models or pure agent-chaining approaches.

Ma outlines Voyage’s focus on high-quality embeddings and re-rankers, domain- and company-specific fine-tuning, and reducing latency through compact, efficient models and optimizers like Sofia.

He closes with reflections on founding a startup as an academic and what academia’s role should be in longer-horizon AI research on efficiency and reasoning, rather than competing directly on scale with industry labs.

Key Takeaways

RAG will likely remain more cost-efficient than long-context LLMs for enterprise data.

Ma argues that stuffing entire corporate knowledge bases (often 100M+ tokens) into context is orders of magnitude more expensive than selective retrieval, and even with cheaper compute, RAG’s neural components will also get cheaper, preserving the cost advantage.

Get the full analysis with uListen AI

Retrieval quality, not wiring up RAG, is now the main bottleneck.

Connecting an LLM, vector database, and basic RAG pipeline is easy; the hard part is ensuring the retrieved documents are highly relevant so the LLM can answer accurately and reduce hallucinations.

Get the full analysis with uListen AI

High-quality, domain-specific embeddings materially improve retrieval performance.

Voyage pre-trains general embeddings, then fine-tunes on massive domain corpora (e. ...

Get the full analysis with uListen AI

Efficiency constraints force embedding models to be specialized and compact.

Because production systems have tight latency budgets (often 50–200 ms), embedding models can’t be arbitrarily large; Ma emphasizes using limited parameters and lower-dimensional embeddings to maximize domain performance and keep vector search fast.

Get the full analysis with uListen AI

Agent-chaining is complementary to RAG, not a replacement.

Ma frames agent systems as multi-step pipelines where both LLMs and embeddings participate; even sophisticated agents will still rely on embedding-based retrieval for efficiency rather than being managed solely by large LLMs.

Get the full analysis with uListen AI

RAG stacks will simplify as neural components become more capable.

Ma predicts future systems will mostly comprise a strong LLM, an embedding model, a reranker, and a vector DB, with far less need for manual chunking, format normalization, or multimodal hacks as models natively handle long context and various data types.

Get the full analysis with uListen AI

Academia’s comparative advantage is long-horizon work on efficiency and reasoning.

Given capital constraints, Ma believes universities should pursue 3–5 year breakthroughs (e. ...

Get the full analysis with uListen AI

Notable Quotes

““The bottleneck seems to be the quality of the response, and the quality of the response is almost bottlenecked by the quality of the retrieval part.””
— Tengyu Ma

““Why would you go through the entire library every time to answer a single question?””
— Tengyu Ma

““My prediction is that RAG will be much cheaper than long context going forward.””
— Tengyu Ma

““You only have a limited number of parameters… there’s no way that you can use these to excel in everything, so that’s why you have to specialize in one domain.””
— Tengyu Ma

““My vision is that in the future, AI will just be a very simple software engineering layer on top of a few very strong neural network components.””
— Tengyu Ma

Questions Answered in This Episode

Under what conditions, if any, could long-context LLMs become economically competitive with RAG for large proprietary corpora?

Get the full analysis with uListen AI

How should an enterprise rigorously measure whether its current bottleneck is retrieval quality, LLM quality, or prompt design?

Get the full analysis with uListen AI

What are the trade-offs between domain-specific embeddings vs. company-specific fine-tuning when data sensitivity and privacy are concerns?

Ma outlines Voyage’s focus on high-quality embeddings and re-rankers, domain- and company-specific fine-tuning, and reducing latency through compact, efficient models and optimizers like Sofia.

Get the full analysis with uListen AI

How might advances in multi-modal embeddings change how we store and retrieve code, images, audio, and video in RAG systems?

Get the full analysis with uListen AI

What kinds of reasoning benchmarks would meaningfully demonstrate that improvements in optimizers or embeddings are enabling qualitatively new capabilities rather than just incremental gains?

Get the full analysis with uListen AI

Transcript Preview

Sarah Guo

Welcome to No Priors. Today, we're talking to Tenggu Ma, assistant professor of computer science at Stanford, and the co-founder and CEO of Voyage. Voyage trains state-of-the-art components for next generation retrieval systems, including embeddings models and re-rankers. We're really excited to talk about his research and the RAG debate today. Uh, welcome, Tenggu.

Tengyu Ma

Yeah. Thanks so much. Thanks for having me here. Looking forward to the debate.

Sarah Guo

Yeah. W- why don't we start with, um, just a- a little bit of an overview of y- like, your research agenda to date, because I- I think uniquely, it covers, um, a broad range of fields, uh, within and around deep learning from, like, theory to RL to embeddings and optimizers. So can you talk a little bit about, um, sort of how you picked the directions you have?

Tengyu Ma

Yeah. So, um, I think most of the papers I wrote, uh, have some theoretical thinking in it. I guess maybe that's the commonality. And, um, um, be- besides that, I think I worked on, uh, quite a few topics, as you mentioned, ranging from the theoretical understanding, mathematical proofs of deep learning systems, um, to, um, all the way to practical large language models, reinforcement learning, um, deep reinforcement learning. And these days, recently, I think what we are working on, um, more centralized to, um, efficiency of training the large language models and reasoning, improving the reasoning tasks for large language models. So I- my vision is that in the future, uh, the efficiency is very important because, um, we are running out of data and compute, so we have to either use the data much better and use the compute much better. Um, and also reasoning tasks seems to be a pretty, uh, important direction. And, um, um, uh, and also in some sense, kind of like a- a risky direction in the sense that we don't know exactly how, um, how fast we can, uh, solve those challenging reasoning questions yet.

Sarah Guo

Mm-hmm. Um, can you, uh, mention a few of the, like, key papers or work that, uh, you know, you or students in your lab have done, just so our listeners can look them up?

Tengyu Ma

In the very, uh, early days, I think I worked on some of this matrix completion, optimization for matrix completion. That's like 10 years ago. Um, and then I move on to, uh, embedding models, like sentence embeddings, vector embeddings. One of the papers we wrote is a ve- actually simple paper where we average the word embeddings to get sentence embeddings, and then we did some, uh, of these transformations using PCA to make the performance much better. Um, that was even before transformer came out. Um, and then I think I move on to, uh, transformers, large language models, um, and contrastive learning, which is the new way of training the embedding models. Um, uh, especially the- the, um- um, the direction started with some of the papers on, uh, using contrastive learning for images, and we work on improving those and understanding why contrastive learning can work. Uh, and recently, we work on, uh, optimizers for large language models. For example, one of the paper, uh, we wrote last year was Sofia, uh, which we found, where we found that we have a new tran- optimizer which can improve the training efficiency by 2X for pre-training.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome