No Priors Ep. 15 | With Kelvin Guu, Staff Research Scientist, Google Brain

No PriorsMay 4, 202337m

Sarah Guo (host), Kelvin Guu (guest), Elad Gil (host)

Kelvin Guu’s background and Google Brain’s role in language model researchRetrieval-augmented language models (REALM) and their motivationsModularity in AI: retrieval vs mixture-of-experts and enterprise adaptationInstruction tuning and FLAN: training models to follow natural language instructionsContinuous adaptation techniques: prompt tuning, model updates, and live learningTraining data attribution and Simfluance for understanding what examples matterMemory, model surgery (ROME), autonomous agents, and brain-inspired design for future AI

In this episode of No Priors, featuring Sarah Guo and Kelvin Guu, No Priors Ep. 15 | With Kelvin Guu, Staff Research Scientist, Google Brain explores google’s Kelvin Guu on Retrieval, Modularity, Memory, and Future AI Kelvin Guu, a staff research scientist at Google Brain, discusses the evolution from pre-trained language models like BERT to retrieval-augmented models (REALM), mixture-of-experts architectures, and instruction-tuned systems such as FLAN. He explains why modularity and retrieval are increasingly important—especially for personalization, enterprise adaptation, and up-to-date knowledge—alongside emerging techniques like prompt tuning, model surgery, and training data attribution (Simfluance). The conversation explores limitations of current agents, challenges in memory, safety, and hallucinations, and how neuroscience-inspired ideas like fear, chunking, and consolidation might inform AI design. Guu closes with thoughts on open training, “family values” alignment, and advice for future researchers in a world where technical execution is increasingly assisted by AI and creativity and problem formulation become key differentiators.

Google’s Kelvin Guu on Retrieval, Modularity, Memory, and Future AI

Kelvin Guu, a staff research scientist at Google Brain, discusses the evolution from pre-trained language models like BERT to retrieval-augmented models (REALM), mixture-of-experts architectures, and instruction-tuned systems such as FLAN. He explains why modularity and retrieval are increasingly important—especially for personalization, enterprise adaptation, and up-to-date knowledge—alongside emerging techniques like prompt tuning, model surgery, and training data attribution (Simfluance). The conversation explores limitations of current agents, challenges in memory, safety, and hallucinations, and how neuroscience-inspired ideas like fear, chunking, and consolidation might inform AI design. Guu closes with thoughts on open training, “family values” alignment, and advice for future researchers in a world where technical execution is increasingly assisted by AI and creativity and problem formulation become key differentiators.

Key Takeaways

Use retrieval-augmentation when you need modular, updatable, or private knowledge.

Dense models can memorize widely repeated web facts as they scale, but retrieval (like REALM) is best for plugging in proprietary, personal, or frequently changing information without retraining the entire model.

Get the full analysis with uListen AI

Choose between retrieval and mixture-of-experts based on granularity of adaptation.

Retrieval excels at precise, factoid questions (e. ...

Get the full analysis with uListen AI

Instruction tuning turns one model into many tasks without bespoke fine-tuning.

By training on around a hundred labeled tasks with natural language instructions (as in FLAN), models learn to follow new, unseen instructions at inference time—enabling systems like InstructGPT/ChatGPT and drastically widening who can use LLMs productively.

Get the full analysis with uListen AI

Continuous learning in production is less common than expected due to validation costs.

Live weight-updating systems are hard to monitor safely; in practice, many teams prefer weekly or monthly releases and lightweight methods like prompt tuning, where only a small vectorized “prompt” is updated while the main model stays frozen.

Get the full analysis with uListen AI

Training data attribution can guide better datasets and clarify what models really know.

Simfluance approximates the effect of removing individual training examples without retraining from scratch, helping developers see which data actually shaped behavior and whether impressive outputs reflect real generalization or hidden near-duplicates in the corpus.

Get the full analysis with uListen AI

Model surgery offers a new, centralized way to edit and propagate factual knowledge.

Work like ROME treats weight matrices as learned lookup tables, showing that editing specific weights can change a model’s beliefs (e. ...

Get the full analysis with uListen AI

Future agents need more than explicit reasoning; they need learned instincts and safeguards.

Today’s agents rely on brittle step-by-step prompting and external memories, lacking human-like mechanisms such as chunking, consolidation, and “fear” of irrecoverable states—areas Guu sees as crucial to robust, long-horizon, autonomous behavior.

Get the full analysis with uListen AI

Notable Quotes

““One of the things that became very apparent early on when playing with BERT was, unlike all the prior generations of models, it had a large amount of world knowledge that we didn't deliberately encode into it.””
— Kelvin Guu

““If you just want to get very precise factoid information like what is my wifi password, retrieval augmentation is going to be very good. But… an expert that's been trained on [a domain] is going to be able to adapt more quickly.””
— Kelvin Guu

““You can't just tell the model, ‘Please don't make things up,’ and expect that to just solve the problem.””
— Kelvin Guu

““Without being able to track [training data] down, we'll never know if these models are generalizing or just kind of cleverly patching together what they know.””
— Kelvin Guu

““It seems like the differentiating factor now shifts more to problem formulation or creativity in identifying problems and being able to frame them in a way that you can then reduce them to a technical problem.””
— Kelvin Guu

Questions Answered in This Episode

In practice, where is the break-even point between just scaling a dense model and adding retrieval for a real-world product?

Get the full analysis with uListen AI

How might we combine retrieval, mixture-of-experts, and model-surgery into a single coherent system for knowledge management and editing?

Get the full analysis with uListen AI

What concrete techniques could give agents human-like ‘instincts’—such as fear of irrecoverable states or automatic chunking—without hardcoding brittle rules?

Get the full analysis with uListen AI

How can Simfluance or similar training data attribution tools be integrated into standard ML pipelines to improve data quality and model safety by default?

Get the full analysis with uListen AI

As instruction-tuned models get better, what kinds of tasks will still truly require specialized fine-tuning or reinforcement learning rather than just better prompting?

Get the full analysis with uListen AI

Transcript Preview

Sarah Guo

(instrumental music) . Kelvin, welcome to No Priors.

Kelvin Guu

Hey, thanks for having me.

Sarah Guo

Let's do a little bit of background. You started out studying math, but pivoted to natural language. What encouraged that switch?

Kelvin Guu

Yeah. I actually had always wanted to get into tools that would help people learn more easily and find information more quickly. My motivation for going into math was really to build a good foundation for getting into many of those deeper questions, and I would say that also motivated my venturing into statistics. So, my PhD was actually in the statistics department, but I quickly then migrated over to the NLP group at Stanford where I just learned a lot and had a great time.

Sarah Guo

You chose to go to Google after your PhD. You've been there since 2018. Like, how has your focus or work changed during that time?

Kelvin Guu

Yeah. I would say at Google, this was really just the dream place for me to deepen my focus on building tools that help people find information better, and at the time that I joined, it was also just an extremely productive period where lots of new ideas were coming out. The group that I'm part of right now, which is now part of Google Brain, at the time was looking into very, at the time, early ideas on pre-training language models, which eventually became BERT, and that work just s- opened up a lot of frontiers for me, both in terms of product work at Google and inspiring new research ideas. I think one of the things that became very apparent early on when playing with BERT was, unlike all the prior generations of models, it had a large amount of world knowledge that we didn't deliberately encode into it. It wasn't in, you know, the fine-tuning data. It was all within pre-training, and that has just many, I think, very obvious use cases now, but that also inspired me to, and my colleagues, to start thinking about retrieval augmented models and getting even more knowledge into them, and I'm sure we'll be talking about that more here.

Sarah Guo

Yeah. Let's do that. Let's talk about REALM, which for our listeners was a really landmark paper in the field. In the paper and your talks about your model, you described a limitation of AI in domain knowledge or specialized knowledge that can be retrieved and represented more accurately. What motivated the paper?

Kelvin Guu

Sure. Yeah. So there are a few different things that can bring people to this retrieval augmented modeling literature. One of them was kind of our original goal, which was to increase the memorization capacity of these models. A second goal that you might come to this from is for modularity, so you might imagine that you have different data sources and you'd like to be able to swap one in or take one out, the same way you could do with a database, and there are just so many business applications where that's helpful. A third is anytime you're dealing with a very timely application, so there's new information arriving daily about, say, a sports team or any other type of event, you really want to be able to incorporate that quickly without retraining. So, these are some of the common ways that people arrive in this space, and it's a very natural thing, I think, now to think about, "Well, I don't want to retrain, and yet I've got all this information out there and it's human interpretable. It's text. How can I bring that in?" That's kind of what brought us to that in the first place. And since then, we've encountered many interesting challenges on top of that idea that I would say even in, in the sort of systems you see today, these tool using models that issue Google searches or provide citations, they still face some challenges in terms of fulfilling those original promises.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome