Skip to content
No PriorsNo Priors

No Priors Ep. 15 | With Kelvin Guu, Staff Research Scientist, Google Brain

How do you personalize AI models? A popular school of thought in AI is to just dump all the data you need into pre-training or fine tuning. But that's costly and less controllable than using AI models as a reasoning engine against an external data source, and thus the intersection of retrieval with LLMs has become an increasingly interesting topic. Kelvin Guu, Staff Research Scientist at Google, wants to make machine learning cheaper, easier, and more accessible. Kelvin joins Sarah and Elad this week to talk about the newer methods his team is working on in machine learning, training, and language understanding. He has completed some of the earliest work on retrieval-augmented language models (REALM) and training LLMs to follow instructions (FLAN). 00:00 - Introduction 01:44 - Kelvin’s background in math, statistics and natural language processing at Stanford 03:24 - The questions driving the REALM Paper 07:08 - Frameworks around retrieval augmentation & expert models 10:16 - Why is modularity important 11:36 - FLAN Paper and instruction following 13:28 - Updating model weights in real time and other continuous learning methods 15:08 - Simfluence Paper & explainability with large language models 18:11 - ROME paper, “Model Surgery” exciting research areas 19:51 - Personal opinions and thoughts on AI agents & research 24:59 - How the human brain compares to AGI regarding memory and emotions 28:08 - How models become more contextually available 30:45 - Accessibility of models 33:47 - Advice to future researchers

Sarah GuohostKelvin GuuguestElad Gilhost
May 4, 202337mWatch on YouTube ↗

At a glance

WHAT IT’S REALLY ABOUT

Google’s Kelvin Guu on Retrieval, Modularity, Memory, and Future AI

  1. Kelvin Guu, a staff research scientist at Google Brain, discusses the evolution from pre-trained language models like BERT to retrieval-augmented models (REALM), mixture-of-experts architectures, and instruction-tuned systems such as FLAN. He explains why modularity and retrieval are increasingly important—especially for personalization, enterprise adaptation, and up-to-date knowledge—alongside emerging techniques like prompt tuning, model surgery, and training data attribution (Simfluance). The conversation explores limitations of current agents, challenges in memory, safety, and hallucinations, and how neuroscience-inspired ideas like fear, chunking, and consolidation might inform AI design. Guu closes with thoughts on open training, “family values” alignment, and advice for future researchers in a world where technical execution is increasingly assisted by AI and creativity and problem formulation become key differentiators.

IDEAS WORTH REMEMBERING

5 ideas

Use retrieval-augmentation when you need modular, updatable, or private knowledge.

Dense models can memorize widely repeated web facts as they scale, but retrieval (like REALM) is best for plugging in proprietary, personal, or frequently changing information without retraining the entire model.

Choose between retrieval and mixture-of-experts based on granularity of adaptation.

Retrieval excels at precise, factoid questions (e.g., “What’s my Wi-Fi password?”), whereas mixture-of-experts models shine when adapting to a domain’s full style or codebase where isolated documents aren’t enough to teach the domain’s ‘language.’

Instruction tuning turns one model into many tasks without bespoke fine-tuning.

By training on around a hundred labeled tasks with natural language instructions (as in FLAN), models learn to follow new, unseen instructions at inference time—enabling systems like InstructGPT/ChatGPT and drastically widening who can use LLMs productively.

Continuous learning in production is less common than expected due to validation costs.

Live weight-updating systems are hard to monitor safely; in practice, many teams prefer weekly or monthly releases and lightweight methods like prompt tuning, where only a small vectorized “prompt” is updated while the main model stays frozen.

Training data attribution can guide better datasets and clarify what models really know.

Simfluance approximates the effect of removing individual training examples without retraining from scratch, helping developers see which data actually shaped behavior and whether impressive outputs reflect real generalization or hidden near-duplicates in the corpus.

WORDS WORTH SAVING

5 quotes

“One of the things that became very apparent early on when playing with BERT was, unlike all the prior generations of models, it had a large amount of world knowledge that we didn't deliberately encode into it.”

Kelvin Guu

“If you just want to get very precise factoid information like what is my wifi password, retrieval augmentation is going to be very good. But… an expert that's been trained on [a domain] is going to be able to adapt more quickly.”

Kelvin Guu

“You can't just tell the model, ‘Please don't make things up,’ and expect that to just solve the problem.”

Kelvin Guu

“Without being able to track [training data] down, we'll never know if these models are generalizing or just kind of cleverly patching together what they know.”

Kelvin Guu

“It seems like the differentiating factor now shifts more to problem formulation or creativity in identifying problems and being able to frame them in a way that you can then reduce them to a technical problem.”

Kelvin Guu

Kelvin Guu’s background and Google Brain’s role in language model researchRetrieval-augmented language models (REALM) and their motivationsModularity in AI: retrieval vs mixture-of-experts and enterprise adaptationInstruction tuning and FLAN: training models to follow natural language instructionsContinuous adaptation techniques: prompt tuning, model updates, and live learningTraining data attribution and Simfluance for understanding what examples matterMemory, model surgery (ROME), autonomous agents, and brain-inspired design for future AI

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome