No PriorsNo Priors Ep. 15 | With Kelvin Guu, Staff Research Scientist, Google Brain
At a glance
WHAT IT’S REALLY ABOUT
Google’s Kelvin Guu on Retrieval, Modularity, Memory, and Future AI
- Kelvin Guu, a staff research scientist at Google Brain, discusses the evolution from pre-trained language models like BERT to retrieval-augmented models (REALM), mixture-of-experts architectures, and instruction-tuned systems such as FLAN. He explains why modularity and retrieval are increasingly important—especially for personalization, enterprise adaptation, and up-to-date knowledge—alongside emerging techniques like prompt tuning, model surgery, and training data attribution (Simfluance). The conversation explores limitations of current agents, challenges in memory, safety, and hallucinations, and how neuroscience-inspired ideas like fear, chunking, and consolidation might inform AI design. Guu closes with thoughts on open training, “family values” alignment, and advice for future researchers in a world where technical execution is increasingly assisted by AI and creativity and problem formulation become key differentiators.
IDEAS WORTH REMEMBERING
5 ideasUse retrieval-augmentation when you need modular, updatable, or private knowledge.
Dense models can memorize widely repeated web facts as they scale, but retrieval (like REALM) is best for plugging in proprietary, personal, or frequently changing information without retraining the entire model.
Choose between retrieval and mixture-of-experts based on granularity of adaptation.
Retrieval excels at precise, factoid questions (e.g., “What’s my Wi-Fi password?”), whereas mixture-of-experts models shine when adapting to a domain’s full style or codebase where isolated documents aren’t enough to teach the domain’s ‘language.’
Instruction tuning turns one model into many tasks without bespoke fine-tuning.
By training on around a hundred labeled tasks with natural language instructions (as in FLAN), models learn to follow new, unseen instructions at inference time—enabling systems like InstructGPT/ChatGPT and drastically widening who can use LLMs productively.
Continuous learning in production is less common than expected due to validation costs.
Live weight-updating systems are hard to monitor safely; in practice, many teams prefer weekly or monthly releases and lightweight methods like prompt tuning, where only a small vectorized “prompt” is updated while the main model stays frozen.
Training data attribution can guide better datasets and clarify what models really know.
Simfluance approximates the effect of removing individual training examples without retraining from scratch, helping developers see which data actually shaped behavior and whether impressive outputs reflect real generalization or hidden near-duplicates in the corpus.
WORDS WORTH SAVING
5 quotes“One of the things that became very apparent early on when playing with BERT was, unlike all the prior generations of models, it had a large amount of world knowledge that we didn't deliberately encode into it.”
— Kelvin Guu
“If you just want to get very precise factoid information like what is my wifi password, retrieval augmentation is going to be very good. But… an expert that's been trained on [a domain] is going to be able to adapt more quickly.”
— Kelvin Guu
“You can't just tell the model, ‘Please don't make things up,’ and expect that to just solve the problem.”
— Kelvin Guu
“Without being able to track [training data] down, we'll never know if these models are generalizing or just kind of cleverly patching together what they know.”
— Kelvin Guu
“It seems like the differentiating factor now shifts more to problem formulation or creativity in identifying problems and being able to frame them in a way that you can then reduce them to a technical problem.”
— Kelvin Guu
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome