No Priors Ep. 7 | With Stanford Professor Dr. Percy Liang

When AI research is evolving at warp speed and takes significant capital and compute power, what is the role of academia? Dr. Percy Liang – Stanford computer science professor and director of the Stanford Center for Research on Foundation Models talks about training costs, distributed infrastructure, model evaluation, alignment, and societal impact. Sarah Guo and Elad Gil join Percy at his office to discuss the evolution of research in NLP, why AI developers should aim for superhuman levels of performance, the goals of the Center for Research on Foundation Models, and Together, a decentralized cloud for artificial intelligence. 00:00 - Introduction 01:44 - How Percy got into machine learning research and started the Center for Research and Foundation Models at Stanford 07:23 - The role of academia and academia’s competitive advantages 13:30 - Research on natural language processing and computational semantics 27:20 - Smaller scale architectures that are competitive with transformers 35:08 - Helm, holistic evaluation of language models, a project with the the goal is to evaluate language models 42:13 - Together, a decentralized cloud for artificial intelligence

Sarah GuohostDr. Percy LiangguestElad Gilhost

Apr 25, 202353mWatch on YouTube ↗

CHAPTERS

0:00 – 1:59
Percy Liang’s path into NLP: from HMM language models to today’s LLM era
Percy recounts how his early fascination with language and theory at MIT led him into machine learning and NLP, followed by graduate work at Berkeley and a faculty role at Stanford. He contrasts early HMM-based language modeling—discovering simple latent structure in text—with the leap in capabilities seen in modern large language models.
- •Early motivation: how humans learn language and world knowledge from exposure
- •Academic trajectory: MIT → Berkeley → Stanford
- •Early NLP work with Hidden Markov Models and learning hidden structure
- •Why today’s LLM capabilities feel like a different order of magnitude
1:59 – 4:01
GPT-3 as the inflection point: in-context learning and “foundation models”
Percy describes GPT-3’s release as the decisive moment that shifted his focus toward large-scale models. The surprise was less the fluency and more the training paradigm—simple next-token prediction plus scale—enabling in-context learning and dissolving the notion of bespoke “tasks.”
- •GPT-3’s training objective and why it was paradigm-shifting
- •In-context learning: prompting with instructions and examples
- •From task-specific systems to general-purpose substrates
- •Coining “foundation models” to capture multimodal/general significance
4:01 – 6:20
Inside Stanford’s CRFM: transparency, access, and why openness is retreating
The conversation turns to the Center for Research on Foundation Models (CRFM) and its mission under Stanford HAI. Percy argues that the field benefited from open tools and datasets, but foundation models are increasingly closed (API-only, limited disclosure), driven by both competitive and safety incentives.
- •CRFM mission: increase transparency and accessibility of foundation models
- •Open culture’s historical role (frameworks, datasets, papers)
- •Why models are becoming closed: capital cost and competitive advantage
- •Safety as an additional driver for restricting release
6:20 – 10:38
Academia’s advantage now: understanding how models work and their societal impact
Percy and Elad discuss how academia and industry roles shift as “making it work” becomes less central when scale and resources can brute-force progress in industry. Percy emphasizes academia’s comparative advantage in rigorous understanding—mechanisms, data/architecture effects, objectives—and in interdisciplinary analysis of social, legal, and economic impacts.
- •Past academic role: making ML systems work at all
- •Current dynamic: industry can scale; academia should focus on understanding
- •Mechanistic questions: data weighting, objectives, architecture-behavior links
- •Interdisciplinary impact: copyright/legal issues, bias, disinformation, medicine
10:38 – 13:00
High-stakes deployment lens: healthcare, privacy, robustness, and ‘superhuman’ standards
Elad probes near-term clinical deployment, drawing parallels to earlier expert systems that never scaled to real-world use. Percy highlights hurdles like privacy, robustness, and hallucinations, while also arguing evaluation should move beyond “human-level” toward reliability, grounding, and principled evidence standards.
- •Barriers to healthcare deployment: privacy, robustness, hallucinations
- •Adoption and cultural friction vs technical capability
- •Why “human doctor” isn’t the right ceiling for evaluation
- •Desired properties: reliability, grounding, statistical evidence, rationality
13:00 – 16:05
Computational semantics & semantic parsing: language as a formal interface to computation
Percy defines computational semantics as extracting ‘meaning’ from text and explains his prior framing of language as something that can be formalized and executed. He describes semantic parsing work such as translating natural-language questions into SQL and contrasts this with retrieval-based QA that may not compute answers rigorously.
- •Computational semantics: computing meaning from language
- •Language-as-programming-language viewpoint
- •Semantic parsing example: natural language → SQL queries
- •Tradeoff vs retrieval/heuristic QA and the limits of a ‘world as database’
16:05 – 19:23
Neural vs symbolic revisited: tool use, reasoning, and the limits of today’s transformers
Percy discusses the need to combine formal tool use (calculators, external systems) with the softer reasoning LLMs perform natively. He notes open research questions around data efficiency, benchmark gaming, context length limits, and more flexible problem-solving approaches that include iteration and backtracking.
- •Marrying tool use with language-model reasoning
- •Neural vs symbolic AI as a recurring debate
- •Data efficiency and robustness vs benchmark optimization
- •Context length constraints and the need for iterative/backtracking architectures
19:23 – 23:03
Emergent capabilities: in-context learning, chain-of-thought, and compositional creativity
Percy outlines capabilities that surprised him and others, especially in-context learning and chain-of-thought prompting improving problem-solving. He also points to ‘mix and match’ composition (e.g., style transfer with technical content) as evidence these models do more than memorization, hinting at potential for scientific discovery.
- •In-context learning as an emergent behavior at scale
- •Chain-of-thought: explaining steps can improve accuracy
- •Compositionality: fusing concepts like style + algorithm explanation
- •Why this argues against pure memorization; implications for creativity/discovery
23:03 – 27:19
What might emerge next: instruction-following, reducing hallucinations, and richer world models
Asked to predict future emergent behaviors, Percy distinguishes between capabilities that may be engineered vs truly emergent. He highlights strong instruction-following as striking today, and discusses the hope that better pretraining and context could reduce hallucinations by improving the model’s implicit understanding of entities, reliability, and uncertainty.
- •Instruction-following as a major capability leap (and what counts as ‘emergent’)
- •Hallucination as a core, difficult failure mode
- •Scale/context as a path to better grounding and uncertainty awareness
- •Pretraining as building a ‘world model’ that makes control easier
27:19 – 30:57
Beyond transformers: alternative architectures, compute budgets, and why academia should challenge the default
Elad raises whether other architectures might shine if scaled like transformers. Percy hopes the field won’t still be using transformers in a decade, cites promising non-attention approaches competitive at smaller scales, and argues academia is well-positioned to explore principles and alternatives—though compute constraints still dominate what’s feasible to test.
- •Transformers as ‘good enough,’ but not necessarily the end state
- •Research into non-attention alternatives competitive at smaller scales
- •Compute constraints: why scaling LSTMs is impractical vs transformers
- •Academic role: principled exploration and challenging status quo
30:57 – 35:24
Together: decentralized compute for training foundation models and overcoming weak interconnects
Percy introduces Together, motivated by compute as the bottleneck and the existence of idle, decentralized resources. He explains the key technical challenge—foundation model training typically needs high-bandwidth datacenter interconnects—and describes scheduling/compression techniques enabling training across weakly connected machines.
- •Premise: compute bottleneck + underutilized decentralized compute
- •Challenge: slow interconnect vs datacenter-grade networking
- •Tech focus: scheduling and compression to make decentralized training viable
- •Goal: expand compute access for research and startups
35:24 – 42:04
Domain-specific models and scientific workflows: BioMed LM, literature overload, and AI-assisted research
The group discusses training and using specialized foundation models, including a PubMed-trained biomedical model that achieved strong benchmark performance. They explore using models to flag issues (e.g., potential fraud) and, more broadly, to tackle literature review and hypothesis generation—moving toward ‘AI scientist’ workflows with humans in the loop.
- •Example: CRFM/Mosaic BioMed LM trained on PubMed; scale vs efficiency tradeoffs
- •Potential use in screening/flagging issues, with caution for consequential decisions
- •Literature review tools to separate signal from noise (e.g., Elicit)
- •Longer-term vision: systems that propose hypotheses, run experiments, and iterate
42:04 – 47:09
HELM: holistic, continuously updated evaluation across models, scenarios, and risk dimensions
Percy explains HELM (Holistic Evaluation of Language Models) as CRFM’s effort to evaluate models rigorously across many real-world scenarios and multiple metrics beyond accuracy. HELM compares dozens of models (open and API-access) and publishes results with drill-down transparency, while updating frequently to track rapidly evolving capabilities and new risks.
- •Why evaluating LMs is hard: broad applicability and shifting use cases
- •HELM framework: scenarios + metrics (robustness, calibration, fairness, toxicity, efficiency)
- •Cross-model comparisons including open models and closed APIs
- •Ongoing refresh cadence; adding new capabilities, multimodal evaluation, and security risks like jailbreak cascades
47:09 – 49:59
Policy, legitimacy, and transparency norms: ‘nutrition labels’ for foundation models
The discussion closes on how foundation models intersect with policy more than prior ML eras. Percy argues that transparency is a prerequisite for accountability—especially around alignment choices and whose values are encoded—and suggests standardized disclosures akin to nutrition labels or spec sheets for model producers.
- •Foundation-model governance: openness vs closed players and black-box alignment
- •Key questions: whose values, who decides, and what accountability exists
- •Transparency as necessary (not sufficient) for meaningful policy discussions
- •Analogy: model disclosure norms like nutrition labels/specification sheets
49:59 – 53:37
Open tooling and ‘no priors’ thinking about AGI: staying open-minded amid rapid change
Percy describes how Together aims to involve the community via clients/APIs and open models like an evolving ‘OpenChatKit’ rather than a finished, closed product. Asked about AGI, he notes his views have shifted from skepticism to open-mindedness, emphasizing near-term robustness and societal issues while acknowledging emergent behavior makes long-range forecasts difficult.
- •How to participate: Together client/API vision and community-driven open models
- •OpenChatKit concept: iterative improvement with feedback vs unilateral release
- •AGI discourse: shifting community attitudes and Percy’s evolving view
- •Focus on near-term robustness and impacts, while remaining open-minded

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Percy Liang’s path into NLP: from HMM language models to today’s LLM era

GPT-3 as the inflection point: in-context learning and “foundation models”

Inside Stanford’s CRFM: transparency, access, and why openness is retreating

Academia’s advantage now: understanding how models work and their societal impact

High-stakes deployment lens: healthcare, privacy, robustness, and ‘superhuman’ standards

Computational semantics & semantic parsing: language as a formal interface to computation

Neural vs symbolic revisited: tool use, reasoning, and the limits of today’s transformers

Emergent capabilities: in-context learning, chain-of-thought, and compositional creativity

What might emerge next: instruction-following, reducing hallucinations, and richer world models

Beyond transformers: alternative architectures, compute budgets, and why academia should challenge the default

Together: decentralized compute for training foundation models and overcoming weak interconnects

Domain-specific models and scientific workflows: BioMed LM, literature overload, and AI-assisted research

HELM: holistic, continuously updated evaluation across models, scenarios, and risk dimensions

Policy, legitimacy, and transparency norms: ‘nutrition labels’ for foundation models

Open tooling and ‘no priors’ thinking about AGI: staying open-minded amid rapid change

Get more out of YouTube videos.