Why Scale Will Not Solve AGI | Vishal Misra - The a16z Show

Vishal Misra returns to explain his latest research on how LLMs actually work under the hood. He walks through experiments showing that transformers update their predictions in a precise, mathematically predictable way as they process new information, explains why this still doesn't mean they're conscious, and describes what's actually required for AGI: the ability to keep learning after training and the move from pattern matching to understanding cause and effect. Timestamps 00:00 — Introduction 02:58 — LLM as Giant Matrix 08:24 — What Is In-Context Learning 13:00 — Bayesian Updating as Evidence 19:13 — Bayesian Wind Tunnel Tests 27:22 — Brains Simulate Causality 36:34 — Manifolds and New Representations 42:17 — Simulation as Short Program Read the full transcript here: https://www.a16z.news/s/podcast Resources: Follow Vishal Misra on X: https://x.com/vishalmisra Follow Martin Casado on X: https://x.com/martin_casado Stay Updated: If you enjoyed this episode, be sure to like, subscribe, and share with your friends! Find a16z on X: https://twitter.com/a16z Find a16z on LinkedIn: https://www.linkedin.com/company/a16z Listen to the a16z Show on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX Listen to the a16z Show on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711 Follow our host: https://x.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see http://a16z.com/disclosures.

Vishal MisraguestErik Torenberghost

Mar 17, 202646mWatch on YouTube ↗

EVERY SPOKEN WORD

40 min read · 8,222 words

0:00 – 2:58
Introduction
1. VMVishal Misra
  Anthropic makes great products. Claude Code is fantastic, CoWork is fantastic, but they are grains of silicon doing matrix multiplication. They don't have consciousness. They don't have an inner monologue. You take an LLM and train it on pre-1916 or 1911 physics and see if it can come up with the theory of relativity. If it does, then we have AGI.
2. ETErik Torenberg
  Just today, by the way-
3. VMVishal Misra
  Yeah
4. ETErik Torenberg
  ... Dario allegedly said that you can't rule out that they're conscious.
5. VMVishal Misra
  You can rule out they're conscious. [both laughing] Come on. To get to what is called AGI, I think there are two things that need to happen. One is...
6. ETErik Torenberg
  Vishal, it's great to have you in again.
7. VMVishal Misra
  Great to be back.
8. ETErik Torenberg
  This is one of my favorite topics, which is, um, how do LLMs actually work?
9. VMVishal Misra
  Mm-hmm.
10. ETErik Torenberg
  And I think that, uh, you, in, in my opinion, you've done kind of the best work on this, modeling it out.
11. VMVishal Misra
  Thank you.
12. ETErik Torenberg
  For those that did not see the original, um, one, maybe it's probably worth doing just a quick background on kind of what led you to this point, and then we'll just go into the current work that you've been doing.
13. VMVishal Misra
  Five years ago, when GPT-3 was first released-
14. ETErik Torenberg
  Yeah
15. VMVishal Misra
  ... uh, I got early access to it, and I started playing with it, and I was trying to solve a problem related to querying a cricket database.
16. ETErik Torenberg
  Yeah.
17. VMVishal Misra
  And I got GPT-3 to do in-context learning, few-shot learning, and, you know, it was kind of the first, at least, uh, to, to me, it was the first known, uh, implementation of RAG, Retrieval-Augmented Generation, which I used to solve this problem of, uh, querying, getting GPT-3 to translate natural language into something that could be used to query a database that GPT-3 had no idea about. I had no access to GPT-3's internals, but I was still able to use it to solve that problem. So it, it, it worked beautifully. Uh, we, we deployed, uh, this, uh, in production at ESPN in September '21, but-
18. ETErik Torenberg
  Wow. Wow, you, you did the first implementation of RAG in 2021?
19. VMVishal Misra
  No, no, no. In 2020.
20. ETErik Torenberg
  2020.
21. VMVishal Misra
  2020, I got it working, and by the time you talk to all the lawyers at ESPN and, you know, productionize it, it took, it took a while.
22. ETErik Torenberg
  Wow.
23. VMVishal Misra
  But October 2020, we had... Well, I had this-
24. ETErik Torenberg
  Yeah
25. VMVishal Misra
  ... architecture working. But after I got it to work, I was amazed that it worked. I wanted to understand how it worked.
26. ETErik Torenberg
  Yeah.
27. VMVishal Misra
  And I looked at, you know, the Attention Is All You Need paper and all the other sort of deep learning architecture papers, and I couldn't understand why it worked.
28. ETErik Torenberg
  Yeah.
29. VMVishal Misra
  So then I started getting sort of, uh, deep into building a mathematical model.
30. ETErik Torenberg
  Yeah. And now you've published a, a series of papers.
2:58 – 8:24
LLM as Giant Matrix
1. ETErik Torenberg
  you were trying to, you, you were trying to describe... You were trying to come up with a mathematical model-
2. VMVishal Misra
  Mm-hmm
3. ETErik Torenberg
  ... of how LLM works.
4. VMVishal Misra
  Yeah.
5. ETErik Torenberg
  And you had, which was very helpful to me-
6. VMVishal Misra
  Mm-hmm
7. ETErik Torenberg
  ... which was, um... And at the time you were actually trying to, like, figure out how in-context learning was working.
8. VMVishal Misra
  Yes. Yeah.
9. ETErik Torenberg
  And you came up with an abstraction for LLMs, which is basically this very, very large matrix, and you used that to describe. So maybe you can kind of walk through that work very quickly.
10. VMVishal Misra
  Sure, yeah. So, so what you do is you, you imagine this huge, gigantic matrix where every row of the matrix corresponds to a prompt.
11. ETErik Torenberg
  Yeah.
12. VMVishal Misra
  And the way these LLMs work is, given a prompt, they construct a distribution of probabilities of the next token. Next token is next word. So every LLM has a vocabulary. You know, GPT and its variants have a vocabulary of about 50,000 tokens.
13. ETErik Torenberg
  Yeah.
14. VMVishal Misra
  So given a prompt, it'll come up with a distribution of what the next token should be, and then all these models sample from that distribution.
15. ETErik Torenberg
  Yeah. So that's the posterior distribution.
16. VMVishal Misra
  That's the posterior distribution.
17. ETErik Torenberg
  Right.
18. VMVishal Misra
  Right? That, that's how LLMs work. And so the idea of this matric is, matrix is for every possible combination of tokens, which is a prompt, there's a row.
19. ETErik Torenberg
  Yeah.
20. VMVishal Misra
  And the columns are a distribution over the vocabulary.
21. ETErik Torenberg
  Yeah.
22. VMVishal Misra
  So if you have, like, a vocabulary of 50,000 possible tokens, it's a distribution over those 50,000 tokens.
23. ETErik Torenberg
  And by distribution, it's just the probability-
24. VMVishal Misra
  Just the probability. Sorry, yeah.
25. ETErik Torenberg
  Yeah.
26. VMVishal Misra
  Just the probability that, uh, the next token should be this-
27. ETErik Torenberg
  Yeah
28. VMVishal Misra
  ... versus that.
29. ETErik Torenberg
  Yeah.
30. VMVishal Misra
  Uh, so that, that's sort of the idea. And, and, and when you start viewing it that way, it, it makes things at least, uh, clearer to, uh, people like me who want to model it, uh, what, what's happening. So c-concretely, let's say you have an example that, uh, let's say your prompt is just one word, protein.
8:24 – 13:00
What Is In-Context Learning
1. VMVishal Misra
  subset, right.
2. ETErik Torenberg
  You, you know, you use this approach to describe how in-context learning works, and so maybe first describe what in-context learning is-
3. VMVishal Misra
  Yeah
4. ETErik Torenberg
  ... and then kind of the conclusion that you came from that.
5. VMVishal Misra
  So in-context learning is when you, uh, show the LLM something it has kind of never seen before. You give it a few examples of this is what it wants, uh, this is what you're trying to do. Then you give a new problem, which is related to the examples that you've shown.
6. ETErik Torenberg
  Yeah.
7. VMVishal Misra
  And the LLM lea-learns in real time what it's supposed to do and solves the problem.
8. ETErik Torenberg
  By the way, the first time I saw this, it absolutely blew my mind.
9. VMVishal Misra
  Yeah.
10. ETErik Torenberg
  I actually, I actually used your DSL-
11. VMVishal Misra
  Mm-hmm
12. ETErik Torenberg
  ... by when I was like first learning about it. So maybe like, kind of like a, like a-
13. VMVishal Misra
  Yeah. So, so I, I, like-
14. ETErik Torenberg
  The DSL thing was just, it's just-
15. VMVishal Misra
  Uh, it was-
16. ETErik Torenberg
  ... crazy this works at all.
17. VMVishal Misra
  I-it's absolutely, you know, mind-blowing that it works. And so going back to that cricket problem-
18. ETErik Torenberg
  Yeah
19. VMVishal Misra
  ... was, you know, i-i-in the mid-'90s, uh, I was part of a group that had created this, uh, cricket portal called Cricinfo.
20. ETErik Torenberg
  Yeah.
21. VMVishal Misra
  Uh, cricket, uh, i-is a very stat-rich sport. You know, you think baseball multiplied by a thousand, and it's got all kinds of stats. And we had created this, uh, online searchable database called StatsGuru, where you could search for anything, any stat related to cricket, and it's been available since 2000.
22. ETErik Torenberg
  Yeah.
23. VMVishal Misra
  But because you can query for anything, everything was be-made available. And how do you make something like that available to the general public?
24. ETErik Torenberg
  Yeah.
25. VMVishal Misra
  Well, they're not gonna write SQL queries. The next best thing at that time was to create a web form. Unfortunately, [chuckles] everything was crammed into that web form. So as a result, you had like 20 drop-downs, 15 check boxes, 18 different text fields. It looked like a very complicated, daunting interface. So as a result, even though it could solve or it could answer any query, almost no one used it. A vanishingly small percentage-
26. ETErik Torenberg
  Yeah. [chuckles]
27. VMVishal Misra
  ... of cricket fans use it because it, it just looked intimidating. And then ESPN bought that site, uh, in 2007. I still know people who, who run the site, and I've always told them, "You know, why don't you do something with StatsGuru?" And in January 2020, uh, the editor-in-chief of, uh, Cricinfo, Sambit Bal, he's, he's a friend, so he came to New York and we'd gone out for drinks. And again, I told him, "You know, why don't you do something with StatsGuru?" So he looks at me and says, "Why don't you do something about StatsGuru?" [chuckles] He was joking, but, uh, that idea kind of stayed with me. And when GPT-3 was released, I thought maybe I could use StatsGuru, use GPT-3 to create a front end for StatsGuru.
28. ETErik Torenberg
  Gotcha.
29. VMVishal Misra
  And so what I did was, uh, I designed a DSL, a domain-specific language, which, uh, converted queries about cricket stats in natural language into this DSL. Now-
30. ETErik Torenberg
  And to be clear, you created this. It wasn't, like, part of, like, any training data-
13:00 – 19:13
Bayesian Updating as Evidence
1. ETErik Torenberg
  learning.
2. VMVishal Misra
  Y-yeah. So, so w-when you think about what in-context learning is, is that a-as you see evidence, so, so, you know, uh, i-in the first paper, what I also did was I, I took this cricket DSL example.
3. ETErik Torenberg
  Okay.
4. VMVishal Misra
  And I, uh, uh, I depicted the next token probabilities-
5. ETErik Torenberg
  Mm-hmm
6. VMVishal Misra
  ... of the model as it was shown more and more examples. So the first time you show it this DSL, the natural language and the DSL, the probabilities of the DSL tokens were, were extremely low because GPT-3 had never seen this thing. When it saw the cricket question, in its mind, it was trying to continue it with an English answer. So the probabilities that were high were all English words.
7. ETErik Torenberg
  Yeah.
8. VMVishal Misra
  Once it saw my prompt where I had the question and the DSL, the next time I had the question in the next row, the probabilities of the DSL token started going up. With every example, it went up, and finally, when I gave the new query, it was like it had almost 100% probability of getting the right token.
9. ETErik Torenberg
  Yeah.
10. VMVishal Misra
  So this is an example of in real time, the model was updating its posterior probability. It was updating its knowledge that, okay, I've seen evidence, this is what I'm supposed to do. Now, this is a colloquial way of saying what Bayesian-
11. ETErik Torenberg
  Yeah
12. VMVishal Misra
  ... inference is. Bayesian updating basically is you start with a prior. When you see new evidence, you update your posterior. That's the mathematical definition. But, but-
13. ETErik Torenberg
  Yeah
14. VMVishal Misra
  ... uh, in, in English, it's basically you see something, you see new evidence, you update your belief about what's happening.
15. ETErik Torenberg
  Yeah.
16. VMVishal Misra
  Right? So it was clear to me that LLMs are doing something which resembles Bayesian updating. So in that first paper, I had this matrix formulation, and I showed that, you know, what it's doing, it looks like Bayesian updating.
17. ETErik Torenberg
  Yeah.
18. VMVishal Misra
  Then we can come to the sort of next series of papers.
19. ETErik Torenberg
  That's right. So, okay. So I mean, it, it, it seemed pretty conclusive to me at that time.
20. VMVishal Misra
  Yeah.
21. ETErik Torenberg
  And then you went quiet for a while, and then I, I still remember the WhatsApp text. You said-
22. VMVishal Misra
  Yeah
23. ETErik Torenberg
  ... "Martín, I know exactly how these things are working now." [both chuckling]
24. VMVishal Misra
  Yeah. Well-
25. ETErik Torenberg
  And then, and then, and then listen, you dropped a series of papers that kind of broke the internet. Like you went super viral on Twitter.
26. VMVishal Misra
  Yeah.
27. ETErik Torenberg
  Like, I mean, people really noticed. Um, uh, and so I, I want to get to that in just a second.
28. VMVishal Misra
  Yeah.
29. ETErik Torenberg
  But before that, um, I remember when your first paper came out, people would be like, "You know, these things are definitely not Bayesian." Like-
30. VMVishal Misra
  Mm
19:13 – 27:22
Bayesian Wind Tunnel Tests
1. ETErik Torenberg
  it. Got it.
2. VMVishal Misra
  So then we came up with this idea, you know, my colleagues, uh, Naman Agarwal and Siddharth Dalal, we, the, the series of papers were, were written with them. We came up with this idea of a Bayesian wind tunnel.
3. ETErik Torenberg
  Okay.
4. VMVishal Misra
  So what's a wind tunnel? Well, wind tunnel in the aerospace industry is where you test an aircraft in an isolated environment. You don't fly it, and you test, test it against all sorts of, uh, you know, aerodynamic pressure. Then you see what, what it'll withstand, what kind of altitude, pressure, blah, blah, blah. Right? You don't want to do it up in the air testing.
5. ETErik Torenberg
  Yeah.
6. VMVishal Misra
  So we said, okay, why don't we create an environment where we take these architectures, and we tested transformers, Mamba, LSTMs, uh, MLPs, a-all architectures. We said, why don't we create, take a blank architecture, give it a task where it's impossible for the architecture to memorize what the solution to that task should be. The space is combinatorially-
7. ETErik Torenberg
  Yeah
8. VMVishal Misra
  ... impossible for, given the number of parameters, and we took very small models. So it's difficult enough that they cannot memorize it.
9. ETErik Torenberg
  Yeah.
10. VMVishal Misra
  But it's tractable enough that we know precisely what the, the Bayesian posterior should be. You can calculate it analytically. So we gave these models a bunch of tasks where, again, we show that it's impossible to memorize. We trained these models, and we found that the transformer got the precise Bayesian posterior down to ten to the power minus three bits accuracy. It was matching the distribution perfectly. So it is actually doing Bayesian in the mathematical sense, given a task-
11. ETErik Torenberg
  Wow
12. VMVishal Misra
  ... where it has to update its belief. Uh, Mamba also does it reasonably well. LSTMs can do one of the things. So the, the, in the papers, we have a taxonomy of Bayesian tasks. Transformer does everything, Mamba does most of it, LSTMs do only partially, and MLPs fail completely.
13. ETErik Torenberg
  So is this a reflection of the data that it's trained on, or is it more a reflection of the mechanism?
14. VMVishal Misra
  It's the mechanism, it's the architecture. The data decides what tasks it learns.
15. ETErik Torenberg
  Right.
16. VMVishal Misra
  So in the first paper, we had these Bayesian wind tunnels, and we show that, you know, it's doing the job at different tasks. In the second paper, we show why it does it. So we look at the transformers, we look at the gradients, and we show how the gradients actually shape this geometry-
17. ETErik Torenberg
  Ah
18. VMVishal Misra
  ... which enables this Bayesian updating to happen. Then in the third paper, what we did, we take, we took these frontier production LLMs, which have open weights so that we could look inside them, and we did our testing, and we saw that the geometries that we saw in the small, uh, models persisted in models which are, you know, hundreds of millions of parameters. The same signature existed. The only thing is that, uh, because they are trained on all sorts of data, it's a little bit dirty or messy.
19. ETErik Torenberg
  Yeah.
20. VMVishal Misra
  But you can see the same structure. So the, the whole idea behind the Bayesian wind tunnel was, unlike these, uh, production LLMs, where you don't know what they have been trained on-
21. ETErik Torenberg
  Right
22. VMVishal Misra
  ... so you cannot mathematically compute the posterior. So again, how do you prove it? I mean, it looks Bayesian, you know, from the first paper.
23. ETErik Torenberg
  From the first paper, yeah.
24. VMVishal Misra
  From the paper it looks Bayesian, but, you know.
25. ETErik Torenberg
  Looks Bayesian to me. Yeah.
26. VMVishal Misra
  So the wind tunnel sort of solved that problem for us.
27. ETErik Torenberg
  Mm-hmm.
28. VMVishal Misra
  We said, okay, let's start with a blank architecture, give it a task where we know what the answer is. It cannot memorize it. Let's see what it does. And yeah.
29. ETErik Torenberg
  So do you think this provides any sort of, like, indication of how humans think, or do you think that these things are totally independent?
30. VMVishal Misra
  No, no, it, it does provide, right? So, you know, human beings also, uh, update our beliefs as we see new evidence.
27:22 – 36:34
Brains Simulate Causality
1. VMVishal Misra
  coming back to it, we, we are Bayesian-
2. ETErik Torenberg
  Yeah
3. VMVishal Misra
  ... but we do something else. You know, when I, when I, when I throw this pen at you, what'll you do?
4. ETErik Torenberg
  Dodge it or-
5. VMVishal Misra
  Dodge it
6. ETErik Torenberg
  ... dodge it? Yeah.
7. VMVishal Misra
  Why will you dodge it?
8. ETErik Torenberg
  Um, to avoid being hit.
9. VMVishal Misra
  Avoid being hit.
10. ETErik Torenberg
  Yeah.
11. VMVishal Misra
  But your head is not doing a Bayesian calculation of, uh, okay, this pen is coming, the probability that it hits me, uh, it'll cause this much pain or all that.
12. ETErik Torenberg
  Correct.
13. VMVishal Misra
  What you're essentially doing in your head is you're doing a simulation.
14. ETErik Torenberg
  Ah, right.
15. VMVishal Misra
  You see the b-uh, the, the, the pen coming and you know that it'll come and hit me. Your mind simulates and you dodge it, right? So all of deep learning is, uh, doing correlations. It's not doing causation.
16. ETErik Torenberg
  Yeah.
17. VMVishal Misra
  Causal models are the ones that are able to do simulations and interventions. So, you know, Judea Pearl has this whole, uh, causal hierarchy-
18. ETErik Torenberg
  Yeah
19. VMVishal Misra
  ... where the first hierarchy, and the first hierarchy is association, which is you build these correlation models. Deep learning is beautiful. It, it's extremely powerful. I mean, you see every day, all these models are, like, amazingly good.
20. ETErik Torenberg
  Yeah.
21. VMVishal Misra
  They do association. The second is intervention in the hierarchy.
22. ETErik Torenberg
  Yeah.
23. VMVishal Misra
  Deep learning models do not do that. Third is counterfactual. So both intervention and counterfactual, you can imagine it, it, it's some sort of simulation. You, you build a model of, causal model of what's happening, and then you are able to simulate. So our brains do that. The current architectures don't do that. Another example I think which will make it clear is, uh, the difference between, uh, I'll use these technical term, Shannon entropy-
24. ETErik Torenberg
  Mm-hmm
25. VMVishal Misra
  ... and Kolmogorov complexity.
26. ETErik Torenberg
  Sure.
27. VMVishal Misra
  So if you look at the Shannon entropy of the digits of pi-
28. ETErik Torenberg
  Yeah
29. VMVishal Misra
  ... it's infinite.
30. ETErik Torenberg
  Sure.
36:34 – 42:17
Manifolds and New Representations
1. VMVishal Misra
  Yeah.
2. ETErik Torenberg
  You know, another way that I've always thought about these, and I thought you articulated it well the last time we talked about it, which is the universe is this very, very complex space, and then-You know, somehow humans map it into a manifold-
3. VMVishal Misra
  Mm-hmm
4. ETErik Torenberg
  ... that's less complex.
5. VMVishal Misra
  Yeah.
6. ETErik Torenberg
  And then that gets kind of written down, and then the LLM... So that's kinda some, some distribution, some-
7. VMVishal Misra
  Mm.
8. ETErik Torenberg
  You know, it's still a very large space, but it's-
9. VMVishal Misra
  Yeah
10. ETErik Torenberg
  ... it's, it's a bounded space, and the LLM learned that manifold, and then they kind of use, you know, Bayesian inference to move up and down that manifold.
11. VMVishal Misra
  Right.
12. ETErik Torenberg
  But they're kind of bound to that manifold.
13. VMVishal Misra
  Yeah.
14. ETErik Torenberg
  And then, again, I don't wanna put words in your mouth, and then, but, like, what they can't do is, is generate a new manifold.
15. VMVishal Misra
  New manifold, yeah, yeah.
16. ETErik Torenberg
  Which requires understanding the way that the universe works, then coming up with a new representation of the universe.
17. VMVishal Misra
  Yeah. And th- this is what relativity is, right?
18. ETErik Torenberg
  Yeah, exactly.
19. VMVishal Misra
  Einstein had to create a new manifold.
20. ETErik Torenberg
  Yeah, yeah, yeah.
21. VMVishal Misra
  If you just stuck with the old manifold of the Newtonian physics-
22. ETErik Torenberg
  Right
23. VMVishal Misra
  ... then you would see these correlations, but you could not come up with a manifold that explain them. So you need to come up with a new representation.
24. ETErik Torenberg
  Yeah.
25. VMVishal Misra
  So to me, you know, there are lots of definitions of AGI, uh, you know, Turing test, we have already passed that. You know, performing economically useful work, every day you see, you know, uh, LLMs are doing that.
26. ETErik Torenberg
  Do we? I don't know.
27. VMVishal Misra
  No, I mean, they are.
28. ETErik Torenberg
  I mean, um, I mean, without human intervention?
29. VMVishal Misra
  No, no, no. So, so that, that's different.
30. ETErik Torenberg
  Okay.
42:17 – 46:48
Simulation as Short Program
1. ETErik Torenberg
  Can you, and can you, can you tie the two things? Like, how does that pair with doing simulation, or is a simulation totally orthogonal?
2. VMVishal Misra
  No, s- simulation is, uh, is related, right?
3. ETErik Torenberg
  So you think it, like, basically you do simulation and somehow that is a step towards doing the Kolmogorov complexity?
4. VMVishal Misra
  It, it's, it's, the simulator is the, i- is the program that we create. It may not be the perfect program.
5. ETErik Torenberg
  Oh, I see. And you say-
6. VMVishal Misra
  But in our heads we create this, uh, simulator that when I'm throwing the pen, you know that it's coming at you, right?
7. ETErik Torenberg
  Yeah.
8. VMVishal Misra
  And you duck.So, so you're not computing the probabilities, uh, as it goes, but, but you have, you know, you, you build an approximate-
9. ETErik Torenberg
  That's a very physical thing versus we are talking more conceptually.
10. VMVishal Misra
  Conceptually, but, but it's a similar thing.
11. ETErik Torenberg
  And you think those are the same mechanism?
12. VMVishal Misra
  It's the same mechanism.
13. ETErik Torenberg
  Really?
14. VMVishal Misra
  Yeah. You, you have to build a causal model.
15. ETErik Torenberg
  Yeah.
16. VMVishal Misra
  Right?
17. ETErik Torenberg
  I see. I see. Yeah.
18. VMVishal Misra
  For most things, right?
19. ETErik Torenberg
  Yeah.
20. VMVishal Misra
  So you have to move from correlation to causation. I mean, we've heard this term-
21. ETErik Torenberg
  Yeah
22. VMVishal Misra
  ... you know, ad infinitum.
23. ETErik Torenberg
  Yeah.
24. VMVishal Misra
  But here it, it's making a difference in the way we view intelligence.
25. ETErik Torenberg
  Yeah. How, how, how has, how has the last three papers been received?
26. VMVishal Misra
  No, I don't know. They're, they're... Well, uh, uh-
27. ETErik Torenberg
  I mean, I mean-
28. VMVishal Misra
  The archive versions were like-
29. ETErik Torenberg
  Let me tell you.
30. VMVishal Misra
  Yeah.

Episode duration: 46:48

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode zwDmKsnhl08

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome