Is AI Slowing Down? Nathan Labenz Says We're Asking the Wrong Question

Nathan Labenz is one of the clearest voices analyzing where AI is headed, pairing sharp technical analysis with his years of work on The Cognitive Revolution. In this episode, Nathan joins a16z’s Erik Torenberg to ask a pressing question: is AI progress slowing down, or are we just getting used to the breakthroughs? They cover the debate over GPT-5, the state of reasoning and automation, the future of agents and engineering work, and how we can build a positive vision for where AI goes next. Timecodes: 00:00 Intro 01:14 Cal Newport’s “AI slowdown” argument 03:08 Are students getting lazy? 04:55 Nathan's two-by-two matrix of AI impact 07:00 Scaling laws, GPT-4.5, and what changed with GPT-5 11:05 Longer context windows and better reasoning 17:05 AI as scientist and real discoveries 19:17 GPT-5’s shift and why launch perception matters 26:10 Jobs, automation, and the misunderstood METR study 36:20 The future of coding, agents, and recursive self-improvement 51:15 Beyond chatbots: multimodal AI and robotics 1:27:00 Why the future depends on a positive vision for AI Resources: Follow Nathan on X: https://x.com/labenz Listen to the Cognitive Revolution: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk Watch Cognitive Revolution: https://www.youtube.com/@CognitiveRevolutionPodcast Stay Updated: If you enjoyed this episode, be sure to like, subscribe, and share with your friends! Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.

Nathan LabenzguestErik Torenberghost

Oct 14, 20251h 30mWatch on YouTube ↗

EVERY SPOKEN WORD

85 min read · 17,303 words

0:00 – 1:14
Intro
1. NLNathan Labenz
  AI is not synonymous with language models. AI is being developed with pretty similar architectures for a wide range of different modalities, and there's a lot more data there. Feedback is starting to come from reality. Maybe we're running out of problems we've already solved when we start to give the next generation of the model these power tools, and they start to solve previously unsolved engineering problems. I think you start to have something that looks kind of like super intelligence.
2. ETErik Torenberg
  Nathan, I'm stoked to, stoked to have you on the a16z podcast for the first time. Obviously, we, uh, have been podcast partners for a long time with, with you leading Cognitive Revolution. Welcome.
3. NLNathan Labenz
  It's great to be here. Thank you.
4. ETErik Torenberg
  So we, we were talking about Cal Newport's podcast appearance on Lost Debates, and, and we, we thought it was a good opportunity to just have this broad conversation and really entertain this sort of question of, is AI slowing down? Um, so wh-why don't you sort of steel man s- some of the arguments that you've heard fr- o-on that idea from him or more broadly, and then we can sort of have this broader conversation.
5. NLNathan Labenz
  Yeah, I mean, I think for one thing, um, it's really important to separate a couple different questions, I think, with respect to AI. One would be, is it good for us, you know, right now even, and is it going to be good for us in the big picture?
1:14 – 3:08
Cal Newport’s “AI slowdown” argument
1. NLNathan Labenz
  And then I think that is a very distinct question from are the capabilities that we're seeing continuing to advance and, you know, at a pretty healthy clip. Um, so I actually found a lot of agreement with the, uh, Cal Newport podcast that you shared with me when it comes to some of the worries about the impact that AI might be having even already on people. You know, he, he goes over... looks over students' shoulders and watches how they're working and finds that basically he thinks that they are using AI to be lazy, which is, you know, no big revelation. I think a lot of teachers would tell you that.
2. ETErik Torenberg
  Shocker.
3. NLNathan Labenz
  He puts that in... Yeah. Puts that in maybe, um, more dressed up terms that, that people are not even necessarily moving faster, uh, but they're able to reduce the strain that the work that they're doing places on their own brains by kind of trying to get AI to do it. And, you know, if that continues, and I think, you know, he's been, um, I think a very valuable commenter on the impact of social media. Certainly, I think we all should be mindful of how is my attention span, uh, you know, evolving over time, and am I getting weak or, you know, uh, averse to hard work? Uh, those are not good trends if they are showing up in oneself. So I think he's really right to watch out for that sort of stuff. And then as we've covered it, you know, in many conversations in the past, I've got a lot of questions about what the ultimate impact of AI is gonna be, and I think he, he probably does too. But then when it comes to... It's a strange move from my perspective to go from, you know, there's all these sort of problems today and maybe in the big picture to, but don't worry, it's flatlining. Like kind of worry, but don't worry 'cause it's not really going anywhere further than this. Uh, or it's, you know, scaling has kinda petered out or, you know, we're not gonna get better AI than we have right now.
3:08 – 4:55
Are students getting lazy?
1. NLNathan Labenz
  Um, or even maybe the most easily refutable claim from my perspective is GPT-5 wasn't that much better than GPT-4. And that I think is where I really was like, "Whoa, wait a second." You know, I was with you on a lot of things, and some of the behaviors that he observes in the students, I would cop to having exhibited myself. You know, when I'm trying to code something these days, a lot of times I'm like, "Oh man, can't the AI just figure it out?" You know, I really don't wanna have to sit here and read this code and figure out what's going on. It's not even about typing the code anymore. You know, I'm way too lazy for that. But it's even about like figuring out how the code is working. Just, just, can't you just make it work? Uh, try again, you know, and just try again and I'll... I do find myself at times falling into those traps. But I would say a big part of the reason I can fall into those traps is because the AIs are getting better and better, and increasingly it's not crazy for me to think that they might be able to figure it out. So that, that's my kinda first, um, slice at the takes that I'm hearing. There's almost like a two-by-two matrix maybe that one could draw up where it's like, do you think AI is good or bad, n- you know, now and in the future? And do you think it's like not a big deal or a big deal? And I'm... I think it's both on the good and bad side. I definitely think it's a big deal. The, the thing that I struggle to understand the most is the people who kind of don't see the big deal that it seems pretty obvious to me and the, uh, you know, especially when it comes again to the, the leap from GPT-4 to GPT-5. Um, maybe one reason that that's happened a little bit is that there were just a lot more releases between GPT-4 and 5. So what people are comparing to is, you know, something that just came out a few months ago, like o3, right? That only came out a few months before GPT-5. Whereas with GPT-4, it was,
4:55 – 7:00
Nathan's two-by-two matrix of AI impact
1. NLNathan Labenz
  you know, shortly after ChatGPT, and it was all kind of this moment of like, whoa, this thing is like exploding onto the scene. A lot of people are seeing it for the first time. And if you look back to GPT-3, you know, there's a huge leap. I would contend that the leap is similar from GPT-4 to 5. These things are hard to score. There's no, you know, single number that you could put on it. Well, there's loss, uh, but of course, one of the big challenges is that like what exactly does a, a loss number translate into in terms of capabilities? So, you know, it's, it's very hard to, to describe what exactly has changed, uh, but we could go through some of the dimensions of change if you want to and, and, um, you know, enumerate some of the things that I think people maybe are starting to or ha-have come to take for granted and kinda forget, like that GPT-4 didn't have a lot of the things that now-
2. ETErik Torenberg
  Yeah
3. NLNathan Labenz
  ... you know, were sort of expected in the GPT-5 release because we'd seen them in 4o and o1 and o3 and all those, you know, things sort of, you know, maybe boiled the frog a little bit when it comes to how much progress people perceived in this last release. Well-
4. ETErik Torenberg
  Yeah. A c-c-couple reactions. So o-one is, and even to complicate your two-by-two even, even further in the sense of, you know, is it bad now versus is it bad later? Like Ca-Cal is not really ex-- you know, who we both admire, by the way, a lot. Ca-Cal's a, a, a great guy and a valuable contributor to the, the thought space. But he, he's not as concerned about sort of this sort of future AI concerns that, um, you know, sort of the AI safety folks and, um, many others are, are, are concerned about. He's more concerned about, you know, what it means to life for, you know, cognitive performance and, and development now in the same way that he's worried about, you know, s-social media's impact. And, you know, you, you think that's a, uh, you know, a, a, a concern, but n-nowhere near as big a concern as, as what to, what to expect in the future. And, and then also he, he, he presents sort of this theory of why we shouldn't worry about the future bec-because it's slowing down. And, and wh-why don't we just share what we th-- how we interpreted kind of his history, whi-which, as, as I interpreted it, was this idea of like, hey, we figured out... The, the simplistic version is we figured out this, this way such that if you throw a, a bunch of data,
7:00 – 11:05
Scaling laws, GPT-4.5, and what changed with GPT-5
1. ETErik Torenberg
  um, i-into the model, it, it, it gets better and, uh, sort of or-order of magnitude, and so the difference between GPT-2 and GPT-3, and then GPT-3 and GPT-4. Um, but then that sort of, you know, wa-was significant, the, the difference, but then it, it achieved sort of a, a diminishing returns significantly, and, and we're not seeing it in GPT-5, and thus we don't have to worry anymore. How, how would you edit the characterization of his view of sort of the, the history, and then we can get into the differences between 4 and 5?
2. NLNathan Labenz
  The scaling law idea, which is, you know, it's definitely worth agreeing, taking a moment to, to note that it is not a law of nature. You know, we do not have a principled reason to believe that scaling is some law that c- will go indefinitely. All we really know is that it has held through quite a few orders of magnitude so far. I think that it's really not clear yet to me whether or not the scaling laws have petered out, or whether we have just found a steeper gradient of improvement that is giving us better ROI on another, uh, front that we can push on. So they did train a, a much bigger model, which was GPT-4.5, and that did get released. And there are a number of interesting... You know, of course, there's a million benchmarks, whatever. The one that I zero in on the most in terms of understanding how GPT-4.5 relates to both o3 and GPT-5, and OpenAI obviously famously terrible at naming. We can all agree on that. Uh, I think a decent amount of this confusion and sort of disagreement actually does stem from [chuckles] unsuccessful, uh, naming decisions. 4.5 on this one benchmark called Simple QA, which is really just a super long tail trivia benchmark. It, it really just measures, do you know a ton of esoteric facts? And they're not things that you can really reason about. You either just have to know or don't know, uh, these particular facts. The o3 class of models got about a fifty percent on that benchmark, and GPT-4.5 popped up to like sixty-five percent. So in other words, it basically, of the things that were not known to the previous generation of models, it picked up a third of them. Now, there's obviously still two-thirds more to go, but I would say that's a pretty significant leap, right? These are super long tail questions. I would say most people would get like close to a zero. You know, you'd be like the person sitting there at the trivia night who like maybe gets one a night, is kinda what I would expect most people to do on Simple QA. And that, you know, checks out, right? Like, obviously, the models know a lot more than we do in terms of facts and just general, you know, information about the world. So at, at a minimum, you can say that GPT-4.5 knows a lot more. You know, a bigger model is able to absorb a lot more facts. Qualitatively, people also said s- in some ways, maybe it's better for creative writing. You know, it was never really trained with the same, uh, power of post-training that GPT-5 has had. And so we don't really have an apples to apples comparison, but people did, did still find some utility in it. I think maybe the, the way to understand why they've taken that offline and gone all in on GPT-5 is just that that model's really big. It's expensive to run. The price was like way higher. It was a full order of magnitude plus higher than GPT-5 is. And it's maybe just not worth it for them to consume all the compute that it would take to serve that. And maybe they just find that people are happy enough with the somewhat smaller models for now. I don't think that means that we will never see a bigger GPT-4.5 model with all that reasoning ability, and I, I would expect that that would deliver more value, especially if you're really going out and trying to do esoteric stuff that's, you know, pushing the frontier of science or what have you. Um, but in the meantime, the current models are really smart, and you can also feed them a lot of context. That's one of the big things that has improved so much over the last generation. When GPT-4 came out, at least the version that
11:05 – 17:05
Longer context windows and better reasoning
1. NLNathan Labenz
  we had as public users was only eight thousand tokens of context, which is like fifteen, you know, pages of, of text. So you were limited. You couldn't even put in like m- a couple papers. You would be overflowing the context. And this is where prompt engineering initially kind of became a thing. It was like, man, I've really only got such a little bit of information that I can provide. I gotta be really careful about what information to provide, uh, lest I overflow the thing and it just can't handle it. There were also, as context windows got extended, there were also versions of models where they could nominally accept a lot more, but they couldn't really functionally use them. You know, they sort of could, could, could fit them, uh, you know, at the API call level, but they-- the models would lose recall, or they, they'd sort of unravel as they got into longer and longer context. Now you have obviously much longer context, and the command of it is really, really good. So you can take dozens of papers on the longest context windows with GeminiAnd it will not only accept them, but it will do pretty intensive reasoning over them and with really high fidelity to those inputs. So that skill, I think, does kind of substitute for the model knowing facts itself. You could say, "Geez, we-- let's try to train all these facts into the model. We're gonna need, you know, a trillion or who knows, five trillion, however many trillion parameters to fit all these super long-tail facts." Or you could say, "Well, a smaller thing that's really good at working over provided context can... if people take the time or, you know, go to the trouble of providing the necessary information, it can kind of access the same facts that way." So you have a kind of, do I wanna push on this size, and do I wanna bake everything into the model? Or do I wanna just try to get as much performance out of a smaller, tighter model that I have? And it seems like they've gone that way, and I, I think basically just because they're seeing faster progress on that gradient. You know, in the same way that the models themselves are always kind of in the training process, taking a little step toward improvement, you know, the, the outer loop of the model architecture and the, the nature of the training runs and where they're gonna invest their compute is also kind of going that direction. And they're always looking at like, well, we could scale up over here, maybe get this kind of benefit a little bit, or we could do more post-training here and get this kind of benefit. And it just seems like we're getting more benefit from the post-training and the reasoning paradigm than scaling. But I don't think either one is, uh... I, I definitely don't think either one is, is dead. We haven't seen yet what 4.5 with all that post-training would look like.
2. ETErik Torenberg
  Yeah. And, and so, I mean, o-one of the things that you mentioned that, that Cal, um, you know, the analysis missed was, was that it way underestimated the value of, of, of, of, of, of extended reasoning, right? Um, and so what, what would it mean to, to fully sort of appreciate that?
3. NLNathan Labenz
  Well, I mean, a big one from just the last few weeks was that we had an IMO gold medal-
4. ETErik Torenberg
  Yeah
5. NLNathan Labenz
  ...with pure reasoning models, uh, with no access to tools from multiple companies. And, you know, that is night and day compared to what GPT-4 could do with math, right? We-- And these things are really weird. Like it's n- nothing I say here should be, uh, intended to suggest that people won't be able to find weaknesses in the models. I, I still use a tic-tac-toe puzzle to this day, where I take a picture of a tic-tac-toe board where some... the, one of the players has made a wrong move, um, that is not optimal and thus allows the other player to force a win. And I ask the models if somebody can force a win from this position. Only very recently, only the last generation of models are starting to get that right some of the time. Almost always before, they were like, "Tic-tac-toe is a solved game. You know, you can always get a draw." There's... And they would wrongly assess my board position as player can still get a draw. So there's a lot of weird stuff, right? The, the jagged, uh, capabilities frontier remains a, a real issue, and people are gonna find, you know, peaks and valleys for sure. But GPT-4, when it first came out, couldn't do anything approaching IMO gold problems. It was still struggling on like high school math. And since then, we've seen this high school math progression all the way up through the IMO gold. Now we've got the frontier math benchmark that is, I think now like up to twenty-five percent. It was two percent about a year ago, or even a little less than a year ago, I think. And we also just today saw something where, um, and I haven't absorbed this one yet, but somebody just came out and said that they had solved, uh, a, you know, canonical super challenging problem that no less than Terence Tao had put out. Um, and it was like this, you know, this thing happened in, I think, days or weeks of the model running versus it was 18 months, you know, that it took professional, not just any professional mathematicians, but like really, you know, the leading minds in the world to make progress on these problems. So yeah, I think that's really, um, you know, that's, that's really hard [chuckles] to, uh, jump in capabilities to miss. I also think a lot about the Google AI co-scientist, which we did an episode with. We can... You can, uh, check out the full story on that if you want to. But, you know, they basically just broke down the scientific method into a schematic, you know. And this is a lot of what happens when people... There's one thing to say the model will respond with thinking, and it'll go through a reasoning process, and, you know, the more tokens it, it spends at runtime, the better your answer will be. That's true. But then you can also build this scaffolding on top of that and say, "Okay, well, let me take something as broad and, you know, aspirational as the scientific method, and let me break that down into parts." Okay, there's hypothesis generation, then there's hypothesis evaluation, then there's, you know, experiment design, there's literature review, there's all these parts to the scientific method. What the team at Google did is created a pretty elaborate schematic that represented their best breakdown of the scientific method, optimized prompts for each of those steps, and then gave this resulting system, which is scaling inference now kind of two ways. It's both the chain of thought, but it's also all these different angles of attack structured
17:05 – 19:17
AI as scientist and real discoveries
1. NLNathan Labenz
  by the team. And they gave it legitimately unsolved problems in science, and in one particularly famous, kind of notorious case, it came up with a hypothesis which it wasn't able to verify because it doesn't have direct access to actually run the experiments in the lab. But it came up with a hypothesis to some open problem in virology that had stumped scientists for years, and it just so happened that they had also recently figured out the answer, but not yet published their results. And so there was this confluence where the scientists had experimentally verified, and Gemini, uh, in the form of this AI co-scientist, came up with exactly the right answer. And these are things that, like, literally nobody knew before. And GP- GPT-4 just wasn't doing that. You know, I mean, these are, uh, qualitatively new capabilities. That thing I think ran for days. You know, it probably cost hundreds of dollars, maybe into the thousands of dollars to run the inference. Um, you know, that's not nothing, but it's also like very much cheaper than, you know, years of grad students. And if you can get to those... caliber of problems and actually get good solutions to them. Like, you know, what would you be willing to pay, right, for that kind of thing? So yeah, I don't know. That's probably not a full appreciation. We could go on for a long time, but I would say in, in summary, GPT-4 was not able to push the actual frontier of human knowledge. I don't-- To my knowledge, I don't know that it ever discovered anything new. It's still not easy to get that kind of output from a GPT-5 or a Gemini 2.5 or, you know, a Claude Opus 4 or whatever. But it's starting to happen sometimes, and that in and of itself is a, a huge deal.
2. ETErik Torenberg
  Well then how do we explain the, the, the, the, the bearishness or the, or the kind of vibe shift around GPT-5 then? You know, one, one potential ex-ex-ex kind of contributor is this idea that if a lot of the... If, if the improvements are at the frontier, you know, not everyone is working with, you know, sort of advanced math and physics in a, in a day-to-day, and so maybe they don't see the benefits in, in their day-to-day lives in the, in the same way that, you know, sort of the jumps in ChatGPT were, were, were, were obvious and, and shaped the day-to-day.
3. NLNathan Labenz
  Yeah. I mean, I think a decent amount of it was
19:17 – 26:10
GPT-5’s shift and why launch perception matters
1. NLNathan Labenz
  that they kind of fucked up the launch, you know, simply put, right? They, like, were tweeting Death Star images, uh, which Sam Altman later came back and said, "No, you're the Death Star. I'm not the Death Star." But, uh, I think people thought that the Death Star was supposed to be the model. That was generally the... You know, the expectations were set extremely high. The actual launch itself was just technically broken, so a lot of people's first experiences of GPT-5... They, they've got this model router concept now where... And I think one, another way to understand what they're doing here is they're trying to own the consumer use case, and to own that, they need to simplify the product experience relative to what we had in the past, which was like, okay, you got GPT-4 and 4o and 4o Mini and o3 and o4 Mini and other things, you know, 4.5 was in there at one point. You got all these different models. Which one should I use for which? It's, like, very confusing to most people who aren't obsessed with this. And so one of the big things they wanted to do was just shrink that down to just ask your question, and you'll get a good answer, and we'll take that complexity on our side as the, the product owners. To do that, interestingly, and I, I don't have a great account of this, but one thing you, you might wanna do is kind of merge the models and figure out, just have the model itself decide how much to think or maybe even, you know, have the model itself decide how many of its experts, if it's a mixture of experts architecture it needs to use. Or maybe, you know, there's been a bunch of different, uh, research projects on, like, skipping layers of the model. If the task is easy enough, you could, like, skip a bunch of layers. So you might have hoped that you could genuinely on the back end merge all these different models into one model that would dynamically use the right amount of compute for the level of challenge that a given user query presented. It seems like they found that harder to do than they expected, and so the solution that they came up with instead was to have a router where the, the router's job is to pick, is this an easy query, in which case we'll send you to this model? Is it a medium? Is it a hard? And I think they just have two really, uh, models behind the scenes, so I think it's just really easy or hard. Certainly, the graphs that they showed, you know, basically showed the kind of with and without thinking. Um, the problem at launch was that that router was broken [chuckles] , so all of, all of the queries were going to the dumb model. And so a lot of people literally just got bad outputs, which were worse than o3 because they were getting non-thinking responses. And so the initial reaction of like, "Okay, this is dumb," and that sort of, you know, uh, traveled really fast. I think that kind of set the tone. My sense now is that as the dust has settled, most people do think that it is the best model available, and, you know, things like the METR, the infamous, uh, METR task length chart, it is the best. You know, we're now over two hours, and, um, it is still above the trend line. So if you just said, you know, do I believe in straight lines on graphs or not, and how should this latest data point influence whether I believe on these straight li- straight, um, lines on, you know, power, um, logarithmic scale graphs? It shouldn't really change your mind too much. It's still above the trend line. I talked to Zvi about this, Zvi Moskovitz, uh, legendary infovore and, uh, AI industry analyst on a recent podcast too, and kinda asked him same question. Like, why do you think the, you know, even some of the most plugged in, you know, sharp, uh, minds in the space s- have seemingly pushed timelines out a bit as a result of this? And his answer was basically just it resolved some amount of uncertainty. You know, you had a open question of maybe they do have another breakthrough. You know, maybe it really is the Death Star. Um, you know, if they surprise us on the upside, then all these short timeline... You know, we, we could have expected a, um... Yeah, I guess one, one way to think about it is like the, the distribution was sort of broad in terms of timelines, and if they had surprised on the upside, it might have narrowed and narrowed in toward the front end of the distribution. And if it, if they surprised on the downside or even just were, you know, purely on trend, then you would take some of your distribution from the very short end of the timelines and kind of push them back toward the middle or the end. And so his answer was like, AI 2027 seems less likely, but AI 2030 seems basically no less likely, maybe even a little more likely because some of the, the probability mass from the early years is now sitting there. So it's not that, um... I don't think people are, are moving the whole distribution out super much. I think they're maybe more just kind of shrinking the, uh, you know, it's getting a little tighter 'cause it's maybe not happening quite as soon as it seemed like it might have been. Uh, but I don't think too many people, at least that I, you know, think are really plugged in on this, are pushing out too much past 2030 at all. And by the way, you know, the-- obviously, there's a lot of, um, you know, disagreement. The way I kind of have always thought about this sort of stuff isDario says 2027, Demis says 2030. I'll take that as my range. So coming into GPT-5, I was kind of in that space, and now I'd say, well, I don't know, Dario's got, uh, what, what cards does he have up his sleeve? You know, they just put out 4.1 Opus, and in that blog post, they said, "We will be releasing more powerful updates to our models in the coming weeks." So they're due for something pretty soon. You know, maybe they'll be the ones to surprise on the upside this time, or maybe Google will be. Um, I wouldn't say 2027 is, is out of the question, but y-yeah, I would say 20, 2030 still looks just as likely as before. And again, from my standpoint, it's like that's still really soon, [chuckles] you know. So if we're on track, whether it's '28, '29, '30, uh, I don't really care. I, I try to frame my own work so that I'm kind of preparing myself and helping other people prepare for what might be the most extreme scenarios and kind of, you know, one of these things where if we aim high and we miss a little bit and we have a little more time, great. I'm sure we'll have plenty of things to do to use that extra time to be ready for, you know, whatever powerful AI does come online. Um, but yeah, I guess the... I don't, uh... My worldview hasn't changed all that much as a result of these, uh, summer's developments.
2. ETErik Torenberg
  Anecdotally, I, I don't hear as much about AI 2027 or situational awareness to the, to the same degree. I, I do talk to some people who've just moved it a, a, a few years back to, to your point. Um, uh, but, um, yeah, D-Darkash had his whole thing around, you know, he still, still believes in it, but sort of, um, you know, maybe because this ga-gap in continual learning or, or, or, or, or, or something to the effect that, um, you know, maybe it's just gonna be a bit slower to, to, to, to diffuse. Um, and, um, you know, METR's paper as, as you mentioned, showed that, uh, engineers are, are less productive, and so maybe there's, there's less of
26:10 – 36:20
Jobs, automation, and the misunderstood METR study
1. ETErik Torenberg
  a, a sort of concern around, um, you know, people being replaced in the next, ne-next few years in, in, in, in, in, in mass. I think w-when we spoke maybe a year ago about this, or I think you said something like 50% of 50% of jobs. Um, I'm curious if that's still your, your, uh, your litmus test or how you think about it.
2. NLNathan Labenz
  Well, for one thing, I think that METR paper is worth unpacking a little bit more because this was one of those things that was... And I, I am a big fan of METR, and I have no, um, you know, no shade on them because I do think do science, publish your results, like that's good. You don't have to, uh, make every experimental result and everything you put out conform to a narrative. But I do think it was a little bit, um, it was a little bit too easy for people who wanted to say that, "Oh, this is all nonsense," to latch onto that. And, you know, again, there is, there's something there that I would kind of put in the Cal Newport category too, where for me, maybe the most interesting thing was the users thought that they were faster, when in fact they seemed to be slower. So that sort of misperception of oneself, I think is really interesting. Personally, I think there's some explanations for that, that include like hitting go on the agent, going to social media and scrolling around for a while, and then coming back. The thing might have been done for quite a while by the time I get back. So honestly, one like really simple, and we're starting to see this in products, one really simple thing that the products can do to a-address those concerns is just provide notifications, [chuckles] like the thing is done now, so, you know, stop scrolling and, and come back and check its work. That in terms of just clock time, you know, it would be interesting to know like what applications did they have open. Maybe they took a little longer with Cursor than doing it on their own, but how m-much of the time was Cursor the active window and how much of it was, you know, some other random distraction while they were waiting? Um, but I think a more fundamental issue with that study, which again, wasn't really about the study design, but just in, in the sort of, you know, interpretation and kind of digestion of it, some of these details got lost. The... They basically tested the models or the, you know, the product Cursor in the area where it was known to be least able to help. This study was done early this year, so it was done with, you know, a kind of one, depending on how you wanna count, right? A couple, couple releases ago with code bases that are large, which again strange the-- strains the context window and, you know, that's one of the, the frontiers that has been moving. Very mature code bases with like high standards for coding and developers who really know their code bases super well, who've made a lot of commit, you know, commits to these particular code bases. So I would say that's basically the hardest situation that you could set up for an AI because the people know, you know, their stuff really well, the AI doesn't, the context is huge. People have already absorbed that through working on it for a long time. The AI doesn't have that, uh, that knowledge, and again, couple generations ago models. Um, and then a big thing too is that the user, the people were not very well-versed in the tools. Why? Because the tools weren't really able to help them yet. I think the sort of mindset of the people that came into the study in many cases was like, "Well, I haven't used this all that much because it hasn't really seemed to be super helpful." They weren't wrong in that assessment, given the, you know, the limitations. And you could see that in terms of the, um, some of the instructions and the help that the METR team gave to people. One of the things that is in the paper that they would... If they noticed that you weren't like, weren't using Cursor super well, they would give you some feedback on how to use it better. One of the things that they were telling people to do is make sure you @ tag a particular file to bring that into context for the model so that the model has, you know, the right context. And that's literally like the most basic thing that you would do in Cursor. You know, that's like the thing you would learn on your f- in your first hour or your first day of using it. So it really does suggest that these wereYou know, while very capable programmers like basically mostly novices when it came to using the AI tools. So I think the result is real, um, but I just, I would be very cautious about generalizing too much there. In terms of, I guess, what, what else, what, what was the other question? It-- What is the expectation for jobs? I mean-
3. ETErik Torenberg
  Yeah
4. NLNathan Labenz
  ... we're starting to see some of this, right? We are definitely seeing no less than like Marc Benioff has said that they've, you know, have been able to cut a bunch of headcount because they've got AI agents now that are responding to every lead. Um, Klarna, of course, is, you know, has said, um, you know, very similar things for a while now. They also, I think, have been a little bit misreported in terms of like, oh, they're backtracking off of that because they're, they're actually gonna keep some customer service people, not none. And I think that's a bit of an overreaction. Like, they may have some people who are just, you know, insistent on having a certain experience, and maybe they wanna provide that, and that makes sense. You know, it doesn't... I think you can have a, a, a spectrum of service offerings to your customers. [chuckles] I once coded up a pricing page for a sa-... I actually just vibe coded up a pricing page for a SaaS company that was like basic, uh, level with AI sales and service is one price. If you wanna talk to human sales, that's a higher price, and if you wanna talk to human sales and support, that's a, you know, third higher, higher price. And so like literally that might be what's going on, I think, in some of these cases, and it, it could very well be a very sensible option for people. But I just-- I do see the i-Intercom, I've got an episode coming up with, they now have this Fin agent that is solving like sixty-five percent of customer service tickets that come in. So, you know, what's that gonna do to jobs? Are there really like three times as many customer service tickets to be handled? Like, I don't know. I think there's kind of a relatively inelastic supply. Maybe you'll get somewhat more tickets if people expect that they're gonna get better, faster answers, but I don't think we're gonna see like three times more tickets. By the way, that number was like fifty-five percent three or four months ago. So, you know, as they ratchet that up, the ratios get really hard, right? At, at half ticket resolution, in theory, maybe you get some more tickets, maybe you don't need to adjust headcount too much. But when you get to ninety percent ticket resolution, you know, are you really gonna have ten times as many tickets or ten times as many hard tickets that the people have to handle? It seems just really hard to imagine that. So I don't think, I don't think these things go to zero probably in a lot of environments, but I do expect that you will see significant headcount reduction in a lot of these places. And the software one is really interesting because the elasticities are really unknown. You know, you can potentially produce X times more software per user or, you know, per, per cursor user or per developer at your company, whatever. But maybe you want that. You know, maybe there is no limit or no, you know, maybe the, the regime that we're in is such that if there's, you know, ten times more productivity, that's all to the good, and, you know, we still have just as many, uh, jobs because we want ten times more software. I don't know how long that lasts. Again, the ratios start to get challenging at some point. Um, but yeah, I think the bottle... You know, the old Tyler Cowen thing comes to mind, "You are a bottleneck. You are a bottleneck." Um, I think more often it is, are people really trying to get the most out of these things and, you know, are they using best practices, and have they, um, have they really put their minds to it or not? And, you know, often the, the real barrier is there. I was-- I've been working a little bit with a company that is doing, um, basically government doc review. I'll abstract a little bit away from the details. Really gnarly stuff like scanned documents, you know, handwritten, uh, filling out of forms, and they've created this auditor AI agent that just won a state-level contract to do the audits on like a million transactions a year of, of these, um, you know, these packets of documents, again, scanned, handwritten, all this kinda crap. Uh, and they just blew away the human, uh, workers that were doing the job before. So where are those workers gonna go? Like, I don't know. I don't... They're not gonna have ten times as many transactions. You know, I can be pretty confident in that. Um, are there gonna be a few still that are there to supervise the AIs and handle the weird cases and, you know, answer the phones? Sure. Um, maybe they, maybe they won't go anywhere. You know, the, the state, you know, the state may do a, a strange thing and, uh, just have all those people like sit around 'cause they can't bear to fire them. Like, who knows what the ultimate decision will be. But I do see a lot of these things where I'm just like, when you really put your mind to it and you identify what would create real leverage for us, can the AI do that? Can we make it work? You can take a pretty large chunk out of high volume tasks, uh, very reliably in today's world. And so the, the impacts I think are starting to be seen there on, on a lot of jobs. Humans I think are, you know, the leadership is maybe the bottleneck or the, the will in, in a lot of places might be the bottleneck, and software might be an interesting case where there is just so much pent-up demand perhaps that it may take a little longer to see those impacts-
5. ETErik Torenberg
  Yeah
6. NLNathan Labenz
  ... because you really do want, you know, ten or a hundred times as much software.
7. ETErik Torenberg
  What is in... Yeah. Let's, let's talk about code 'cause it's, you know, it's, it's where Anthropic made, made a big bet, um, ear-early on, you know, per-perhaps inspired by the sort of automated researcher, you know, recursive self-improvement, um, you know, sort of, uh, uto-
36:20 – 51:15
The future of coding, agents, and recursive self-improvement
1. ETErik Torenberg
  you know, desired future, and then, and we saw OpenAI, uh, make, make moves, um, th-there as well. So, uh, look, why don't we flush that out or, or talk a little about w-w-you know, what, what inspired that and where you see that going?
2. NLNathan Labenz
  You know, utopia or dystopia is really the big question there, I think, right? I mean, is maybe one part technical, two parts social in terms of why code has been so focal. The technical part is that it's really easy to validate code. You generate it, you can run it. If you get a runtime error, you can get the feedback immediately. It's, you know, somewhat harder to do functional testing. Replit recently, just in the last, like, 48 hours, released their V3 of their agent, and it now, in addition to, you know, code, code, code, try to make your app work, V2 of the agent would do that, and it could go for minutes and, you know, in some cases generate dozens of files. And I've had some magical experiences with that where I was like, "Wow, you just did that whole thing in one prompt, and it, like, worked amazing." Other times, it will sort of code for a while and hand it off to you and say, "Okay, does it look good? Is it working?" And you're like, "No, it's not. I'm not sure why." You know, you get into a, a back and forth with it. But the difference between V2 and V3 is that instead of handing the baton back to you, it now uses a browser and the vision aspect of the models to go try to do the QA itself. So it doesn't just say, "Okay, hey, I, uh, I tried my best, wrote a bunch of code. Like, let me know if it's working or not." It takes that first pass at figuring out if it's working. And, you know, again, that, that really improves the flywheel, just how much you can do, how much you can validate, how quickly you can validate it. The, the speed of that loop is really key to the pace of improvement. So it's a problem space that's pretty amenable to the sorts of, you know, rapid flywheel techniques. Second, of course, they, they're all coders, right, at these places, so they want to, you know, solve their own problems. That's, like, very natural. And third, I do think on the, you know, sort of social vision competition, uh, who knows where this is all going, they do wanna create the automated AI researcher. That's another data point, by the way, from... This was from the o3 system card. They showed a jump from, like, low to mid-single digits to roughly 40% of PRs actually checked in by research engineers at OpenAI that the model could do. So prior to o3, not much at all, you know, low to mid-single digits. As of o3, 40%. I'm sure those are the easier 40% or whatever. Again, there will be, you know, caveats to that. But that's-- you're entering maybe the steep part of the S curve there, and that's presumably pretty high-end. You know, I, I don't know how many easy problems they have at OpenAI, but presumably, you know, not that many relative to the rest of us that are out here making, uh, generic web apps all the time. So, you know, at 40%, you gotta be starting to, I would think, get into some pretty hard tasks, some pretty high-value stuff. You know, at that-- at what point does that ratio really start to tip where the AI is, like, doing the bulk of the work? Um, GPT-5 notably wasn't a big update over o3 on that particular measure. I mean, it also wasn't going back to the simple QA thing. Um, GPT-5 is generally understood to not be a scale-up relative to 4o and o3, and you can see that in the simple QA measure. It basically scores the same on these long-tail trivia questions. It's not a bigger model that has absorbed, like, lots more world knowledge. Um, it is, it is, you know... Cal is right, I think, in his analysis that it's, it's post-training, um, but that post-training, you know, is potentially entering the steep part of the S curve when it comes to the ability to do even the kind of hard problems that are happening at, uh, at OpenAI on the, on the research engineering front. And, you know, yikes. I, I'm a little worried about that, honestly. The, um, the idea that we could go from these companies having a few hundred research engineer people to having, you know, unlimited overnight, and, like, what would that mean in terms of how much things could change and also just our ability to steer that overall process? Um, I'm not super comfortable with the idea of the companies tipping into a recursive self-improvement regime, especially given the, the level of control and the level of unpredictability that we currently see in the models. But that does seem to be what they are going for. So in terms of, like, why, um, I think this has been the plan for quite some time. Even, you remember that leaked Anthropic, uh, fundraising deck from maybe two years ago, where they said that in 2025 and 2026, the companies that train the best models will get so far ahead that nobody else will be able to catch up? I think that's kinda what they meant. I think that they were projecting then that in the '25, '26 timeframe, they'd get this, like, automated researcher, and once you have that, how's anybody, you know, who doesn't have that gonna catch up with you? Um, now, obviously, some of that remains to be validated, but, um, I do think they have been pretty intent on that for a long time.
3. ETErik Torenberg
  F-five years from now, are there m-more engineers or fewer engineers?
4. NLNathan Labenz
  I t- I tend to think less. Um, you know, already if I just think about my own life and work, I'm like, would I rather have a model, or would I rather have, like, a junior marketer? I'm pretty sure I'd rather have the model. Would I rather have the models or a junior engineer? I think I'd probably rather have the models in a lot of cases. I mean, it obviously depends on, you know, the exact person you're talking about. Um, but truly forced choice today... Now, that-- and then you've got cost adjustment as well, right? I'm not spending nearly as much on my Cursor subscription as I would be on a, you know, an actual human engineer. So even if they have some advantages... You know, and I, I also have not scaffolded, um, I haven't gone full co-scientist, right, on my, uh, Cursor problems. I think that, that's another interestingYou start to see why folks like Sam Altman are so focused on questions like energy and the sev-seven trillion dollar build-out because these power law things are weird, and, you know, to get incremental performance for 10X the cost is weird. It's a, it's a-- it's definitely not the kind of thing that we're used to dealing with. But for many things, it might be worth it, and it still might be cheaper than the human alternative. You know, if it's like, well, Cursor costs me, whatever, 40 bucks a month or something, uh, would I pay $400 for, you know, however much better? Yeah, probably. Would I pay $4,000 for however much better? Well, it's still, you know, a lot less than a full-time human engineer. Um, and the costs are obviously coming down dramatically too, right? That's another huge thing. GPT-4 was way more expensive. It's like 90, uh... It's like a 95% discount from GPT-4 to GPT-5. That's, you know, no small thing, right? I mean, it's... Apple's app was a little bit hard because the chain of thought does spit out a lot more tokens, and so you get, you give back a little. On a per token basis, it's dramatically cheaper. More tokens generated, um, you know, does, does eat back into some of that savings. But everybody seems to expect the trends will continue in terms of prices continuing to fall, and so, you know, how many more of these, like, price reductions do you have to, to then be able to, you know, do the power law thing a few more times? I guess I think, yeah, I think I, I think less. Um, and I, I think that's probably true even if we don't get, like, full-blown AGI that's, you know, better than humans at everything. I think you could easily imagine a situation where of however many million people are currently employed as professional software developers, some top tier of them that do the hardest things can't be replaced. But there's not that many of those, you know. They-- and the, the real, like, rank and file, you know, the people that over the last 20 years were told, "Learn to code, you know. That'll be your thing." Like, the people that are the really top, top people didn't need to be told to learn to code, right? They just... It was their thing. They had a passion for it. They were amazing at it. Um, we may not-- it wouldn't, wouldn't shock me if we, like, still can't replace those people in three, four, five years' time. But I would be very surprised if you can't get your nuts and bolts web app, mobile app type things spit out for you-
5. ETErik Torenberg
  Yeah
6. NLNathan Labenz
  ...for far less, um, and far faster than, and probably honestly with significantly higher quality and, and less back and forth, um, with an AI system than, you know, with your kind of middle-of-the-pack developer, um, in that timeframe.
7. ETErik Torenberg
  O-o-one thing I do wanna call out, you know, there are definitely people have concerns about progress moving too fast, but there's also concern, and maybe it's rising, about progress not moving fast enough in the sense that, um, you know, a third of the stock market is, is Mag 7. Um, you know, AI CapEx is, you know, over 1% of GDP, and so we are kind of relying on some of this progress in order to, uh, sort of sustain our, sustain our economy.
8. NLNathan Labenz
  Yeah. And with the, um... You know, another thing that I would say has been slower to materialize than I would've expected are AI culture wars or s- you know, sort of the, the ramping up of protectionism of various industries. We just saw, um, Josh Hawley, I don't know if he introduced a bill or just said he intends to introduce a bill to ban self-driving cars nationwide.
9. ETErik Torenberg
  Wow.
10. NLNathan Labenz
  Um, you know, God help me.
11. ETErik Torenberg
  [chuckles]
12. NLNathan Labenz
  Uh, I've dreamed of self-driving cars since I was a little kid, truly. Like, sitting at red lights, I used to be like, "There's gotta be a way that this could be automated."
13. ETErik Torenberg
  Did we-- I think we took a Waymo together.
14. NLNathan Labenz
  [chuckles] Yeah, and it, it's so good. Um, and the safety, you know, no, I think whatever people wanna argue about jobs, it's gonna be pretty hard to say-
15. ETErik Torenberg
  Right
16. NLNathan Labenz
  ...30,000 Americans should die every year, uh, so that people's incomes don't get disrupted. It seems like you have to be able to get over that hump and say, like, the, you know, saving all these lives, if nothing else-
17. ETErik Torenberg
  Yeah
18. NLNathan Labenz
  ...is just really hard to, uh, to argue against. But we'll see. You know, I mean, he's, uh, not, uh, without influence, obviously. So yeah, I mean, I am, uh, very much on team abundance, and, you know, my old mantra, I've been saying this less lately, but adoption accelerationist, hyperscaling pauser. The tech that we have, you know, could do so, so much for us even as is. I think if, if progress stopped today, I still think we could get to 50 to 80% of work automated over the next, like, five to 10 years. It would be a real slog. You'd have a lot of, you know, co-scientist type breakdowns of complicated tasks to do. You'd have a lot of work to do to go sit and watch people and say, "Why are you doing it this way? What's going on here? What's-
19. ETErik Torenberg
  Yeah
20. NLNathan Labenz
  ...this-- You handled this one differently. Why did you handle that one differently?" All this, uh, tacit knowledge that people have and the kind of knowhow, procedural, um, you know, just instincts that they've developed over time, those are not documented anywhere. They're not in the training data, so the AIs haven't had a chance to learn them. But again, no, uh, if I s- when I say, like, no breakthroughs, I, I still am allowing therefore like, you know, fine-tuning of things to just, like... Capabilities that we have that haven't been applied to particular problems yet.
21. ETErik Torenberg
  Right.
22. NLNathan Labenz
  Um, so just going through the economy and, and just sitting with people and being like, "Why are you doing this?" You know, "Let's, let's document this. Let's get the, you know, the model to learn your particular niche thing." Um, that would be a real slog, and in some ways, I kinda wish that were the future that we were gonna get, uh, because it would be a methodical, you know, kind of one step, one foot in front of the other, you know, no quantum leaps. Like, it would probably feel pretty manageable, I would think, in terms of the-Pace of change, hopefully society could, you know, could absorb that and kind of adapt to it as we go without, you know, one day to the next like, oh my God, you know, all the drivers, you know, are, are getting replaced or that one will be a little slower because you do have to have the actual physical build out. Um, but in some of these things, you know, customer service could get ramped down real fast, right? Like if a call center has something that they can just drop in and it's like this thing now answers the phones and talks like a human and has a higher success rate and scales up and down. Um, one thing we've seen at Waymark, small company, right? We've always prided ourselves on customer service. We do a really good job with it. Our customers really love our customer success team. But I looked at our intercom data, and it takes us like half an hour to resolve tickets. Uh, we respond really fast. We respond in like under two minutes most of the time. But when we respond, you know, two minutes is still long enough that the person has gone on to do something else, right? It's the same thing as with the cursor thing that we were talking about earlier, right? They've tabbed over to something else. So now we get the response back in two minutes, but they are doing something else. So then they come back at, you know, minute six or whatever, then they respond. But now our person has gone and done something else. So the resolution time, even for like simple stuff, can be easily a half an hour, and the AI, you know, it just responds instantly, right? So you, you don't have to have that kind of back and forth. You're just in and out. So I do think some of these categories could be really fast changes. Um, others will be slower. But yeah, I mean, I kinda wish we had that, um, I kinda wish we had that slower path in front of us. My best guess, though, is that we will probably continue to see things that will be significant leaps and that there will be like actual disruption. Another one that's come to mind recently, you know, maybe we can get the abundance department on these new antibiotics. Have you seen this, uh, development?
23. ETErik Torenberg
  No. Tell us about it.
24. NLNathan Labenz
  I mean, it's not a language model. I think that's another thing people really underappreciate or that, you know, you could kind of look back at GPT-4 to 5 and then imagine a pretty easy extension of that. So GPT-4, initially when it launched, the-- we didn't have image, um, understanding capability. They did demo it at the time of the launch, but it wasn't released for some months later. The first version that we had could understand images, could do a pretty good job of understanding images, still with like jagged capabilities and whatever. Um,
51:15 – 1:27:00
Beyond chatbots: multimodal AI and robotics
1. NLNathan Labenz
  now with the new NanoBanana from Google, you have this like basically Photoshop-level ability to just say, "Hey, take this thumbnail." Like we could take our two, uh, feeds right now, you know, take a snapshot of you, a snapshot of me, put them both into NanoBanana and say, "Generate the thumbnail for the YouTube preview featuring these two guys. Put them in the same place, same background, whatever." It'll mash that up. You can even have it, you know, put text on top, progress since GPT-4, whatever we wanna call it. Um, GPT-5 is not a bust. Uh, and it'll spit that out. And you see that it has this deeply integrated understanding that bridges language and image, and that's something that it can take in, but now it's also something it can put out as all, as part of one core model with like a single unified intelligence. That I think is gonna come to a lot of other things. Um, we're at the point now with these biology models and material science models where they're kind of like the image generation models of a couple years ago. They can take a real simple prompt, and they can do a generation. But they're not deeply integrated where you can have like a true conversation back and forth, um, and have that kind of unified understanding that bridges language and these other modalities. But even so, it's been enough for this group at MIT to use some of these relatively, you know, narrow purpose-built biology models and create totally new antibiotics, new in the sense that they have a new mechanism of action, like they're, they're affecting the bacteria in a new way. And, uh, notably, they, they do work on, um, antibiotic-resistant bacteria. This is some of the first new antibiotics we've had in a long time. Now they're gonna have to go through... You know, when I say the, get the abundance department on it, it's like, where's my Operation Warp Speed for these new antibiotics, right? Like, we've got people dying in hospitals from drug-resistant strains all the time. Um, why is nobody, you know, crying about this? I think one of the things that's happening to our society in general is just so many things are happening at once. It's kind of the, it's like the flood the zone thing, except like there's so many AI developments flooding the zone that nobody can even keep up with all of those. And that's, that's come from me, by the way, too. I would say two years ago, I was like pretty in command of all the news, then a year ago, I was starting to lose it, and now I'm like, "Wait a second, there was new antibiotics developed?" You know, and I'm kind of, um, missing things, you know, just like everybody else, despite my best efforts. But key point there is AI is not synonymous with language models. There are AIs being developed with pretty similar architectures for a wide range of different modalities. We have seen this play out with text and image, where you had your text-only models, and you had your image-only models, and then they started to come together, and now they've come really deeply together. And so I think you're gonna see that across a lot of other modalities over time as well. Um, and there's a lot more data there. You know, we might... I, I don't know what it means to like run out of data, um, in the reinforcement learning paradigm. There's always more problems, right? There's always some-something to go figure out. There's always something to go engineer. The feedback is starting to come from reality, right? That was one of the things Elon talked about on the Grok 4 launch was like, maybe we're running out of problems we've already solvedAnd, you know, we only have so much of those sitting around in inventory. We only have one internet, you know, we only have so much of that stuff. But over at Tesla, over at SpaceX, like we're solving hard engineering problems on a daily basis, and they seem to be never-ending. So when we start to give the next generation of the model these power tools, the same power tools that the professional engineers are using at those companies to solve those problems, and the AIs start to learn those tools, and they start to solve un- previously unsolved engineering problems, like that's gonna be a really powerful signal that they will be able to learn from. And now again, fold in th-those other modalities, right? The ability to have sort of a sixth sense for, you know, the space of material science possibilities. When you can bridge or, or unify the understanding of language and those other things, I think you start to have something that looks kind of like super intelligence, even if it's like not able to, you know, write poetry at a, a superhuman level necessarily. Its ability to see in these other spaces is going to be truly a superhuman, uh, thing that I think will be pretty hard to miss.
2. ETErik Torenberg
  You said that that was one thing that Cal's analysis missed is just the lack of appreciation for non-language modalities and, and how they drive in some of the innovations that you're talking about.
3. NLNathan Labenz
  Yeah. I think people are often just kind of equating the chatbot experience with AI broadly.
4. ETErik Torenberg
  Yeah.
5. NLNathan Labenz
  And, you know, the, that, that, uh, conflation will not last probably too much longer because we are gonna see self-driving cars unless they get banned. Um, and that's a, you know, very different kind of thing. And talk about your impact on jobs too, right? It's like, what, four or five million professional drivers in the United States? Um, that is a big, that is a big deal. I don't think most of those folks are gonna be super keen to learn to code, and even if they do learn to code, [chuckles] you know, I'm not sure how long that's gonna last. So that's gonna be a disruption. And then general robotics is like not that far behind. You know, the... And this is one area where I do think China might be actually ahead of the United States right now. But regardless of whether that's true or not, you know, these robots are getting really quite good, right? They can like walk over all these obstacles. I mean, these are things that a few years ago they just couldn't do at all. You know, they j- they could barely balance themselves and walk a few steps under ideal conditions. Now you've got things that you can like literally do a flying kick [chuckles] and it'll like absorb your kick and shrug it off and just keep going. Uh, you know, right itself and, and, uh, continue on its way. Super rocky, you know, uneven terrain, all these sorts of things are getting quite good. Um, you know, the same thing is working everywhere. I think one of the, the other thing that's kind of... There's always a lot of detail to the work, so it's, it's a sort of inside view, outside view, right? Inside view, you're like, there's always this minutia, there's always, you know, these problems that we had and things we had to solve. But you zoom out and it looks to me like the same basic pattern is working everywhere. And that is like if we can just gather enough data to do some pre-training, you know, some kind of raw, rough, you know, not very useful, but just enough at least to kind of get us going, then we're in the game. And then once we're in the game, now we can do this flywheel thing of like, you know, rejection sampling, like have it try a bunch of times, take the ones where it succeeded, you know, re-fine-tune on that. The RLHF, you know, feedback, the, the sort of preference, take two, which one was better, fine, you know, fine-tune on that. The reinforcement learning. All these techniques that have been developed over the last few years, it seems to me they're absolutely gonna apply to a problem like a humanoid robot as well. And that's not to say there won't be a, you know, lot of work to figure out exactly how to do that. But I think the big difference between language and robotics is really mostly that there just wasn't a huge repository of data to train the robots on at first. And so you had to do a lot of hard engineering to make it work at all, you know, to even stand up, right? You had to have all these control systems and whatever, um, 'cause there was nothing for them to learn from in the way that the language models could learn from the internet. But now that they're working at least a little bit, you know, I think all these kind of refinement techniques are gonna work. It'll be interesting to see if they can get the error rate low enough that I'll actually like allow one in my house around my kids. Um, you know, that they'll probably be, um, better deployed in like factory settings first, more controlled environments than, uh, the chaos of my house as you, you know, have seen in this, uh, in this recording. But I do think they're gonna, they're gonna work.
6. ETErik Torenberg
  What's the state of agents m-more, more broadly, uh, at, at the moment? Where, where do you-- how do you see things playing out? Where does it go?
7. NLNathan Labenz
  Well, broadly, I think, you know, we're-- it's the, the task length story from METR of the, you know, every seven months or every four months doubling time. We're at two hours-ish with GPT-5. Replit just said their new Agent V3 can go 200 minutes. That-- If that's true, that would even be a new, you know, high point on the, um, on that graph. Again, it's a little bit sort of apples to oranges 'cause they've done a lot of scaffolding. How much have they broken it down? Like, how much scaffolding are you allowed to do, you know, with these things before you sort of are off of their chart and onto maybe a different chart? But if you extrapolate that out a bit and you're like, okay, take, take the four-month case just to be a little aggressive, um, that's three doublings a year. That's eight X task length increase per year. That would mean you go from two hours now to two days in one year from now. And then if you do another eight X on top of that, you're looking at basically, say, two days to two weeks of work in two years. That would be a big deal, you know, to say the least, if you could delegate an AI two weeks worth of work and have it do, uh, you know, even half the time, right? The METR thing isThat they will succeed half the time on tasks of that size. But if you could take a two-week task and have a 50% chance that an AI would be able to do it, even if it did cost you a couple hundred bucks, right? It's like, well, that's again, a lot less than it would cost to hire a human to do it. Um, and it's all on demand. It's kind of, you know, it's immediately available. Um, if I'm not using it, I'm not paying anything. Transaction costs are just like a lot lower, the whole, you know, many, many other aspects are favorable for the AI there. So, you know, that would suggest that you'll see a huge amount of automation in, in all kinds of different places. The other thing that I'm watching, though, is the reinforcement learning does seem to bring about a lot of bad behaviors. Rein-- um, reward hacking being one. You know, the, the any sort of gap between what you are rewarding the model for and what you really want can become a big issue. Um, we've seen this in coding in many cases where the AI will, Claude is like notorious for this, will put out a unit test that always passes, you know, that just has like return true in the unit test. Why is it doing that? [chuckles] Like, well, uh, it must have learned that what we want is for unit tests that pass, and we want it to pass unit tests. But we didn't mean to write fake unit tests that always pass, but that technically did, you know, satisfy the reward condition. And so we're seeing those kinda weird behaviors. With that comes this like scheming kinda stuff. We, we don't really have a great handle on that yet. There is also situational awareness that seems to be on the rise, right? Where the models are like increasingly in their chain of thought, you're seeing things like, "This seems like I'm being tested." Um, you know, maybe I should be conscious of what my tester is really looking for here, and that makes it hard to evaluate models in tests because you don't know if they're actually gonna behave the same way when they're out in the real world. So those, you know... I wouldn't say there's a high-level, uh, or high confidence prediction, but like one model of the future I've been playing with is the task length keeps doubling, while at the same time these weird behaviors pop up and then are suppressed. And we have seen in the Claude 4 and in the GPT-5 system cards, Claude 4 reported, I think, a two-thirds reduction in, um, reward hacking and, you know, in GPT-5 they reported a few different dimensions but, you know, say something similar, um, reduction in deceptive behavior. But those behaviors kind of just emerged, so it's sort of like weird behavior emerges, then they sort of figure out how to tamp it down, but not entirely. Presumably in the next generation, they'll like tamp it down some more, but maybe some new additional weird behavior could emerge, and then they'll have to kinda tamp that one down. All the while, the, the tasks are expanding in scope, you know, with, uh, every four months doubling. So you could end up in a world where you can delegate really like major things to AIs, but there's some small but not necessarily totally vanishing chance that it like actively screws you over [chuckles] in the way that it is trying to do that task. And so you have a, you know, maybe it's like, okay, I think here I'm about to get two weeks worth of work done for 100 bucks. Okay, that's amazing. But there's also a one in 10,000 chance that it like legitimately, you know, attacks me in a, in a like meaningful way. Some of the things that we have seen, these are like fairly famous at this point, but in the Claude 4 system card, they reported blackmailing of the human. The, the setup was that the AI had access to the engineer's email, and they told the AI that it was gonna be like replaced with a, you know, a less ethical version or something like that. It didn't want that, and it found in the engineer's email that the engineer was having an affair, so it started to blackmail the engineer-
8. SPSpeaker
  Oh my God
9. NLNathan Labenz
  ... to, so as to avoid being replaced with a less ethical version. People, I think, are way too quick, in my view, to move past these, uh, anecdotes. People are sort of often like, "Well, you know, they set it up that way, and, you know, that's not really realistic." But another one was whistleblowing. You know, there was another thing where they sort of set up this dynamic where there was some, you know, unethical, illegal behavior going on. And again, the model had access to this data, and it decided to just email the FBI and, and tell, uh, the FBI about it. So first of all, I don't think we really know what we want. You know, to some degree, maybe you do want AIs to report certain things to authorities. Um, that could be one way to think about the bioweapon risk. You know, it's like not only should the models refuse, but maybe they should report you to the authorities if you're actively trying to create a bioweapon. Um, I certainly don't want them to be doing that too much. I don't wanna live under the, you know, surveillance of, um, Claude 5 that's always gonna be, you know, threatening to turn me in. But I do sort of want some people to be turned in if they're doing sufficiently bad things. We don't have a good resolution society-wide on, you know, what we want the models to even do in those situations. Um, and I think it's also, you know, it's like, yes, it was set up, yes, it was research, but it's a big world out there, right? We got a billion users already on these things, and we're plugging them into our email. So they're gonna have very deep access to information about us. You know, I don't know what you've been doing in your email. I don't... I hope there's nothing too crazy in mine, but like now I gotta think about it a little bit, right? What, what did I... Have I ever done anything that I, you know, geez, I don't know. Um, or, or even that it could misconstrue, right? Like it's obviously not, um, maybe I didn't even really do anything that bad, but it just misunderstands what exactly was going on. SoThat could be a weird, you know, if there's one thing that could kind of stop the agent momentum, in my view, it could be like the 1 in 10,000 or whatever, you know, we ultimately kind of push the, the really bad behaviors down to, is maybe still just so spooky to people that they're like, "I can't deal with that." You know? And that might be hard to resolve. So, well, you know, what happens then? Um, you know, it's hard to check two weeks' worth of work every couple hours or whatever, right? Like, that's part of where the, where the whole... Then you bring another AI in to check it. You know, that's again, where you start to get to the, now I see why we need more electricity and, and $7 trillion of build-out is yikes, you know. They're gonna be producing so much stuff I can't possibly even review it all. [chuckles] I need to rely on another, uh, AI to help me do the review of the first AI to make sure that if it is trying to screw me over, you know, somebody's catching it. I can't monitor that myself. I think Redwood Research is doing some really interesting stuff like this, where they are trying to get systematic on like, okay, let's just assume this is quite a different, quite a departure from the traditional AI safety work, where the, you know, the big idea traditionally was, let's figure out how to align the models, make them safe, you know, make them not do bad things. Great. Redwood Research has taken the other angle, which is, let's assume that they're gonna do bad stuff. They're gonna be out to get us at times. Um, how can we still work with them and get productive output and, you know, get value without, you know, uh, fixing all those problems? And that involves like, again, all these sort of AIs supervising other AIs and, um, crypto might have a place to, to-- a role to play in this. Um, another episode coming out soon with Illia Polosukhin, who's the founder of NEAR. Really fascinating guy, 'cause he was one of the eight-
10. ETErik Torenberg
  Yeah. Interesting
11. NLNathan Labenz
  ... authors of the Attention Is All You Need paper, and then he started this NEAR company. It was originally an AI company. They took a huge detour into crypto because they were trying to hire task workers around the world and couldn't figure out how to pay them. So they were like, "This sucks so bad to pay these task workers in all these different countries that we're trying to get data from, that we're gonna pivot into a whole blockchain, uh, side quest." Now they're coming back to the AI thing, and their, their tagline is, "The blockchain for AI."
12. ETErik Torenberg
  Mm.
13. NLNathan Labenz
  And so you might be able to get, you know, a certain amount of control from, you know, the, the sort of crypto security that the, the blockchain-type technology can provide. But I could see a scenario where the, the, these, the bad behaviors just become so costly when they do happen that people kinda get spooked away from using the frontier capabilities in terms of just like how much, you know, work the, the AIs can do. But that wouldn't be a, that wouldn't be a pure capability stall out. It would be a, we can't solve, you know, some of the long tail safety issues-
14. ETErik Torenberg
  Yeah
15. NLNathan Labenz
  ... challenge. And, you know, that, if that is the case, then, you know, that'll be, um, that'll be an important [chuckles] fact about the world too.
16. ETErik Torenberg
  Yeah.
17. NLNathan Labenz
  I, I always... Nobody ever seems to solve any of these things like 100%, right? They, they always, every, every generation it's like, "Well, we reduced hallucinations by 70%. Oh, we reduced deception by two-thirds. We reduced, um, you know, scheming or, or whatever by however much." But it's always still there, you know? And it's, and if you take the even, you know, lower rate and you multiply it by a billion users and thousands of queries a month, and agents running in the background and processing all your emails and, you know, all the deep access that people sort of envision them happening, it could be a pretty weird world where there's just this sort of negative lottery of like AI accidents. Um, another episode coming up is with the AI Underwriting Company, and they are trying to bring the insurance industry and all the, you know, the wherewithal that's been developed there to price risk, figure out how to, you know, create standards. You know, what can we allow? What sort of guardrails do we have to have to be able to insure this kind of thing in the first place? Um, so that, that'll be another really interesting area to watch is like, can we sort of financialize those risks, um, in the same way we have, you know, with car accidents and all, all these other mundane things. But the, the space of car accidents is only so big. The space of weird things that AIs might do to you, um, you know, as they have weeks' worth of runway, is much bigger. And so it's, it's gonna be a hard challenge, but, you know, people are, people are work-- We got some of our best people working on it.
18. ETErik Torenberg
  What, what do you make of the claim that 80% of AI startups have Chinese open models? Um, and what do you make of the claim and, and the impli- uh, so implications?
19. NLNathan Labenz
  I think that may be true. That probably is true with the one caveat that it is only measuring companies that are using open source models at all. I think most companies are not using open source models, and I would guess, you know, the vast majority of tokens being processed by American AI startups are... They're, they're API calls, right? To, to just the usual suspects. Um, so weighted by actual usage, I would say still the majority, as far as I could tell, would be going to commercial models. Um, for those that are using open source, I do think it's true that the Chinese models have become the best. Um, you know, the American bench there was always kinda thin, right? It was basically Meta that was willing to put in huge amounts of money and resources and then open source it. You've got, you know, um, Paul Allen-funded group, the Allen Institute for AI, AI2. Um-You know, they're, they're doing good stuff too, but they don't have pre-training resources. So they do, you know, really good post-training and, and open source their recipes and all that kind of stuff. So it's not like American open source is bad, you know. And again, it's a-- the ti-- the-- this is another way in which I think you can really validate that things are moving quickly, because if you take the best American open source models and you take them back a year, they are probably as good, if not a little better, than anything that we had commercially available at the time. If you... compared to Chinese, you know, they have, I think, uh, surpassed. So there's been like pretty clear change at the frontier. I think that means that the best Chinese models are like pretty clearly better than anything we had a year ago, um, commercial or otherwise. So... Yeah, I mean, that just means like things are moving. I think that's like, hopefully I've, uh, made that case compellingly, but that's another data point that I think makes it hard to... You c-- I don't think you can believe both that, um, the Chinese models are now the best open source models and that AI has stalled out and we haven't seen much progress since GPT-4. Like, those seem to be kind of contradictory notions. Um, I believe the, the one that is wrong is the lack of progress. In terms of what it means, I mean... I don't really know. It's, uh... We're not gonna stop China. Yeah, the, the whole... I've always been a skeptic of the no selling chips to China thing. The... notion originally was like, we're gonna prevent them from doing, you know, some super cutting-edge military applications, and it was like, well, we can't really stop that, um, but we can at least stop them from training frontier models. And then it was like, eh, well, we can't necessarily really stop that, but now we can, you know, at least keep them from like having tons of AI agents. We'll, we'll have like way more AI agents than they do. And I don't love that line of thinking really at all. Um, but one upshot of it potentially is they just don't have enough compute available to provide inference as a service, you know, to the rest of the world. So instead, the best they can do is just say, "Okay, well, we'll train these things, and, you know, you can figure it out. Here, h- here you go, like, have at it." Um, it's kind of a soft power play, presumably. Um, I did an episode with, uh, Anjana from a16z, who I, I thought really did a great job of providing the perspective of what I, what I've started calling countries three through 193. If the US and China are one and two, three through-- there's a big gap. You know, there's like, I think the US is still ahead, but not by that much in terms of research and, you know, ideas relative to China. We do have this compute advantage, and that does seem like it matters. One of the upshots may be that they're op- open sourcing, and countries three through 93 are like, or three through 193 are significantly behind. Um, so for them, it's a way to, you know, try to bring more countries over to the Chinese camp, potentially in the US-China rivalry. It seems like the model everybody-- And I don't like this at all. I, I, I don't like technology decoupling. Um, as somebody who worries about, you know, who's the real other here? I always say the, the real other are the AIs, not the Chinese. So if we do end up in a situation where, yeah, it's like, you know, we're seeing some crazy things, it would be really nice if we were on basically the same technology paradigm to the degree that we really decouple and, you know, not just the chips are different, but maybe the ideas start to become very different. Publishing gets shut down, you know, t-tech, tech trees evolve and k- and kind of grow apart. Um, that to me seems like a recipe for... You know, it's harder to know what the other side has. It's harder to trust one another. It seems to feed into the arms race dynamic, which I do think would, you know, is, is a real, uh, existential risk factor. I would hate to see us, you know, create another sort of mad type dynamic where we all live under the threat of AI destruction, um, but that very well could happen. And so, yeah, I don't know. I, I, I do kind of, um, have some sympathy for the recent decision that the administration made to be willing to sell the H20s to China, and then it was funny that they turned around and rejected them, which to me seemed like a mistake. Uh, I don't know why they would be rejecting them. If I were them, I would buy them. Um, and I would maybe, I would maybe sell inference on the models that I've just been, uh, creating, and I would try to make my money back doing that. But in the meantime, they can at least, you know, demonstrate the greatness of the Chinese nation by showing that they're not, uh, far behind the frontier, and they can also make a pretty powerful appeal to countries three through 193 and say like, you know, "Look, do you really wanna... Do you see how the US is acting? Uh, in general, you know, you really wanna... They, they cut us off from chips. Uh, they had a, even a long-- You know, the last administration had an even longer list of countries that couldn't get chips. This administration's doing all kinds of crazy stuff. You know, you get 50% tariffs here, there, whatever. Um, how do you know you can really rely on them to continue to provide you AI into the future? Well, you can rely on us. We open source the model. You can have it. Um, so, you know, come work with us and buy our chips, 'cause by the way, our models will, you know, as we mature, they'll be optimized to run on our chips." So I don't know. That's a complicated stuff, a complicated, uh, situation. I do think it's true. I, I don't think the adoption is as high as that 80%. I think that is, you know, within that subset of companies that are doing stuff with open source. We're gonna experiment with that at Waymark, but we-- to, to be honest, we have never done anything with an open source model in our productTo present. Everything we've ever done has been through commercial. Um, at this point, we are going to try doing some reinforcement fine-tuning. We are gonna do that on a Qwen model, I think first. Um, so, you know, that'll put us in that 80%. But I'm guessing that at the end of the day, we'll take that Qwen model, we'll do the reinforcement fine-tuning, and we'll probably get roughly up to as good as, you know, GPT-5 or Claude 4 or whatever, and then we'll say, "Okay, do we really want to have to manage inference ourself? How much are we really gonna save?" And at the end of the day, I would guess we probably are still gonna end up just being like, "Eh, we'll pay a little bit more on a monthly bill basis for one of these frontier models. They're a little bit better maybe still, and, you know, it's operationally a lot easier, and they'll have upgrades," you know? Um, so yeah. I mean, of course, there's regulated industries. There's al- there's a lot of places where, you know, you, you have hard constraints that you just can't get around, and that forces you to do those Chinese thing, uh, Chinese models. Then there's also gonna be the question of like, are there backdoors in them? Um, you know, people have seen the sleeper agents project, where a model was trained to be good up until a certain point of time and, you know, people put the today's date in the system prompt all the time, right? "Today's date is this. You are Claude," you know, "Here you go." So and then that, that's gonna be another kind of thing for people to worry about. Um, and we don't really have a great... There, there have been some studies. Anthropic did a thing where they trained models to have some hidden objectives and then challenged teams to figure out what those hidden objectives were. And with certain interpretability techniques, they were able to figure that stuff out relatively quickly. So you might be able to get enough confidence that you take this open source thing, you know, created by some Chinese company, whatever, and then put it through, you know, some sort of, not exactly audit, 'cause you can't trace exac-exactly what's happening, but some sort of examination, you know, to see, can we detect any hidden goals or any, you know, secret backdoor bad behavior whatevers, and maybe with enough of that kind of work, you could be confident that you don't have it. Um, but the more and more critical this stuff gets, you know, again, going back to that task length doubling, weird behavior, now you gotta add into the mix, what if they intentionally programmed it to do certain bad things under certain, you know, rare circumstances? Um, we're just headed for a really weird future, you know. The-- We've got all these... There's, there's no limit to it. You know, all these things are valid concerns. They often are in direct tension with each other. Um, I don't... I, I'm not one who, uh, you know, wants to see one tech company take over the world by any means, so I, I definitely think we would do really well to have some sort of broader, more buffered ecological-like system where, you know, all the AIs are kind of in some sort of competition, you know, mutual coexistence with each other. But we don't really know what that looks like, and we don't really know, um, you know, we don't really know what an invasive species might look like, you know, when it gets introduced into that very, you know, nascent and as yet like not battle-tested, uh, ecology. So yeah, I don't know. Bottom line, I think the future's gonna be really, really weird.
20. ETErik Torenberg
  Yeah. Well, I, uh, I do wanna close on a, on a uplifting note, so maybe, maybe as a, as a gearing towards closing question, we could get into some areas where we're s- already seeing some, some exciting capabilities emerge and sort of transform the experience, maybe, maybe around education or, or healthcare or any other areas you wanna, you wanna highlight.
21. NLNathan Labenz
  Yeah. It's... Boy, it's all over. Um, one of my mantras is that there's never been a ti- better time to be a motivated learner.
22. ETErik Torenberg
  Yeah.
23. NLNathan Labenz
  So I, I think a lot of these things do have kind of, you know, two sides of the coin. There's the worry that the students are taking the shortcuts and they're, you know, losing the ability to sustain focus and endure cognitive strain. Flip side of that is, as somebody who's fascinated by the intersection of AI and biology, sometimes I want to read a biology paper, and I really don't have the background. An amazing thing to do is turn on voice mode and share your screen with ChatGPT and just go through the paper reading. It's... You don't even have to talk to it most of the time. You're doing your reading. It's watching over your shoulder, and then at any random point you have a question, you can verbally say, "What's this? Why are, why are they talking about that? What's going on with this? What is the role of this particular protein that they're referring to?" or whatever, and it will have the answers for you. So if you really want to learn in a sincere way, you know, the, the things are unbelievably good at helping you do that. Flip side is you can take a lot of shortcuts and, you know, maybe never have to learn stuff. On the biology front, you know, a-again, like we've got multiple of these sort of discovery things happening, the antibiotics one we covered. There was another one that I did another episode on with a, a Stanford professor named James Zhao, who created something called the virtual lab, and basically this was an AI agent that could spin up other AI agents depending on what kind of problem it was given. Then they would go through a deliberative process where you'd have, you know, one expert in one thing would give its take, and they'd, you know, bat it back and forth. There was a critic in there that would criticize, you know, the ideas that had been given. Eventually, they'd synthesize. Then they were also given some of these narrow specialist tools. So you have agents using the AlphaFold type, um, not just AlphaFold, you know, there's a whole b- whole wide, wide, uh, array of those at this point, but using that type of thing to say, "Okay, well, can we simulate, you know, how this would interact with that?" Um, agents are running that loop, and they were able to get thisLanguage model agent with specialized tool system to generate new treatments for novel strains of COVID that had, you know, kind of escaped, um, the previous treatments. Amazing stuff, right? I mean, the flip side of that, of course, is you know, you get the bioweapon risk. So all these things do seem like they're gonna be... Even, even on j-just the abundance front itself, right? Like we may have a world of unlimited professional private drivers, but we don't really have a great plan for what to do with the five million people that are currently doing that work. We may have infinite software, but you know, when-- especially once the five million drivers pile into all the coding boot camps and, you know, get coding jobs, I don't know what we're gonna do with the 10 million people that were coding when, you know, nine million of them become superfluous. So yeah, I don't know. I think we're, we're headed for a weird world. Nobody really knows what it's gonna look like in five years. There was a great moment at, at, um, Google's I/O where they brought up some journalist. I know you-- I know we, uh, we're skeptical of journalists. This is a great moment to, uh... We're going direct, right? This is a great reason or example of why one would wanna do that. They brought up this person to interview Demis and, uh, Sergey Brin, and they-- the guy asked like, "What is search gonna look like in five years?" And Sergey Brin like almost spit out his coffee on the, on the stage and was like, "Search? We don't know what the world is gonna look like in five years." So I think that's really true. Like the biggest risk I think for so many of us, and I, you know, include myself here, is thinking too small. You know, the, the worst thing I think we could do would be to underestimate how far this thing could go. I would much rather be... I would much rather be mocked for things happening on twice the timescale that I thought than to find myself unprepared when they do happen. So whether it's
1:27:00 – 1:30:31
Why the future depends on a positive vision for AI
1. NLNathan Labenz
  '27, '29, '31, uh, I'll take that extra buffer honestly where we can get it. My thinking is just get, you know, get ready as, as much and as fast as possible, and again, if we do have a little grace time to, uh, you know, to do extra thinking, then great. Um, but I would... The, I think the worst mistake we could make would be to dismiss and, and not feel like we need to get ready for big changes.
2. ETErik Torenberg
  Should we wrap directly on that, or is there any other last note you wanna make sure to get across regarding a-anything we, we said today?
3. NLNathan Labenz
  One of my, uh, other mantras these days is the scarcest resource is a positive vision for the future.
4. ETErik Torenberg
  Yeah.
5. NLNathan Labenz
  I do think it's always really striking, whether it's Sergey or, you know, or Sam Altman or Dario, like Dario probably has the best positive vision of the frontier developer CEOs with, uh, Machines of Love and Grace. But it's always striking to me how little detail there is on these things. And when they launched GPT-4o, which was the voice mode, they were pretty upfront about saying, "Yeah, this was kind of inspired by the movie Her." And so I do think like even if you are not a researcher, you know, not great at math, not somebody who codes, um, I think that this technology wave really rewards play. It really rewards imagination. I think literally writing fiction might be one of the highest value things you could do, especially if you could write aspirational fiction that would get people at the frontier companies to think, "Geez, maybe we could steer the world in that direction. Like, wouldn't that be great?" If you could plant that kind of seed, um, in people's minds, it could come from a totally non-technical place and potentially be really impactful. Play, fiction, um... Add one other dimension to that, but yeah, play, fiction, positive vision for the future. Anything that you could do to offer a positive... Oh, behavioral too is like these days, because you can get the AIs to code so well, I'm starting to see people who have never coded before. I'm working with one guy right now who's never coded before but does have a sort of behavioral science background, and he's starting to do legitimate frontier research on how are AIs gonna behave under various kind of esoteric circumstances. So I think nobody should count themselves out from the ability to contribute to figuring this out and even to shaping this phenomenon. Um, it is not just something that the, you know, the technical minds can contribute to at this point. Literally philosophers, fiction writers, uh, people literally just messing around, um, Pliny the Jailbreaker. You know, there's, there are a-almost unlimited cognitive profiles that would be really valuable to add to the mix of people trying to figure out what's going on with AI. So come one, come all is kind of my attitude on that.
6. ETErik Torenberg
  That's a great place to, to wrap. Nathan, thank you so much for coming on the podcast.
7. NLNathan Labenz
  Thank you, Erik. It's been fun. [outro music]

Episode duration: 1:30:42

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode nkmPNvAU49Q

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome