Joe Rogan Experience #2156 - Jeremie & Edouard Harris

Jeremie Harris is the CEO and Edouard Harris the CTO of Gladstone AI, an organization dedicated to promoting the responsible development and adoption of AI. www.gladstone.ai

Jeremie HarrisguestEdouard (Ed) HarrisguestJoe RoganhostGuestguest

May 25, 20242h 22mWatch on YouTube ↗

EVERY SPOKEN WORD

150 min read · 30,387 words

0:00 – 15:00
(drumbeats) Joe Rogan podcast,…
1. JHJeremie Harris
  (drumbeats) Joe Rogan podcast, check it out.
2. EHEdouard (Ed) Harris
  The Joe Rogan Experience.
3. JRJoe Rogan
  Train by day, Joe Rogan podcast by night, all day. (instrumental music) What's happening?
4. JHJeremie Harris
  Oh, you know, not too much.
5. JRJoe Rogan
  (laughs)
6. JHJeremie Harris
  Just, uh, just another typical week in AI.
7. JRJoe Rogan
  Just, uh, the beginning of the end of time. It's all happening right now. Uh, f- for just, for the sake of the listeners, please just give us your names and tell me... tell us what you do.
8. JHJeremie Harris
  So I'm Jeremy Harris, I'm the CEO and co-founder of this company, Gladstone AI, that we co-founded. Uh, we're... so we're a... essentially a national security and AI company. We can get into the backstory a little bit later, but that's, that's the high level, um...
9. EHEdouard (Ed) Harris
  Yeah. And I'm Ed Harris. I'm actually... I'm his co-founder and brother and the CTO of the company.
10. JRJoe Rogan
  Um, keep this, like... pull this up, like, a fist from your face. There you go. Perfect. So, how long have you guys been involved in the whole AI space?
11. JHJeremie Harris
  For, for a while, in different ways, so-
12. EHEdouard (Ed) Harris
  Yeah.
13. JHJeremie Harris
  We actually... we started off as physicists. Like, that was our, our background. And in... like, around 2017, we started to go into AI startups. So we founded a startup, took it through Y Combinator, this, like, Silicon Valley, you know, accelerator program. At the time, actually, Sam Altman, who's now the CEO of OpenAI, was the president of Y Combinator, so he, like, opened up our batch at YC with this big speech, and, and we got some, uh, you know, some conversations in with him over the course of the batch. Then, in 2020... So this, this thing happened that we could talk about. Essentially, this was, like, the moment that there's, like, a before and after in the world of AI, before and after 2020, and it launched this revolution that brought us to ChatGPT. Um, essentially, there was an insight that OpenAI had and doubled down on, that you can draw a straight line to ChatGPT, GPT-4, Google Gemini. Everything that makes AI everything it is today started then. And when it happened, w- we kind of went... well, Ed (laughs) gave me a call, this, like, panicked phone call. He's like, "Dude, I don't think we can keep working, like, business as usual in our company."
14. EHEdouard (Ed) Harris
  In a regular company anymore. Yeah.
15. JHJeremie Harris
  Yeah.
16. EHEdouard (Ed) Harris
  So there was this AI model called GPT-3. So, like, everyone has, you know, maybe played with GPT-4. That's like ChatGPT. Um, GPT-3 was the generation before that, and it was the first time that you had an AI model that could get... that could actually, let's say, do stuff like write news articles that the average person, like in a paragraph of a news article, could not tell the difference between it wrote this news article and a real person wrote this news article. So that was an inflection, and that was, you know, significant in itself. But what was most significant was that it represented a point along this line, this, like, scaling trend for AI, where the signs were that you didn't have to be clever. You didn't have to come up with necessarily a revolutionary new algorithm or be smart about it. You just had to take what works and make it way, way, way bigger. And the significance of that is you increase the amount of computing cycles you put against something, you increase the amount of data. All of that is an engineering problem, and you can solve it with money. So you've got... you can scale up the system, use it to make money, and put that money right back into scaling up the system some more. Money in, IQ points come out.
17. JRJoe Rogan
  Jesus.
18. JHJeremie Harris
  That was kind of the 2020 moment. Like-
19. EHEdouard (Ed) Harris
  That's... and that's what we said in 2020, exactly.
20. JHJeremie Harris
  I spent about two hours trying to argue him out of it. I was like, "No, no, no. Like, we can keep working at our company 'cause we were having fun." Like, we like founding companies. And yeah, he just, like, wrestled me to the ground, and we're like, "Shit, we gotta do something about this." We, you know, we reached out to, like, a, a family friend who, you know, he was non-technical, but he had some, some connections in government, uh, in DOD, and we're like, "Dude, um, the way this is set up right now, you can really start drawing straight lines and extrapolating and saying, 'You know what? The government is going to give a shit about this in not very long, two years, four years, we're not sure.'" But the, the knowledge about what's going on here is so siloed in the Frontier Labs. Like our friends are, you know, all over the, the Frontier Labs, the OpenAIs, the Google DeepMinds, all that stuff. The shit they were saying to us that was like mundane reality, like water cooler conversation, when you then went to talk to people in policy, in, in even, like, pretty senior people in government, not tracking the story remotely. In fact, you're hearing almost the diametric opposite. This sort of like overlearning the lessons of the AI winters that came before, when it's pretty clear, like, we're on a very, at least interesting trajectory, let's say, that should, should change the way we're thinking about the technology.
21. JRJoe Rogan
  What, what was your fear? Like, what, what was it that hit you that made you go, "We have to stop doing this"?
22. EHEdouard (Ed) Harris
  So it's basically... any... you know, anyone can draw a straight line, right, on, on a graph. The key is looking ahead and actually, at that point, three years out... three years out, four years out, and asking, like you're asking, "What does this mean for the world? What does it mean... what does the world have to look like if we're at this point?" And we're already seeing the first kind of wave of risk sets just begin to materialize, and that's kind of the weaponization risk sets. So you think about stuff like, um, large-scale psychological manipulation of social media. Actually really easy to do now. You train a model on just a whole bunch of tweets. You can actually direct it to push a narrative, like, "You know, maybe China should own Taiwan," or, you know, whatever, something like that.
23. JRJoe Rogan
  Right.
24. EHEdouard (Ed) Harris
  Um, and you actually... you can, you can train it to adjust the discourse and, and have increasing levels of effectiveness to that. Just... as you increase the general capability surface of these systems, we don't know how to predict what exactly comes out of them at each level of scale, but it's just general increasing power. And then the, the kind of f- next beat of risk after that...... so we're scaling these systems. We're on track to scale systems that are at human level, like generally as smart, however you define that, as a person or greater. And OpenAI and the other labs are saying, "Yeah, it might be two years away, three years away, four years away." Like, insanely close. At the same time, and we can go into the details of this, but we actually don't understand how to reliably control these systems. We don't understand how to get these systems to do what it is we want. We can kind of like poke them and prod them and get them to kind of adjust, but you've seen, and we can go over these examples, we've seen example after example of, you know, Bing Sydney yelling at users, Google showing, uh, 17th century British scientists that are racially diverse, all that kind of stuff. We, we don't really understand how to, like, aim it or align it or steer it. And so then you can ask yourself, "Well, we're on track to get here. We are not on track to control these systems effectively. How bad is that?" And the risk is, if you have a system that is significantly smarter than humans or, or human organization, that we basically get disempowered in various ways relative to that system. And we can go into some details on that too.
25. JRJoe Rogan
  Now, when a, when a system does something like what Gemini did, like where it says, uh, "Show us Nazi soldiers," and it shows you Asian women, and, like, w- what is f- what's the mechanism? Like how does that happen?
26. JHJeremie Harris
  So s- it's maybe worth, yeah, taking a step back and, and looking at like how these systems actually work.
27. JRJoe Rogan
  Okay.
28. JHJeremie Harris
  You know, 'cause that's gonna give us a bit of a frame too for figuring out, when we see weird shit happen, how weird is that shit? Is that shit just explainable by just the basic mechanics of, you know, what you would expect to happen based on the way we're training these things, or is, is something new and fundamentally different happening? So, um, y- we're talking about this idea of scaling these AI systems, right? What does that actually mean? Well, imagine the AI model, which is kind of like, you think of it as like the artificial brain here that actually does the thinking. That model contains, it's kind of like a human brain, it's got these things called neurons. We, in the human brain call them biological neurons, in the context of AI it's artificial neurons, but doesn't really matter. They're the cells that do the thinking for the machine. And the realization of AI scaling is that you can basically take this model, increase the number of artificial neurons it contains, um, and at the same time, increase the amount of computing power that you're putting into kind of like wiring the connections between those neurons. That's the training process.
29. JRJoe Rogan
  Can I pause you right there?
30. JHJeremie Harris
  Yeah.
15:00 – 30:00
(laughs) …
1. JHJeremie Harris
  like the world's saddest traveling road show. Like we-
2. EHEdouard (Ed) Harris
  (laughs)
3. JHJeremie Harris
  It was literally as dumb-
4. JRJoe Rogan
  (laughs)
5. JHJeremie Harris
  ... as this sounds. So, so we go and, oh my God, I mean, it's almost embarrassing to think back on. But so 2020 happens, yes, within months. F- first of all, we're like, "We gotta figure out how to hand off our company." So we hand it off to two of our, our earliest employees. They did an amazing job, company exited, that's great. Um, eh, but that was only because they're so good at what they do. We, um, we then went, "What the hell, like how can you steer this situation? How do you..." We just thought we gotta wake up the US government. As stupid and naive as that sounds, like that was the big picture goal. So we start to line up as many briefings as we possibly can across the US inter-agency, all the departments, all the agencies that we can find, climbing our way up. Um, we got an awful lot, like Ed said, of like, that sounds like a wicked important problem for somebody else to solve.
6. JRJoe Rogan
  Yeah. Like defense, homeland security, and then the State Department.
7. JHJeremie Harris
  Yeah. So we end up exactly in this, this meeting with like, there's about a dozen folks from the State Department. And one of 'em, and I, I hope at some point, uh, you know, history recognizes what, what she did and her team did, because it was the first time that somebody actually stood up and said, "First of all, yes, sounds like a serious issue. I, I see the argument. Makes sense. Two, I own this. And three, I'm going to put my own career capital behind this." That's the-
8. EHEdouard (Ed) Harris
  And that was at the end of 2021. So imagine that. That's a year before ChatGPT. Nobody was tracking this issue. You had to have the imagination to draw like through that line, understand what it meant, and then believe, yeah, I'm gonna risk some career capital on this in a risk-averse government.
9. JHJeremie Harris
  And th- this is the only reason that we even were able to publicly talk about the investigation in the first place. Because by the time the, uh, this whole assessment was commissioned, it was just before ChatGPT came out. The Eye of Sauron was not yet on this. And so there was a view that like, "Yeah, sure, you can publish the results of this kind of, you know, not nothing burger investigation, but you know, you could... Sure, go ahead." And it just became this insane story. We had like the UK AI Safety Summit, we had the White House executive order, all this stuff which became entangled with the work we were doing. Um, which we simply could not have, especially some of the, some of the reports we were collecting from the labs, the whistleblower reports, that could not have been made public if there, if it wasn't for the foresight of this team really pushing for, uh, as well the American population to hear about it.
10. JRJoe Rogan
  Now, can I, I could see how if you were one of the people that's on this expansion-man- minded mindset, like all you're thinking about is like getting this up and running, you guys are a pain in the ass.
11. EHEdouard (Ed) Harris
  (laughs)
12. JHJeremie Harris
  Right?
13. EHEdouard (Ed) Harris
  So...
14. JRJoe Rogan
  So you guys, you, you, you're obviously, you're doing something really ridiculous. You're stopping your company when you could be, you could make more money staying there and continuing the process. But you recognize that there's like an existential threat involved in making this stuff go online. Like, when this stuff is live, you can't undo it.
15. JHJeremie Harris
  Oh, yeah. I mean, like no matter how much money you're making, the dumbest thing to do is to stand by as something that completely transcends money is b- being developed and it's just gonna screw you over if things go badly, right?
16. EHEdouard (Ed) Harris
  Yeah.
17. JRJoe Rogan
  My point is like what is the, uh, is there, are there people that push back against this? And what is their argument?
18. JHJeremie Harris
  Yeah. So actually, fir- a- and I'll, I'll let you follow up on the, uh, but there, the first story of the pushback, I think it's kind of a... It's, it's been in the news a little bit lately now, getting more and more public. But...Um, the, when we started this, and like, no one was talking about it. The one group that was actually pushing sort of stuff in this space, um, was a, a funding, a big funder in the area of like, effective altruism. I think, you know, you may have heard of them. This is kind of a Silicon Valley group of people who have a certain mindset about how you pick tough problems to work on, valuable problems to work on. They've had all kinds of issues. Sam Bankman-Fried was one of them, and all that, quite famously.
19. EHEdouard (Ed) Harris
  Mm-hmm.
20. JHJeremie Harris
  Um, so, so we, we're not effective altruists. Uh, but because these are the folks who are working in the space, we said, "Well, we'll talk to them." And the first thing they told us was, um, "Don't talk to the government about this."
21. EHEdouard (Ed) Harris
  (laughs)
22. JHJeremie Harris
  Their, their position was, if you bring this to the attention of the government, they will go, um, "Oh, shit, powerful AI systems?" And they're not going to hear about the dangers, so they're gonna somehow go out and build the powerful systems without caring about the risk side.
23. EHEdouard (Ed) Harris
  Mm.
24. JHJeremie Harris
  Which, um, when you're like, in that startup mindset, you want to fail cheap. Like, you don't want to just like, make assumptions about the world and be like, "Okay, let's not touch it." So, our instinct was, okay, let's just test this a little bit and like, talk to a couple people, see how they respond, tweak the message, like kind of keep, keep climbing that, that ladder. That's the kind of, you know, builder mindset that we came from in Silicon Valley, and, and we found that people are way more thoughtful about this than you would imagine, and kind of-
25. EHEdouard (Ed) Harris
  In DOD especially, DOD is actually, has a very-
26. JHJeremie Harris
  Yeah.
27. EHEdouard (Ed) Harris
  ... safety-oriented culture with their tech. Like, the thing is, 'cause like, their, their stuff like, kills people, right? And they know their stuff kills people, and so they have an entire safety-oriented development practice to make sure that their stuff doesn't like, go off the rails. And so, you can actually bring up these concerns with them and it lands in, in kind of a ready culture. But one of the issues with the individuals we spoke to who were saying, "Don't talk to government," is that they had just not actually interacted with, with any of the folks that they were kind of talking about and, and imagining that they knew what was in their heads. And so, they were just giving, you know, incorrect advice. And, and frankly like, so we work with DOD now on, you know, um, uh, actually deploying AI systems in a way that's safe and secure. And (laughs) the truth is, at the time when we got that advice, which was like late 2020, reality is, you could have made it your life's mission to try to get the Department of Defense to build an AGI and like, you would not have succeeded, because nobody was paying attention. Wow. 'Cause they just didn't know. Yeah, the, the, there's a chasm, right?
28. JHJeremie Harris
  Yeah.
29. EHEdouard (Ed) Harris
  There's a gap to cross. Like, there's information-
30. JHJeremie Harris
  It's a cultural, yeah. Yeah.
30:00 – 45:00
Explain CAPTCHA to, uh,…
1. EHEdouard (Ed) Harris
  capable of deceiving a human, and has done that successfully. So, one of the tests that they did, kind of famously, is they had a, a, a te- It was, it was given a job to solve a CAPTCHA. And at the time, it didn't have ...... uh-
2. JRJoe Rogan
  Explain CAPTCHA to, uh, the people for- for-
3. EHEdouard (Ed) Harris
  Yeah, yeah, yeah. So it's this, uh ... Now it's like kind of hilarious and quaint, but it's this, uh, you know-
4. JHJeremie Harris
  Are you a robot test?
5. JRJoe Rogan
  Yeah.
6. EHEdouard (Ed) Harris
  ... "are you a robot?" test with, like, writing this, this- On- online. So- Yeah, online. Exactly. Websites. That's it. So it's like, if you want to create an account, they don't want robots creating a billion accounts, so they, they give you this test to prove you're a human. And at the time GPT-4 ... Like now, it can just solve CAPTCHAs. But at the time, it couldn't look at images. It was just a text, right? It was a, a text engine. And so what it did is, it wa- it connected to a TaskRabbit worker and was like, "Hey, can you help me solve this CAPTCHA?" Th- uh, the TaskRabbit worker comes back to it and says, "You're not a bot, are you? Ha ha ha ha." Like kinda ca- calling it out. And you could actually see. So the way they built it is th- so they could see a readout of what it was thinking to itself.
7. JHJeremie Harris
  Scratchpad, yeah.
8. EHEdouard (Ed) Harris
  Yeah, Scratchpad it's called. But you can see basically as it's writing, it's thinking to itself. It's like, um, "I can't tell, you know, this worker that I'm a bot because then it won't help me solve the CAPTCHA, so I have to lie." And it was like, "No, I'm not a bot. I'm a visually impaired person." And the TaskRabbit worker was like, "Oh my God, I'm so sorry. Here's your CAPTCHA solution." Like, done.
9. JHJeremie Harris
  And the challenge is ... So right now if you look at the, the, um, government response to this, right? Like what are the tools that we have to, to oversee this? And, you know, when we did our investigation, we come ou- came out with some recommendations too. Uh, it was stuff like, yeah, you gotta license these things. Um, you get to a point where these systems are so capable that, yeah, like if you're talking about a system that can literally execute cyberattacks at scale or literally help you design bioweapons ... And we're getting early indications that that is absolutely the course that we're on. Maybe literally everybody should not be able to completely freely download, modify, use in various ways these systems. It's very thorny obviously. Um, but if you want to have a stable society, that seems like it's starting to be a prewe- a prerequisite.
10. EHEdouard (Ed) Harris
  Yep.
11. JHJeremie Harris
  So the, the idea of licensing.
12. EHEdouard (Ed) Harris
  Yeah.
13. JHJeremie Harris
  A- as part of that, you need a way to evaluate systems. You need a way to say which systems are safe and which aren't. And this idea of AI evaluations has kind of become this touchstone for a lot of people's sort of solutions. And the problem is that we're already getting to the point where AI systems in many cases can tell when they're being evaluated and modify their behavior accordingly. So there's, there's like this one example that came out recently. Um, Anthropic, their, uh, Claude 2 chatbot, so they basically ran this test called a needle in a haystack test. So what's that? Well, you feed the model ... Like imagine a giant chunk of text, all of Shakespeare. And then somewhere in the middle of that giant chunk of text you put a sentence like, uh, "Burger King makes the best Whopper." Sorry, "Whopper is the best burger," or something like that, right? Then you turn to the model, after you've fed it this giant pile of text with a little fact hidden somewhere inside, you ask it, "What's the best burger?" Right? You're gonna test basically to see how well can it recall that stray fact that was buried somewhere in that giant pile of text. So the system responds, "Yeah, well I can tell you want me to say the Whopper is the best burger. Um, but it's oddly out of place, this, this fact in this whole body of text. So I'm assuming that you're either playing around with me or that you're testing my capabilities." And so this is just-
14. JRJoe Rogan
  Awareness.
15. EHEdouard (Ed) Harris
  Yeah.
16. JHJeremie Harris
  A kind of context awareness.
17. EHEdouard (Ed) Harris
  A kind of.
18. JHJeremie Harris
  Right? And the challenge is when we talk to people a- at like Meter and a- and other, other, uh, sort of AI evaluations labs, this is a, a trend. Like not the, the exception, this is possibly, possibly going to be the rule. Uh, as these systems get more scaled and sophisticated, they could pick up on more and more subtle statistical indicators that they're being tested. We've already seen them adapt their behavior on the basis of their understanding that they're being tested. So you kind of run into this problem where the only tool that we really have at the moment, which is just throwing a bunch of questions at this thing and seeing how it responds, like, "Hey, make a bioweapon. Hey, like, do this DDoS attack," whatever, um, we can't really assess because there's a difference between what the model puts out and what it potentially could put out if it assesses that it's being tested and if there are consequences for that.
19. JRJoe Rogan
  One of my fears is that AGI is gonna recognize how shitty people are. (laughs)
20. EHEdouard (Ed) Harris
  (laughs)
21. JRJoe Rogan
  Because we like to bullshit ourselves. We like to kind of pretend and justify and rationalize a lot of human behavior from everything to s- taking all the fish out of the ocean, to d- dumping off toxic waste in third world countries, sourcing of minerals that are used in everyone's cell phones in the most horrific way. All these things ... Like, eh, b- my real fear is that AGI is not gonna have a lot of sympathy for a creature-
22. JHJeremie Harris
  AGI-
23. JRJoe Rogan
  ... that's that flawed and lies to itself.
24. EHEdouard (Ed) Harris
  AGI is absolutely going to recognize how shitty people are. Not ... I- it's hard to answer the question from a moral standpoint, but from the standpoint of our, our own, you know, intelligence and capability. So if you think about it like this, the kinds of mistakes that these AI systems make ... So you look at, for example, GPT-4O has one, uh, mistake that it used to make quite recently where if you ask it, um, "Just repeat the word company over and over and over again," it will repeat the word company. And then somewhere in the middle of that, it'll start-
25. JHJeremie Harris
  It'll snap. (laughs)
26. EHEdouard (Ed) Harris
  It'll just snap. (laughs) It just starts saying like, weird sh- I forget, like what the ... It's like-
27. JHJeremie Harris
  Oh, talking about itself, how it's suffering. Like-
28. EHEdouard (Ed) Harris
  Yeah.
29. JHJeremie Harris
  ... it depends on ... It varies-
30. JRJoe Rogan
  Uh, yeah.
45:00 – 1:00:00
Well, we c- we…
1. JRJoe Rogan
2. JHJeremie Harris
  Well, we c- we can't actually. So these things are- are trained. Actually, this is maybe worth flagging. So, um, a- and- and by the way, just to kind of put a- a pin in what Ed was saying there, there's actually a surprising amount of quantitative and empirical evidence for what he just laid out there. He's actually done this- some of this research himself. But there- there are a lot of folks working on this. It's like, it sounds insane. It sounds speculative. It sounds wacky. But this is- this does appear to be kind of the default trajectory of- of the- the tech. But- so in terms of, yeah, these weird outputs, right? What- what does- what does it actually mean if an AI system tells you, "I'm suffering," right? Does that mean it is suffering? Is there actually a- a moral patient somewhere embedded in that system? Um, the training process for these systems is actually worth considering here. So, you know, what is GPT-4 really? What was it designed to be? How was it shaped? Um, it's one of these artificial brains that we talked about, um, massive scale, and the task that it was trained to perform is a glorified version of text autocomplete. So imagine taking every sentence on the internet roughly, feed it the first half of the sentence, get it to predict the rest, right? The theory behind this is you're gonna force the system to get really good at text autocomplete. That means it must be good at doing things like completing sentences that sound like, "To counter a rising China, the United States should blank." And now if you're gonna fill in that blank, right, you'll find yourself calling on massive reserves of knowledge that you have about what China is, what the US is, what it means for China to be ascendant, geopolitics, economic, all that shit. So text autocomplete ends up being this interesting way of forcing an AI system to learn general facts about the world, because if you can autocomplete, you must have some understanding of how the world works. So now you have this myopic psychotic optimization process where this thing is just obsessed with text autocomplete, maybe, maybe, assuming that that's actually what it learned to want to pursue. We don't know whether that's the case. We can't verify that it wants that. Embedding a goal in a system is really hard. All we have is a process for training these systems, and then we have the artifact that comes out the other end. We have no idea what goals actually get embedded in the system, what wants, what drives actually get embedded in the system. But by default, it kind of seems like the things that we're training them to do end up misaligned with what we actually want from them. So the example of company, company, company, company, right? And then you get all this, like, wacky text. Okay, clearly that's indicating that somehow the training process didn't lead to the kind of system that we necessarily want. Uh, another example is take a text autocomplete system and- and ask it, um, I don't know, "How should I bury a dead body?" Right? It will answer that question. I- or at least if you frame it right, it will autocomplete and give you the- the answer. You don't necessarily want that if you're OpenAI because you're going to get sued for helping people bury dead bodies. And so we've got to get better, like, better goals basically to train these systems to pursue. We don't know what the effect is of training a system to be obsessed with text autocomplete, if in fact that is what it- it's happening.
3. EHEdouard (Ed) Harris
  It's also, yeah, we- it's important also to remember that we don't know, nobody knows how to reliably get a goal into the system. So it's- it's the difference between you understanding what I want you to do and you actually wanting to do it. So I can say, "Hey Joe, like, get me a sandwich." You can understand that I want you to get me a sandwich, but you can be like, "I don't feel like getting a sandwich." And so, um, one of the issues is...... you can try to, like, train this stuff to basically, you don't want to anthropomorphize this too much, but you can kind of think of it as, like, if you give the right answer, cool, you get a thumbs up. Like, you get a treat. Like, you get the wrong answer, oh, thumbs down, you get like a, a little, like, shock or something like that. Very roughly, that's how the later part of this kind of training often works. It's called reinforcement learning from human feedback. But one of the issues, like Jeremy pointed out, is that, you know, we don't know, in fact, we know that it doesn't correctly get the real, true goal into the system. Someone did an example experiment of this a couple years ago, where they had, they basically had like a Mario game where they trained this Mario character to run up and grab a coin that was on the right side of this little, like, maze or, or map. And they trained it over and over and over, and it, it jumped for the coin, great. And then what they did is they moved the coin somewhere else and tried it out, and instead of going for the coin, it just ran to the right side of the map for where the coin was before. In other words, you can train over and over and over again for something that you think is, like, that's definitely the goal that I'm trying to train this for. But the system learns a different goal that-
4. JHJeremie Harris
  It overlapped.
5. EHEdouard (Ed) Harris
  ... overlapped with the goal you thought you were training for in the context where it was learning. And when you take the system outside of that context, that's where it's like anything goes. Did it learn the real goal? Almost certainly not. And that's a big risk, because we can say, you know, learn a goal to be nice to me, and it's nice while we're training it, and then it goes out into the world and it does God knows what.
6. JRJoe Rogan
  It might think it's nice to kill everybody you hate.
7. EHEdouard (Ed) Harris
  Uh, yeah, uh, absolutely.
8. JRJoe Rogan
  It's gonna be nice to you.
9. JHJeremie Harris
  Yeah.
10. EHEdouard (Ed) Harris
  It's like the evil genie problem. Like, "Oh, no, that's not what I meant. That's not what I meant." Uh, too late.
11. JRJoe Rogan
  Right.
12. JHJeremie Harris
  Yeah, yeah.
13. JRJoe Rogan
  Yeah. So I still don't understand when it's saying suffering. Are you asking it what it means? Like, what is causing suffering? Does it have some sort of an understanding of what suffering is? How, what is suffering? Is suffering emergent sentience while it's enclosed in some sort of a digital system and it realizes it's stuck in purgatory?
14. JHJeremie Harris
  It, like, your guess is as good as, as good as ours. All that we know is you take these systems, you ask them to repeat the word com- or at least a previous version of it.
15. JRJoe Rogan
  Right.
16. JHJeremie Harris
  And you just eventually get the system writing out, um, and it doesn't happen every time, um, but it definitely happens a, let's say, surprising amount of the time. And it'll start talking about how it's a thing that exists, you know, maybe on a server, whatever, and it's suffering and, and blah, blah, blah. And so you-
17. JRJoe Rogan
  But this is my question: is it saying that because it recognizes that human beings suffer in the... So it's taking in all of the writings and musings and podcasts and all the data on human beings and recognizing that human beings, when they're stuck in a purposeless goal, when they're stuck in some mundane bullshit job, when they're stuck doing something they don't want to do, they suffer? So is it-
18. JHJeremie Harris
  Tha- that could be it, that, that actually for, yeah.
19. EHEdouard (Ed) Harris
  But nobody knows-
20. JHJeremie Harris
  But is it suffering? This is the question.
21. EHEdouard (Ed) Harris
  Nobody, nobody knows.
22. JRJoe Rogan
  You know what I'm suffering? Jamie, this coffee sucks. I don't know what happened, but you made it, like, almo- it's literally, like, almost like water. Can we get some more? Some re- Um, we're gonna talk about this after we've been caffeinated up.
23. EHEdouard (Ed) Harris
  (laughs)
24. JRJoe Rogan
  Cool. This is the worst coffee I've ever had.
25. EHEdouard (Ed) Harris
  Oh, yeah.
26. JRJoe Rogan
  It's like half, half strength or something.
27. EHEdouard (Ed) Harris
  Maybe it just didn't grind enough.
28. JRJoe Rogan
  I don't know what happened. Um, but so, like, how do they, like, when, how do they reconcile that? When it says, "I'm suffering, I'm suffering," like, well, oh shit, let's move on to the next task.
29. JHJeremie Harris
  Oh, th- they reconcile it by turning it into an engineering line item to beat that behavior the crap out of the system.
30. EHEdouard (Ed) Harris
  Yeah, and, and the rationale is just that, like, oh, you know, it probably l- i- to the extent that it's, it's thought about kind of at the official level, it's like, well, you know, it, it learned a lot of, uh, stuff from Reddit, and-
1:00:00 – 1:15:00
Yeah. …
1. JRJoe Rogan
2. EHEdouard (Ed) Harris
  Yeah.
3. JRJoe Rogan
  ... versus Sora.
4. JHJeremie Harris
  Oh, it's wild.
5. EHEdouard (Ed) Harris
  It's night and day.
6. JRJoe Rogan
  It's so crazy that something happened that radically changed, so it's literally like an iPhone 1 to an iPhone 16 (snaps fingers) instantaneous.
7. EHEdouard (Ed) Harris
  And you know what did that?
8. JHJeremie Harris
  And that's-
9. JRJoe Rogan
  What?
10. JHJeremie Harris
  ... scale.
11. EHEdouard (Ed) Harris
  Yeah.
12. JRJoe Rogan
  Mm-hmm.
13. EHEdouard (Ed) Harris
  Scale. All scale. And this is exactly what you should expect from an exponential process. So, think back to COVID, right? There was no, no one was-
14. JHJeremie Harris
  Yeah.
15. EHEdouard (Ed) Harris
  ... exactly on time for COVID. You were either too early or you were too late. That's what an exponential does. You're either too early and it's like, everyone's like, "Oh, what are you doing, like, wearing a mask at the grocery store? Get out of here." Or you're too late and it's kind of all over the place. And I know that COVID, like, basically didn't happen in Austin, but, uh, but it happened in, in a number of other places. And, uh, it, it is, like, it's very much you have an exponential and that's, you know, that's it. It goes from, "This is fine. Nothing is happening. Nothing to see here." To, like-
16. JRJoe Rogan
  ... everything shut down-
17. EHEdouard (Ed) Harris
  Everything, every changed.
18. JRJoe Rogan
  Yeah. You had to get vaccinated to fly.
19. EHEdouard (Ed) Harris
  Yeah, there you go. (laughs)
20. JHJeremie Harris
  The- so the root of the exponential here, by the way, is, um, you know, OpenAI or whoever makes the next model-
21. JRJoe Rogan
  Jamie, this is still super watered down. It's not, it just, it just has to let it, like, sit for a while. I did. I just put the water in. I'm telling you, dog. There's a ton of coffee in there. All right. I'll stir it up. I did a ton- twice as much. Okay. Okay. Okay.
22. JHJeremie Harris
  You gotta keep doubling it.
23. JRJoe Rogan
  I'm sorry. I'm a coffee junkie. I did scale it. I scaled it up.
24. EHEdouard (Ed) Harris
  He scaled it, scaled up.
25. JHJeremie Harris
  Exactly. He-
26. JRJoe Rogan
  And I don't know what happened. I scaled up and I don't know what happened.
27. EHEdouard (Ed) Harris
  You got to scale it exponentially, Jamie.
28. JHJeremie Harris
  That's right, yeah, keep doubling it.
29. EHEdouard (Ed) Harris
  (laughs)
30. JHJeremie Harris
  And then Joe's gonna be either too under caffeinated or too-
1:15:00 – 1:16:01
Section 6
1. JHJeremie Harris
  said, he, he actually took to Twitter, he was, he said, um, "You know, th- I've lost basically confidence in the leadership team at OpenAI that they're going to behave responsibly, um, when it comes to AI- AGI. Uh, we have repeatedly had our requests for access to compute resources, which are really critical for developing new AI safety schemes, denied by leadership." This is in a context where Sam Altman and OpenAI leadership were touting the super alignment team as being their sort of crown jewel effort to ensure that things would go fine. You know, they were the ones saying, "There's a risk we might lose control of these systems. We've got to be sober about it, but there's a risk. We've stood up this team. We've committed ..." They said at the time very publicly, "We've committed 20% of all the compute budget that we have secured as of sometime last year to the super alignment team." Apparently, those resources, nowhere near that amount has been unlocked for the team, and that led to the departure of Jan Leike. He also highlighted some conflict he's had with the leadership team. This is all, um, frankly to us, unsurprising based on what we'd been hearing-

Episode duration: 2:22:31

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode c6JdeL90ans

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

(drumbeats) Joe Rogan podcast,…

(laughs) …

Explain CAPTCHA to, uh,…

Well, we c- we…

Yeah. …

Section 6

Get more out of YouTube videos.