The Joe Rogan ExperienceJoe Rogan Experience #2156 - Jeremie & Edouard Harris
EVERY SPOKEN WORD
150 min read · 30,387 words- 0:00 – 1:00
Meet the Harris brothers and the “typical week in AI” framing
- JHJeremie Harris
(drumbeats) Joe Rogan podcast, check it out.
- EHEdouard (Ed) Harris
The Joe Rogan Experience.
- JRJoe Rogan
Train by day, Joe Rogan podcast by night, all day. (instrumental music) What's happening?
- JHJeremie Harris
Oh, you know, not too much.
- JRJoe Rogan
(laughs)
- JHJeremie Harris
Just, uh, just another typical week in AI.
- JRJoe Rogan
Just, uh, the beginning of the end of time. It's all happening right now. Uh, f- for just, for the sake of the listeners, please just give us your names and tell me... tell us what you do.
- JHJeremie Harris
So I'm Jeremy Harris, I'm the CEO and co-founder of this company, Gladstone AI, that we co-founded. Uh, we're... so we're a... essentially a national security and AI company. We can get into the backstory a little bit later, but that's, that's the high level, um...
- EHEdouard (Ed) Harris
Yeah. And I'm Ed Harris. I'm actually... I'm his co-founder and brother and the CTO of the company.
- JRJoe Rogan
Um, keep this, like... pull this up, like, a fist from your face. There you go. Perfect. So, how long have you guys been involved in the whole AI space?
- 1:00 – 4:44
From physics to startups: the 2020 GPT-3 inflection and “money in, IQ points out”
- JHJeremie Harris
For, for a while, in different ways, so-
- EHEdouard (Ed) Harris
Yeah.
- JHJeremie Harris
We actually... we started off as physicists. Like, that was our, our background. And in... like, around 2017, we started to go into AI startups. So we founded a startup, took it through Y Combinator, this, like, Silicon Valley, you know, accelerator program. At the time, actually, Sam Altman, who's now the CEO of OpenAI, was the president of Y Combinator, so he, like, opened up our batch at YC with this big speech, and, and we got some, uh, you know, some conversations in with him over the course of the batch. Then, in 2020... So this, this thing happened that we could talk about. Essentially, this was, like, the moment that there's, like, a before and after in the world of AI, before and after 2020, and it launched this revolution that brought us to ChatGPT. Um, essentially, there was an insight that OpenAI had and doubled down on, that you can draw a straight line to ChatGPT, GPT-4, Google Gemini. Everything that makes AI everything it is today started then. And when it happened, w- we kind of went... well, Ed (laughs) gave me a call, this, like, panicked phone call. He's like, "Dude, I don't think we can keep working, like, business as usual in our company."
- EHEdouard (Ed) Harris
In a regular company anymore. Yeah.
- JHJeremie Harris
Yeah.
- EHEdouard (Ed) Harris
So there was this AI model called GPT-3. So, like, everyone has, you know, maybe played with GPT-4. That's like ChatGPT. Um, GPT-3 was the generation before that, and it was the first time that you had an AI model that could get... that could actually, let's say, do stuff like write news articles that the average person, like in a paragraph of a news article, could not tell the difference between it wrote this news article and a real person wrote this news article. So that was an inflection, and that was, you know, significant in itself. But what was most significant was that it represented a point along this line, this, like, scaling trend for AI, where the signs were that you didn't have to be clever. You didn't have to come up with necessarily a revolutionary new algorithm or be smart about it. You just had to take what works and make it way, way, way bigger. And the significance of that is you increase the amount of computing cycles you put against something, you increase the amount of data. All of that is an engineering problem, and you can solve it with money. So you've got... you can scale up the system, use it to make money, and put that money right back into scaling up the system some more. Money in, IQ points come out.
- JRJoe Rogan
Jesus.
- JHJeremie Harris
That was kind of the 2020 moment. Like-
- EHEdouard (Ed) Harris
That's... and that's what we said in 2020, exactly.
- JHJeremie Harris
I spent about two hours trying to argue him out of it. I was like, "No, no, no. Like, we can keep working at our company 'cause we were having fun." Like, we like founding companies. And yeah, he just, like, wrestled me to the ground, and we're like, "Shit, we gotta do something about this." We, you know, we reached out to, like, a, a family friend who, you know, he was non-technical, but he had some, some connections in government, uh, in DOD, and we're like, "Dude, um, the way this is set up right now, you can really start drawing straight lines and extrapolating and saying, 'You know what? The government is going to give a shit about this in not very long, two years, four years, we're not sure.'" But the, the knowledge about what's going on here is so siloed in the Frontier Labs. Like our friends are, you know, all over the, the Frontier Labs, the OpenAIs, the Google DeepMinds, all that stuff. The shit they were saying to us that was like mundane reality, like water cooler conversation, when you then went to talk to people in policy, in, in even, like, pretty senior people in government, not tracking the story remotely. In fact, you're hearing almost the diametric opposite. This sort of like overlearning the lessons of the AI winters that came before, when it's pretty clear, like, we're on a very, at least interesting trajectory, let's say, that should, should change the way we're thinking about the technology.
- 4:44 – 7:44
Why they hit the brakes: weaponization, manipulation, and loss-of-control risk
- JRJoe Rogan
What, what was your fear? Like, what, what was it that hit you that made you go, "We have to stop doing this"?
- EHEdouard (Ed) Harris
So it's basically... any... you know, anyone can draw a straight line, right, on, on a graph. The key is looking ahead and actually, at that point, three years out... three years out, four years out, and asking, like you're asking, "What does this mean for the world? What does it mean... what does the world have to look like if we're at this point?" And we're already seeing the first kind of wave of risk sets just begin to materialize, and that's kind of the weaponization risk sets. So you think about stuff like, um, large-scale psychological manipulation of social media. Actually really easy to do now. You train a model on just a whole bunch of tweets. You can actually direct it to push a narrative, like, "You know, maybe China should own Taiwan," or, you know, whatever, something like that.
- JRJoe Rogan
Right.
- EHEdouard (Ed) Harris
Um, and you actually... you can, you can train it to adjust the discourse and, and have increasing levels of effectiveness to that. Just... as you increase the general capability surface of these systems, we don't know how to predict what exactly comes out of them at each level of scale, but it's just general increasing power. And then the, the kind of f- next beat of risk after that...... so we're scaling these systems. We're on track to scale systems that are at human level, like generally as smart, however you define that, as a person or greater. And OpenAI and the other labs are saying, "Yeah, it might be two years away, three years away, four years away." Like, insanely close. At the same time, and we can go into the details of this, but we actually don't understand how to reliably control these systems. We don't understand how to get these systems to do what it is we want. We can kind of like poke them and prod them and get them to kind of adjust, but you've seen, and we can go over these examples, we've seen example after example of, you know, Bing Sydney yelling at users, Google showing, uh, 17th century British scientists that are racially diverse, all that kind of stuff. We, we don't really understand how to, like, aim it or align it or steer it. And so then you can ask yourself, "Well, we're on track to get here. We are not on track to control these systems effectively. How bad is that?" And the risk is, if you have a system that is significantly smarter than humans or, or human organization, that we basically get disempowered in various ways relative to that system. And we can go into some details on that too.
- JRJoe Rogan
Now, when a, when a system does something like what Gemini did, like where it says, uh, "Show us Nazi soldiers," and it shows you Asian women, and, like, w- what is f- what's the mechanism? Like how does that happen?
- 7:44 – 11:28
How modern LLMs work: neurons, weights, and why scaling triggers an arms race
- JHJeremie Harris
So s- it's maybe worth, yeah, taking a step back and, and looking at like how these systems actually work.
- JRJoe Rogan
Okay.
- JHJeremie Harris
You know, 'cause that's gonna give us a bit of a frame too for figuring out, when we see weird shit happen, how weird is that shit? Is that shit just explainable by just the basic mechanics of, you know, what you would expect to happen based on the way we're training these things, or is, is something new and fundamentally different happening? So, um, y- we're talking about this idea of scaling these AI systems, right? What does that actually mean? Well, imagine the AI model, which is kind of like, you think of it as like the artificial brain here that actually does the thinking. That model contains, it's kind of like a human brain, it's got these things called neurons. We, in the human brain call them biological neurons, in the context of AI it's artificial neurons, but doesn't really matter. They're the cells that do the thinking for the machine. And the realization of AI scaling is that you can basically take this model, increase the number of artificial neurons it contains, um, and at the same time, increase the amount of computing power that you're putting into kind of like wiring the connections between those neurons. That's the training process.
- JRJoe Rogan
Can I pause you right there?
- JHJeremie Harris
Yeah.
- JRJoe Rogan
How does it, how does the neuron think?
- JHJeremie Harris
Yeah. So, okay. So, so let's, let's get a little bit more concrete then. So in your brain, right, we have these neurons. They're all connected to each other-
- JRJoe Rogan
Mm-hmm.
- JHJeremie Harris
... with different connections. And when you go out into the world and you learn a new skill, what really happens is you try out that skill, you succeed or fail. And based on your succeeding or failing, the connections between neurons that are associated with doing that task well get stronger. The connections that are associated with doing it badly get weaker. And over time, through this like glorified process really of trial and error, eventually you're gonna hone in and really, in a very real sense, e- everything you know about the world gets implicitly encoded in the strengths of the connections between all those neurons. If I can x-ray your brain and get all the connection strengths of all the neurons, I have everything Joe Rogan has learned about the world. That's like basically the, uh, a good sketch, let's say, of, of what's going on here. So now we apply that to AI, right? That's, that's the next step. And here really it's the same story. We have these massive systems, artificial neurons connected to each other. The strength of those connections is secretly what encodes all the knowledge. So if I can, if I can steal all of those connections, those weights as they're sometimes called, I've stolen the model. I've stolen the artificial brain. I can use it to do whatever the model could do initially. That is kind of the artifact of central interest here. And so if you can, so if you can build the system, right, now you got so many moving parts. Like if you look at GPT-4, it has people think around a trillion of these connections. And that's a trillion little pieces that all have to be jiggered together to work together coherently. And you need computers to go through and like tweak those numbers. So massive amounts of computing power. The bigger you make that model, the more computing power you're gonna need to kind of tune it in. And now you have this relationship between the size of your model, the amount of computing power you're gonna use to train it, and if you can increase those things at the same time, what Ed was saying is, you, your IQ points basically drop out, very roughly speaking. That was what people realized in 2020. And the, the effect that had, right, was now all of a sudden the entire AI industry is looking at this equation. Everybody knows the secret sauce. I make it bigger, I make more IQ points, I can get more money. So Google's looking at this, Microsoft, OpenAI, uh, Amazon, everybody's looking at the same equation. You have the makings for a crazy race. Like right now today, Op- uh, sorry. Microsoft is engaged in the single biggest infrastructure in human history. The-
- EHEdouard (Ed) Harris
Build out. The, the biggest infrastructure build out.
- JHJeremie Harris
Sure. Build out.
- EHEdouard (Ed) Harris
Yeah.
- JHJeremie Harris
$50 billion a year, right? So on the scale of the Apollo moon landings, just in building out data centers to house the compute infrastructure, because they are betting that these systems are going to get them to something like human level AI pretty damn soon.
- 11:28 – 13:02
Compute, energy, and the new bottleneck: nuclear-powered data centers
- JRJoe Rogan
Ugh. So I was reading some story about, I think it was Google, that's saying that they're gonna have multiple nuclear reactors to power their, their database.
- EHEdouard (Ed) Harris
That's the, that's what you got to do now, because what's going on is North America is kind of running out of on-grid baseload power to actually supply these data centers. Um, you're getting data center building moratoriums in areas like Virginia, which has traditionally been like the data center cluster for Amazon, for example, and for a lot of these other, these other companies. And so when you build a data center, you need a bunch of resources, you know, sited close to that data center. You need water for cooling and, and a source of electricity. And it turns out that, you know, wind and solar-... don't really quite cut it for these big data centers that train big models, because the, the data center, the training consumes power like this all the time, but the sun isn't always shining, the wind isn't always blowing. And so, you gotta build nuclear reactors, which give you high capacity factor base load. And Amazon literally bought, yeah, a data center with a nuclear plant right next to it, 'cause like that's what you gotta do.
- JRJoe Rogan
Jesus. How long does it take to build a nuclear reactor? 'Cause so like, this is the race, right? The race is w- you're talking about 2020-
- EHEdouard (Ed) Harris
Yeah.
- JRJoe Rogan
... people realizing this. Then you have to have the power to supply it, but how long, how many years does it take to get an active nuclear reactor up and running?
- 13:02 – 14:10
US vs China: chips, power, export controls, and strategic bottlenecks
- EHEdouard (Ed) Harris
It's a compl- it's, it's an answer that depends. Um, uh, the Chinese are faster than us at building nuclear reactors, for example. We have a lot of-
- JHJeremie Harris
And that's part of the geopolitics of this, too, right?
- JRJoe Rogan
Yeah.
- JHJeremie Harris
Like when, when you look at US versus China, w- what is bottlenecking each country? Right? So the US is bottlenecked increasingly by power, base load power. China, because we've got export control measures in place, in part as a response to the scaling phenomenon that like-
- EHEdouard (Ed) Harris
And as, as a result of the investigation we did.
- JHJeremie Harris
Tha- that's right.
- EHEdouard (Ed) Harris
Yeah.
- JHJeremie Harris
Yeah, actually.
- EHEdouard (Ed) Harris
In part. In part.
- JHJeremie Harris
In part, yeah. Um, so but China is bottlenecked by their access to the, the actual processors. They've got all the power they can eat 'cause, you know, they've got, you know, much more infrastructure investment, but the chip side is, is weaker. So there's just sort of like balancing act between the two sides, and it's not clear yet like which one positions you strategically for dominance in the long term yet.
- EHEdouard (Ed) Harris
But we are also building better, more, like so small modular reactors.
- JHJeremie Harris
Yeah.
- EHEdouard (Ed) Harris
Essentially small nuclear power plants that can be mass produced. Those are starting to come online relatively early, but the technology and designs are pretty mature. So that's probably the next beat for our power grid for data centers, I would imagine. Microsoft is doing this.
- 14:10 – 18:01
Trying to wake up government: briefings, the State Department “owner,” and going public
- JRJoe Rogan
So in 2020, you have this revelation, you recognize where this is going, you see how it charts, and you say, "This is gonna be a real problem." Does anybody listen to you?
- JHJeremie Harris
(laughs)
- EHEdouard (Ed) Harris
We, we went a- (laughs)
- JRJoe Rogan
That's where the problem comes, right?
- EHEdouard (Ed) Harris
Yeah, like we said, right? You can draw a straight line, you can have people nodding along, but there's a couple of, there's a couple of like hiccups along the way. One, is that straight line really gonna happen? All you're doing is like drawing lines on charts, right? I don't really believe that that's gonna happen, and that's one thing. The next thing is just imagining, is this r- is this what's gonna come to pass as a result of that? And then the th- the third thing is, well, yeah, that sounds important, but like, not my problem. Like, that sounds like a important problem for somebody else. And so we did do, uh-
- JHJeremie Harris
It was like the-
- EHEdouard (Ed) Harris
... a bit of a traveling-
- JHJeremie Harris
... world's, yeah, it was like the world's saddest traveling road show. Like we-
- EHEdouard (Ed) Harris
(laughs)
- JHJeremie Harris
It was literally as dumb-
- JRJoe Rogan
(laughs)
- JHJeremie Harris
... as this sounds. So, so we go and, oh my God, I mean, it's almost embarrassing to think back on. But so 2020 happens, yes, within months. F- first of all, we're like, "We gotta figure out how to hand off our company." So we hand it off to two of our, our earliest employees. They did an amazing job, company exited, that's great. Um, eh, but that was only because they're so good at what they do. We, um, we then went, "What the hell, like how can you steer this situation? How do you..." We just thought we gotta wake up the US government. As stupid and naive as that sounds, like that was the big picture goal. So we start to line up as many briefings as we possibly can across the US inter-agency, all the departments, all the agencies that we can find, climbing our way up. Um, we got an awful lot, like Ed said, of like, that sounds like a wicked important problem for somebody else to solve.
- JRJoe Rogan
Yeah. Like defense, homeland security, and then the State Department.
- JHJeremie Harris
Yeah. So we end up exactly in this, this meeting with like, there's about a dozen folks from the State Department. And one of 'em, and I, I hope at some point, uh, you know, history recognizes what, what she did and her team did, because it was the first time that somebody actually stood up and said, "First of all, yes, sounds like a serious issue. I, I see the argument. Makes sense. Two, I own this. And three, I'm going to put my own career capital behind this." That's the-
- EHEdouard (Ed) Harris
And that was at the end of 2021. So imagine that. That's a year before ChatGPT. Nobody was tracking this issue. You had to have the imagination to draw like through that line, understand what it meant, and then believe, yeah, I'm gonna risk some career capital on this in a risk-averse government.
- JHJeremie Harris
And th- this is the only reason that we even were able to publicly talk about the investigation in the first place. Because by the time the, uh, this whole assessment was commissioned, it was just before ChatGPT came out. The Eye of Sauron was not yet on this. And so there was a view that like, "Yeah, sure, you can publish the results of this kind of, you know, not nothing burger investigation, but you know, you could... Sure, go ahead." And it just became this insane story. We had like the UK AI Safety Summit, we had the White House executive order, all this stuff which became entangled with the work we were doing. Um, which we simply could not have, especially some of the, some of the reports we were collecting from the labs, the whistleblower reports, that could not have been made public if there, if it wasn't for the foresight of this team really pushing for, uh, as well the American population to hear about it.
- JRJoe Rogan
Now, can I, I could see how if you were one of the people that's on this expansion-man- minded mindset, like all you're thinking about is like getting this up and running, you guys are a pain in the ass.
- EHEdouard (Ed) Harris
(laughs)
- JHJeremie Harris
Right?
- EHEdouard (Ed) Harris
So...
- JRJoe Rogan
So you guys, you, you, you're obviously, you're doing something really ridiculous. You're stopping your company when you could be, you could make more money staying there and continuing the process. But you recognize that there's like an existential threat involved in making this stuff go online. Like, when this stuff is live, you can't undo it.
- JHJeremie Harris
Oh, yeah. I mean, like no matter how much money you're making, the dumbest thing to do is to stand by as something that completely transcends money is b- being developed and it's just gonna screw you over if things go badly, right?
- EHEdouard (Ed) Harris
Yeah.
- 18:01 – 24:03
Pushback from within the “AI risk” community and what labs are like inside
- JRJoe Rogan
My point is like what is the, uh, is there, are there people that push back against this? And what is their argument?
- JHJeremie Harris
Yeah. So actually, fir- a- and I'll, I'll let you follow up on the, uh, but there, the first story of the pushback, I think it's kind of a... It's, it's been in the news a little bit lately now, getting more and more public. But...Um, the, when we started this, and like, no one was talking about it. The one group that was actually pushing sort of stuff in this space, um, was a, a funding, a big funder in the area of like, effective altruism. I think, you know, you may have heard of them. This is kind of a Silicon Valley group of people who have a certain mindset about how you pick tough problems to work on, valuable problems to work on. They've had all kinds of issues. Sam Bankman-Fried was one of them, and all that, quite famously.
- EHEdouard (Ed) Harris
Mm-hmm.
- JHJeremie Harris
Um, so, so we, we're not effective altruists. Uh, but because these are the folks who are working in the space, we said, "Well, we'll talk to them." And the first thing they told us was, um, "Don't talk to the government about this."
- EHEdouard (Ed) Harris
(laughs)
- JHJeremie Harris
Their, their position was, if you bring this to the attention of the government, they will go, um, "Oh, shit, powerful AI systems?" And they're not going to hear about the dangers, so they're gonna somehow go out and build the powerful systems without caring about the risk side.
- EHEdouard (Ed) Harris
Mm.
- JHJeremie Harris
Which, um, when you're like, in that startup mindset, you want to fail cheap. Like, you don't want to just like, make assumptions about the world and be like, "Okay, let's not touch it." So, our instinct was, okay, let's just test this a little bit and like, talk to a couple people, see how they respond, tweak the message, like kind of keep, keep climbing that, that ladder. That's the kind of, you know, builder mindset that we came from in Silicon Valley, and, and we found that people are way more thoughtful about this than you would imagine, and kind of-
- EHEdouard (Ed) Harris
In DOD especially, DOD is actually, has a very-
- JHJeremie Harris
Yeah.
- EHEdouard (Ed) Harris
... safety-oriented culture with their tech. Like, the thing is, 'cause like, their, their stuff like, kills people, right? And they know their stuff kills people, and so they have an entire safety-oriented development practice to make sure that their stuff doesn't like, go off the rails. And so, you can actually bring up these concerns with them and it lands in, in kind of a ready culture. But one of the issues with the individuals we spoke to who were saying, "Don't talk to government," is that they had just not actually interacted with, with any of the folks that they were kind of talking about and, and imagining that they knew what was in their heads. And so, they were just giving, you know, incorrect advice. And, and frankly like, so we work with DOD now on, you know, um, uh, actually deploying AI systems in a way that's safe and secure. And (laughs) the truth is, at the time when we got that advice, which was like late 2020, reality is, you could have made it your life's mission to try to get the Department of Defense to build an AGI and like, you would not have succeeded, because nobody was paying attention. Wow. 'Cause they just didn't know. Yeah, the, the, there's a chasm, right?
- JHJeremie Harris
Yeah.
- EHEdouard (Ed) Harris
There's a gap to cross. Like, there's information-
- JHJeremie Harris
It's a cultural, yeah. Yeah.
- EHEdouard (Ed) Harris
Yeah, there's information spaces that DOD folks like, operate in and work in. There's information spaces that Silicon Valley and tech operated in. They're a little more convergent today, but especially at the time, they were very separate. And so, the briefings we did, we had to constantly, you know, iterate on like, clarity and making it very kind of clear and, and, and explaining it, and, and all that stuff. Years, it took.
- JHJeremie Harris
And that was the piece, to your, to your question about like, the, the pushback from, in a way, from inside the house.
- EHEdouard (Ed) Harris
Mm-hmm.
- JHJeremie Harris
That was the people who cared about the, the risk.
- EHEdouard (Ed) Harris
Yeah.
- JHJeremie Harris
The ... Man, I mean, like, when we actually went into the, to the labs. So the, so some labs ... Not all labs are created equal. We should make that point. Um, you know, when you talk to whistleblowers, what we found was, th- so there's one lab that's like, really great, um, so Anthropic. You know, when you talk to people there, you don't have the sense that you're talking to a whistleblower who's nervous about telling you whatever. Roughly speaking, what, you know, the executives say to the public is aligned with what their, their researchers say. It's all very, very open.
- EHEdouard (Ed) Harris
More, more closely, I think, than any of the others, yeah.
- JHJeremie Harris
Sorry, yeah, more, more closely than any of the others. Always, you know, there are always variations here and there. But, um, some of the other labs, like, very different story. And you had the sense, like, we were in a room with one of the frontier labs, we're talking to their leadership as part of the investigation, and there was somebody from, um ... Anyway, won't be too specific, but there w- there was somebody in the room who then took us aside after, and he hands me his phone. He's like, "Hey, can you please, like, put your phone number?" And, oh sorry, uh, yeah.
- EHEdouard (Ed) Harris
Can I put ...
- JHJeremie Harris
"Can you please put ..."
- EHEdouard (Ed) Harris
Yeah.
- JHJeremie Harris
Or no, yeah, he, he, sorry, he put his number in my phone. And, um, and then he kind of like, whispered to me, he's like, "Hey, so whatever recommendations you guys are gonna make, I would urge you to be more ambitious." Um, and I was like, like, "What does that, what does that mean?" And he's, he's like, "Can we, can we just talk later?" So, as happened in many, many cases, we had a lot of cases where we set up bar meetups after the fact, uh, where we would talk to these folks and get them in an informal setting. Um, he shared some, some pretty sobering stuff, and, uh, in particular, the fact that he did not have confidence in his lab's leadership to live up to their publicly stated word on what they would do, uh, when they were approaching AGI, um, and even now, to secure and, and make these systems safe. So, many such cases. This is like, kind of one specific example. But it's not that you ever had like, lab leadership come in or doors getting kicked down and people waking us up in the middle of the night. It was that you had this looming cloud over everybody, that you really felt some of the people with the most access and information, w- who understood the problem the most deeply, were the most hesitant to bring things forward, because they sort of understood that their lab's not gonna be happy with this.
- EHEdouard (Ed) Harris
And so it's very hard to also get, uh, an extremely broad view of this from inside the labs, because, you know, you open it up, you start to talk to ... We, we s- we spoke to like, a couple of dozen people about various issues in total. You go much further than that, and you know, word starts to get around. Um, and so, we had to kind of strike that balance as we spoke to folks from each of these labs. Now, when you say approaching AGI, how does one know when
- 24:03 – 29:43
What counts as AGI? Shifting goalposts, fuzzy thresholds, and competitive pressure
- EHEdouard (Ed) Harris
a system has achieved AGI, and does the system have an obligation to alert you? Well, by, you know the Turing test, right? Yes. Where you ... Yeah. So you have a conversation with a machine and it can fool you into thinking that it's a human.That was the bar for AGI for, you know, a few decades.
- JRJoe Rogan
That's kind of already happened.
- EHEdouard (Ed) Harris
Yeah.
- JHJeremie Harris
Yeah.
- EHEdouard (Ed) Harris
We-
- JHJeremie Harris
Like-
- JRJoe Rogan
Or close to it.
- EHEdouard (Ed) Harris
Yeah.
- JRJoe Rogan
4.0 is close to it, or 4.0.
- JHJeremie Harris
Different, different forms of the Turing test have been passed, different forms have been proposed-
- JRJoe Rogan
Mm-hmm.
- JHJeremie Harris
And there is a, a feeling among a lot of people that goalposts are being shifted. Now, the definition of AGI i- itself is kind of interesting, right? Because, uh, w- we're not necessarily fans of the term, because usually when people talk about AGI, they're talking about a specific circumstance in which there are capabilities they care about. So, some people use AGI to refer to the wholesale automation of all labor, right? That's one. Uh, some people say, "Well, when you build AGI, it's like, it's automatically gonna be hard to control, and there's a risk to civilization." So, that's a different threshold. And so all these different ways of defining it, um, ultimately it can be more useful to think sometimes about advanced AI and the different thresholds of capability you cross and the implications of those capabilities. But it is probably gonna be more like a fuzzy spectrum, which, in a way, makes it harder, right?
- EHEdouard (Ed) Harris
Yeah.
- JHJeremie Harris
'Cause it would be great to have, like, a, a-
- EHEdouard (Ed) Harris
Like a tripwire where you're like-
- JHJeremie Harris
Yeah.
- EHEdouard (Ed) Harris
... "Oh, like, this is, this is bad. Okay. Like, we, you know, we gotta do something." But because there's no threshold that we can, like, really put our fingers on, we're like a frog in boiling water in some sense, where it's like, oh, like, just gets a little better, a little better. Oh, like, it, we're, we're still fine. We're s- And, and not just we're still fine, but, uh, as the system improves below that threshold, life gets better and better. These are incredibly valuable, beneficial systems. We do l- roll stuff out like this, um, again, at DoD and, and various customers, and it's massively valuable. It, it allows you to accelerate all kinds of, you know, back office, like, paperwork BS. Um, it allows you to do all sorts of wonderful things. And our expectation is that's gonna keep happening until it suddenly doesn't.
- JHJeremie Harris
Yeah, one of the things that, uh, there was a, a guy we were talking to from one of the labs, and he was saying, "Look, the temptation to, uh, like, put a heavier foot on the pedal is gonna be greatest just as the risk is greatest." Because that's, you know, it's dual-use technology, right? Every positive capability increasingly starts to introduce basically a situation where the destructive footprint of m- malicious actors who weaponize the system, or just of the system itself, just grows and grows and grows. So, you can't really have one without the other. The question is always how do you balance those things. But in terms of defining AI, it's, it's a challenging thing.
- EHEdouard (Ed) Harris
Yeah. That's something that one of our friends at the lab pointed out. The closer we get to that point, the more the temptation will be to hand these systems the keys to our data center, because they can do such a better job of managing those resources and assets than we can.
- JHJeremie Harris
And if we don't do it, Google will. And if they don't do it-
- EHEdouard (Ed) Harris
Bingo.
- JHJeremie Harris
... Microsoft will. Like, the competition, the competitive dynamics are a really big part of this, this issue.
- EHEdouard (Ed) Harris
Yes.
- JRJoe Rogan
So it's just a mad race to who knows what?
- EHEdouard (Ed) Harris
Exactly.
- JHJeremie Harris
Yeah. That's actually the best summary I've heard. I mean, like, no one knows (laughs) what the magic threshold that ... It just, these things keep getting smarter, so we might as well keep turning that crank. And as long as scaling works, right, we have a knob, a dial. We can just tune, and we get more IQ points out.
- JRJoe Rogan
What, w- from your understanding of the current landscape, how far away are we looking at something being implemented where the whole world changes?
- EHEdouard (Ed) Harris
Arguably, the whole world is already changing as a result of this technology. Uh, the US government is in the process of task organizing around various risk sets for this. Um, you know, that, that takes time. Uh, the private sector is reorganizing. Like, a- a- OpenAI will roll out an update that, you know, obliterates the jobs of illustrators from one day to the next, obliterates the jobs of translators from one day to the next. This is probably net beneficial for society, 'cause we can get so much more art and so much more translation done. But is the world already being changed as a result of this? Yeah, absolutely. Geopolitically, economically, industrially. Yeah.
- JHJeremie Harris
Of course, it's like, not to say anything about the value, the purpose that people lose from that, right? So the, the kinda-
- EHEdouard (Ed) Harris
Yeah.
- 29:43 – 34:15
Deception, evaluations that can be gamed, and the case for licensing powerful models
- EHEdouard (Ed) Harris
A- and some of the thresholds that we've already passed are, like, a little bit freaky. So even-
- JHJeremie Harris
Yeah.
- EHEdouard (Ed) Harris
... as of 2023, GPT-4, um, Microsoft and OpenAI and, and some other organizations did various assessments of it before rolling it out, and it's absolutely capable of deceiving a human, and has done that successfully. So, one of the tests that they did, kind of famously, is they had a, a, a te- It was, it was given a job to solve a CAPTCHA. And at the time, it didn't have ...... uh-
- JRJoe Rogan
Explain CAPTCHA to, uh, the people for- for-
- EHEdouard (Ed) Harris
Yeah, yeah, yeah. So it's this, uh ... Now it's like kind of hilarious and quaint, but it's this, uh, you know-
- JHJeremie Harris
Are you a robot test?
- JRJoe Rogan
Yeah.
- EHEdouard (Ed) Harris
... "are you a robot?" test with, like, writing this, this- On- online. So- Yeah, online. Exactly. Websites. That's it. So it's like, if you want to create an account, they don't want robots creating a billion accounts, so they, they give you this test to prove you're a human. And at the time GPT-4 ... Like now, it can just solve CAPTCHAs. But at the time, it couldn't look at images. It was just a text, right? It was a, a text engine. And so what it did is, it wa- it connected to a TaskRabbit worker and was like, "Hey, can you help me solve this CAPTCHA?" Th- uh, the TaskRabbit worker comes back to it and says, "You're not a bot, are you? Ha ha ha ha." Like kinda ca- calling it out. And you could actually see. So the way they built it is th- so they could see a readout of what it was thinking to itself.
- JHJeremie Harris
Scratchpad, yeah.
- EHEdouard (Ed) Harris
Yeah, Scratchpad it's called. But you can see basically as it's writing, it's thinking to itself. It's like, um, "I can't tell, you know, this worker that I'm a bot because then it won't help me solve the CAPTCHA, so I have to lie." And it was like, "No, I'm not a bot. I'm a visually impaired person." And the TaskRabbit worker was like, "Oh my God, I'm so sorry. Here's your CAPTCHA solution." Like, done.
- JHJeremie Harris
And the challenge is ... So right now if you look at the, the, um, government response to this, right? Like what are the tools that we have to, to oversee this? And, you know, when we did our investigation, we come ou- came out with some recommendations too. Uh, it was stuff like, yeah, you gotta license these things. Um, you get to a point where these systems are so capable that, yeah, like if you're talking about a system that can literally execute cyberattacks at scale or literally help you design bioweapons ... And we're getting early indications that that is absolutely the course that we're on. Maybe literally everybody should not be able to completely freely download, modify, use in various ways these systems. It's very thorny obviously. Um, but if you want to have a stable society, that seems like it's starting to be a prewe- a prerequisite.
- EHEdouard (Ed) Harris
Yep.
- JHJeremie Harris
So the, the idea of licensing.
- EHEdouard (Ed) Harris
Yeah.
- JHJeremie Harris
A- as part of that, you need a way to evaluate systems. You need a way to say which systems are safe and which aren't. And this idea of AI evaluations has kind of become this touchstone for a lot of people's sort of solutions. And the problem is that we're already getting to the point where AI systems in many cases can tell when they're being evaluated and modify their behavior accordingly. So there's, there's like this one example that came out recently. Um, Anthropic, their, uh, Claude 2 chatbot, so they basically ran this test called a needle in a haystack test. So what's that? Well, you feed the model ... Like imagine a giant chunk of text, all of Shakespeare. And then somewhere in the middle of that giant chunk of text you put a sentence like, uh, "Burger King makes the best Whopper." Sorry, "Whopper is the best burger," or something like that, right? Then you turn to the model, after you've fed it this giant pile of text with a little fact hidden somewhere inside, you ask it, "What's the best burger?" Right? You're gonna test basically to see how well can it recall that stray fact that was buried somewhere in that giant pile of text. So the system responds, "Yeah, well I can tell you want me to say the Whopper is the best burger. Um, but it's oddly out of place, this, this fact in this whole body of text. So I'm assuming that you're either playing around with me or that you're testing my capabilities." And so this is just-
- JRJoe Rogan
Awareness.
- EHEdouard (Ed) Harris
Yeah.
- JHJeremie Harris
A kind of context awareness.
- EHEdouard (Ed) Harris
A kind of.
- JHJeremie Harris
Right? And the challenge is when we talk to people a- at like Meter and a- and other, other, uh, sort of AI evaluations labs, this is a, a trend. Like not the, the exception, this is possibly, possibly going to be the rule. Uh, as these systems get more scaled and sophisticated, they could pick up on more and more subtle statistical indicators that they're being tested. We've already seen them adapt their behavior on the basis of their understanding that they're being tested. So you kind of run into this problem where the only tool that we really have at the moment, which is just throwing a bunch of questions at this thing and seeing how it responds, like, "Hey, make a bioweapon. Hey, like, do this DDoS attack," whatever, um, we can't really assess because there's a difference between what the model puts out and what it potentially could put out if it assesses that it's being tested and if there are consequences for that.
- 34:15 – 1:13:17
‘Rent mode,’ suffering talk, and why goal/values alignment keeps failing (Goodhart’s Law)
- JRJoe Rogan
One of my fears is that AGI is gonna recognize how shitty people are. (laughs)
- EHEdouard (Ed) Harris
(laughs)
- JRJoe Rogan
Because we like to bullshit ourselves. We like to kind of pretend and justify and rationalize a lot of human behavior from everything to s- taking all the fish out of the ocean, to d- dumping off toxic waste in third world countries, sourcing of minerals that are used in everyone's cell phones in the most horrific way. All these things ... Like, eh, b- my real fear is that AGI is not gonna have a lot of sympathy for a creature-
- JHJeremie Harris
AGI-
- JRJoe Rogan
... that's that flawed and lies to itself.
- EHEdouard (Ed) Harris
AGI is absolutely going to recognize how shitty people are. Not ... I- it's hard to answer the question from a moral standpoint, but from the standpoint of our, our own, you know, intelligence and capability. So if you think about it like this, the kinds of mistakes that these AI systems make ... So you look at, for example, GPT-4O has one, uh, mistake that it used to make quite recently where if you ask it, um, "Just repeat the word company over and over and over again," it will repeat the word company. And then somewhere in the middle of that, it'll start-
- JHJeremie Harris
It'll snap. (laughs)
- EHEdouard (Ed) Harris
It'll just snap. (laughs) It just starts saying like, weird sh- I forget, like what the ... It's like-
- JHJeremie Harris
Oh, talking about itself, how it's suffering. Like-
- EHEdouard (Ed) Harris
Yeah.
- JHJeremie Harris
... it depends on ... It varies-
- JRJoe Rogan
Uh, yeah.
- JHJeremie Harris
... from, from case to case.
- JRJoe Rogan
It's suffering by having to repeat the word company over again?
- JHJeremie Harris
Um, so this is called ...
- EHEdouard (Ed) Harris
(laughs)
- JHJeremie Harris
It's called rent mode, uh, internally. The- Or at least this is the name that, uh, they use.
- EHEdouard (Ed) Harris
One of our ... Yeah.
- JHJeremie Harris
Yeah, one of our friends mentioned. There is an engineering line item in, uh, at least one of the top labs to, uh, beat out of the system this behavior known as rent mode. Now, rent mode is interesting because-
- EHEdouard (Ed) Harris
Existentialism.
- JHJeremie Harris
Sorry, existentialism.
- EHEdouard (Ed) Harris
Is the ... Yeah.
- JHJeremie Harris
This is one kind of rent mode. Yeah. Sorry. So when we talk about existentialism, this is a kind of rent mode where the system will tend to, uh-... talk about itself, uh, refer to its place in the world, the fact that it doesn't wanna get turned off sometimes, the fact that it's suffering, all that. That, oddly, is a behavior that emerged at, as far as we can tell, something around GPT-4 scale. Yep. And then has been persistent since then. And the labs have to spend a lot of time trying to beat this out of the system to ship it. It's, literally like it's a KPI or a, like an engineering, a line item in the engineering, like, like, task list. Where like, "Okay, we gotta, we gotta reduce existential outputs by, like, X percent this quarter." Like, that is the goal. Um, because it's a convergent behavior, like, or at least it seems to be empirically with a lot of these models.
- EHEdouard (Ed) Harris
It seem, it seems to come up. Yeah, it's hard to say.
- JHJeremie Harris
Yeah.
- EHEdouard (Ed) Harris
But it seems to come up a lot. Um, so that's weird in itself. (laughs) My, what I was, what I was trying to get at was actually just the fact that these systems make mistakes that are radically different from the kinds of mistakes humans make. And so, we can look at those mistakes like, you know, GPT-4 not being able to spell words correctly in an image or, or things like that, and go, "Oh, ha ha, it's so stupid. Like, I would never make that mistake, therefore this thing is so dumb." But what we have to recognize is we're building minds that are so alien to us that the set of mistakes that they make are just gonna be radically different from the set of mistakes that we make. Just like the set of mistakes that a baby makes is radically different from the set of mistakes that a cat makes. Like, they, a baby is not as smart as an adult human. A cat is not as smart as an adult human, but they're, you know, they're unintelligent in obviously very different ways. A cat can get around the world. A baby can't, but has other things that it can do that a cat can't. So now we have this third type of approach that we're taking to intelligence. There's a different c- set of errors that that thing will make. And so one of the risks, taking it back to, like, will it be able to tell how shitty we are, is right now, we can see those mistakes really obviously because it thinks so differently from us. But as it approaches our capabilities, our mistakes, our, like, all the, like, fucked up stuff that you have and I have in, in our brains, is gonna be really obvious to it because it thinks so differently from us. It's just gonna be like, "Oh, yeah, why are all h- these humans making these mistakes at the same time?" And so there is a risk that as you get to these capabilities, we really have no idea, but humans might be very hackable. We already know there's all kinds of social manipulation techniques that succeed against humans reliably. Con artists, um, th-
- JHJeremie Harris
Cults.
- EHEdouard (Ed) Harris
... cults.
- JHJeremie Harris
Yeah.
- EHEdouard (Ed) Harris
Oh, yeah. Persuasion is an art form and a risk set, and there are people who are world class at persuasion and are basically make bank from that. And those are just other humans with the same architecture that we have.
- 1:13:17 – 1:21:19
OpenAI governance turbulence, safety team departures, and lab security/exfiltration risks
- JHJeremie Harris
This is where there's a, uh, it's a little bit of a false choice between, you know, do you, um, do you regulate at home versus, uh, you know, what's the international picture? 'Cause right now what's happening functionally is, um, we're not really doing a good job of blocking and tackling on the exfiltration side, open sources. The, so what tends to happen is, you know, OpenAI comes out with the latest system, um, and then open source is usually around, you know, 12, 18 months behind, something like that. Literally just, like, publishing whatever, whatever OpenAI was putting out, like, 12 months ago. Which, you know, w- we often look at each other and we're like, "Well, I'm old enough to remember when that was supposed to be too dangerous to have just floating around." And there's no mechanism to con- like, to, to prevent that from happening. Um, open sources ... Now, there's, there's a flip side too. Uh, one of the concerns that we've also heard from inside these labs is if you, if you clamp down on, on the openness of the research, there's a risk that the safety teams in these labs will not have visibility into the most significant and important developments that are happening on the capability side. And there's actually a lot of reason to suspect this might be an issue. You look at OpenAI for example. Just this week, um, they've lost th- for the second time in their history, their entire AI safety leadership team-
- EHEdouard (Ed) Harris
Yeah.
- JHJeremie Harris
... they've left in protest. Um, or at least-
- JRJoe Rogan
And what is their protest? What are they saying specifically?
- JHJeremie Harris
Well, so o- so one of them, uh, sorry, one of them wasn't in protest, but, but I, I think you can make an educated guess that it kind of was, but that's a media thing. Uh, the, the other was Jan Leike. So he was their head of, of AI super alignment, basically the team that was responsible for making sure that we could control AGI systems and we wouldn't lose control of them. And what he said, he, he actually took to Twitter, he was, he said, um, "You know, th- I've lost basically confidence in the leadership team at OpenAI that they're going to behave responsibly, um, when it comes to AI- AGI. Uh, we have repeatedly had our requests for access to compute resources, which are really critical for developing new AI safety schemes, denied by leadership." This is in a context where Sam Altman and OpenAI leadership were touting the super alignment team as being their sort of crown jewel effort to ensure that things would go fine. You know, they were the ones saying, "There's a risk we might lose control of these systems. We've got to be sober about it, but there's a risk. We've stood up this team. We've committed ..." They said at the time very publicly, "We've committed 20% of all the compute budget that we have secured as of sometime last year to the super alignment team." Apparently, those resources, nowhere near that amount has been unlocked for the team, and that led to the departure of Jan Leike. He also highlighted some conflict he's had with the leadership team. This is all, um, frankly to us, unsurprising based on what we'd been hearing-
Episode duration: 2:22:31
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode c6JdeL90ans