No Priors Ep. 105 | With Director of the Center of AI Safety Dan Hendrycks

This week on No Priors, Sarah is joined by Dan Hendrycks, director of the Center of AI Safety. Dan serves as an advisor to xAI and Scale AI. He is a longtime AI researcher, publisher of interesting AI evals such as "Humanity's Last Exam," and co-author of a new paper on National Security "Superintelligence Strategy" along with Scale founder-CEO Alex Wang and former Google CEO Eric Schmidt. They explore AI safety, geopolitical implications, the potential weaponization of AI, along with policy recommendations. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @DanHendrycks Show Notes: 0:00 Introduction 0:36 Dan’s path to focusing on AI Safety 1:25 Safety efforts in large labs 3:12 Distinguishing alignment and safety 4:48 AI’s impact on national security 9:59 How might AI be weaponized? 14:43 Immigration policies for AI talent 17:50 Mutually assured AI malfunction 22:54 Policy suggestions for current administration 25:34 Compute security 30:37 Current state of evals

Sarah GuohostDan Hendrycksguest

Mar 5, 202536mWatch on YouTube ↗

EVERY SPOKEN WORD

70 min read · 14,356 words

0:00 – 0:36
Introduction
1. SGSarah Guo
  (instrumental music plays) Hi, listeners, and welcome back to No Priors. Today, I'm with Dan Hendrycks, AI researcher and director of the Center for AI Safety. He's published papers and widely used evals, such as MMLU and most recently, Humanity's Last Exam. He's also published Superintelligence Strategy, alongside authors including former Google CEO Eric Schmidt and Scale founder Alex Wang. We talk about AI safety and geopolitical implications, analogies to nuclear, compute security, and the state of evals. Dan, thanks for doing this.
2. DHDan Hendrycks
  Glad to be
0:36 – 1:25
Dan’s path to focusing on AI Safety
1. DHDan Hendrycks
  here.
2. SGSarah Guo
  How'd you end up working on AI safety?
3. DHDan Hendrycks
  AI was pretty clearly going to be a big deal if one would just think through its conclusions, so early on, um, it- it seemed like other people were ignoring it because it was, um, weirder or not that pleasant to think about, it's- it's hard to wrap your head around, but it seemed like the most important thing during this century, so I- I thought that that would be a good place to develop my career toward, um, and so that's why I started on it early on. And then since it'd be such a big deal, um, we need to make sure that we can think about it properly, um, channel it in a- a- a productive direction and take care of some sort of, um, tail risks which will... or generally systematically under-addressed, so that- that's why I got into it. It's a big deal and people weren't really doing much about it, um, at the time.
4. SGSarah Guo
  And- and what
1:25 – 3:12
Safety efforts in large labs
1. SGSarah Guo
  do you think of as the- the center's role versus safety efforts within the large labs?
2. DHDan Hendrycks
  Well, there aren't that many safety efforts in the (laughs) labs even now. I mean, I think the labs can just focus on doing some very basic measures to refuse, um, queries related to, like, help me make a virus and things like that. But, um, I- I don't think labs have a extremely large role in safety overall or making this go well. They're kind of predetermined to race. They can't really choose not to unless they would no longer be a relevant company in the arena. I- I think they can reduce, like, terrorism risks or, um, some, like, accidents, but beyond that, I don't think they can dramatically change the- the outcomes in too substantial of a way. They could... or b- because a lot of this is geopolitically determined. If companies decide to act very differently, um, e- e- e- there's the prospect of competing with China, um, or maybe- maybe Russia will become relevant later. And as that happens, this constrains their behavior substantially, so, um, I- I- I've been interested in tackling, um, AI at multiple levels. There's things companies can do to have some very basic anti-terrorism safeguards which are pretty easy to implement. There's also the economic effects, uh, that will need to be managed well, and companies can't really change, um, how that goes either. Um, it's going to cause mass disruptions to, uh, labor, um, and automate a lot of digital labor. If they, you know, tinker the design choice or add some different refusal data, it doesn't change that fact. Safety or making AI go well and the risk management is just much more of a broader problem. It's got some technical aspects, but I- I- I think that's, uh, a small part of it.
3:12 – 4:48
Distinguishing alignment and safety
1. DHDan Hendrycks
2. SGSarah Guo
  I don't know that the leaders of the labs would say, like, "We can do nothing about this," but maybe it's also a question of, you know, everybody also has, like, equity in this equation, right? Um, maybe it's also a question of semantics. Like, can you describe how you think of the difference between, like, alignment and safety as you think about it?
3. DHDan Hendrycks
  I'm just using safety as a sort of catchall for, like, dealing with- with risks. There are other risks, like if you never get, um, uh, really intelligent AI systems, that poses some risks in itself. There's- there's other sorts of risks that don't run... that are not as necessarily technical, like- like concentration of power. So, I- I view the distinction between alignment and safety, um, as alignment as being a sort of subset of safety. Obviously, you want the value systems of the AIs to be in keeping with or compatible with, um, say, the US public for US AIs or for you as an individual, but that doesn't make it necessarily safe. If you ha- if you have an AI that's reliably obedient or aligned to you, this doesn't make everything work totally well. China can have AIs that are totally aligned with them. The US can have AIs that are totally aligned with them. You still are going to have, uh, a strategic competition between the two. This is going to, um... Th- they're gonna need to integrate it in their militaries, they're probably gonna need to integrate it really quickly, this competition's gonna force them to have a higher risk tolerance in the process, so even if the AIs are doing their principles as biddings, um, reliably, this doesn't necessarily make, uh, the overall situation perfectly fine. I- I think it's, um, not just a question of reliability or whether they do what you want, there are other structural pressures that cause this to be riskier, like the
4:48 – 9:59
AI’s impact on national security
1. DHDan Hendrycks
  geopolitics.
2. SGSarah Guo
  At the highest level, like bundle of weights, increasingly capable, like why do we care about AI from a national security perspective? Like, what's the most practical way, uh, it matters in geopolitics or gets used as a weapon?
3. DHDan Hendrycks
  I- I think that AI isn't that powerful currently in many respects, so i- in many ways it's not actually that relevant for national security currently. This could well change within a year's time. I- I think generally I've been focused on the- the trajectory that it's on as opposed to saying right now it is extremely concerning. That said, there are some spec- for instance, for cyber, I don't think AIs are that relevant for being able to pull off a devastating cyberattack on the grid by a malicious actor currently. That said, we should look at cyber and be prepared and think about what its strategic implications. There are other capabilities like virology. The AIs are getting very good at STEM, PhD level types of topics, and that includes virology, so I think that they are sort of rounding the corner on, um, being able to provide expert level capabilities in terms of, uh, their knowledge of the literature or even helping in prac- practical wet lab situations. So I- I do think on the virology aspect, um, they do have already national security implications, but that's only very recently with the rea- reasoning models, uh, but, uh, in many other respects, they're not as relevant. It's more prospective, that it could well become the way in which a nation might, um, try and dominate another nation, um, and the- the backbone for not just war, but also just economic security. Uh, the amount of ships that the US has versus China might be the determiner or the determinant of, um, which country, um, is the most prosperous, uh, and which one falls behind, so. But this is all prospective. I- I don't think it's just speculative, it's speculative in the same way that, like, NVIDIA's valuation is speculative or the valuations behind AI companies are speculative. It's something that I think a lot of people are expecting, um, and expecting fairly soon.
4. SGSarah Guo
  Yeah, it's quite hard to think about time horizons in AI. We invest in things that I think of it as, like, medium term speculative, but they get pulled in quite quickly. You know, and just because you mentioned both cyber and bio, uh, w- we're investors in companies like, uh, Culminate or Cybil on the defensive cybersecurity side or, um, Chai and Somite on the biotech discovery side or, you know, modeling different, um, systems in biology that will help us with treatments. How do you think about the balance of, like, competition and benefits and safety? Because some of these things I think are... you know, w- we think they're working effectively in the near term on the positive side as well.
5. DHDan Hendrycks
  Yeah, I mean, I- I- I don't get this, um, this big trade-off between safety and... Uh, I mean, you're just taking care of a few tail risks. For bio, what... if you want to expose those capabilities, just, like, talk to sales, get the enterprise account. Here you can have the little refusal thing for virology, but if you just create an account a second ago and you're asking it how to...... um, culture this virus and you, here's your picture of your Petri dish and what's the next step that you should do. Uh, that, yeah, if you, if you want the access to those capabilities, you can speak to sales. So that's basically, um, XAI- in XAI's risk management framework. It's just we're not exposing those expert level capabilities to people who we don't know who they are. But if we do, then sure, have them. So I think you can... And likewise with cyber. I think you can just, uh, very easily capture the- the benefits while taking care of some of these, um, pretty avoidable tail risks. But then once you have that, you've basically taken care of malicious use for the- the models behind your, uh, your API. And that's about the best that you can do as a company. You could, you know, try and influence policy by using your- your voice or something. Uh, but, um, I don't see a substantial amount that they could do. They could, they could do some research for trying to make the models more, um, controllable or try and make policymakers be more aware of the situation, uh, more broadly in terms of where we're going. 'Cause I, I don't think policymakers have internalized what's happening (laughs) in AI at all. Uh, they, they still think it's a, like a, a, um, a... They're just selling hype and they don't actually believe, or the companies, that the employees don't actually believe that, uh, this stuff could, um, we, you know, we could get AGI and- and so to speak in the, in the next few years. So I don't know. I- I- I don't see like really substantial trade-offs there. I see much more subs- I- I- I think that the complications really come about when we're dealing with, um, like what's the right stringency in export controls, for instance. That's, that's complicated. Um, uh, if you turn the pain dial all the way up for China in export controls, um, and if AI chips are the currency of economic power in the future, then this increases the probability that they wanna invade Taiwan. They already want to. This would give them all the more reason if, like, AI chips are the main thing and they're not getting any of it and they're not even getting the latest semiconductor manufacturing tools for even making, um, cutting edge CPUs, let alone GPUs. So those are some other types of complicated, um, uh, problems that we, we have to address and think about and calibrate appropriately. But in terms of just mitigating virology stuff, just, just speak to sales if you're Genentech or- or a- a bio startup, and then, um, you have access to those capabilities.
9:59 – 14:43
How might AI be weaponized?
1. DHDan Hendrycks
  Problem solved.
2. SGSarah Guo
  What is a way you actually expect that AI gets used as a weapon beyond virology and, and security, yeah?
3. DHDan Hendrycks
  I wouldn't expect, um, uh, a bioweapon from a state actor. From a non-state actor, um, that, that would make a lot more sense. The... I- I- I think cyber makes sense from a, from state actors and both non-state actors. Uh, then there's drone applications. These could disrupt, um, other things. These could help with other types of weapons research, like help explore exotic EMPs, um, could help, um, uh, create better types of drones, could substantially help with situational awareness, uh, uh, so that one might know where, you know, all the nuclear submarines are. Um, some advancement in AI might be able to help with that, and that could disrupt, uh, uh, our second strike capabilities, um, and mutually assured destruction. So, uh, those are some geopolitical implications. It could potentially bear on nuclear deterrence, and that's not even a weapon. The example of just heightened situational awareness and being able to pinpoint where, um, hardened, um, uh, land, uh, nuclear launchers are or where nuclear submarines are, um, is- is just informational, uh, but could nonetheless be extremely disruptive, so, or, and destabilizing. Outside of that, the- the default conventional AI weapon would be drones. Um, which is, um, I don't know if that makes sense that comp- or that- that countries would compete on that. And, uh, I think that would be a mistake if the US weren't (laughs) , um, trying to-
4. SGSarah Guo
  Yeah.
5. DHDan Hendrycks
  ... do more in manufacturing drones, so.
6. SGSarah Guo
  Yeah. I, um, started working recently with an electronic warfare company. I think there's a massive under m- uh, lack of understanding of just like the basic concept of, you know, we have autonomous systems. They all have communication systems. Our missile systems have targeting and communication systems. And, um, from a battlefield awareness and control perspective, like, a lot of that, um, fought will be won with, uh, radio and radar and related systems, right? And- and so-
7. DHDan Hendrycks
  Mm-hmm.
8. SGSarah Guo
  ... I- I think there's an area where AI is going to be very relevant and is already very relevant in Ukraine.
9. DHDan Hendrycks
  Speaking about, um, AIs assisting with like command and control, I mean, um, I- I- I remember... Was hearing some story about, uh, um, how on Wall Street humans used to not be able to... You always had a human in the loop for each decision. So at a later stage before they removed that requirement on Wall Street, you just had, um, (laughs) rows of people just clicking the accept, accept, accept button (laughs) . And, uh, we're kind of getting to a similar state, um, in- in some contexts with, um, uh, with- with AI. It wouldn't surprise me if we'd end up automating some- more of that decision-making. But, so th- this just turns into questions of reliability and doing some, doing some reliability research seems, seems useful. To return to that- that larger question of, um, what, where, where are the- the sort of safety trade-offs? I think it's people are largely thinking that this... the push for- for risk management is to do some sort of pausing or something like that. Um, an issue is you need teeth behind an agreement. If you do it voluntarily, you just make yourself less powerful and you let the worst actors get ahead of you. Um, you could say, "Well, we'll sign, sign a treaty." Um, we will not assume that the treaty will be followed. Like, that- that would be very imprudent. You would actually need some sort of threat of force or something to back it up, some verification mechanism. But absent that, if it's entirely voluntary, then this doesn't seem like a useful thing at all. So I think people's conflation of- of safety, we- we- we... what we must do is we must voluntarily slow it down. It just doesn't make, um, as much, um, geopolitical sense unless you have, um, some- some threat of force, um, to- to back it up or some very strong verification mechanism. Um, but in... absent that-
10. SGSarah Guo
  As a proxy, there's clearly been very little, um, compliance to either treaties or norms around cyberattacks and around corporate espionage, right?
11. DHDan Hendrycks
  Mm-hmm. Yeah. I mean, corporate espionage, for instance, that was one strategy, this sort of voluntary pause strategy. People thinking that equals safety. And then, and then maybe last year, there was that paper, Situational Awareness, where people, written by Leopold Ashenbrenner, and he's a sort of a, a safety person. So his idea was let's instead try and beat China to super intelligence as much as possible. But that has some sort of weaknesses because, like, it assumes that corporate espionage will not be a thing, uh, at all, (laughs) um, which is very difficult to do. I mean, we have, you know, some places, you know, 30% plus of the employees at these top AI companies are, like, Chinese nationals. I mean, this is, um, not feasible. If you're gonna get rid of them, they're gonna go to China and then they're probably gonna beat you because they're extremely important, (laughs) for, for the, the US's success. Um, so you're gonna want to keep them here, but that's gonna expose you to some, uh, information security types of issues, but that's just too bad.
14:43 – 17:50
Immigration policies for AI talent
1. DHDan Hendrycks
2. SGSarah Guo
  Do you have a point of view on how we should change immigration policy, if at all, given these risks?
3. DHDan Hendrycks
  So I would, of course, claim that this is, to, the policy on this should be totally separate from southern border policy and other... and, and, and broader policy. But if we're talking about AI researchers, if they're very talented, then I think you'd want to make it easier, and I think that it's probably too difficult for many of them to, to stay currently. I, and I think that that discussion should be kept s- totally separate from southern border policy.
4. SGSarah Guo
  Just in terms of broad strokes, like things that you think won't work, uh, voluntary compliance, and assuming that'll happen, or, uh, just straight race?
5. DHDan Hendrycks
  So we want to be competitive, and I think it's, I think racing in other sorts of spheres, say drones or AI chips, seems fine. Um, uh, if you're saying let's race to super intelligence to try and get... and turn that into a weapon to crush them, and they're not going to do the same or they're not going to have access to it or they're not going to prevent that from happening, that seems like quite a tall claim. I mean, if, if, um, w- we did have a substantially better AI, they could just co-opt it. They could just steal it. Um, unless you had really, really strong in- information security, like you, you, you move the AI researchers out to the desert, but then you're reducing your probability of actually beating them because a lot of your best scientists ended up going, um, going to, um, back to China. Even then, if there were signs that they were really pulling ahead and going to be able to get some powerful AI that will crush, that will enable China... or that would enable the US to crush China, they would then try to deter them from doing something like that. They're not gonna sit idly by and say, "You know what? Yeah, go ahead. Develop your, develop your super intelligence or whatever, and then you can boss us around and we'll just accept your dictates, uh, till the end of time." So that, that... I, I think that there is kind of a, a failure of some sort of second-order reasoning going on there, which is, well, how would China respond to this sort of maneuver if we're building a trillion-dollar compute cluster in, in the desert, (laughs) totally visible from space?
6. SGSarah Guo
  (laughs)
7. DHDan Hendrycks
  Um, and, (laughs) it's basically the only pl- plausible read on this is that this is a bid for, uh, for dominance or a sort of monopoly on super intelligence. It, uh, uh... So, so I think it's, um... Uh, it reminds me of, um, in, in the nuclear era, there's a brief period where some people were saying, "You know what? We got to just, like, preemptively destroy or preventively destroy the USSR. We got a nuke 'em." Um, even people, even pacifists or people who are normally pacifists, like Bertrand Russell, were advocating for this. The opportunity window for that was, like, maybe didn't ever exist, um, but there was, there was a, a, a prospect of it for some time. Uh, but I, I don't think that the opportunity window really exists here because of the complex, um, um, dependence and the multinational talent, um, dependence in, in the United States, that I don't think you can have China be totally severed from, um, any awareness, um, or any ability to, um, uh, gain insight, um, or imitate what we're doing here.
8. SGSarah Guo
  We're clearly nowhere close to that as a-
9. DHDan Hendrycks
  No, I would-
10. SGSarah Guo
  ... real environment right now, right? So, so-
11. DHDan Hendrycks
  No, it would take years.
12. SGSarah Guo
  Yeah.
13. DHDan Hendrycks
  It would take years to do well, and, like, I don't even think the timelines for some very powerful AI systems, they... there might not even be enough time to do that securitization
17:50 – 22:54
Mutually assured AI malfunction
1. DHDan Hendrycks
  anyway.
2. SGSarah Guo
  So, okay. In reaction, uh, you propose, along with some, you know, other esteemed authors and friends, Eric Schmidt and Alex Wang, a new deterrence regime, uh, mutually assured AI malfunction. I think that's the right name. MAIM, bit of a scary acronym, and also a nod to mutually assured destruction. Can you explain MAIM in, uh, plain li- language?
3. DHDan Hendrycks
  Let's think of what happened in, in nuclear strategy. Basically, a lot of, a lot of states deterred each other from doing a first strike because they could then retaliate. They had a shared vulnerability. So they're, they were, "We're not gonna do this really aggressive action of trying to make a bid to wipe you out because that will end up causing us to be damaged." And we have a somewhat similar situation later on, um, when AI is more salient, when it is viewed as pivotal to the future of, of a nation. When people are on the verge of making a super intelligence more, when, when they can, say, automate, you know, pretty much all AI research, I, I think states would try to deter each other from trying to leverage that to, um, develop it into something like a super weapon that would allow the o- other countries to be crushed or use those AIs to do, um, uh, some really rapid automated AI research and development loop that could, um, have it bootstrap from its current levels to something that's, um, a super intelligent, vastly more capable than, than a- any other system out there. I think that later on, it becomes so destabilizing that China just says, "We're going to do something preemptive, like do a cyberattack on your data center," and the US might do that to China.Um, in Russia, get, coming out of Ukraine will, you know, reassess the situation, uh, get, get situationally where it's think, "Oh, what's going on with the US and China? Oh my goodness, they're so ahead on AI. AI is looking like a big deal." Let's say it's later in the year when, you know, a big chunk of software engineering is, is starting to be impacted by AI. Uh, "Oh wow, this is looking pretty relevant. Hey, if you try and use this to crush us, we will prevent that by doing a cyber attack on you and we will keep tabs on your projects." Because it's pretty easy for them to do that espionage. All they need to do is do a zero day on Slack, and then they can know what DeepMind is up to in very high fidelity, and OpenAI and xAI and others. Um, so it's, it's, it's pretty easy for them to do espionage and sabotage. Right now, they don't need to be, uh, they wouldn't be threatening that because it's not at the level of severity, it's not actually that potentially destabilizing. It's still too distant, the capabilities. Um, a lot of decision-makers still aren't taking this AI stuff s- that seriously, relatively speaking. But I think that'll change as it gets more powerful. Uh, and then I think that this is how they would end up responding. And this makes us not wind up in a situation where we are doing something extremely destabilizing, like trying to create some weapon that enables, uh, one country to, like, totally wipe out the other and, uh, as was proposed by, um, uh, people like Leo. (laughs)
4. SGSarah Guo
  What are the parallels here that you think make sense to nuclear and don't?
5. DHDan Hendrycks
  I think that more broadly, just say as a dual use technology, dual use could be to civilian applications, it has military applications, um, you know, it, its economic applications are still, you know, in some ways limited and likewise its military applications are still, uh, um, limited. But I, I think that will, uh, keep changing rapidly. Like chemical, it was important for the economy. Um, it had some, uh, military use, but, um, uh, they kind of coordinated not to, uh, go down the chemical route, and bio as well, um, can be used as a weapon and, uh, has enormous, um, economic applications, and likewise with nuclear too. Um, so I, I think it has some of those, those properties. For each of those technologies, countries did eventually coordinate to, um, to make sure that it didn't wind up in the hands of rogue actors like terrorists. There have been a lot of efforts taken to make sure it doesn't, uh, that rogue actors don't get access to it and use it against them because it's in neither of their interests. Basically, like bio weapons, for instance, and chemical weapons are a poor man's atom bomb, and this is why we have the Chemical Weapons Convention and Bio-Weapons Convention. That's where there's some shared interest. So they might be rivals in other senses, in the way that the US and the Soviet Union were rivals. But there's still, um, uh, coordination on that because it was incentive-compatible. There's b- there's... It doesn't benefit them in any way if the, if, if terrorists have access to, to these sorts of things. Uh, it, it's, it's just inherently destabilizing. So, um, I think that's an opportunity for, for, um, uh, coordination. That isn't to say that they have an incentive to both, um, pause all forms of AI development. But it may mean that they would be deterred from some particular forms of, of AI development, in particular ones that have a very plausible prospect of, uh, enabling one country to get a decisive edge over another and crush them. Um, so no, like, super weapon type of stuff, but more conventional type of warfare, like drones and things like that, I expect that they'll continue to race and, um, probably not, may- maybe not even coordinate, um, on anything like that, but and that's just how things will go. That's just, you know, bows and arrows and nuclear. It just made sense for them to develop those sorts of weapons and, um, thr- threaten each other with them.
6. SGSarah Guo
  If, uh,
22:54 – 25:34
Policy suggestions for current administration
1. SGSarah Guo
  you all could propose a magical adoption tactically of some policy or action to the current administration, what is the first step here? It is the-
2. DHDan Hendrycks
  Yeah.
3. SGSarah Guo
  ... you know, "We will not build a super weapon and we're gonna be watching for other people building them too."
4. DHDan Hendrycks
  As I've sort of been alluding to throughout this whole conversation, like, what would the companies do? Like, mm-hmm, not that much. I mean, add some basic anti-terrorism safeguards, but I think this is, like, pretty technically easy. This is unlike refusal for other things. Refusal robustness for other things is harder. Like, if you're trying to get at, like, crimes and torts, that, that, that's harder because it's, it's a lot messier to overlap, so typical everyday interaction. I think likewise here, the, the asks for states are not that challenging either. I, I, I, I... It's just a matter of them doing it. So one would be the CIA has a cell that's doing more espionage of other states' AI programs, so that way they have a better sense of what's going on and aren't caught by surprise. And then secondly, maybe some part of government, like let's say CYBERCOM, which has a, a lot of cyber offensive capabilities, um, gets some cyberattacks ready to, um, disable, um, other data centers in other countries if they are looking like they are doing something, running a, or creating a destabilizing AI project. That's it for the deterrents. For non-proliferation of, of AI chips to rogue actors in particular, I think there'd be, um, some adjustments to export controls, in particular just knowing where the AI chips are at re- reliably. We wanna know where the AI chips are at for the same reason we want to know where our fissile material is at, um, for the same reason that we want Russia to know where its fissile material is at. Like, it's just, that's just generally a good, um, bit of information to collect, and that can be done with some very basic states craft of having a licensing regime, and for allies, they just notify you whenever it's being, um, shipped to a different location and they get a license exemption, um, uh, on that basis. And then you have enforcement officers prioritize doing some basic, um, uh, inspections for AI chips for, um, end use checks in. So I think, like, all of these are, um-... a few texts away, um, or a, uh, basic, um, document away. And I think that, that kind of, like, 80/20 is a lot of it. Of course, this is, this is, uh, always a changing situation. Um, safety isn't, uh, as I've been, um, trying to reinforce, not really that much of a technical problem. This is more of a complex, um, uh, geopolitical problem with, with technical aspects. Later on, maybe we'll need to do more. Maybe we will, um... there might be some new risk sources that we need to, to take care of and adjust. But I think, like, right now, I think that espionage through CIA, um, uh, sabotage with CYBERCOM, um, building up those capabilities, buying those options seems like it, that takes care of,
25:34 – 30:37
Compute security
1. DHDan Hendrycks
  uh, a, a lot of the risk.
2. SGSarah Guo
  Let, let's talk about compute security.
3. DHDan Hendrycks
  Mm-hmm.
4. SGSarah Guo
  Um, if we're talking about 100,000 networked state-of-the-art chips, you can tell where that is. How does DeepSeek and the recent releases they've had factored into your view of compute security, given expert controls have clearly led to innovation toward highly compute efficient pre-training that works on chips that China can import at what might, one might consider, like, an irrelevant scale, a much smaller scale today? Um, it's hard for me to see directionally, that training becoming less efficient, even if we, even if people want to scale it up. And so, uh, like, does that change your view at all?
5. DHDan Hendrycks
  No, I think it, it just sort of undermines other types of strategies, like the, this, um, uh, Manhattan Project type of strategy of let's, you know, move people out to the desert and do a big cluster there and, uh... What it shows is that you can't rely as much on restricting and other superpowers as capabilities, their ability to make models. So you can restrict their intent, which is what deterrence does, but I don't think you can reliably or robustly restrict their capabilities. You can restrict the capabilities of rogue actors, and that's what I would want things like compute security and export controls to facilitate with, and make sure it doesn't, you know, wind up in the hands of Iran or something. China will probably keep getting some fraction of these chips, but we should basically just try and know where they're at more, and we can tighten things up. Um, but I would primarily... You could even coordinate with China to, um, make sure that the chips aren't winding up in rogue actors' hands. I, I should also say that the export controls, it wasn't actually a priority among leadership at BIS, to my understanding, a substantial priority, the A- A- AI chips. Um, for some people, but for the enforcement officers, like, did they, did any of them go to Singapore to see where (laughs) these 10% of NVIDIA's chips were going? Um, uh, I think that would have... c- they would have very quickly found, oh, they were going to, to China. So some basic, um, end-use check would have, would have taken care of that. I don't think this is that export controls don't work. We- we've had... We've done non-proliferation of lots of other, um, things like chemical agents and, and, and fissile material, um, so it, it, it can be done if, if people care. Um, but even so, I, I, I still think if you really tighten the export controls, you mean so that China can't get any of those chips at all and this was your... one of your biggest priorities, they're just gonna steal the waits anyway. I think it'll be too difficult to s- totally restrict their capabilities, but I think you can restrict their intent through deterrence.
6. SGSarah Guo
  It also seems like either stuff is powerful or it's not. It seems infeasible to me, given the economic opportunity, that China will say, "We don't need the capability."
7. DHDan Hendrycks
  Yeah. Yeah.
8. SGSarah Guo
  I, uh, fail to see a version of the world where gre- like, leadership in another great power that believes that there is value here, says, "We don't need that from an economic value perspective."
9. DHDan Hendrycks
  Yeah, that's right. Yeah. I... Uh, just, just, um, for, for a lot of these, it would be... maybe it would be nicer if everything went, you know, 3X slower, and maybe there'd be fewer, like, mess-ups if, if there's, like, some magic button that would do that. And I, I don't know whether that's true or not, actually. I don't have a position on that. Given the structural constraints and the competitive pressures which... between these, between these companies, between these, these, these states, it just makes a lot of these things infeasible. Um, uh, a- a- a lot of these other gestures that could be useful for risk mitigation when you consider them, uh, or when you, uh, think about the, the structural realities of it, it just, it just becomes a lot less tractable. That said, there still would be s- in some ways, some pausing or halting of development of particular projects that you could potentially lose control of, or that, um, uh, or that if, if controlled, would be very destabilizing 'cause it would enable one country to rush the other. I think people's, um, conceptions about what risk management looks like is it's, um... uh, that people think it's a peacenik thing or something like that, like i- i- it's, it's all kumbaya, um, and, um, we, we just have to, um, I- I- ignore structural realities i- in, um, in operating in this space. I think instead, the, the right approach, um, toward this is that it's sort of like n- nuclear strategy, like, it is an evolving situation. It depends. There's some basic things you can do, like you're probably going to need a stockpile of nuclear weapons. You're gonna need a secure second strike. You're gonna need to keep an eye on what they're doing. You're gonna need to make sure that there isn't proliferation of rogue actors, um, when the capabilities are extremely hazardous. And this is a continual battle, but it's not, you know... it's not gonna be clearly an extremely positive thing no matter what. It's not gonna be doomsday no matter what for nuclear strategy. Um, it was obviously risky business. The Cuban Missile Crisis became pretty close to (laughs) an all-out, uh, nuclear war. It depends on what we do, um, uh, and, um, I think there's some basic interventions and some very basic statescraft can take care of... uh, can take care of a lot of these, these, uh, uh, sorts of risks and, uh, make it, uh, manageable. I, I, I imagine then we're left with more domestic type of problems, like what to do about automation and, and things like that. But I think maybe we'll be able to get a handle on some of the geopolitics
30:37 – 36:23
Current state of evals
1. DHDan Hendrycks
  here.
2. SGSarah Guo
  I wanna change tacks for our last couple of minutes and talk about evals, um, and it's obviously very related to, uh, safety and understanding where we are in terms of capability. Can you just contextualize where, w- where you think we are? Uh, you came out with triggeringly named Humanity's Last Exam eval, and then also Enigma. Um, like, why are these relevant and where are we in evals?
3. DHDan Hendrycks
  Yeah, yeah. So for context, I've been making evaluations trying to understand where we're at in this, um, in, in AI for, uh, I don't know, about as long as I've been doing AI research. Uh, so previously, I've done some, um, datasets like MMLU and the math dataset. Before that, before ChatGPT, there's things like ImageNet-C and, and other sorts of things. So Humanity's Last Exam was basically an attempt at...... getting at what's the- what would be the, um, end of the road for the evaluations and benchmarks that are based on exam-like questions, ones that test some sort of academic type of knowledge. So, for this, we asked professors and researchers around the world to submit a really challenging question, and then we would add that to the, the dataset. So it's a big collection of what professors, for instance, would encounter as challenging problems in their, in their research, uh, that have a definitive closed-ended objective answer. With that, I, I think the genre of here's a closed-ended answer where it's just, you know, multiple choice or a simple short answer, I think that genre will roughly be expired when performance on this dataset is, uh, near the ceiling. So, and when performance is near the ceiling, I think that'd basically be an indication that, like, you have something like a superhuman mathematician, um, or a superhuman STEM scientist for, i- in many ways, for when there are- when closed-ended questions, um, are, are very useful, uh, such as in math. But (clears throat) it doesn't get at other things to measure, such as what's its ability to perform open-ended tasks? So that's more agent type of evaluations, and I think that will take, um, more time. So we'll, you know, try and measure just directly what's its ability to automate various digital tasks, like collect various digital tasks, see, you know, have it work on them for a few hours, see if they successfully completed them, something like that coming out soon. We, we have a test for closed-ended questions, things that test knowledge in the academy and, like, things like mathematics, but, um, they still are very bad at agent stuff. This could possibly change overnight, but, um, it's, it's still near the floor. I think they're still extremely defective as agents. So, uh, there'll need to be more evaluations for that. But, but the overall approach is just to try and understand what's going on, um, what's, what's the rate of- what's the rate of development, um, uh, so that the public can at least, like, understand what's happening, uh, 'cause if all the evaluations are saturated it's- i- i- it's difficult to even have a conversation about the, the, the state of AI. Nobody really knows exactly where it's at or where it's going or what the rate of improvement is, so.
4. SGSarah Guo
  Is there anything that qualitatively changes, um, when, uh, let's say, these models are- and model systems are just better than humans, right? Like, exceeding human capability in how we do evals, does it change our ability to evaluate them?
5. DHDan Hendrycks
  So I think the intelligence frontier is just so jagged, what things they can do and can't do is often surprising. They still can't fold clothes. They can answer a lot of tough physics problems, though. Why that is, it's, you know, they're, they're complicated reasons. So it's not, um, all uniform. And so in some ways, they'll be better than humans. It seems totally plausible that they'll be better at human's mathematics not too long from now, um, but still not able to book a flight. Y- the implications of that are wh- when you have them being better, they just might be better in some limited ways, and that just might have kind of limited, uh, just, just influence its domain, but not necessarily generalized to, to other sorts of things. But I, I do think it's possible that they'll be better at reasoning skills than us. We still could have humans checking 'cause they can still verify. If, if a- if an AI mathematician is better than a human, humans can still run the proof through a proof checker, and then confirm that it was correct. So in that way, humans can still, um, understand what's going on i- i- in some ways. But in other ways, like, if they're getting better taste in things, if there's- if that makes any sense, maybe it doesn't make any philosophical sense, that would be pretty difficult for, um, uh, people to, to confirm. I think we're, we're on track overall to have AIs that are, like, have really good oracle-like skills, like you can ask them things and just, wow, uh, it j- it just totally said something in- insightful or very non-trivial or push the bounds of knowledge in some particular way, but, um, not necessarily able to carry out tasks on behalf of people for some while. Uh, so I, I think this is why we don't take the AIs that seriously 'cause they still can't do, like, a lot of- a lot of very trivial stuff. But when they get some of the agent skills, then I don't think that there are many barriers for their economic impacts, um, or people thinking that this is kind of an interesting thing to this being the most important thing. I think that's an emergent property with (laughs) agent skills, that the vibes really shift and it's pretty clear that this is, um, the, uh, much bigger than, you know, some prior t- technology like the ap- uh, the App Store or social media. (laughs) It's, it's, uh, it's in a category of its own. (laughs)
6. SGSarah Guo
  Well, Dan, thanks for doing this. It was a great conversation.
7. DHDan Hendrycks
  Yeah. Glad- thank you for having me. Yeah.
8. SGSarah Guo
  Find us on Twitter @nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way, you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.

Episode duration: 36:24

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode doPZfdZkxso

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome