
No Priors Ep. 105 | With Director of the Center of AI Safety Dan Hendrycks
Sarah Guo (host), Dan Hendrycks (guest)
In this episode of No Priors, featuring Sarah Guo and Dan Hendrycks, No Priors Ep. 105 | With Director of the Center of AI Safety Dan Hendrycks explores aI, Geopolitics, and Nuclear Parallels: Dan Hendrycks’ Safety Playbook Dan Hendrycks, director of the Center for AI Safety, argues that AI safety is primarily a geopolitical and strategic problem, not just a technical alignment issue. He believes labs can mitigate obvious misuse (e.g., terrorism, bio/cyber help) but cannot meaningfully control macro outcomes driven by state competition, especially between the U.S. and China. Hendrycks lays out a deterrence framework he calls “Mutually Assured AI Malfunction” (MAIM), drawing analogies to nuclear strategy and advocating for espionage, cyber-sabotage options, and chip-tracking regimes to prevent destabilizing AI ‘superweapons’ and rogue-actor access. He also discusses the current state of AI evaluations, explaining Humanity’s Last Exam as a near-terminal benchmark for exam-style tasks, and forecasts a future where models become superhuman oracles in STEM long before they become competent agents at everyday tasks.
AI, Geopolitics, and Nuclear Parallels: Dan Hendrycks’ Safety Playbook
Dan Hendrycks, director of the Center for AI Safety, argues that AI safety is primarily a geopolitical and strategic problem, not just a technical alignment issue. He believes labs can mitigate obvious misuse (e.g., terrorism, bio/cyber help) but cannot meaningfully control macro outcomes driven by state competition, especially between the U.S. and China. Hendrycks lays out a deterrence framework he calls “Mutually Assured AI Malfunction” (MAIM), drawing analogies to nuclear strategy and advocating for espionage, cyber-sabotage options, and chip-tracking regimes to prevent destabilizing AI ‘superweapons’ and rogue-actor access. He also discusses the current state of AI evaluations, explaining Humanity’s Last Exam as a near-terminal benchmark for exam-style tasks, and forecasts a future where models become superhuman oracles in STEM long before they become competent agents at everyday tasks.
Key Takeaways
AI safety extends far beyond alignment and technical mitigations inside labs.
Hendrycks frames alignment as just one subset of ‘safety’; even perfectly obedient AIs can still drive destabilizing arms races, economic upheaval, and dangerous strategic competition between major powers.
Get the full analysis with uListen AI
Labs can and should implement straightforward misuse safeguards, but cannot solve the geopolitical problem.
He argues that companies can handle tail risks like casual bioterror queries (e. ...
Get the full analysis with uListen AI
Trying to “race to superintelligence” as a U.S. advantage is strategically fragile.
Hendrycks criticizes strategies that assume the U. ...
Get the full analysis with uListen AI
MAIM proposes deterring destabilizing AI ‘superweapon’ projects rather than all AI development.
By analogy to nuclear deterrence, MAIM relies on shared vulnerability: states use espionage and prepared cyber-sabotage options against each other’s data centers to dissuade attempts at building AI systems capable of delivering a decisive strategic edge.
Get the full analysis with uListen AI
Compute security and chip tracking are practical, near-term tools for nonproliferation.
He advocates for basic statecraft—licensing regimes, end-use checks, and knowing where advanced AI chips are—similar to fissile material tracking, especially to keep powerful compute from rogue actors, even if China’s capabilities can’t be fully constrained.
Get the full analysis with uListen AI
Export controls can limit rogue actors but will not permanently hobble major powers.
DeepSeek-style efficiency and model theft mean we cannot robustly restrict China’s capabilities; instead, policy should focus on shaping intent via deterrence and on coordination to prevent proliferation to terrorists or pariah states.
Get the full analysis with uListen AI
Evals are moving from exams to real-world task and agentic performance measurement.
Humanity’s Last Exam aims to be the “final boss” of exam-style STEM benchmarks, indicating near-superhuman academic reasoning when saturated, but Hendrycks stresses we still need robust evaluations for open-ended, agentic tasks and real economic work.
Get the full analysis with uListen AI
Notable Quotes
“Safety or making AI go well and the risk management is just much more of a broader problem. It's got some technical aspects, but I think that's a small part of it.”
— Dan Hendrycks
“If you want to expose those [bio] capabilities, just talk to sales, get the enterprise account... We're not exposing those expert level capabilities to people who we don't know who they are.”
— Dan Hendrycks
“If you do it voluntarily, you just make yourself less powerful and you let the worst actors get ahead of you.”
— Dan Hendrycks
“You can't rely as much on restricting another superpower’s capabilities... You can restrict their intent, which is what deterrence does.”
— Dan Hendrycks
“We're on track overall to have AIs that have really good oracle-like skills... but not necessarily able to carry out tasks on behalf of people for some while.”
— Dan Hendrycks
Questions Answered in This Episode
How can policymakers practically distinguish between ‘normal’ military AI applications (like drones and EW) and destabilizing ‘superweapon’ AI projects that should trigger deterrence or intervention?
Dan Hendrycks, director of the Center for AI Safety, argues that AI safety is primarily a geopolitical and strategic problem, not just a technical alignment issue. ...
Get the full analysis with uListen AI
What kinds of verification or monitoring mechanisms would be realistic for a MAIM-style regime, given the difficulty of inspecting software and training runs compared to tracking nuclear materials?
Get the full analysis with uListen AI
How should democratic societies balance openness in AI research with the need to reduce espionage risks from rival states without triggering a brain drain of top researchers?
Get the full analysis with uListen AI
As models surpass human performance on Humanity’s Last Exam and similar benchmarks, what new eval paradigms will we need to assess agentic behavior, reliability, and real-world impact?
Get the full analysis with uListen AI
If AI becomes a “poor man’s superweapon” for non-state actors in bio or cyber domains, what additional governance or technological controls beyond chip tracking and enterprise gating will be necessary?
Get the full analysis with uListen AI
Transcript Preview
(instrumental music plays) Hi, listeners, and welcome back to No Priors. Today, I'm with Dan Hendrycks, AI researcher and director of the Center for AI Safety. He's published papers and widely used evals, such as MMLU and most recently, Humanity's Last Exam. He's also published Superintelligence Strategy, alongside authors including former Google CEO Eric Schmidt and Scale founder Alex Wang. We talk about AI safety and geopolitical implications, analogies to nuclear, compute security, and the state of evals. Dan, thanks for doing this.
Glad to be here.
How'd you end up working on AI safety?
AI was pretty clearly going to be a big deal if one would just think through its conclusions, so early on, um, it- it seemed like other people were ignoring it because it was, um, weirder or not that pleasant to think about, it's- it's hard to wrap your head around, but it seemed like the most important thing during this century, so I- I thought that that would be a good place to develop my career toward, um, and so that's why I started on it early on. And then since it'd be such a big deal, um, we need to make sure that we can think about it properly, um, channel it in a- a- a productive direction and take care of some sort of, um, tail risks which will... or generally systematically under-addressed, so that- that's why I got into it. It's a big deal and people weren't really doing much about it, um, at the time.
And- and what do you think of as the- the center's role versus safety efforts within the large labs?
Well, there aren't that many safety efforts in the (laughs) labs even now. I mean, I think the labs can just focus on doing some very basic measures to refuse, um, queries related to, like, help me make a virus and things like that. But, um, I- I don't think labs have a extremely large role in safety overall or making this go well. They're kind of predetermined to race. They can't really choose not to unless they would no longer be a relevant company in the arena. I- I think they can reduce, like, terrorism risks or, um, some, like, accidents, but beyond that, I don't think they can dramatically change the- the outcomes in too substantial of a way. They could... or b- because a lot of this is geopolitically determined. If companies decide to act very differently, um, e- e- e- there's the prospect of competing with China, um, or maybe- maybe Russia will become relevant later. And as that happens, this constrains their behavior substantially, so, um, I- I- I've been interested in tackling, um, AI at multiple levels. There's things companies can do to have some very basic anti-terrorism safeguards which are pretty easy to implement. There's also the economic effects, uh, that will need to be managed well, and companies can't really change, um, how that goes either. Um, it's going to cause mass disruptions to, uh, labor, um, and automate a lot of digital labor. If they, you know, tinker the design choice or add some different refusal data, it doesn't change that fact. Safety or making AI go well and the risk management is just much more of a broader problem. It's got some technical aspects, but I- I- I think that's, uh, a small part of it.
Install uListen to search the full transcript and get AI-powered insights
Get Full TranscriptGet more from every podcast
AI summaries, searchable transcripts, and fact-checking. Free forever.
Add to Chrome