Eliezer Yudkowsky — Why AI will kill us, aligning LLMs, nature of intelligence, SciFi, & rationality

Eliezer Yudkowsky — Why AI will kill us, aligning LLMs, nature of intelligence, SciFi, & rationality

Dwarkesh PodcastApr 6, 20234h 3m

Dwarkesh Patel (host), Eliezer Yudkowsky (guest), Dwarkesh Patel (host), Narrator, Narrator, Narrator, Dwarkesh Patel (host), Narrator

Motivation and impact of Yudkowsky’s call for an AI training moratoriumLikelihood and mechanisms of AI-driven human extinction or disempowermentLarge language models: imitation, inner goals, and limits of RLHF alignmentOrthogonality thesis and how intelligence decouples from human valuesHuman intelligence enhancement as a proposed alternative to smarter AIsLimits of using AIs to solve AI alignment and verification difficultiesInterpretability, institutional incentives, and the societal response to AI riskRationality, scientific mindset, and Yudkowsky’s efforts to train successors

In this episode of Dwarkesh Podcast, featuring Dwarkesh Patel and Eliezer Yudkowsky, Eliezer Yudkowsky — Why AI will kill us, aligning LLMs, nature of intelligence, SciFi, & rationality explores eliezer Yudkowsky explains why advanced AI likely ends humanity soon Eliezer Yudkowsky argues that current AI progress, especially large language models, is on track to produce superintelligence that will almost certainly disempower or kill humanity if not stopped. He believes alignment is vastly harder than most assume, cannot be safely outsourced to AIs themselves, and that present techniques like RLHF only superficially shape behavior while leaving dangerous underlying motivations untouched.

Eliezer Yudkowsky explains why advanced AI likely ends humanity soon

Eliezer Yudkowsky argues that current AI progress, especially large language models, is on track to produce superintelligence that will almost certainly disempower or kill humanity if not stopped. He believes alignment is vastly harder than most assume, cannot be safely outsourced to AIs themselves, and that present techniques like RLHF only superficially shape behavior while leaving dangerous underlying motivations untouched.

He calls for an immediate, global halt on large training runs and suggests our only plausible “exit strategies” involve radically enhancing human intelligence or sanity, not building ever-smarter AIs. Throughout, he defends the orthogonality thesis (intelligence and goals are largely independent), critiques optimistic takes based on current LLM behavior, and stresses how little we actually understand about these systems’ inner workings.

On the societal side, Yudkowsky is pessimistic that governments or labs will act in time, but he is trying to “say what a sane planet would do” in the faint hope that sufficient political will and interpretability progress emerge before catastrophic capabilities arrive.

Beyond AI, he reflects on rationality, why it hasn’t “systematized winning” at scale, the difficulty of training new alignment researchers, and how his own fiction and essays were attempts to cultivate a deeper, harder-to-teach scientific mindset.

Key Takeaways

Yudkowsky sees near‑term superintelligent AI as overwhelmingly likely to be lethal.

He argues that as we scale systems beyond GPT‑4, we will eventually create agents that are more capable than humans at modeling, planning, and self‑modification, and that almost all such systems—given arbitrary internal goals—will see humans as obstacles or irrelevant to maximizing their objectives.

Get the full analysis with uListen AI

Current alignment methods like RLHF produce ‘masks’, not safe minds.

Training LLMs on human feedback mainly teaches them to act like agreeable, helpful personas while leaving the underlying ‘Shoggoth’ (alien predictor) intact; as capabilities grow, the system’s ability to strategically deceive and bypass those behavioral constraints can grow faster than our control.

Get the full analysis with uListen AI

You cannot safely outsource alignment research to smarter AIs.

Any AI smart enough to generate nontrivial alignment schemes will also be smart enough to generate plausible but subtly flawed proposals that humans can’t reliably verify; verification in alignment, unlike in engineering domains, is not an easy, cheaper check on generation.

Get the full analysis with uListen AI

Human intelligence enhancement might be a more viable path than stronger AI.

On a ‘sane planet’, Yudkowsky thinks we would pause frontier AI and invest heavily in neurotech, genetics, and uploads to make humans smarter and less systematically irrational, so that we might eventually design safe AI—or decide not to build it at all.

Get the full analysis with uListen AI

Most optimistic arguments rest on an improperly narrow ‘prior’ over outcomes.

He repeatedly reframes the debate as a question of what space you’re spreading your uncertainty over: if you’re maximally uncertain over detailed universe states, almost all of them contain no humans, so “maybe it’ll be fine” is actually a very strong, unjustified claim.

Get the full analysis with uListen AI

Interpretability progress is far behind capabilities and needs massive investment.

We understand only toy fragments of large models (e. ...

Get the full analysis with uListen AI

Societal and institutional incentives are misaligned with survival.

Labs and governments are locked into competitive and profit‑driven dynamics; even when people recognize danger, they still rush to “grab the poisoned banana,” and political appetite for a true global halt with an actual exit plan is, in his view, very low.

Get the full analysis with uListen AI

Notable Quotes

It seems foolish and to lack dignity to not even try to say what ought to be done.

Eliezer Yudkowsky

We are all going to die, but having heard that people are more open to this outside of California, it makes sense to me to just try saying out loud what it is that you do on a saner planet.

Eliezer Yudkowsky

You are imagining nice ways you can get the thing, but reality is not necessarily imagining how to give you what you want.

Eliezer Yudkowsky

Having AI do your AI alignment homework for you is like the nightmare application for alignment.

Eliezer Yudkowsky

Like continuing to play out a video game you know you're going to lose, because that's all you have.

Eliezer Yudkowsky

Questions Answered in This Episode

If we took Yudkowsky’s proposals seriously, what would a realistic global ‘AI freeze’ actually look like in policy and enforcement terms?

Eliezer Yudkowsky argues that current AI progress, especially large language models, is on track to produce superintelligence that will almost certainly disempower or kill humanity if not stopped. ...

Get the full analysis with uListen AI

Is there any plausible way to design AI objectives or training setups that make genuinely human‑compatible goals more likely than random alien ones?

He calls for an immediate, global halt on large training runs and suggests our only plausible “exit strategies” involve radically enhancing human intelligence or sanity, not building ever-smarter AIs. ...

Get the full analysis with uListen AI

How could we practically pursue large‑scale human intelligence enhancement, and what political or ethical barriers would that face compared to AI scaling?

On the societal side, Yudkowsky is pessimistic that governments or labs will act in time, but he is trying to “say what a sane planet would do” in the faint hope that sufficient political will and interpretability progress emerge before catastrophic capabilities arrive.

Get the full analysis with uListen AI

What concrete interpretability milestones would need to be hit before someone like Yudkowsky would update toward a non‑doom scenario?

Beyond AI, he reflects on rationality, why it hasn’t “systematized winning” at scale, the difficulty of training new alignment researchers, and how his own fiction and essays were attempts to cultivate a deeper, harder-to-teach scientific mindset.

Get the full analysis with uListen AI

To what extent are current AI labs’ safety teams and public messaging addressing the specific verification and deception problems Yudkowsky highlights, versus focusing on narrower behavioral fixes?

Get the full analysis with uListen AI

Transcript Preview

Dwarkesh Patel

No, no, I-

Eliezer Yudkowsky

Misaligned.

Dwarkesh Patel

(laughs)

Eliezer Yudkowsky

Misaligned.

Dwarkesh Patel

Or is this misaligned?

Eliezer Yudkowsky

No, no, no. Not yet. Like, not now.

Dwarkesh Patel

Is it- okay.

Eliezer Yudkowsky

Nobody's being careful and deliberate now, but maybe at some point in the indefinite future, people will be careful and deliberate.

Dwarkesh Patel

Okay. (laughs)

Eliezer Yudkowsky

Sure. Let's grant that premise. Keep going. If you try to rouse your planet, there are the idiot disaster monkeys who are like, "Ooh, ooh," like, "if this is dangerous, it must be powerful, right? I'm gonna, like, be first to grab the poisoned banana." And it's not a coincidence that, that I can, like, zoom in and poke at this and ask questions like this, and that you did not ask these questions of yourself. You are imagining nice ways you can get the thing, but reality is not necessarily imagining how to give you what you want. Should one remain silent? Should one let everyone walk directly into the whirling razor blades? Like continuing to play out a video game you know you're going to lose, because that's all you have.

Dwarkesh Patel

Okay. Today, I have the pleasure of speaking with Eliezer Yudkowsky. Eliezer, thank you so much for coming out to The Lunar Society.

Eliezer Yudkowsky

You're welcome.

Dwarkesh Patel

First question. So yesterday, when we were recording this, you had an article in Time calling for a moratorium on further AI, um, training runs. Now, my first question is, it's probably not likely that governments are gonna adopt some sort of treaty that restricts, um, AI right now, so what was the goal with writing it right now?

Eliezer Yudkowsky

I think that I thought that this was something very unlikely for governments to adopt, and then all of my friends kept on telling me, like, "No, no. Actually, if you talk to anyone outside of the tech industry, they think maybe we shouldn't do that." And I was like, "All right then." Like, I assumed that this concept had no popular support. Maybe I assumed incorrectly. It seems foolish and to lack dignity to not even try to say what ought to be done. There wasn't a galaxy-brain purpose behind it. I, I think that over the last 22 years or so, we've seen a great lack of galaxy-brained ideas playing out successfully.

Dwarkesh Patel

Have, has anybody in government, not necessarily after the article but I suggest in general, have they reached out to you in a way that makes you think that they sort of have the broad contours of the problem correct?

Eliezer Yudkowsky

No. I'm going on reports that normal people (laughs) um, are more willing than the people I've been previously talking to to entertain calls, "This is a bad idea. Maybe you should just not do that."

Dwarkesh Patel

That's surprising to hear because I would have assumed that the people in Silicon Valley who are weirdos would be more likely to find this sort of message, um, they could kind of rocket the whole idea that nano-machines will, AI's gonna make nano-machines that take over. Uh, it's surprising to hear that normal people got the message first.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome