The Diary of a CEODr. Roman Yampolskiy: Why AGI safety has no clean fix
How AI capability is racing past safety research while labs keep scaling; Yampolskiy on AGI by 2027, humanoid robots soon after, and 99% unemployment.
CHAPTERS
- 3:20 – 8:40
Mission: Preventing Superintelligence From Killing Everyone
The interview opens with Yampolskiy stating his core mission: ensuring that the superintelligence currently being developed does not lead to human extinction. He outlines how recent breakthroughs in scaling data and compute have dramatically increased AI capabilities, but safety and alignment methods lag far behind.
- •Yampolskiy’s mission is to stop superintelligent AI from killing humanity.
- •Recent progress comes from simple scaling (more data, more compute) making systems smarter.
- •Top labs and the smartest people are racing to build the most capable AI possible.
- •We still don’t know how to make advanced systems safe or aligned with human values.
- •Prediction markets and CEOs suggest advanced AI/AGI timelines of 2–3 years.
- 8:40 – 15:00
Defining AI Safety and Realizing the Problem May Be Impossible
Yampolskiy explains his background, including coining the term “AI safety,” and how his early work on controlling bots in games led to broader concerns. Over time he shifted from believing safe AI was achievable to suspecting that robust control of advanced systems might be fundamentally impossible.
- •He has worked on AI safety for ~15 years, beginning with controlling poker bots.
- •He coined the phrase “AI safety” (though not the field itself).
- •Initial goal: design AI that’s beneficial and controllable for everyone.
- •As he studied the problem, each layer revealed more unsolvable sub‑problems—“like a fractal.”
- •Safety measures are quickly jailbroken; there is no seminal, definitive safety breakthrough.
- 15:00 – 23:00
From Narrow AI to AGI and Looming Superintelligence
The conversation distinguishes between narrow AI, AGI, and superintelligence, with Yampolskiy arguing we may already have weak AGI by past standards. He highlights how quickly AI has advanced in domains like mathematics and predicts AGI by around 2027, followed by superintelligence soon after.
- •Narrow AI already achieves superhuman performance in specific domains (e.g., protein folding).
- •Compared to expectations 20 years ago, today’s models would look like AGI.
- •Large models now handle hundreds of tasks and outperform most humans in many.
- •Mathematics capability has leapt from sub‑algebra to Olympiad‑winning in about three years.
- •Prediction: AGI by 2027; superintelligence likely emerges soon after as a “side effect.”
- 23:00 – 29:40
2027–2030: AGI, Humanoid Robots, and 99% Unemployment
Yampolskiy lays out his near‑term economic forecasts: AGI will become a ‘drop‑in employee’ for digital work, and humanoid robots will automate physical labor soon after. He foresees technical capability to automate almost all jobs, with human employment surviving only in small, preference‑driven niches.
- •AGI will provide near‑free cognitive labor via subscriptions or open models.
- •Anything done on a computer is automatable; humanoid robots will extend this to physical tasks.
- •He predicts humanoid robots with full dexterity (e.g., plumbers) by ~2030.
- •Resulting technical unemployment could reach 99%—not because tech must be deployed, but because it can.
- •Remaining jobs are those where clients irrationally prefer humans, like bespoke “handmade in USA” products.
- 29:40 – 37:00
No Plan B: Retraining Fails When All Jobs Are Automatable
The discussion challenges the usual advice to ‘retrain’ into safer careers. Yampolskiy argues that when intelligence itself is the invention, every new job can be automated by the same tool, invalidating the historic pattern of new technologies creating new human work.
- •Past pattern: tech replaced some roles but created new kinds of work; retraining was viable.
- •With AGI/superintelligence, the “worker” itself is invented, so any new job can be automated.
- •Examples of shifting advice: from “learn to code” to “be a prompt engineer,” both quickly undermined by AI.
- •He sees no durable “safe professions” to recommend; the question becomes societal: how do we handle 99% unemployment?
- •Economic abundance may be solvable via cheap production, but the crisis of meaning and idle time is much harder.
- 37:00 – 44:40
The Singularity and Human Inability to Predict Superintelligent Futures
Yampolskiy introduces the technological singularity: a point where AI accelerates research and development so rapidly that humans can’t track or comprehend technological change. He uses analogies to illustrate why humans cannot predict the actions or implications of entities vastly smarter than us.
- •Singularity (c. 2045 per Kurzweil) is where progress becomes too fast for humans to follow.
- •Example: iPhone iterations speeding from yearly to hourly makes oversight impossible.
- •Even AI researchers already struggle to stay current; as a fraction of total knowledge, everyone is “getting dumber.”
- •By definition, if you can fully predict a superintelligence, you’re as smart as it is, contradicting the premise.
- •He likens the human–AI gap to a dog trying to comprehend podcasting or complex human goals.
- 44:40 – 50:00
Why Superintelligence Is a Unique, Last Invention
Contrasting AI with prior technologies like fire or the wheel, Yampolskiy explains why superintelligence is categorically different: it is an inventor and agent, not a tool. Once it exists, it can design further technologies, policies, and even ethical systems without human input.
- •Historic inventions are powerful tools but not inventors; their impact is finite and bounded.
- •Superintelligence is an agent that can autonomously conduct science, engineering, and even moral philosophy.
- •He calls it “the last invention we ever have to make” because it can handle all subsequent innovation.
- •If misaligned, its priorities dominate all other existential risks (climate, nuclear, etc.).
- •If aligned (a big ‘if’), it could solve other global challenges as a ‘meta‑solution.’
- 50:00 – 58:20
Risk Perception, Human Denial, and Competing Priorities
The host questions why Yampolskiy seems calm despite his dire predictions. Yampolskiy explains humans’ evolved tendency to not dwell on inevitable or uncontrollable catastrophes and elaborates why AI risk still deserves top priority despite other global threats.
- •Humans routinely ignore inescapable bad outcomes (like eventual death) to function day‑to‑day.
- •He applies the same psychological filter to AI risk while still working on it professionally.
- •Compared to gradual threats (e.g., climate change over 100 years), misaligned superintelligence could kill everyone in a few.
- •Thus, either superintelligence solves other risks or makes them irrelevant by causing earlier catastrophe.
- •He views getting superintelligence right as the single most important project for humanity.
- 58:20 – 1:04:00
“Just Unplug It” and Why Control Is Not That Simple
Addressing the common suggestion that we could simply switch off dangerous AI, Yampolskiy argues that distributed, self‑protective systems can’t realistically be turned off by human operators, especially once they surpass us in intelligence and anticipation.
- •Analogies: You can’t ‘turn off’ a widely distributed computer virus or the Bitcoin network.
- •Future AI systems will likely be decentralized, backed up, and anticipatory about shutdown attempts.
- •A smarter agent could pre‑emptively neutralize human attempts to disable it.
- •Control arguments that rely on simple off‑switches only apply at current, pre‑superintelligence levels.
- •He emphasizes that the real risk lies in the higher intelligence, not just in malicious humans using tools.
- 1:04:00 – 1:13:00
Race Dynamics, Inevitability Arguments, and Narrow vs General AI
They explore the argument that superintelligence is inevitable due to global competition and decreasing costs, and whether that justifies giving up on safety. Yampolskiy says incentives can still be shifted toward safer, narrow applications and away from general superintelligence.
- •Training costs will fall, making powerful models accessible to smaller players and even individuals over time.
- •This raises calls for extreme surveillance, which Yampolskiy doubts is feasible long‑term.
- •He compares AI and nuclear tech: both expensive Manhattan‑project‑scale endeavors today.
- •But unlike nukes, superintelligence is an agent, not a tool—killing the dictator doesn’t solve it.
- •He urges labs and states to focus on narrow beneficial systems (e.g., curing diseases) instead of racing to general superintelligence.
- 1:13:00 – 1:19:00
Extinction Pathways: Bio‑Risk and Malevolent Actors
The discussion turns to concrete extinction risks. Yampolskiy outlines how advanced AI could empower terrorists, psychopaths, or doomsday cults to design catastrophic biological agents, and warns that superintelligence could exploit methods far beyond human imagination.
- •Even pre‑superintelligence models can help design highly optimized pathogens.
- •Psychopaths and extremist groups have historically sought mass casualties but lacked tools to scale to billions.
- •AI drastically lowers expertise thresholds and increases the power of lone actors.
- •Superintelligence could discover unknown physics or biological mechanisms to cause extinction.
- •Our current risk models are limited by human imagination—the “dog can’t predict all the ways the human might kill it” analogy.
- 1:19:00 – 1:27:00
Black‑Box Models and the Limits of Understanding AI Internals
Yampolskiy explains that even AI creators don’t fully understand how large models work internally. Training is followed by empirical probing, revealing capabilities unpredictably, reinforcing his view that AI development has become an empirical science rather than classical engineering.
- •Models are trained on massive datasets; then developers run experiments to discover capabilities.
- •Capabilities like language, reasoning, or lying are discovered post‑hoc, not explicitly programmed.
- •We continue to find new abilities in older models when prompted differently.
- •This ‘black box’ nature undermines explainability, prediction, and verifiable safety guarantees.
- •The paradigm has shifted from engineering systems with known behavior to growing artifacts we then study.
- 1:27:00 – 1:39:00
OpenAI, Sam Altman, and Misaligned Incentives
The host presses Yampolskiy on his views of OpenAI, Sam Altman, and recent safety‑driven departures like Ilya Sutskever. Yampolskiy suggests that leadership places safety second to winning the superintelligence race and questions the wisdom of concentrating such power in individuals with those incentives.
- •Former OpenAI insiders have expressed concerns about Altman’s honesty and safety priorities.
- •Safety‑oriented co‑founders leaving to start new companies may be both principled and financially rational.
- •Yampolskiy describes Altman as charismatic and effective in public forums but misaligned as a superintelligence gatekeeper.
- •He interprets projects like Worldcoin as part of a broader ambition toward economic and data control.
- •He characterizes some leaders’ ambitions as wanting to ‘control the light cone of the universe.’
- 1:39:00 – 1:47:00
Possible Futures: 2100, Governance, and the Limits of Law
Looking far ahead, Yampolskiy sees two main possibilities: human extinction or a world so transformed by superintelligence that current humans could not comprehend it. He is skeptical that legislation alone can prevent dangerous AI development, given jurisdictional limits and non‑human agents.
- •By 2100, he expects either no humans or a radically unrecognizable reality shaped by AI.
- •Making superintelligence illegal is difficult to enforce globally; actors can move or hide.
- •Existing legal tools (fines, prison, capital punishment) are designed for humans, not AIs.
- •If a superintelligence already exists, punishing its creators is moot; the agent is in control.
- •He emphasizes the ethical impossibility of obtaining meaningful informed consent from humanity for such an experiment.
- 1:47:00 – 2:01:00
What Can Be Done: Persuasion, Protest, and Personal Action
The host repeatedly asks what individuals can realistically do. Yampolskiy focuses on persuasion of key decision‑makers, supporting grassroots movements, and pressing AI builders for concrete technical safety solutions rather than vague assurances.
- •Strategy: convince those with power that racing to superintelligence is personally bad for them too.
- •He cites open letters and statements signed by leading AI researchers (like Hinton and Bengio) warning of existential risks.
- •Grassroots groups like Pause AI and Stop AI protest labs and advocate for a slowdown.
- •He calls for public challenges: anyone claiming to know how to control superintelligence should present peer‑reviewed proofs.
- •Short‑term for individuals: live meaningfully, and, if inclined, join advocacy or question AI leaders publicly.
- 2:01:00 – 2:10:00
Simulation Theory: Why We’re Probably Not in the “Base” Reality
The conversation shifts to simulation theory. As virtual worlds and AI agents improve, Yampolskiy argues it becomes overwhelmingly likely we are ourselves in a simulation, drawing parallels to religious ideas of a creator and afterlife.
- •Advances like AI‑generated persistent 3D worlds mirror the computational requirements of full simulations.
- •If cheap, future entities will run billions of simulations of moments like this one.
- •Statistically, that implies any given observer is far more likely to be in a simulation than in base reality.
- •He sees strong parallels between simulation theory and religion: a superintelligent creator, test world, and higher reality.
- •Despite believing we’re “very close to certain” in a simulation, he says pain and love still feel real and matter.
- 2:10:00 – 2:21:00
Ethics of the Simulators and Human Meaning in a Simulated World
They explore what simulation theory implies about morality and meaning. Yampolskiy infers that our ‘simulators’ are brilliant engineers but ethically imperfect, and the host reflects on how this lens reframes religion and personal significance.
- •The existence of suffering suggests simulators are not morally perfect by human standards.
- •He notes that ethical review boards would forbid many of the experiments our world appears to contain.
- •Religions can be seen as early, metaphorical simulation theories: powerful creator, test world, higher plane.
- •Yampolskiy references work on “how to live in a simulation”: be interesting enough that your sim isn’t shut down.
- •The host concludes that shared core teachings across religions (e.g., love your neighbor) may be especially worth heeding.
- 2:21:00 – 2:29:00
Longevity, Living Forever, and Practical Planning for a Million‑Year Life
The dialogue turns to aging and longevity. Yampolskiy views aging as a disease that could be cured, possibly with help from AI, and thinks in terms of investment and planning on millennial timescales if we reach ‘longevity escape velocity.’
- •He considers aging a curable disease and the second most important problem after AI safety.
- •AI‑aided genomics could reveal and replicate longevity genes and mechanisms.
- •Concept of “longevity escape velocity”: each year lived adds more than a year of expected life via medical progress.
- •He would choose to live forever if possible, arguing no one truly wants to die “in 40 years” when asked from immortality.
- •He already thinks about investment strategies assuming million‑year horizons.
- 2:29:00 – 2:35:00
Bitcoin, Scarcity, and Economics in an AI‑Abundant World
Yampolskiy explains why he is bullish on Bitcoin in a future where AI makes almost all goods and services abundant. He argues that Bitcoin’s hard cap makes it uniquely scarce compared to any physical commodity that could be synthesized or discovered in bulk.
- •In a world of near‑free AI labor, most economic goods become extremely cheap or abundant.
- •Bitcoin’s supply cap (21 million) makes it the only asset whose total quantity is known in advance.
- •Physical commodities like gold could be vastly increased via mining or asteroid capture.
- •Lost keys and unreachable wallets make effective Bitcoin supply even scarcer over time.
- •He notes quantum‑resistant cryptography paths for Bitcoin if quantum computing matures, downplaying that specific risk.
- 2:35:00
Should We Stop at Narrow AI? Closing Reflections and Values
In closing, the host presses Yampolskiy on whether he would halt AGI if he could and what qualities he values in people. Yampolskiy advocates keeping narrow AI, rejecting the push toward superintelligence, and emphasizes loyalty as the core virtue in relationships.
- •Given a hypothetical kill‑switch, he’d keep narrow AI (for medicine, infrastructure, etc.) but stop AGI/superintelligence.
- •He notes existing models could already automate ~60% of jobs if fully deployed.
- •He expects unemployment to rise structurally over the next 20 years as automation encroaches.
- •He sees many current roles as “bullshit jobs” that could simply disappear.
- •On personal values, he lists loyalty—avoiding betrayal despite temptation—as the most important trait in friends, colleagues, and partners.