Skip to content
Dwarkesh PodcastDwarkesh Podcast

Gwern — Anonymous writer who predicted AI trajectory on $12K/year salary

Gwern's blog: https://gwern.net/. Gwern is a pseudonymous researcher and writer. After the episode, I convinced Gwern to create a donation page where people can help sustain what he's up to. Please go here to contribute: https://donate.stripe.com/6oE9DTgaf6oD0M03cc. Thank you to my friend Chris Painter for doing an amazing job voice acting Gwern. 𝐒𝐏𝐎𝐍𝐒𝐎𝐑𝐒 * Jane Street is looking to hire their next generation of leaders. Their deep learning team is looking for ML researchers, FPGA programmers, and CUDA programmers. Summer internships are open - if you want to stand out, take a crack at their new Kaggle competition. To learn more, go here: https://jane-st.co/dwarkesh * Turing provides complete post-training services for leading AI labs like OpenAI, Anthropic, Meta, and Gemini. They specialize in model evaluation, SFT, RLHF, and DPO to enhance models’ reasoning, coding, and multimodal capabilities. Learn more at https://turing.com/dwarkesh. * This episode is brought to you by Stripe, financial infrastructure for the internet. Millions of companies from Anthropic to Amazon use Stripe to accept payments, automate financial processes and grow their revenue. Learn more here: https://stripe.com/ 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Transcript: https://www.dwarkeshpatel.com/p/gwern-branwen * Me on Twitter: https://twitter.com/dwarkesh_sp * Spotify: https://open.spotify.com/episode/46H5dTtYaj1L55UAy9XXaY?si=xVoj6euwQdmZYnyvaQ46lA 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 - Anonymity 00:01:09 - Automating Steve Jobs 00:04:38 - Isaac Newton's theory of progress 00:06:36 - Grand theory of intelligence 00:10:39 - Seeing scaling early 00:21:04 - AGI Timelines 00:22:54 - What to do in remaining 3 years until AGI 00:26:29 - Influencing the shoggoth with writing 00:30:50 - Human vs artificial intelligence 00:33:52 - Rabbit holes 00:38:48 - Hearing impairment 00:43:00 - Wikipedia editing 00:47:43 - Gwern.net 00:50:20 - Counterfactual careers 00:54:30 - Borges & literature 01:01:32 - Gwern's intelligence and process 01:11:03 - A day in the life of Gwern 01:19:16 - Gwern's finances 01:25:05 - The diversity of AI minds 01:27:24 - GLP drugs and obesity 01:31:08 - Drug experimentation 01:33:40 - Parasocial relationships 01:35:23 - Open rabbit holes

Dwarkesh PatelhostGwern Branwenguest
Nov 13, 20241h 36mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 1:09

    Why Gwern stays anonymous: fewer projections, fewer risks

    Dwarkesh introduces Gwern as an influential anonymous writer and researcher, and explains the use of an avatar to protect his identity. Gwern argues anonymity reduces identity-based dismissal and limits real-world retaliation, giving his ideas a fairer hearing.

    • Anonymity prevents people from “slotting” the author into stereotypes
    • Reduces social/political retaliation risks (swatting, harassment)
    • Forces readers to engage with arguments before judging
    • Framing: the interview is in-person but identity is protected
  2. 1:09 – 3:15

    How AI automates companies: bottom-up replacement, human ‘taste’ at the top

    Gwern predicts automation will begin with lower-level roles and climb upward rather than replacing CEOs first. He sketches a model where humans may retain comparative advantage in long-horizon vision and taste, directing a pyramid of AI workers.

    • Automation pressures favor replacing lower-level labor first
    • Humans may keep edge in long-term strategy/vision (less myopic)
    • CEO-as-curator: accepting/rejecting AI-generated proposals
    • Human-led firms could outperform fully autonomous AI firms via taste
  3. 3:15 – 4:39

    Selection and organization in AI firms: ‘packages of minds’ as the unit of evolution

    Asked what natural selection looks like for replicable AIs, Gwern suggests selection can occur at higher organizational levels than individual models. He expects optimized bundles—department-like packages of specialized models—will be copied, varied, and retained based on performance.

    • Perfectly replicable models enable selection at larger group levels
    • Hard to train interactions directly; choose teams that ‘click’
    • Departments as reusable bundles (programmer/manager/legal/finance)
    • Variation-and-selection over whole packages could optimize org design
  4. 4:39 – 6:37

    Early singularity ideas and skepticism about ‘progress’: Butler, Newton, Lucretius

    Gwern traces proto-singularity thinking back to Samuel Butler’s 1860s writings about machine life threatening humans. He contrasts that with historical disbelief in true acceleration, citing Newton and Lucretius arguing progress must be due to civilizational resets.

    • Samuel Butler (1863/1872) as an early explicit machine-threat scenario
    • Technological progress as a historically “new” concept
    • Newton’s explanation: periodic civilization collapse causes rediscovery
    • Lucretius made similar arguments centuries earlier
  5. 6:37 – 9:21

    A parsimonious theory of intelligence: search over Turing machines

    Gwern offers a ‘10,000-foot’ model: intelligence is compute-driven search over programs (Turing machines) rather than a single master algorithm or ‘fluid’ essence. Differences in intelligence come from more compute enabling more extensive search and recombination of learned special cases.

    • Intelligence as search across many programs/special cases
    • Scaling = searching over more/longer programs with more compute
    • Variation in IQ: more compute allows longer/broader search
    • No ‘IQ gland’; fluid intelligence emerges from recombining learned modules
  6. 9:21 – 10:39

    Why human-level intelligence is rare in evolution: hardcoding beats expensive learning

    Dwarkesh asks why intelligence took so long to evolve if it’s so valuable; Gwern argues general intelligence is costly and often dominated by gene-encoded solutions. In static niches or short lifespans, specialized hardwired behaviors can be more adaptive than flexible learning.

    • General learning is expensive, slow, and unreliable compared to hardcoding
    • Genes can encode small ‘solutions’ directly given enough evolution
    • Static environments reduce value of general intelligence
    • Short-lived organisms can’t amortize the cost of learning
  7. 10:39 – 16:19

    Seeing scaling early: from Kurzweil skepticism to GPT-2/3 as decisive evidence

    Gwern recounts moving from skepticism about ‘compute summons algorithms’ to gradual belief as deep learning kept scaling in data, parameters, and breadth of tasks. GPT-2 felt like a shock; GPT-3’s few-shot results convinced him scaling was the correct worldview, prompting him to write publicly against prevailing takes.

    • Early exposure: Moravec/Kurzweil; initial skepticism of ‘build it and they will come’
    • Evidence accumulation: bigger datasets/models, more GPUs, expanding applications
    • GPT-1’s unsupervised ‘sentiment neuron’ as a notable sign
    • GPT-2 as a ‘holy shit’ prompting moment; GPT-3 as the crucial scaling test
    • Motivation to write: widespread misinterpretation of GPT-3 results
  8. 16:19 – 21:04

    Why many AI commentators missed the crux: overlooked scaling laws and compute-driven discovery

    Gwern argues mainstream 2020 AI discourse underweighted prior scaling evidence and overemphasized algorithmic insight. He highlights compute-enabled trial-and-error as the real driver, with research papers often rewriting messy discovery histories into misleading narratives.

    • Key precursors neglected: scaling laws papers, compute-driven advances
    • AlphaZero insight: removing tree search via hyperparameter optimization
    • 2017 scaling laws seen as non-salient and forgotten
    • Compute enables broad trial-and-error; small choices can matter hugely
    • Research literature often falsifies how ideas were discovered
  9. 21:04 – 26:29

    AGI timelines and planning for a near horizon: what’s worth doing before 2028?

    Gwern describes his timelines compressing rapidly after AlexNet, with progress feeling like ‘two years per year.’ He suggests planning around an aggressive AGI date (e.g., 2028) and prioritizing work that either is intrinsically enjoyable, captures uniquely human preferences, or records what otherwise won’t be written down.

    • Timeline drift: far (post-2050) in 2005–2010, then rapid compression post-AlexNet
    • Over-updating episodes: AlphaGo hype vs later RL fizzles, then GPT resets
    • Strategy: do projects you enjoy, or write the ‘human part’ (specify goals)
    • Focus on recording preferences/judgments that AIs can’t ‘do for you’
    • Planning anchor: 2028 as a pragmatic personal baseline
  10. 26:29 – 30:54

    Writing to influence the ‘shoggoth’: tokens as civic power and latent-space legacy

    Dwarkesh asks about Gwern’s claim it’s a hinge moment to write; Gwern says writing is both steering future models and preserving a persona that future AIs will respond to. If values aren’t expressed online, they effectively don’t exist to models—and may not exist in the future that models help shape.

    • Writing as one of few available levers on future model behavior
    • Tokens as the ‘currency’ models must predict; text becomes influence
    • Unwritten values/preferences may be absent from AI-mediated futures
    • Writing also shapes how future systems treat you personally
    • Example: Kevin Roose’s reputation embedded in model discourse
  11. 30:54 – 33:53

    Human vs neural-network intelligence: the unresolved tension in Gwern’s worldview

    Gwern says he oscillates on whether humans and neural nets are equivalent, complementary, or mutually flawed. He cites puzzles like LLM memorization vs human creativity, and humans’ poor memory despite apparent general intelligence.

    • Daily swings: ‘humans are awesome’ vs ‘neural nets are awesome’
    • Comparisons: sample efficiency, memorization, and creativity tradeoffs
    • Humans: low explicit recall yet high competence
    • Belief: the two forms can’t be totally unrelated—scaling must connect to biology
  12. 33:53 – 38:48

    Rabbit holes as life’s objective: obsession, quitting criteria, and lifelong phases

    Gwern claims he maximizes rabbit holes—deep, obsessive inquiry triggered by small observations (like catnip immunity). He describes limits on simultaneous deep pursuits and the typical endpoint: missing data or unanswered questions, with anecdotes like Evangelion research and nootropic analog dead ends.

    • Rabbit holes are defined by obsession, not casual interest
    • Triggers can be trivial; outcomes include new taxonomies and questions
    • Concurrency cap: ~2–3 real rabbit holes at once
    • Exit when data doesn’t exist or the next step is blocked
    • Examples: Evangelion burnout; nicotine-analogue dead end
  13. 38:48 – 43:01

    Hearing impairment and its effects: isolation, reading, and the ‘Deaf accent’

    Gwern discusses being hearing impaired since birth and how it shaped school and social life, from humiliating classroom equipment to conversational lag. He links it to becoming a bookworm and to speech patterns that strangers misread as a foreign accent.

    • Special-ed schooling early; burdensome classroom hearing-aid setup
    • Social costs: always ‘a second behind’ in fast conversation
    • Long-term anxieties and practical constraints (water, swimming)
    • Reading as an accessible refuge; book-learned pronunciations
    • Speech differences tied to auditory development from early childhood
  14. 43:01 – 1:04:46

    From Wikipedia to Gwern.net: synthesis training, community ‘deletionism,’ and first viral post

    Gwern frames Wikipedia editing as the precursor to his later work: rabbit-holing then synthesizing into durable articles. He left as the community grew more hostile to detailed contributions, then found early blogging success with a heavily documented Silk Road post that went viral.

    • Wikipedia as ‘Gwern.net before Gwern.net’: writing and synthesis training
    • Old Wikipedia enabled obsessive deep dives; modern norms more hostile
    • Early editing began in middle school (Neopets + watchlists)
    • Transition to independent blogging after graduation and policy shifts
    • First big hit: Silk Road ordering walkthrough with screenshots and context
  15. 1:04:46 – 1:17:50

    Process and productivity: gardening vs harvest, arguing online, and the daily routine

    Dwarkesh probes Gwern’s slow-burn writing style: years-long diffusion-like iteration, punctuated by sudden thesis eruptions. Gwern describes a day of site cleanup, heavy reading via RSS/Twitter, bursts of real project work, and gym time as an ‘opposite activity’ to prevent burnout; he also credits online argument as fuel for the tedious ‘harvest.’

    • Long-form synthesis: years of accumulating notes and patterns
    • Two modes: spiral-around idea (many examples) vs sudden eureka essays
    • Daily routine: site maintenance, reading streams, evening focused work, gym
    • Burnout theory: do the most opposite activity possible to reset rewards
    • Arguing online provides motivation to ‘harvest’ and crystallize writing
    • Risk of isolation: bitterness/crankdom if spite isn’t released afterward
  16. 1:17:50 – 1:24:53

    Money and independence: aesthetics obsession, Patreon runway, and living on ~$12k/year

    They discuss criticism that Gwern over-invests in website aesthetics; he admits it’s partly selfish but says it supports rereading and maintenance-driven spaced repetition. Gwern explains his funding (Patreon + Bitcoin savings), extremely low expenses, and why moving to San Francisco is mostly constrained by money.

    • Website design as hobby; helps reread, tolerate repetition, notice improvements
    • Patreon (~$900–$1,000/month) plus savings from early Bitcoin
    • Lifestyle: rural living, minimal travel/eating out, no health insurance
    • Not a replicable model for aspiring writers; each must find their own path
    • Would relocate to SF with ~$50k–$100k/year support; contact via email
  17. 1:24:53 – 1:27:37

    Diversity of AI minds: architectural differences and converging LLM behavior

    Asked when AIs become more diverse than humans, Gwern argues they already are—especially across model families—though recent LLM tuning reduces visible differences. He gives intuitive examples of how objectives shape ‘cognitive style,’ like GANs’ adversarial incentives versus diffusion models’ artifacts.

    • Across deep learning, model ‘minds’ differ more than human cognition
    • Within modern LLMs, RLHF/tuning and imitation compress diversity
    • Different architectures imply different failure modes and representations
    • GANs as ‘adversarial/scared’ vs diffusion models’ different artifact patterns
  18. 1:27:37 – 1:31:08

    Overlooked trend outside AI: GLP weight-loss drugs, willpower, and environmental ‘lead’ analogues

    Gwern points to GLP drugs as a surprisingly underappreciated development with implications for addiction, health, and theories of willpower. He revisits his ‘Algernon’ evolutionary argument, notes obesity’s recent explosion, and suggests it’s near-certain we’ll discover major harmful environmental exposures in hindsight—though obesity trends look more salient than IQ changes.

    • GLP drugs as a major, unexpected lever on behavior and health
    • Unclear mechanism; too early to update strongly on evolutionary arguments
    • Obesity crisis is recent (post-1990) and visually dramatic in population photos
    • High credence that modern environments contain serious hidden harms
    • Intelligence seems stable over time; obesity better candidate for impact signal
  19. 1:31:08 – 1:33:41

    Drug experimentation boundaries: why psychedelics feel riskier than nootropics

    Dwarkesh asks why Gwern is skeptical of Bay Area psychedelic experimentation despite his history with nootropics. Gwern distinguishes them by permanence and self-reinforcing dynamics: psychedelics can reshape judgment and psychiatric state, making repeated use harder to evaluate objectively.

    • Psychedelics can have acute and permanent effects; nootropics are more bounded
    • Harder to quantify psychedelic impact; risk of lasting preference shifts
    • ‘Self-recommending’ spiral: use increases desire to use more
    • Nootropic failure mode is mostly wasted money/time, not personality change
    • Some limited use can be beneficial, but repeated use warrants scrutiny
  20. 1:33:41 – 1:35:23

    Parasocial roles and the ‘Gwern’ archetype: mentor-wizard vs villain-guru

    They close on how anonymity shapes parasocial relationships: Gwern wants to be a mentor figure encouraging agency and better internet culture. He worries readers instead treat him as either infallible authority or a malicious ideological caricature, missing the intended call to independent thinking.

    • Aspirational role: mentor/‘wizard’ prompting readers toward agency and creation
    • Observed roles: guru-like ‘can do no wrong’ vs demonized ideological villain
    • Anonymity amplifies projection and extremes of interpretation
    • Goal: readers leave not just informed, but motivated to improve the web
  21. 1:35:23 – 1:36:43

    Open rabbit holes to 2050: big unanswered questions about humans, history, and AI

    In the final exchange, Gwern lists enduring mysteries he hopes are answered by mid-century, even if AGI makes them less practically important. The questions span biology (sleep, aging, sex), civilization (development speed, Industrial Revolution counterfactuals), and the deep learning revolution’s predictability and relation to brain efficiency.

    • Biology puzzles: sleep/dreaming, aging, sexual reproduction
    • Human variation: why people differ across individuals and day-to-day
    • Civilization questions: why tech took so long; where are the aliens?
    • History counterfactuals: why not China’s Industrial Revolution?
    • AI puzzles: how to predict deep learning; why brains seem oversized vs ANNs

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.