Andrej Karpathy: Software Is Changing (Again)

Andrej Karpathy's keynote on June 17, 2025 at AI Startup School in San Francisco. Slides provided by Andrej: https://drive.google.com/file/d/1a0h1mkwfmV2PlekxDN8isMrDA5evc4wW/view?usp=sharing Chapters: 00:00 - Intro 01:25 - Software evolution: From 1.0 to 3.0 04:40 - Programming in English: Rise of Software 3.0 06:10 - LLMs as utilities, fabs, and operating systems 11:04 - The new LLM OS and historical computing analogies 14:39 - Psychology of LLMs: People spirits and cognitive quirks 18:22 - Designing LLM apps with partial autonomy 23:40 - The importance of human-AI collaboration loops 26:00 - Lessons from Tesla Autopilot & autonomy sliders 27:52 - The Iron Man analogy: Augmentation vs. agents 29:06 - Vibe Coding: Everyone is now a programmer 33:39 - Building for agents: Future-ready digital infrastructure 38:14 - Summary: We’re in the 1960s of LLMs — time to build Drawing on his work at Stanford, OpenAI, and Tesla, Andrej sees a shift underway. Software is changing, again. We’ve entered the era of “Software 3.0,” where natural language becomes the new programming interface and models do the rest. He explores what this shift means for developers, users, and the design of software itself— that we're not just using new tools, but building a new kind of computer. More content from Andrej: https://www.youtube.com/@AndrejKarpathy Thoughts (From Andrej Karpathy!) 0:49 - Imo fair to say that software is changing quite fundamentally again. LLMs are a new kind of computer, and you program them *in English*. Hence I think they are well deserving of a major version upgrade in terms of software. 6:06 - LLMs have properties of utilities, of fabs, and of operating systems → New LLM OS, fabbed by labs, and distributed like utilities (for now). Many historical analogies apply - imo we are computing circa ~1960s. 14:39 - LLM psychology: LLMs = "people spirits", stochastic simulations of people, where the simulator is an autoregressive Transformer. Since they are trained on human data, they have a kind of emergent psychology, and are simultaneously superhuman in some ways, but also fallible in many others. Given this, how do we productively work with them hand in hand? Switching gears to opportunities... 18:16 - LLMs are "people spirits" → can build partially autonomous products. 29:05 - LLMs are programmed in English → make software highly accessible! (yes, vibe coding) 33:36 - LLMs are new primary consumer/manipulator of digital information (adding to GUIs/humans and APIs/programs) → Build for agents! Some of the links: - Software 2.0 blog post from 2017 https://karpathy.medium.com/software-2-0-a64152b37c35 - How LLMs flip the script on technology diffusion https://karpathy.bearblog.dev/power-to-the-people/ - Vibe coding MenuGen (retrospective) https://karpathy.bearblog.dev/vibe-coding-menugen/ Apply to Y Combinator: https://ycombinator.com/apply Work at a startup: https://workatastartup.com

Jun 19, 202539mWatch on YouTube ↗

CHAPTERS

0:01 – 1:31
Why software is changing (again): entering industry at a unique moment
Karpathy frames the talk around a rare, fundamental shift in how software is built, arguing that the industry has seen multiple paradigm changes in just a few years. He emphasizes the opportunity for students and new engineers: huge amounts of software will be written and rewritten as these paradigms collide.
- •Software is undergoing another foundational shift after decades of relative stability
- •Rapid recent changes create an unusually large “rewrite” and “new build” surface area
- •New entrants should expect their careers to be shaped by these transitions
1:31 – 3:02
Software 1.0 vs 2.0: code vs neural-network weights
He defines Software 1.0 as explicit code and Software 2.0 as neural network weights produced by training. He uses the “map of GitHub” and the rise of model hubs to illustrate how Software 2.0 resembles a parallel ecosystem to traditional code repositories.
- •Software 1.0 = human-written instructions (code)
- •Software 2.0 = trained weights; developers shape behavior via data/optimization
- •Hugging Face and model ecosystems resemble a “GitHub for weights”
- •Model fine-tuning (e.g., LoRAs) is analogous to commits in weight space
3:02 – 4:34
Software 3.0: LLMs are programmable—and the language is English
Karpathy argues that LLMs represent a new kind of programmable computer, where prompts act as programs. This deserves a major version bump to “Software 3.0,” because the same task can now be expressed as code, training, or prompting—with prompting accessible in natural language.
- •LLMs make neural nets broadly programmable, not just fixed-function models
- •Prompts are programs; English becomes a practical programming language
- •Same task (e.g., sentiment classification) can be done via 1.0, 2.0, or 3.0
- •Modern repos increasingly mix code with English instructions and context
4:34 – 6:06
How new paradigms “eat the stack”: Tesla Autopilot as a case study
Using Autopilot, he describes how functionality initially implemented in C++ migrated into neural networks as capabilities improved. He predicts a similar “stack consumption” dynamic as LLM-centric software expands, and advises engineers to be fluent across 1.0/2.0/3.0 and move between them deliberately.
- •At Tesla, neural networks replaced increasing portions of the traditional stack
- •Capabilities like multi-camera/time stitching moved from code into nets
- •Software 2.0 ‘ate’ Software 1.0 in practical production systems
- •Engineers must choose the right paradigm per component and transition fluidly
6:06 – 10:09
LLMs as utilities, fabs, and operating systems: a framework for the ecosystem
Karpathy offers three analogies to understand LLM providers and usage: utility (metered, uptime/latency expectations), fab (high CapEx, deep R&D tech tree), and most strongly, operating system (a platform ecosystem). He highlights switching layers, reliability expectations, and how outages feel like “intelligence brownouts.”
- •Utility analogy: pay-per-token, uptime/latency, switching across providers
- •Outages create an “intelligence brownout” as workflows depend on models
- •Fab analogy: huge CapEx and centralized know-how, with imperfect defensibility
- •OS analogy: a platform ecosystem with closed-source and open-source counterparts
10:09 – 12:42
The ‘LLM OS’ and historical computing analogies (time-sharing, 1960s era)
He sketches the LLM as a computer with a context window as memory and the model as an orchestrator. Because inference is expensive, LLMs are centralized and accessed as thin clients via time-sharing—similar to early computing before the personal computing revolution.
- •LLM as CPU-like core; context window as memory; orchestration resembles an OS
- •Apps can run on different underlying models like software on different OSes
- •Centralized cloud inference mirrors time-sharing mainframe computing
- •We may be in a ‘1960s’ stage before a true personal-computing shift for LLMs
12:42 – 14:43
What’s unprecedented: technology diffusion flipped to consumers first
Unlike many foundational technologies that start with governments/corporations and later reach consumers, LLMs arrived directly to billions via software distribution. He argues this inversion shapes what early killer apps look like and why adoption is racing ahead of many institutions.
- •Historical diffusion: expensive tech usually starts in government/enterprise
- •LLMs diffused to consumers immediately (e.g., ChatGPT ‘beamed’ to billions)
- •Early usage patterns are everyday and consumer-driven, not military-first
- •This changes incentives and the near-term application landscape
14:43 – 16:46
LLM psychology: ‘people spirits’ with superpowers and cognitive deficits
Karpathy proposes treating LLMs as stochastic simulations of people trained on human text, yielding emergent psychology. They can be superhuman (memory/knowledge) yet unreliable (hallucinations, jagged intelligence), and their limitations must shape product design and collaboration patterns.
- •LLMs as stochastic ‘people spirit’ simulators built on autoregressive Transformers
- •Superpower: encyclopedic recall and broad textual knowledge
- •Deficits: hallucinations, poor self-knowledge, jagged intelligence and odd errors
- •Security issues: gullibility, prompt injection risks, potential data leakage
16:46 – 18:16
The missing capability: lasting learning and the ‘amnesia’ problem
He contrasts human coworkers—who accumulate organizational context over time—with LLMs that do not naturally consolidate experience between sessions. Context windows function like working memory, so developers must explicitly manage context and artifacts to compensate for the lack of durable learning.
- •Humans learn over time; LLMs don’t natively consolidate new knowledge
- •Context windows are working memory that must be actively managed
- •Popular-culture analogies: Memento / 50 First Dates (weights fixed; memory reset)
- •Product designs need mechanisms to preserve, audit, and re-inject context
18:16 – 20:17
Partial-autonomy apps: why ‘LLM + product’ beats chatting with the OS
He argues the best experiences won’t be raw chat with an LLM, but purpose-built applications that manage context, orchestrate multiple models, and present results in an auditable interface. Cursor (coding) and Perplexity (search/research) illustrate common design patterns for partially autonomous products.
- •Dedicated apps beat direct chat by managing context and workflow integration
- •Orchestration: embeddings, chat models, diff application models working together
- •GUI is essential for auditing and controlling fallible outputs
- •Examples: Cursor and Perplexity as early templates for LLM-native products
20:17 – 21:48
The autonomy slider: calibrating control from assistive to agentic
Karpathy introduces an ‘autonomy slider’ that lets users choose how much authority the model has per task. He stresses that autonomy should be adjustable—from small suggestions to repo-wide actions—because verification costs and risk vary with scope.
- •Autonomy should be user-controlled and task-dependent
- •Cursor modes: completion → edit selection → edit file → agent across repo
- •Perplexity modes: quick search → research → deep research
- •Design goal: enable larger jumps while keeping oversight practical
21:48 – 25:51
Human-AI collaboration loops: generation vs verification, and keeping AI on a leash
He frames interaction as AI generating and humans verifying, making verification speed the key bottleneck. GUIs and constrained workflows help humans audit quickly, while limiting agent overreach prevents massive diffs that humans can’t safely review.
- •Core loop: AI generates; humans verify—verification is the bottleneck
- •GUIs leverage human visual processing (diffs, previews, accept/reject controls)
- •Overly large diffs slow verification and increase risk
- •Best practice: incremental steps and more precise prompting to avoid ‘spin’
25:51 – 27:52
Lessons from autonomy in the real world: Waymo/Tesla and the ‘decade of agents’
Drawing from self-driving, he warns against hype cycles like ‘year of agents,’ noting that perfect demos can precede many years of hardening and human-in-the-loop reality. He advocates seriousness about reliability, supervision, and long timelines for true autonomy.
- •2013 self-driving demo felt solved—yet autonomy still took many years
- •Even today, deployment often includes teleoperation and human oversight
- •Agent progress is likely a long arc: ‘decade of agents,’ not a single year
- •Software reliability and safety demand careful, supervised rollout
27:52 – 28:52
Iron Man suits vs robots: augmentation first, agents over time
He uses Iron Man to describe a continuum between augmentation and full autonomy, arguing today’s sweet spot is ‘suits’—tools that amplify humans with strong UI/UX—rather than flashy fully autonomous ‘robots.’ The autonomy slider should still exist so products can gradually move rightward as models improve.
- •Augmentation and agency are endpoints on the same product continuum
- •Today’s practical focus: ‘Iron Man suits’ (human-in-control, fast verify loops)
- •Still design for future autonomy with an explicit slider
- •UI/UX becomes a core competency for safe acceleration
28:52 – 33:31
Vibe coding and democratized programming: everyone can build (but shipping is hard)
Because Software 3.0 is programmed in natural language, Karpathy argues many more people can create software. He shares his own vibe-coding projects and highlights the real friction: turning a demo into a deployed product often involves non-code DevOps and web-console workflows that are painful and manual.
- •Natural-language programming expands who can create software
- •Vibe coding as a ‘gateway drug’ to deeper software development
- •Personal examples: simple iOS app; MenuGen (menu photo → generated images)
- •Biggest pain: auth, payments, deployment, domains—manual browser workflows
33:31 – 38:05
Building for agents: LLM-friendly docs, protocols, and infrastructure
He argues LLMs are becoming a primary consumer/manipulator of digital information alongside humans (GUIs) and programs (APIs). To support agents, companies should provide machine-usable documentation and interfaces (Markdown docs, fewer ‘click’ instructions, cURL equivalents, MCP), and tools that repackage existing assets for LLM ingestion.
- •Agents are a new third interface layer: beyond GUIs and APIs
- •Provide ‘llms.txt’/agent-readable domain summaries (analogy to robots.txt)
- •Shift docs to Markdown; rewrite ‘click’ steps into executable commands
- •Examples/tools: Vercel/Stripe LLM docs, Anthropic MCP, GitIngest, DeepWiki
38:05 – 39:31
Closing synthesis: we’re in the 1960s of LLMs—time to rebuild and ship
Karpathy concludes that LLMs resemble early operating systems: powerful but immature, with massive opportunity for builders. The path forward is to rewrite software for 1.0/2.0/3.0 coexistence, build partial-autonomy products with strong human-in-loop design, and progressively move the autonomy slider rightward over the next decade.
- •Massive rewrite cycle: new software paradigms will reshape the stack
- •LLMs are OS-like platforms delivered as utilities today
- •Product success hinges on human+AI loops, auditing UX, and constrained autonomy
- •Over time, autonomy will increase—builders should start now

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Why software is changing (again): entering industry at a unique moment

Software 1.0 vs 2.0: code vs neural-network weights

Software 3.0: LLMs are programmable—and the language is English

How new paradigms “eat the stack”: Tesla Autopilot as a case study

LLMs as utilities, fabs, and operating systems: a framework for the ecosystem

The ‘LLM OS’ and historical computing analogies (time-sharing, 1960s era)

What’s unprecedented: technology diffusion flipped to consumers first

LLM psychology: ‘people spirits’ with superpowers and cognitive deficits

The missing capability: lasting learning and the ‘amnesia’ problem

Partial-autonomy apps: why ‘LLM + product’ beats chatting with the OS

The autonomy slider: calibrating control from assistive to agentic

Human-AI collaboration loops: generation vs verification, and keeping AI on a leash

Lessons from autonomy in the real world: Waymo/Tesla and the ‘decade of agents’

Iron Man suits vs robots: augmentation first, agents over time

Vibe coding and democratized programming: everyone can build (but shipping is hard)

Building for agents: LLM-friendly docs, protocols, and infrastructure

Closing synthesis: we’re in the 1960s of LLMs—time to rebuild and ship

Get more out of YouTube videos.