Lex Fridman Podcast

Raschka & Lambert on Lex Fridman: Why Post-Training Won 2025

Rlvr and inference time scaling, not architecture, drove 2025 AI gains. Deepseek open-weight releases showed frontier performance need not be closed-source.

Lex FridmanhostSebastian RaschkaguestNathan Lambertguest

Jan 30, 20264h 25mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

AI in 2026: scaling, post-training, open models, agents, geopolitics, compute

Lex Fridman hosts Sebastian Raschka and Nathan Lambert to map the “state of AI” entering 2026, using the “DeepSeek moment” as a turning point for open-weight Chinese models and intensified global competition.
They argue architectures remain largely transformer-based, while major progress now comes from post-training (RLVR, RLHF, inference-time scaling), better data pipelines, and systems/compute optimizations rather than radical architectural change.
The conversation contrasts US product dominance (ChatGPT/Gemini/Claude/Grok) with China’s surge in open-weight releases and friendlier licenses, discussing how this could reshape adoption, policy, and business models.
They explore emerging directions (tool use, agents, long context, continual learning, diffusion text models), plus societal issues: work culture, hype bubbles, safety, education/learning, developer jobs, and the long-run trajectory toward (or away from) AGI narratives.

IDEAS WORTH REMEMBERING

5 ideas

No one “owns” unique ideas anymore; budget and compute are the moat.

Raschka argues researcher mobility makes technical ideas diffuse quickly; the differentiator becomes hardware, capital, and operational excellence rather than secret breakthroughs.

2026 progress is driven more by post-training and inference-time scaling than new architectures.

Both guests emphasize transformers are still the core; large capability jumps come from RLVR-style training, better post-training pipelines, and letting models “think” longer at inference.

Open-weight Chinese models may reshape global adoption through licenses and distribution, not just quality.

Lambert notes security concerns limit US companies paying Chinese APIs, so open weights let Chinese labs gain mindshare and usage via US hosting—especially with fewer licensing “strings” than some Western releases.

Serving costs dominate training costs at scale—business incentives shape model design.

They highlight that training may be millions, but serving hundreds of millions of users can be billions; this pushes routing, smaller models, speed/intelligence tradeoffs, and product-level monetization experiments.

RLVR is the 2025–2026 post-training workhorse, but evaluation contamination remains a major scientific problem.

RLVR works best on verifiable domains (math/code) and scales well, yet both warn benchmark leakage (e.g., Qwen math contamination) can mislead conclusions and distort research claims.

WORDS WORTH SAVING

5 quotes

I don't think nowadays, 2026, that there will be any company who is... having access to a technology that no other company has access to.

— Sebastian Raschka

Extended thinking and inference time scaling is just a way to make the models marginally smarter, and I will always edge on that side.

— Nathan Lambert

One of the best ways to solve hallucinations is to not try to always remember information or make things up... why not use a calculator app or Python?

— Sebastian Raschka

Our GPUs are hurting... we're releasing this because we can use your GPUs.

— Nathan Lambert (paraphrasing Sam Altman’s rationale for open model distribution)

I'm hoping that we, society drowns in slop enough to snap out of it.

— Nathan Lambert

China vs US AI race and “DeepSeek moment”Model “winners”: ChatGPT vs Gemini vs Claude vs GrokBest AI for coding: IDEs vs terminal agentsOpen-weight explosion: China, US/EU labs, licensing dynamicsTransformer lineage: MoE, attention tweaks, KV cache efficiencyScaling laws: pre-training vs RL scaling vs inference-time scalingTraining stack: pre-, mid-, post-training; synthetic data; data qualityPost-training: RLVR, RLHF, DPO; value functions/process rewardsTool use and agents; UI/UX as a differentiatorLong context, memory, continual learningText diffusion models as non-autoregressive alternativesEconomics: serving costs, subscriptions, ads, acquisitions, IPOsWork culture (996), Silicon Valley bubble, and human impactsAGI timelines, definitions, and skepticism about singularityCompute geopolitics: NVIDIA, TPUs, data centers, power constraintsOpen models policy (The “Atom/Adam Project”) and national strategy

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.