Skip to content
Lex Fridman PodcastLex Fridman Podcast

Raschka & Lambert on Lex Fridman: Why Post-Training Won 2025

Rlvr and inference time scaling, not architecture, drove 2025 AI gains. Deepseek open-weight releases showed frontier performance need not be closed-source.

Lex FridmanhostSebastian RaschkaguestNathan Lambertguest
Jan 30, 20264h 25mWatch on YouTube ↗

FREQUENTLY ASKED QUESTIONS

Direct answers grounded in the episode transcript. Tap any timestamp to verify against the source.

  1. Why are Chinese open-weight AI models so popular?

    Chinese open-weight models are popular because they combine practical customization with fewer licensing strings. Sebastian Raschka says local users can run them much like ChatGPT, while companies can customize them, train them further, add post-training, or specialize them with more data for fields like law or medicine. The licensing matters too. He contrasts the Chinese releases with Llama or Gemma, where he says there can be strings attached, such as user-count limits or reporting obligations once usage gets large. For Chinese open-weight models, he says the licenses feel more like unrestricted open-source licenses, so people can use them without the same catch. Nathan Lambert adds nearby that US-hosted versions, like Kimi K2 Thinking on Perplexity, are an example of people being sensitive to where and how these models are served.

    35:43 in transcript
  2. How is Claude Code different from Cursor for programming?

    Claude Code pushes the programmer toward high-level English guidance, while Cursor keeps them closer to the code. Lex Fridman says he uses Cursor and Claude Code about half-and-half because the experiences are fundamentally different. In Cursor, he is more likely to micromanage generation, inspect the diff, alter code, read it, and understand the code deeply as it changes. With Claude Code, he is practicing 'programming with English,' thinking in a broader design space and guiding the system at a macro level. He also says Claude Code seems to make better use of Claude Opus 4.5. Nathan Lambert recommends trying Claude Code, Cursor, and VS Code side by side with the same models, and says Claude Code is much better in that programming domain.

    22:29 in transcript
  3. What is RLVR in AI post-training?

    RLVR is reinforcement learning where the reward comes from checking verifiable answers, not from broad human preference scores. Nathan Lambert explains that DeepSeek's breakthrough was scaling a setup where a model generates answers, the completion is graded for correctness, and that accuracy becomes the reward for reinforcement learning. Classic reinforcement learning has an agent act in an environment and receive a state and reward; in language models, the reward is usually accuracy on tasks that can be verified. Math and coding are the famous examples, but Nathan also mentions factual domains and instruction constraints, like responding only with words that start with A, as partly verifiable. He contrasts this with RLHF, where the optimized score is a learned reward model built from aggregate human preferences. Changing the problem domain lets the optimization scale much further.

    1:38:44 in transcript
  4. How do tools reduce hallucinations in LLMs?

    Tools reduce hallucinations by moving some work out of memorization, but they do not make LLMs automatically reliable. Sebastian Raschka says tool use can reduce hallucinations, not solve them, because the model still has to know when to call a tool. A calculator is the simple example: instead of memorizing or guessing 'twenty-three plus five,' the model can outsource the computation. Web search is more complicated. Even if the model searches the Internet for who won the World Cup in 1998, it still has to find the right website and extract the right information. In the following explanation, Sebastian points to recursive language-model setups where a long task is broken into subtasks, each potentially using web search or other tools, then stitched back together. The improvement comes from how the LLM is used and what it can use.

    2:35:01 in transcript
  5. When will AI agents start replacing programmers?

    AI agents are framed less as an instant programmer replacement and more as a near-term role shift. Nathan Lambert says that within 'low years,' many programmers may be pushed toward designer or product-manager roles, supervising multiple agents that try fixes or features. In his example, agents might take one to two days to implement a feature or attempt to fix a bug, then report through dashboards, with Slack serving as a plausible dashboard where the agents talk to the human and receive feedback. He also says AI could soon implement end-to-end features in systems like Slack or Microsoft Word when organizations allow it. The caveat is that cohesive design, style, and deciding what to add next remain hard. Nearby, he notes that Claude had a test where it could almost rebuild Slack from scratch in a sandbox.

    3:09:15 in transcript

Answers are AI-generated from the transcript and may contain errors. Tap a question to verify against the source.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome