Y CombinatorSelf-Play for LLMs, AI for Biology, Formal Verification, and More | YC Paper Club
At a glance
WHAT IT’S REALLY ABOUT
YC Paper Club explores biology AI, self-play, RAG, verification, hacks
- Protein language modeling is shown to follow scaling-law-like behavior when training data is massively expanded (e.g., metagenomics), enabling strong structure/function signals from sequence-only pretraining and even interpretable biological features.
- LLM self-play for formal math can plateau because “make it hard” rewards incentivize adversarially messy tasks; adding self-guidance and a learned/judged relatedness signal yields better gains than vanilla self-play or standard RL alone.
- Streaming RAG targets voice-agent latency by running retrieval during a user’s ongoing utterance, raising open research questions around when to trigger retrieval and how to decide partial-query sufficiency without wasting compute.
- Lean is presented as a core substrate for “verified intelligence,” spanning formal math, program verification, and even verified ML/compute kernels, with growing ecosystems (Mathlib) and practical tooling for correctness guarantees.
- Agentic coding workflows are reframed as real-time strategy: maximize parallelism via worktrees/agents, minimize human keystrokes/approval overhead, and build strong internal knowledge bases to compound speed gains despite higher token use.
IDEAS WORTH REMEMBERING
5 ideasProtein models can exhibit clean scaling behavior—if data scale is sufficient.
The talk highlights that earlier protein LMs plateaued, but ESM Cambrian/ESMC regained smooth improvements by expanding training data dramatically (e.g., from tens of millions to billions of sequences), suggesting “data walls” are often domain-specific rather than fundamental.
Sequence-only pretraining can nearly match structure-heavy pipelines in key settings.
ESMFold-style approaches that discard MSAs can approach AlphaFold-level performance on some complex prediction tasks, and can be especially competitive where MSAs are sparse or unhelpful (e.g., certain antibody contexts), while also improving throughput by avoiding alignment costs.
Inference-time compute is becoming a lever in biology too.
Looped/refinement architectures allow repeated passes at inference to improve structure predictions, paralleling test-time compute/sampling ideas in LLMs and reinforcing the general “scale compute” theme beyond pure training.
Interpretability tools from LLMs can transfer to protein models with meaningful biological features.
Sparse-coding/SAE analyses reportedly yield monosemantic-like directions spanning amino acids → motifs → domains → functions, enabling an “atlas” view of protein space and suggesting shared representation phenomena across modalities.
Vanilla self-play reward design can create ‘adversarial curricula’ that don’t teach useful skills.
If the conjecturer is rewarded for producing tasks that are merely hard, it can generate pathological, overly complex formal statements that break usefulness; the solver’s learning stagnates similarly to standard RL despite lots of synthetic task generation.
WORDS WORTH SAVING
5 quotesIf the full solution space F is F, training on known human solutions will limit you to some typical set H despite any feasible amount of test time compute or recursive self, um, improvement.
— Francois Chaubard
The actual paper title is right below that. But I mean, just a quick refresher, I'm sure everyone in this specific audience probably read Richard Sutton's famous article.
— Yasa Baig
So the promise for LLMs is I can take some-- I can train on a bunch of human data, I get to, like, human level, and then I can run loads of self-play and go far beyond that and hopefully solve really interesting problems with, with our models. But unfortunately, this is not how it works.
— Luke Bailey
Especially in voice, we care about this even more because from a human perspective, it's difficult to kind of actively catch hallucinations when you're listening to it compared to like when you're reading it over text.
— Arnab Maiti
You cannot fool this theorem prover.
— Robert George
High quality AI-generated summary created from speaker-labeled transcript.