Skip to content
No PriorsNo Priors

No Priors Ep. 29 | With Inceptive CEO Jakob Uszkoreit

"Biological Software" is the future of medicine. Jakob Uszkoreit, CEO and Co-founder of Inceptive, joins Sarah Guo and Elad Gil this week on No Priors, to discuss how deep learning is expanding the horizons of RNA and mRNA therapeutics. Jakob co-authored the revolutionary paper Attention is All You Need while at Google, and led early Google Translate and Google Assistant teams. Now at Inceptive, he's applying these same architectures and ideas to biological design, optimizing vaccine production, and magnitude-more efficient drug discovery. We also discuss Jakob's perspective on promising research directions, and his point of view that model architectures will actually get simpler from here, and be driven by hardware. 00:00 - Creating Biological Software 06:54 - The Hardware Drivers of Large-Scale Transformers 14:32 - Challenges in Optimizing Compute Allocation 23:25 - Deep Learning in Biology and RNA 32:49 - The Future of Drug Discovery 41:41 - Collaboration and Innovation at Inceptive

Elad GilhostJakob UszkoreitguestSarah Guohost
Aug 24, 202335mWatch on YouTube ↗

CHAPTERS

  1. 0:05 – 0:41

    Creating biological software: from Transformers to “compiling” RNA

    Elad introduces Jakob Uszkoreit’s background (Google, Transformers) and frames the episode’s central idea: treating biology as programmable software. The premise is that deep learning could help design RNA as a kind of executable substrate for medicine.

    • Jakob’s role in foundational Transformer-era research at Google
    • Core question: what if we could compile RNA like software?
    • Motivation to make medicines/biotech widely accessible
    • Positioning Inceptive at the intersection of AI and biology
  2. 0:41 – 4:16

    Why attention fit the hardware—and why that mattered for the Transformer breakthrough

    Jakob explains that the success of Transformers wasn’t just a theoretical idea; it depended on engineering for efficiency on parallel accelerators. He connects hierarchical structure in language to tree-like composition, motivating attention as a scalable parallel operation.

    • Deep learning progress often comes from faster/more efficient implementations
    • Language has hierarchical/statistical structure (tree-like) that models can exploit
    • Attention approximates evaluating combinations (quadratic interaction) repeatedly
    • Transformers won partly because they mapped cleanly onto accelerator parallelism
    • Engineering execution (implementation details) was as critical as the concept
  3. 4:16 – 6:01

    Are Transformers ‘locked in’ by GPUs? Co-designing architectures and accelerators

    Elad raises the concern that other architectures might outperform Transformers at scale but aren’t explored because hardware and funding are optimized for Transformers. Jakob argues GPUs aren’t necessarily optimal and that better hardware–model pairings may exist if pursued intentionally.

    • Emergent behaviors appear mainly at scale; architecture comparisons are compute-limited
    • Accelerator fit can determine what research gets tried
    • GPUs weren’t originally built for deep learning; may be suboptimal for current workloads
    • Key trade-offs: memory bandwidth, parallelism vs latency
    • Potential upside in exploring new hardware-model co-design spaces
  4. 6:01 – 6:51

    Chicken-and-egg progress: how hardware adapts to models (and vice versa)

    The discussion turns to whether future accelerators will be designed around current Transformer patterns or enable new model designs. Jakob notes newer accelerator designs increasingly account for Transformer-like workloads and highlights that simplified architectures can be surprisingly competitive.

    • Hardware/software co-evolution is inherently chicken-and-egg
    • Example: MLP-Mixer as a simplified vision model with competitive performance
    • Model simplification can open up alternative implementation paths
    • Efficiency isn’t the only driver; momentum and ecosystem matter too
  5. 6:51 – 7:57

    The hidden force behind adoption: optimism, momentum, and human cycles

    Jakob argues that community belief and energy (“optimism and hope”) helped Transformers win because they unlocked massive experimentation. Once an approach is perceived to ‘work,’ more people invest effort, which compounds progress even if early results required heavy tuning.

    • Research progress depends on human iteration cycles as much as compute
    • Community optimism fuels experimentation across tasks and domains
    • Perception of reliability changes the prior: people try more things and push harder
    • Many approaches can work—but only after sustained engineering effort
  6. 7:57 – 9:15

    Compute allocation is crude: why model effort should depend on problem difficulty

    Jakob identifies a core inefficiency: inference compute scales with prompt/response length rather than true task difficulty. He argues models need mechanisms to dynamically allocate more compute to hard problems and less to easy ones, instead of wasting resources based on formatting.

    • Today’s inference cost is driven by token length, not difficulty
    • Hard problems can be short (e.g., succinctly stated tasks) yet deserve more compute
    • Current systems can waste compute on easy tasks via long prompts/verbose answers
    • Need for ‘anytime’ or adaptive-compute behaviors during runtime
  7. 9:15 – 11:33

    Generated data, amortized compute, and why test-time search feels ‘clunky’

    The conversation connects adaptive compute to debates about training on generated data. Jakob reframes it as amortizing expensive compute over time, but calls it inelegant; ideally the system would allocate compute directly at inference rather than through repeated retraining or bolt-on search.

    • Information theory arguments often ignore compute/energy costs
    • Generated data can be viewed as amortizing prior compute across iterations
    • Iterative retraining to “spend more compute” over time is inefficient
    • Anytime algorithms should allocate effort where needed at runtime
  8. 11:33 – 13:09

    Elasticity beyond text: adapting compute to video resolution, length, and sampling rate

    Jakob generalizes the adaptive-compute problem to multimodal inputs, especially vision/video. He argues models should not automatically spend more compute just because input resolution or frame rate is artificially increased without adding new information.

    • Models struggle to gracefully handle variable resolutions, durations, and densities
    • Artificial upscaling/interpolation shouldn’t force proportionally higher compute
    • Current approaches can be wasteful for equivalent underlying tasks
    • Elasticity is a foundational efficiency frontier for future systems
  9. 13:09 – 15:12

    Depth-adaptive Transformers and test-time search: practical wins vs end-to-end learnability

    Elad references depth-adaptive approaches and test-time search loops (especially in code) as two active directions. Jakob sees test-time search as effective but hard to optimize end-to-end; and notes earlier adaptive-depth ideas (e.g., Universal Transformers) haven’t worked well enough to be widely adopted.

    • Depth-adaptive compute: promise, but limited practical adoption so far
    • Test-time search is powerful, especially with verifiable feedback (e.g., compilation)
    • Search-based methods can be hard to optimize end-to-end
    • If adaptive compute worked well enough, scarcity of compute would drive broad usage
  10. 15:12 – 17:54

    Why biology needs a new approach: deep learning as an alternative to full mechanistic understanding

    Jakob explains his long-standing interest in biology and the challenge of learning it outside formal training. He argues that deep learning at scale can help bypass incomplete inventories and weak predictive theories in biology, similar to how it transformed language tasks.

    • Biology remains partially unmapped: incomplete inventory of mechanisms
    • Even known mechanisms often lack practical predictive theory (protein folding example)
    • Deep learning can treat systems as black boxes given sufficient scalable IO
    • Goal is practical intervention, not necessarily complete conceptual understanding
  11. 17:54 – 20:57

    Inceptive’s focus: designing better mRNA as a programmable medicine platform

    Jakob shares the origin story (personal motivation, AlphaFold’s impact) and Inceptive’s initial focus on RNA—especially mRNA. He frames RNA as “biological software,” where models could compile high-level therapeutic intent into RNA sequences with better stability, efficacy, and manufacturability.

    • Motivation influenced by becoming a parent and seeing life’s fragility
    • AlphaFold 2 as proof that deep learning can crack major bio problems
    • RNA/mRNA as a “neglected step-child” with huge therapeutic potential
    • Vision: compile program-like specs into RNA molecules that execute in cells
    • mRNA platform scale and manufacturing advantages over proteins/small molecules
  12. 20:57 – 23:49

    From ‘print a protein’ to full programs: riboswitches, conditionals, and combinatorial explosion

    The discussion explores how RNA therapies could become more expressive than today’s ‘print statement’ vaccines. Jakob describes how features like self-amplifying RNA and riboswitches hint at conditional logic, making brute-force screening impossible and pushing toward generative design.

    • Today’s mRNA vaccines approximate simple “print this protein” behavior
    • Self-amplifying RNA and riboswitches enable richer, conditional behavior
    • Expressive therapeutic programs imply astronomical design spaces (screening won’t work)
    • Personalized cancer vaccines require many antigens per patient over time
    • Generative models (not explicit search) are needed to navigate the space
  13. 23:49 – 28:28

    Is understanding a hindrance? Empiricism, regulatory expectations, and black-box acceptability

    Elad and Jakob debate whether insisting on mechanistic explanations can slow progress when empirical efficacy is what matters most. They draw analogies to linguistics and historical pharmacology, noting many effective drugs worked long before mechanisms were understood, and suggesting deep learning revives functional/phenotypic thinking at scale.

    • Mechanistic discovery can be slow and may not be necessary for progress
    • Historical drugs (e.g., aspirin, metformin) were used before mechanisms were clear
    • Regulatory emphasis on mechanism can add hurdles without improving outcomes
    • Black-box models may outperform human-comprehensible theories for complex systems
    • Functional screens in biology mirror end-to-end empirical optimization
  14. 28:28 – 31:54

    Human augmentation and ‘general intelligence’: IO limits, evolution as pretraining, and skepticism about ‘general’

    The conversation closes with a detour into human augmentation and what ‘general’ means in AGI. Jakob argues evolution acts like pretraining (amortized compute), questions whether boosting human IO would help given brain constraints, and challenges the coherence of ‘general’ as a concept.

    • Bullish on long-term augmentation but skeptical of near-term intuitions
    • Brains may be optimized around IO constraints; extra bandwidth may not translate to capability
    • Evolution as pretraining; individual learning as fine-tuning with less data
    • Neuroplasticity suggests redundancy, but not necessarily true general-purpose compute
    • AGI terminology is problematic because ‘general’ is underspecified
  15. 31:54 – 35:23

    How Inceptive generates data: anti-disciplinary teams and many small wet–dry loops

    Sarah asks how Inceptive approaches wet-lab data generation to train models. Jakob describes building a new ‘anti-disciplinary’ discipline where assay design is central, and where computation and experimentation interleave through many tightly-coupled feedback loops—the work happening ‘on the beach’ where wet and dry meet.

    • Data generation via assays is a core competency, not a downstream step
    • Inceptive operates as an ‘anti-disciplinary’ team creating a new field
    • Instead of one big loop, the system contains many small interlocking loops
    • Models inform experiments; experimental readouts feed new models and instrument parameters
    • Wet lab + in silico boundaries intentionally blur to accelerate iteration

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.