No Priors Ep. 29 | With Inceptive CEO Jakob Uszkoreit

No Priors Ep. 29 | With Inceptive CEO Jakob Uszkoreit

No PriorsAug 24, 202335m

Elad Gil (host), Jakob Uszkoreit (guest), Narrator, Elad Gil (host), Sarah Guo (host)

Origins and design principles of the Transformer and attention mechanismHardware–software co-evolution and the limits of current accelerators for deep learningNeed for elastic, input-adaptive compute in modern AI modelsTest-time search, depth-adaptive transformers, and amortizing computeDeep learning as an alternative to mechanistic understanding in biologyInceptive’s vision of RNA as biological software and bytecodeData generation, assays, and the ‘wet–dry’ integration at Inceptive

In this episode of No Priors, featuring Elad Gil and Jakob Uszkoreit, No Priors Ep. 29 | With Inceptive CEO Jakob Uszkoreit explores transformer pioneer builds 'biological software' to reprogram life with RNA Jakob Uszkoreit, co-author of the Transformer paper and CEO of Inceptive, discusses the origins of the attention-based architecture and why its success is tightly coupled to modern accelerator hardware and community optimism. He argues that future AI progress must tackle elastic compute—models that dynamically adjust computation to problem difficulty and input complexity. Uszkoreit then outlines Inceptive’s vision of treating RNA as biological bytecode and medicines as compilable programs, using large-scale deep learning and custom assays instead of full mechanistic biological understanding. He suggests this black-box, end‑to‑end approach could dramatically expand the reach, scalability, and sophistication of medicines, especially mRNA-based therapeutics and vaccines.

Transformer pioneer builds 'biological software' to reprogram life with RNA

Jakob Uszkoreit, co-author of the Transformer paper and CEO of Inceptive, discusses the origins of the attention-based architecture and why its success is tightly coupled to modern accelerator hardware and community optimism. He argues that future AI progress must tackle elastic compute—models that dynamically adjust computation to problem difficulty and input complexity. Uszkoreit then outlines Inceptive’s vision of treating RNA as biological bytecode and medicines as compilable programs, using large-scale deep learning and custom assays instead of full mechanistic biological understanding. He suggests this black-box, end‑to‑end approach could dramatically expand the reach, scalability, and sophistication of medicines, especially mRNA-based therapeutics and vaccines.

Key Takeaways

Architectures must be tightly matched to hardware to unlock breakthroughs.

The Transformer’s success came not only from the attention idea but from implementations that perfectly fit GPU accelerators, enabling massive parallelism and practical scaling compared to more sequential architectures.

Get the full analysis with uListen AI

Current models waste compute by not adapting effort to problem difficulty.

Today’s LLMs use computation roughly proportional to prompt and output length, not task hardness, leading to over-spending on trivial queries and under-spending on succinct but computationally hard problems.

Get the full analysis with uListen AI

Training on generated data can be valuable by amortizing prior compute.

Although synthetic data doesn’t add Shannon information, it can reuse past computational work; retraining on generated outputs effectively concentrates more compute on similar problems over time.

Get the full analysis with uListen AI

Elasticity in multimodal models is an underexploited efficiency frontier.

Models currently scale compute with input size (e. ...

Get the full analysis with uListen AI

Deep learning enables powerful biology without full mechanistic understanding.

Uszkoreit argues that, as with language and many historical drugs, we can design effective biological interventions using data-driven, black-box models rather than waiting for complete, predictive theories of all underlying mechanisms.

Get the full analysis with uListen AI

RNA can be treated as a programmable substrate for medicines.

Inceptive views RNA, particularly mRNA, as biological bytecode that can be ‘compiled’ from high-level specifications (like code), enabling programmable vaccines and therapeutics with conditionals, amplification, and complex behaviors.

Get the full analysis with uListen AI

Intertwined experimental design and modeling is a new discipline.

Inceptive integrates assay design, high-throughput RNA synthesis, and deep learning into many small feedback loops, blurring the line between wet lab and computation and creating an ‘anti-disciplinary’ approach to biological software.

Get the full analysis with uListen AI

Notable Quotes

At the end of the day, the one thing we know really works in deep learning is making it faster and more efficient on given hardware.

Jakob Uszkoreit

The big question is, does it matter that we may never test architectures that don’t fit today’s accelerators?

Jakob Uszkoreit

Right now there’s no knob for a model to say, ‘This problem is hard, I should use more compute,’ versus ‘This is two plus two.’

Jakob Uszkoreit

We think of RNA as the equivalent of bytecode and what we’re doing is compiling biological programs into RNA molecules.

Jakob Uszkoreit

Maybe the hope to fully understand biology is actually holding us back; the ground truth is simply whether a treatment does more good than harm.

Jakob Uszkoreit

Questions Answered in This Episode

How might AI architectures change if we designed new accelerators from scratch specifically for elastic, task-adaptive computation rather than for matrix multiplications alone?

Jakob Uszkoreit, co-author of the Transformer paper and CEO of Inceptive, discusses the origins of the attention-based architecture and why its success is tightly coupled to modern accelerator hardware and community optimism. ...

Get the full analysis with uListen AI

What would a practical ‘programming language’ for medicines look like, and how would regulators evaluate safety for compiled RNA programs they don’t mechanistically understand?

Get the full analysis with uListen AI

Where is the line between acceptable black-box efficacy and required mechanistic insight in medicine, especially when side effects are rare but poorly understood?

Get the full analysis with uListen AI

How can the wet–dry integration model Inceptive uses be generalized to other scientific fields that depend on expensive or slow experiments?

Get the full analysis with uListen AI

What are the ethical and societal implications of treating life processes as software that can be compiled, updated, and distributed at global scale?

Get the full analysis with uListen AI

Transcript Preview

Elad Gil

What would the world look like if we could create biological software that allows us to compile RNA? That's the big question this week on the podcast. Sara and I are sitting down with Jakob Uszkoreit, co-founder and CEO of Inceptive. Jakob spent more than a decade at Google, where he co-authored the Attention Is All You Need paper, and several other papers as- set the foundation for today's AI revolution. He has also started and led the research teams that transformed Google Search, Google Translate, and Google Assistant. Now at Inceptive, he builds biological software with the aim to make widely accessible medicines and biotechnologies. Jakob, welcome to No Priors.

Jakob Uszkoreit

Thank you. Thank you for having me.

Elad Gil

Um, you worked at Google for more than a decade, working on many leading research teams. You were really seminal in the original Transformer paper, and I think, um, when it, you know, when I talk to the other authors of the Transformer paper, people sort of in the know at Google, you're widely credited with really coming up with the idea of focusing on attention, which was sort of the basis for the Attention Is All You Need paper. Could you talk a little bit more about how you came up with that and how the team started working on it and sort of the origins of that pretty foundational breakthrough in terms of the Transformer?

Jakob Uszkoreit

It- it's really not that simple, right? It's also really important to keep in mind that it- always in deep learning, you can't make something, in quotes, "really work" that is maybe pretty far, I would say, the theoretical or formal end without really going deep on the engineering and implementation side, and it just has to be efficient. At the end of the day, in my mind, that's the one and only thing we know really works if you want to push deep learning forward, is to make it faster and more effective and more efficient on a given piece of hardware. There's a lot of evidence that the way we actually understand language, and that's, uh, something that then shapes language in terms of its statistical properties, is actually somewhat hierarchical. And the best piece of kind of just circumstantial or anecdotal evidence for that is just looking at what the linguists do, right? They- they draw these trees. And while I don't think that they're ever really true, they're also definitely not always false. And so they do capture some of the statistics that are inherent in language, and- and probably language was actually ev- evolved this way in order to exploit our cognitive capacities really in a- in a fairly optimal way. And so you can safely assume that it is not necessary to go through the entirety of a sequential signal beginning to end, and maybe also end to beginning simultaneously, in order to understand it, but actually you can gain a lot of the understanding, in air quotes, by looking at individual groups of, say, your signal, right? And ultimately, if you now are given a piece of hardware that has the very key strength of doing lots and lots of simple computations in parallel as opposed to complicated structured computations sequentially, then really that's actually a kind of statistical property you really want to exploit, right? You want to, in parallel, understand pieces of an image first, and then maybe that's not possible in its entirety, but you can actually get a lot of it. And then only once you've done some of that, you put these incomplete understandings or representations together, and as you put them together more and more, that's when you disambiguate the last remaining, or that's when you get rid of the last remaining ambiguity at the end of the day. And when you think about what that process looks like, it's a tree, and when you think about how you would actually run something that evaluates all possible trees, then a reasonable approximation is that you repeat an operation where you look at all combinations of things, that's this quadratic step, right, that ultimately is at the core of this attention step, and then you effectively pull information in for a given representation of a given piece, all the other representations of all the other pieces, and rinse and repeat. And it seems intuitive, and it also seems intuitively clear that that's a really good fit for the kind of accelerators that we had at the time that we still have today. And so that's really where that idea came from, and if you want to look at, say, the biggest differences, for example, between the Transformer as it was described in the Attention Is All You Need paper and some of its ancestors like this decomposable attention model, the big difference is just that the Transformer was implemented by folks like Noam and Ashish, et cetera, in a way that's such an excellent fit for the accelerators that we had at the time.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome