Skip to content
No PriorsNo Priors

No Priors Ep. 29 | With Inceptive CEO Jakob Uszkoreit

"Biological Software" is the future of medicine. Jakob Uszkoreit, CEO and Co-founder of Inceptive, joins Sarah Guo and Elad Gil this week on No Priors, to discuss how deep learning is expanding the horizons of RNA and mRNA therapeutics. Jakob co-authored the revolutionary paper Attention is All You Need while at Google, and led early Google Translate and Google Assistant teams. Now at Inceptive, he's applying these same architectures and ideas to biological design, optimizing vaccine production, and magnitude-more efficient drug discovery. We also discuss Jakob's perspective on promising research directions, and his point of view that model architectures will actually get simpler from here, and be driven by hardware. 00:00 - Creating Biological Software 06:54 - The Hardware Drivers of Large-Scale Transformers 14:32 - Challenges in Optimizing Compute Allocation 23:25 - Deep Learning in Biology and RNA 32:49 - The Future of Drug Discovery 41:41 - Collaboration and Innovation at Inceptive

Elad GilhostJakob UszkoreitguestSarah Guohost
Aug 24, 202335mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:006:54

    Creating Biological Software

    1. EG

      What would the world look like if we could create biological software that allows us to compile RNA? That's the big question this week on the podcast. Sara and I are sitting down with Jakob Uszkoreit, co-founder and CEO of Inceptive. Jakob spent more than a decade at Google, where he co-authored the Attention Is All You Need paper, and several other papers as- set the foundation for today's AI revolution. He has also started and led the research teams that transformed Google Search, Google Translate, and Google Assistant. Now at Inceptive, he builds biological software with the aim to make widely accessible medicines and biotechnologies. Jakob, welcome to No Priors.

    2. JU

      Thank you. Thank you for having me.

    3. EG

      Um, you worked at Google for more than a decade, working on many leading research teams. You were really seminal in the original Transformer paper, and I think, um, when it, you know, when I talk to the other authors of the Transformer paper, people sort of in the know at Google, you're widely credited with really coming up with the idea of focusing on attention, which was sort of the basis for the Attention Is All You Need paper. Could you talk a little bit more about how you came up with that and how the team started working on it and sort of the origins of that pretty foundational breakthrough in terms of the Transformer?

    4. JU

      It- it's really not that simple, right? It's also really important to keep in mind that it- always in deep learning, you can't make something, in quotes, "really work" that is maybe pretty far, I would say, the theoretical or formal end without really going deep on the engineering and implementation side, and it just has to be efficient. At the end of the day, in my mind, that's the one and only thing we know really works if you want to push deep learning forward, is to make it faster and more effective and more efficient on a given piece of hardware. There's a lot of evidence that the way we actually understand language, and that's, uh, something that then shapes language in terms of its statistical properties, is actually somewhat hierarchical. And the best piece of kind of just circumstantial or anecdotal evidence for that is just looking at what the linguists do, right? They- they draw these trees. And while I don't think that they're ever really true, they're also definitely not always false. And so they do capture some of the statistics that are inherent in language, and- and probably language was actually ev- evolved this way in order to exploit our cognitive capacities really in a- in a fairly optimal way. And so you can safely assume that it is not necessary to go through the entirety of a sequential signal beginning to end, and maybe also end to beginning simultaneously, in order to understand it, but actually you can gain a lot of the understanding, in air quotes, by looking at individual groups of, say, your signal, right? And ultimately, if you now are given a piece of hardware that has the very key strength of doing lots and lots of simple computations in parallel as opposed to complicated structured computations sequentially, then really that's actually a kind of statistical property you really want to exploit, right? You want to, in parallel, understand pieces of an image first, and then maybe that's not possible in its entirety, but you can actually get a lot of it. And then only once you've done some of that, you put these incomplete understandings or representations together, and as you put them together more and more, that's when you disambiguate the last remaining, or that's when you get rid of the last remaining ambiguity at the end of the day. And when you think about what that process looks like, it's a tree, and when you think about how you would actually run something that evaluates all possible trees, then a reasonable approximation is that you repeat an operation where you look at all combinations of things, that's this quadratic step, right, that ultimately is at the core of this attention step, and then you effectively pull information in for a given representation of a given piece, all the other representations of all the other pieces, and rinse and repeat. And it seems intuitive, and it also seems intuitively clear that that's a really good fit for the kind of accelerators that we had at the time that we still have today. And so that's really where that idea came from, and if you want to look at, say, the biggest differences, for example, between the Transformer as it was described in the Attention Is All You Need paper and some of its ancestors like this decomposable attention model, the big difference is just that the Transformer was implemented by folks like Noam and Ashish, et cetera, in a way that's such an excellent fit for the accelerators that we had at the time.

    5. EG

      So one question that I've kind of heard people bring up is a lot of the behavior that we've seen in Transformers to some extent is most interesting at scale, right? You- you get interesting emergent properties.

    6. JU

      Yeah.

    7. EG

      And there may be other architectures that have equally interesting or perhaps more interesting properties at scale, but there's sort of two impediments. Number one is people just aren't throwing in a lot of money and compute at it, and two is the underlying accelerator architecture actually fits so well.

    8. JU

      Yeah.

    9. EG

      That is dramatically less performant to do other architectures, and therefore we may never actually test them. Do you think that's a true statement?

    10. JU

      I think that the big question is, does it matter? It would be really interesting to evaluate, especially if we can make them simpler, to evaluate combinations of different hardware and then models or architectures that fit like gloves to those. And I feel at the moment, given where GPUs came from, they weren't built for this, right? Why would it be that they are anywhere near optimal if- if at least they were engineered for this purpose and lots of people basically banged their head against walls until they had this kind of somewhat optimized... But that's not how the basic architecture came to be. And so you can talk a lot about and- and reason a lot about, and I think that some of that is true, the generality of basically really fast, scalable matrix multipliers and how that just does everything in scientific computing really well, sure, but there's still lots of bells and whistles and there are lots of specific trade-offs. Say, for example, things like memory bandwidth and ultimately inherent parallelism versus latency. I don't think GPUs are at the sweet spot when it comes to large scale deep learning w- with respect to exactly those trade-offs. And so it may very well be that if we actually try these combinations, we might actually even quickly find something that's better.

    11. NA

      When you think about how we get progress from here, usually people think of software as driving the hardware.

    12. EG

      ... right? Do you think we get accelerators designed for the large-scale transformer architectures we already have, or new hardware designs? Like, so it's chicken or egg a little bit here.

    13. JU

      It's, it's chicken and egg. And, and if you look at the newest accelerator designs, they are taking this into account to a significant extent, actually, increasingly. So there are a couple of interesting examples. We had a computer vision architecture that really was just an MLP, called Mixer. And while it wasn't significantly better, it also wasn't significantly worse than, than the vision transformers, right? And I think that already goes to show, it's not that difficult, and especially if you simplify on the way, it, it might really be a possibility. I will say one other thing, aside from efficiency, just really

  2. 6:5414:32

    The Hardware Drivers of Large-Scale Transformers

    1. JU

      raw efficiency in terms of its fit, the architecture's fit to the accelerator hardware, the other main contributor, I think, to the success of this architecture is optimism and hope. So suddenly, you were in a situation where, for whatever reason, a bunch of things that people tried with this started to work, and then more started to work. And that's not coincidence. It's really just because ultimately, the human cycles invested to getting all these things, all these diverse things to work are ultimately fueled by suspension of disbelief, and, aka hope or whatever you want to call it. And, and that really... I mean, the community became so energized so quickly, and then just tried everything under the sun. And because the prior was just a different one. The prior now was, "Oh, look, we have this thing where it just works," which is just not true. The reality is that you try something else the first time, and you really have to work hard for a long period of time. Then lo and behold, sometimes it works. And if you do that many more times, then it will work many more times. And I think that's really what we're seeing ultimately.

    2. EG

      W- where do you think people should invest that sort of optimism going forward? Like, what are the big areas that people need to work on to increase the performance of these systems or add memory or do other things that you feel are... If you, if you were to sort of paint the roadmap going ahead in terms of making these really valuable performance systems, what would you focus on?

    3. JU

      I mean, I think there's one thing that still boggles my mind in terms of just from first principles, that it can't be optimal, and that is that if you think about it, the way you today scale the compute that's invested in a given problem, right? Let's say the problem is what's the response to a prompt in some large language model? Then ultimately, the way you scale that compute depends on the prompt and how much, how long that is. The longer the prompt, the more compute you get. And it depends on, and there's of course many different screws to tweak here, the length of the response. There are many very hard problems where the response is incredibly short, and you can, in many cases, actually formulate those problems very, very succinctly. So you're not gonna be using a lot of compute, even though the problem we know is really, really difficult. Say, I don't know, prime factori- prime factorization. A problem like that, simply stated, big potential impact. And right, right now, there's no knob that you can easily tweak as a user, but also really there's no knob that the architecture can tweak itself when it comes to then basically deciding, oh, this is hard. I actually need to use more compute for this. And ironically, and this comes back to a question that many people ask, I think, around does it make any sense to train on generated data? Because information theory, sounding information theory very clearly says, "Nope, you're not gonna get more information out of it. You can do it all you want." But there is an artifact or there's an omission maybe even in that information, in that flavor of inf- information theory, which is, it doesn't take into account compute. It doesn't take into account actually the energy expenditure necessary to generate that data. So if you now think back to these problems, right? If you were to just let LLMs run, generate stuff, and then train new LLMs or even the same LLM actually on that output, what you do is you amortize compute that was expended at some point in time. And so now suddenly, right, you actually have models that if you retrain them over and over again, they're starting to spend more compute on the same problems, but it's amortized over all of these iterations, effectively, of the system. And that seems clunky. That just seems so clunky. That ultimately it should be something where at inference time, at runtime, the model effectively can decide or, or maybe even query, right? So there's this notion of any-time algorithms where it might just depend on your resources. If you have more time or more money, then let it run longer. But you don't want that to happen in cases where the answer or the ques- the problem in, in question is simple. You only want to do that in cases where it's actually hard. And that right now doesn't work, because if you pose a very, very simple problem, like two plus two to GPT-4 right now, and you write that in a very long-winded way in a prompt, and you ask GPT-4 to generate a very complicated answer, right? Then it will actually expend a ton of compute to add two to two, which makes no sense. And so that... I mean, out of all the different problems that I currently see in kind of at a, at a high level, because it's not clear how you would exactly address it, that is maybe the one that boggles my mind the most.

    4. EG

      Yeah. Are there other big research areas that you're excited about right now or areas where you, you see enormous progress being made?

    5. JU

      So in terms of foundations, I think different flavors of elasticity are, are really interesting. So you could actually claim that a lot of these questions boil down to the question that I just, uh, to basically this problem that I just, uh, uh, described, right? The compute is, in a certain sense, very crudely allocated. But you can look at different incarnations of this problem. So another one would be, why don't we have models that in an elegant way manage to consume, say, visual sensor output of different resolutions, different sampling rates, different durations, right? Right now, it's actually quite tricky to have, other than maybe recurrent architectures, a model that takes videos of different lengths, different image resolutions, or ultimately different densities, if you wish, and different sizes, and, and really elegantly adjusts compute.... to what you really want to know about the sort of the, how- how difficult it really has to be to generate the representations that you, that you need in order to do whatever you want to do. And here, again, an example for, that makes this, uh, thing pretty clear is you can take a video, you can scale it up, you can frame interpolate with trivial algorithms, and then run it again. And if the problem you're trying to solve conditioned on that video is the same, then I wouldn't want more computers to use. But right now, that's what's gonna happen. You're gonna use a ton more compute. And so, effectively, these types of, in a certain sense, kind of elasticity or- or- or the flexibility of these models, I believe the, our lack of techniques addressing those ultimately is incredibly wasteful.

    6. EG

      I've seen increasing attention around, like, two different concepts i- in these general directions. One is, um, I think it was, uh, I think it was some people at Meta that did depth-adaptive transformers.

    7. JU

      Mm-hmm.

    8. EG

      Right? So just adjusting the amount of computation for each input and, like, a prediction on that, right? And then, I don't know how much more work has gone in that direction, and then I- I think a- a number of people are more excited about doing test-time search, especially for problems like, uh, like code generation, where you can evaluate with compilation or something to sort of get loop of, loop of success in- in the model itself.

    9. JU

      I think it's super effective, the test-time search. I do think it's clunky, because it's not something that you can easily end-to-end optimize.

    10. EG

      Mm.

    11. JU

      So, right, basically this is also what I'm, what I was trying to get at a little bit maybe with, with saying, well, the, some of these efficiency improvements that we're not yet, yeah, really harnessing I believe would dramatically affect training time. And if you look at kind of how test-time search actually affects training, it's just a bit clunky, and I don't think we'll be able to optimize it as, as well. Although as an engineering in a certain sense, I don't know, hack, uh, could sound negative. That's not what I mean. I- I think it's, it's an awesome hack, uh, uh, as an engineering hack around this problem. It's, it's really, really effective. And basically comes back to this whole idea of amortizing compute in a certain sense, right? With the stuff you already have lying around and memoized, even though it was the humans

  3. 14:3223:25

    Challenges in Optimizing Compute Allocation

    1. JU

      that actually put it there, in many cases. In terms of adaptive, adaptive time transformers, et cetera, we tried this universal transformer thing actually a long time ago. It just hasn't caught on, and that's because it just doesn't work, right? At this point, it doesn't work well enough. It's not like it doesn't work at all, but if it worked really well, then because of the fact that compute right now is this incredibly scarce resource, we would see it everywhere. And, and I think what that tells us is... And I don't think here it's really just for a lack of trying, probably there's too little experimentation, but at least those known or- or proposed methods here, they just don't work well enough yet.

    2. EG

      So one thing that you've been working on for the last few years is Inceptive, which is really starting to focus on how can you apply machine learning and different aspects of software to biology. Could you share a little bit about the company, how you got interested in bio, and what you view are some of the interesting problems there?

    3. JU

      Yeah, so basically, I've always been interested in bio and know nothing about it. And that's a conundrum, because it's difficult to learn a lot about biology when you're not in school, and I didn't want to go back to school. But at the same time, it always felt like something where there's a lot of headroom in terms of efficiency, and actually also where maybe even alternative approaches, at least if what you are interested in is really solving acute problems, where there's maybe a dire need for alternative approaches. Alternative to basically biology the science that is trying to develop a complete conceptual understanding of how life works. I don't have very high hopes for humanity to develop that complete conceptual understanding to the level that we would need in order to do all the interventions we want to do. We don't really have great tools in our toolbox, or we didn't have them until somewhat recently as- as alternatives to understanding how it works, and then basically based on that understanding, fixing it, if it needs fixing. And I think now we have an alternative that's an extremely good match, and that's deep learning at scale, where really we can potentially, to a pretty large extent, if not entirely, whatever this even means, work around the following two problems. Number one is we don't know all the stuff that's going on in life, right? So we still just, uh, don't even have a complete inventory, let alone really understand all the mechanisms. And number two, we ultimately, even for the stuff that we do know, so far haven't really, in many cases, been able to come up with, yeah, sufficiently predictive theories to really make that understanding useful. A concrete example here is protein folding, right? Where basically even if you just act as if there are no chaperones, there's no other stuff in this environment in which folding or, or, or whatever you want to call it in which that process in which kind of the earliest kinetics during translation happen, even if you make that massively simplifying assumption, the- the theory just wasn't practical. And it seems like deep learning is at least potentially a really good answer to both of those aspects, because you can basically treat everything as a black box, and as long as you are able to observe that black box in terms of whatever input/output fast enough and at sufficient scale, you might go somewhere with that.

    4. EG

      So Inceptive is pretty stealthy. Is there anything you can share in terms of how you're applying deep learning or other techniques to, to biology in the context of the company?

    5. JU

      Yeah. My daughter was born, my first child, and just that entire process gave me a really fundamentally different appreciation for the fragility of life, and a really wonderful one, but also a pretty fundamentally different one. And so here we are, we have this new tool named AlphaFold 2 that solves one of these fundamental problems in, in structural biology. We have instances of a macromolecule family that's basically about to save the world, and I basically want to fix life, because I now have this wonderful daughter. It became clear that...... using the exact tools we had been working on at Google before, and applying those to this neglected step-child, mainly RNA, or more specifically at first mRNA, could have massive impact on the world. And ultimately, what we're trying to do is to design better RNA and at first mRNA molecules for a pretty broad variety of different medicines. Infectious disease vaccines are, I guess, maybe the obvious first example, given the COVID vaccines. But if you look at the pipelines of Moderna and BioNTech and all those companies, the at least potential applicability of RNA, mRNA more specifically, is, is near l-limitless. There's already now hundreds of programs underway in different stages of development. That number is expected to climb, hitting high triple digits before the end of the decade. And now we're talking about a modality that might end up, before the end of the decade, being the second or third biggest modality in terms of revenue, and potentially also in terms of impact. And if you now take that in terms of just trajectory and look at how suboptimal in a certain sense the mRNA vaccines were when you compare it to what's possible using RNA, just looking around in nature, looking at how severe the side effects were for what fraction of, of ultimately patients that, that received the vaccines, how few people comparatively really had access to any of those vaccines when they really were necessary and needed. And it seems like currently, if we look around in, in our toolkit, the only tool we have to potentially change that quickly is deep learning. So, at Inceptive we think of this now as something that you could call biological software, where mRNA and RNA in general is maybe the, the equivalent to byte code that then forms the substrate, forms like the, the actual stuff that the software is made of. And what you do is you learn models that allow you to translate biological programs, programs that might look like some bit of Python code, that specify what you want a certain medicine to do inside yourself, inside your cells, and translate those programs, compile them into descriptions of RNA molecules, that then hopefully actually do what you wrote, what you programmed them to do. And ultimately right now, if you look at mRNA vaccines, our programming language is just a print statement, right? Just print this protein. But you can easily imagine that with self-amplifying RNA as one example, and with riboswitches, so-called riboswitches, basically RNAs that change dramatically in structure or self-destruct in the presence of, say, a different small molecule or so, you can effectively have conditionals, you can have recursion. And as a computer scientist, you squint and you're like, "Oh, wow, okay, this is basically touring complete." Yes, MIO and d- you kind of have all sorts of tools now at your disposal to really build very, very complex, ultimately, medicines that then might also be produced, manufactured and distributed in a way that is much more scalable than anything that we've been able to do so far. Protein-based biologics oftentimes don't make it to the market because it's just not possible to manufacture them at scale. If we wanted to medicate everybody in the world with all the protein-based biologics that they could actually, that they should actually receive, the real estate on the planet wouldn't be enough to make all the stuff. But right now, if you look at RNA manufacturing and distribution infrastructure, we're gonna have six to eight billion doses two years from now, manufacturable and distributable across the globe. And that number is gonna go up really, really quickly. At Inceptive right now in our lab, we can actually print pretty much any given RNA. And that's just something you can't do with small molecules, you can't easily do with proteins, certainly not at scale. And that's not something that only matters when you have a product in your hand. If you want to treat this as a machine learning problem, you need to generate training data that doesn't already exist. And so you also really want to have scalable synthesis and manufacturing, which is unprecedented as a constellation.

    6. EG

      So your view is that you can actually search for the program that codes for, let's say, the COVID spike protein at a certain amount with different stability characteristics, with different immune reaction characteristics, that doesn't need code ch- uh, cold chain logistics, the condition of whatever cell type... I'm, I'm saying in the future, right? Not Inceptive today, but, but that's the goal.

    7. JU

      That is exactly right.

    8. EG

      Of, of all of the 10 to 630 variants.

    9. JU

      That's right. Yeah. And it's not se- I mean, ultimately it's not going to be a search, right? Just like today, the output of an LLM isn't coming out of a proper search procedure, right? It has to be a generation procedure-

    10. EG

      Mm-hmm.

    11. JU

      ... exactly in the same, in the same way and for the same reason as, as you basically see it in, in, in large language models or image generation models. But yeah, that's exactly the goal. Because

  4. 23:2532:49

    Deep Learning in Biology and RNA

    1. JU

      screening is just not gonna cut it. 10 to the 630th, and that's really just one antigen that we're coding for there when we actually want to code for many and update those for any given-

    2. EG

      For any given, yep.

    3. JU

      Exactly. When you do personalized cancer vaccines, it is going to be many antigens for each patient over time, right? And there's just no hope of basically tackling this with, with screening approaches at all.

    4. EG

      Yeah. I'm excited to just get to the right answer without having to understand or discover every single mechanic and do the mass expensive screens we have today.

    5. JU

      I mean, that's really the big question. Are we here maybe at a crossroads where the discovery and understanding is actually a hindrance? The hope to discover it and really get it, how this works, might actually be holding us back. And there is a pretty direct analogy to language understanding. The computational linguistics and linguistics in general tried this for a while to develop a sufficiently accurate and complete theory of language to make this really actionable.

    6. EG

      Yeah. When you talked about how transformer model works, for example, I actually was thinking about genomic sequencing where you used to do the sequential sequencing contig by contig, and you'd have these big chunks of chromosomes that you'd sequence through sequentially, and then eventually you moved to an, to a era where you just...... broke it up into tons and tons and tons of tiny little sequences that were randomly generated, and then you'd reassemble it with a machine, right?

    7. JU

      Yeah.

    8. EG

      And that felt like a very interesting parallel or analog to what you were talking about from a language perspective. It's effectively the same thing.

    9. JU

      It is, exactly. And, and, and the parallels are so striking, and they, they don't end there. So yeah, it's, it's, it's really, really interesting to see. And the, the, the invariant that, uh, that I feel just holds true across the board is that these formalisms that we make up in order to communicate our conceptual understanding or intuitive understanding and conceptualizing it explicitly is great for education. It is, it's, it's also great for many other types of maybe that reasoning about them. It might actually, because of our limited cognitive capabilities, really not be the right tool to actually really predict what's going to happen with a given intervention.

    10. EG

      Yeah. And I think the other point that I think really resonated in terms of what you mentioned was just if you look at drugs, especially traditionally, we actually didn't understand how most drugs worked until very recently. And so aspirin, we had no idea how it worked when it was, you know, taken out of the bark of a yew tree or whatever in the 1800s, and it was fine. Like people were fine taking these things, they had minimal side effects.

    11. JU

      Yeah.

    12. EG

      There's very popular drugs on the market like metformin that bind to multiple targets, we still aren't sure exactly how they work. And so a lot of the emphasis right now from a regulatory pathway for drugs is, oh, you need a mechanism of function or you need a proven pathway, and all these things that create hurdles that don't necessarily help with drug efficacy.

    13. JU

      And some of them might actually also be, in a certain sense, kind of, how should I say?

    14. EG

      It's a waste of time and money. If the thing works, it works.

    15. (laughs)

    16. JU

      Yes. It's a waste of time and money, and it might not even be true. And we have no way of telling.

    17. EG

      Yeah.

    18. JU

      Because in the end, the, the ground truth is, right, does it work and does it actually do more good than harm? And it's empirical. And, uh, and yeah, maybe there's really just, maybe that should be the focus.

    19. EG

      Yeah.

    20. JU

      And everything else should be treated as something that we should at least do after we, we get the first, uh, take the first step.

    21. EG

      And in that historical framing of we don't actually understand many of the things that have been most important in medicine, or if we've discovered their mechanisms after the fact, you know, the end-to-end like black box like deep learning pipeline approach seems a little more rational, a little less heretical, which I, I think on, upon first blush, it certainly is controversial.

    22. JU

      Yeah, I mean, the, the part that one can look at as blasphemous is that now suddenly you don't know the theory anymore that you're testing, right? And you might never, because it, it's not clear to us today, as far as I can tell, that if there is a theory in that black box today, that we could get it out. There are people trying, and I think it's worth trying. I- I'm not super optimistic about that. I think it'll work for some cases, right, where it's simple enough that we can get it. I think there are many cases where it just isn't, right? Like say climate and weather forecasting. I just don't think we're gonna get it. We're gonna get it in the sense that we understand, uh, I think we understand the Schrodinger equation and, and how that could be used intractably though in theory to just solve all these things. But that's not practical. And to develop a theory that is both, uh, predictive and practical here might just not be something we can put in our heads.

    23. EG

      Yeah. Uh, this, it's kind of interesting, because I actually feel like this again is the basis of a lot of traditional drug discovery from way back when, as well as just the basis for how you think about genetic screens, right? You basically do functional screens, so you'd mutagenize a bunch of organisms, you'd look for output, and then you'd say, okay, I've identified genes that are part of this pathway or output, and I can map in some ways how they're interacting with each other. But before molecular biology, we actually didn't understand anything from a function perspective, we just understood sequencing and output, right? And so it feels like deep learning is really just a throwback to other forms of biology that have been incredibly fruitful, but just with a new sort of technology and modality to interrogate these systems.

    24. JU

      Exactly.

    25. EG

      So how do you think about human augmentation in the context of all this stuff? You know, how bullish are you on human augmentation, and what forms do you think it'll take in the near term?

    26. JU

      I'm very bullish on human augmentation in the very long term, but it's one that I don't see intuitively... I think, uh, looking at our brains even just physically, they seem to be very focused, and this is not surprising, on our IO. And why would there somewhere in there be some kind of computational capacity that if we just boosted our IO by a few orders of magnitude could still cope? Why would evolution put that there? I don't know why. And, and so yes, you could argue, you know, maybe to do long-term planning tasks and so on and so forth. But sure, right, let's bound it a lifetime. So right, it, it's just not so clear whether there would've been any evolutionary pressures to really make our capacity there much bigger than say some multiplier, basically time on our IO capacity.

    27. EG

      If you look at the number of tokens that you use to train an LLM, and then you look at the number of tokens or words that are used to train a, a kid, right, a child, a human baby or a human toddler, I mean, a human toddler is probably exposed to what? Hundreds of thousands, maybe millions of words before they can speak, like fluently.

    28. JU

      But I, I think that's because we confuse fine-tuning and pre-training. Pre-training is all about evolution.

    29. EG

      Sure.

    30. JU

      And then basically you arrive at this thing that it's maybe the doing something that's completely, in a certain sense, a completely irrelevant task at first, but it has all the capacity in there to then with comparatively small amount of data, maybe it's something between, right? But be then fine-tuned towards something that we would regard as oh, so advanced cognitively.

  5. 32:4941:41

    The Future of Drug Discovery

    1. JU

      that we're using to generate the data that we need to then train the models, in a certain sense, is at the core of this discipline, if you wish. Because the experiments or the assays that we're running, they use the models that we're training on the data that their predecessors actually produced. And so really, if you squint, then in a certain sense, I guess there, there was always this dream of, and I think it's a pipe dream, of having this cycle between experimentation, and then you put that into some, something in silico, something running on computers, and then that informs the experiments, and then you kind of iterate that cycle. I think that's just... It would be beautiful and simple and nice. I don't think it's really that easy. And so what you see at Inceptive is actually, there's not that one cycle, although maybe now somewhere hazily there actually is that cycle too, but by design actually, there are tons of little cycles. So right, you, you start an assay, and the first thing you do is actually you query a neural network, and then you do some stuff, and then you get certain readouts. And those, you then, together with some other stuff, feed into yet another model, and then that actually gives you parameters for some instruments. And then you run that instrument on the stuff that you've created. And so it's really just this kind of giant mess, where the boundary actually is increasingly blurry. And so we actually think that our work happens on the beach, because that's where the wet and the dry meet in harmony.

    2. SG

      Ah. Huh.

    3. JU

      And so initially, folks join Inceptive, and they usually, most of them, they come from say, either "side," right? They've spent most of their careers working on deep learning or maybe robotics or biology. But ultimately, it doesn't take them that long to start speaking some weird kind of creole of all of these languages, and also thinking these ways. And what then happens is magic. It's really amazing, because then you suddenly find solutions to problems that, say, the biologist they were two years ago just wouldn't even think about. And they work together with folks they would have otherwise maybe never even met, and the results sometimes don't work at all, but sometimes they really are magical.

    4. SG

      That's a, that's a really inspiring note to end on. Thanks, Jakob.

    5. JU

      Thank you.

    6. SG

      Find us on Twitter @nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way, you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.

Episode duration: 35:23

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode xVFNfBaYAwQ

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome