No Priors Ep. 3 | With Stability AI’s Emad Mostaque

AI-generated images have been everywhere over the past year, but one company has fueled an explosive developer ecosystem around large image models: Stability AI. Stability builds open AI tools with a mission to improve humanity. Stability AI is most known for Stable Diffusion, the AI model where a user puts in a natural language prompt and the AI generates images. But they're also engaged in progressing models in natural language, voice, video, and biology. This week on the podcast, Emad Mostaque joins Sarah Guo and Elad Gil to talk about how this barely one-year-old, London-based company has changed the AI landscape, scaling laws, progress in different modalities, frameworks for AI safety and why the future of AI is open. 00:00 - Introduction 02:00 - Emad’s background as one of the largest investors in video games and artificial intelligence 07:24 - Open-source efforts in AI 13:09 - Stability.AI as the only independent multimodal AI company in the world 15:28 - Computational biology, medical information and medical models 23:29 - Pace of Adoption 26:31 - AGI versus intelligence augmentation 31:38 - Stability.AI’s business model 37:44 - AI Safety

Sarah GuohostEmad MostaqueguestElad Gilhost

May 3, 202345mWatch on YouTube ↗

EVERY SPOKEN WORD

90 min read · 18,034 words

0:00 – 2:00
Introduction
1. SGSarah Guo
  (music plays) Emad, welcome to No Biased.
2. EMEmad Mostaque
  Thank you for having me on, Sara. You're loud.
3. SGSarah Guo
  Let's start with a personal story. You have a background in computer science, and you were working in the hedge fund world. Uh, that's a, a hard left turn, or it looks like it, from, um, that world to being a driving force in the AI state of the art. How did you end up working in this field?
4. EMEmad Mostaque
  Uh, yeah, I've always been interested in kind of AI and technology. Um, so on the hedge fund, I was one of the largest investors in video games and artificial intelligence. But then my real interest came when my son was diagnosed with autism, and, uh, I was told there was no cure or treatment. And I was like, "Uh, well, let's try and see what we can do." So I built up a team and did AI-based literature review, this was about 12 years ago, of the existing, um, treatments and papers to try and figure out commonalities, and then did some, uh, kind of, uh, biomolecular pathway analysis of neurotransmitters for drug repurposing, and came down to a few different things that could be causing it. You know, worked with doctors to treat him, and he went to mainstream school, and that was fantastic. Went back to running a hedge fund, won some awards, and then I was like, "Let's try and make the world better." And so the first one was, uh, non-AI enhanced education tablets for refugees and others. Um, that's Imagine Worldwide, my co-founder's charity. And then in 2020, COVID came, and I saw something like autism, a multi-systemic condition, the existing mechanisms that extrapolated the future from the past wouldn't be able to keep up with, and thought, "Could we use AI to make this understandable?" And so I set up an AI initiative with the World Bank, UNESCO and others, to try and understand what caused COVID, um, and try and make that available to everyone. Then I hit the institutional wall (laughs) in a variety of places and realized that, uh, the models and technologies that had evolved were far beyond anything that happened before, and there were some interesting arbitrage
2:00 – 7:24
Emad’s background as one of the largest investors in video games and artificial intelligence
1. EMEmad Mostaque
  opportunities, uh, from a business perspective, and more than that, a bit of a moral imperative to make this technology available to everyone, 'cause we're now going to very narrow superhuman performance, and, uh, everyone should have access to that.
2. SGSarah Guo
  Uh, it's an amazing journey and, uh, congratulations on all the impact you've already had. So as you say, um, or as you imply, the AI field in recent years has been increasingly driven by labs and private companies, and one of the most obvious paths to performance progress is to just make models bigger, right? Scaling data parameters, GPUs, which is very expensive. Um, and then in, in reaction, just to set the stage a little bit, there's been some efforts over the, uh, previous years to be more community-driven and open and build alternatives like Eleuther. How did you start engaging in that, and how did Stability change the game here?
3. EMEmad Mostaque
  Yeah. So when I was doing the COVID work, um, you know, we tried to get access to various models. In some cases, the companies blew up. Other cases, we weren't given access, despite it being a high-profile project. And so I started supporting Eleuther AI as part of the second wave. Um, so, you know, Stella and Conor and others kind of led it, um, on the language model side. But really, one of my main interests was the image model side. Uh, I have aphantasia, so I can't visualize anything in my brain, which is more common than people would think. In fact, a lot of the developers in this space have that. Like, we've got nothing in our brain.
4. SGSarah Guo
  You just see words? What do you... What, what's in there?
5. EMEmad Mostaque
  Just feelings.
6. SGSarah Guo
  Okay.
7. EMEmad Mostaque
  So like, again, I thought it was a metaphor. Imagine yourself on a beach. I was like, "Okay, I feel a beach." No-
8. SGSarah Guo
  (laughs)
9. EMEmad Mostaque
  ... apparently, you guys have pictures in your heads. It must be like just disconcerting.
10. SGSarah Guo
  (laughs)
11. EMEmad Mostaque
  Um, but then with the arrival of CLIP released by OpenAI a couple of years ago, um, you could suddenly take generative models and guide them to text prompts. So it was VQ-GAN, which is kind of the slightly mushy, more abstract version first, but I built a model for my daughter while I was recovering, ironically, from COVID, and then she, uh, took the output and sold it as an NFT for three and a half thousand dollars and donated it to India Code Relief, and I was like, "Wow, that's crazy." Uh, so I started supporting the whole space at Eleuther and beyond, giving jobs to the developers, compute for the model creators, funding the various notebooks from disco diffusion to these other things. You know, giving grants to people like Midjourney that were kind of kicking this off.
12. SGSarah Guo
  Just personally?
13. EMEmad Mostaque
  Just personally. They were doing all the hard work. And I was like, "Can I capitalize this? 'Cause this is good for society." Uh, then about 15 months ago, I was like, "While these communities are growing, it'd be great if we could create this as a common good." And originally I thought, uh, you got communities, you gotta make them kind of coordinated. Could a DAO work or a DAO of DAOs? And that's how Stability started. After about a week, I realized that was not gonna work. (laughs)
14. SGSarah Guo
  (laughs)
15. EMEmad Mostaque
  And it was incredibly difficult. So then I figured out commercial open source software could be the way, um, to create aligned technology, not just an image, but beyond, that would potentially change the game by making this stuff accessible. Because, as you said, one of the key things, this is in the State of the AI report, um, this is in AI Index as well, is that most research has been subject to scaling laws and other things. Uh, transformers seem to work for everything. And so it was moving more and more towards private companies. But the power of this technology is double-edged. One is that there are fears about what could go wrong, so it's not released. And the other one is, why not keep it for excess returns, right? Um, so you've had this massive brain drain occurring and no real option. You work in an academic lab, you have a couple of GPUs, or you go and work at big tech, or slash OpenAI, or you set up your own startup, which, uh, is very, very difficult, as you guys know. So I wanted to create another option, and that's what we did with Eleuther, uh, and Stability and the other communities that we have grown and incubated.
16. EGElad Gil
  Could you talk more broadly about, um, why you think it's important for...... there to be open source efforts in AI and h- what your view of the world is bes- because I think, um, Stability has really helped create this alternative to a lot of the closed ecosystems, particularly around image gen, protein folding, a variety of different areas, and those are incredibly important efforts. So I'd just love to hear more about your thoughts on, you know, why is this important, how y'all view the participation of the industry over time, and also what you think the world looks like in, you know, five years, 10 years, et cetera, in terms of closed versus open systems.
17. EMEmad Mostaque
  So I think there's a fundamental misunderstanding about this technology because it's a very new thing, right? Classical open source is lots of people working together with a bit of direction that's a bit chaotic, but then you've seen Red Hat and other things emerge from this. There aren't many people that train these models, right? Like, we don't invite the whole community and you have 100 people training a model. It's usually 5 to 10, plus a supercomputer and a data team and things like that. And the models when they come out are a new type of programming primitive infrastructure, because you can have a Stable Diffusion that's two gigabytes that deterministically converts a string into an image. That's a bit insane, and that's what's led to the adoption here. You know, on GitHub stars, we've overtaken Ethereum and Bitcoin cumulatively. It took them 10 years. We got there in, like, three, four months. If you look at the whole ecosystem, it's the most popular open source software ever, not just AI. Why? Because again, it is this new
7:24 – 13:09
Open-source efforts in AI
1. EMEmad Mostaque
  translation file, and you do the pre-compute, as it were, on these big supercomputers, which means the inference required to create an image is very low, and that's not what people would have expected five years ago, or to create a ChatGPT output. So as infrastructure, I think that's how it should be viewed. And so my take was that what would happen is everyone would be closed because you needed talent, data, and super compute, and those would be lacking, (laughs) as it were, so it would be the big companies only. They would go four or five years, and then someone would defect and go open source, and it would collapse the market as they would commoditize everyone else's complement. So similar to Google offering free Gmail and all sorts of stuff around their core business. But more than that, I realized that governments and others would need this infrastructure, because if a company has it privately, they will sell to business-to-business and maybe a bit of B2C, but we've seen the Cambrian explosion of people building around this technology. But who's building the Japan model or the India model or others? Well, we are. And then that means that you can tap into infrastructure spending, which is very important because it needs billions, but the reality is, that's actually a small drop in the ocean. Self-driving cars got $100 billion of investment, Web3, hundreds of billions, 5G, trillions. And for me, this is 5G level. So from an ethical, moral perspective, I was like, we've got to make this as equitably available as possible. From a business model perspective, I thought it was a good idea as well, but I thought we were held here inevitably, so I decided to create Stability to help coordinate and drive this forward in what's hopefully a moral and reasonable way. Like, you know, the decisions that we make have a lot of input and they're not easy, but we are trying to be kind of Switzerland in the middle of all of this and provide infrastructure that will uplift everyone here.
2. EGElad Gil
  What do you think this world looks like in five years or 10 years? Do you think that there's a mix of closed and open source? Do you think the most cutting-edge models, the- the giant language models are gonna be both? Or do you think, like, capital will eventually become such a large obstacle that it'll make, um, the private world more likely to drive progress forward? And I know you have plans in terms of how to offset that, but I'd just love to hear about those.
3. EMEmad Mostaque
  The reality is, we have more compute available to us than Microsoft or Google. So I have access to national supercomputers, and I'm helping multiple nations build exascale computers. So to give you an example, we just got a seven million hour grant on Summit, one of the fastest supercomputers in the US. And like I said, we're building exascale computers that are literally the fastest in the world. Private companies don't have access to that infrastructure, uh, because governments, thanks to us, are realizing that this is infrastructure of the future. So we have more compute access. We have more cooperation from the whole of academia than all of them do because their agreements tend to be commercial. There's no way that private enterprise can keep up with us, and our costs are zero as well when you actually consider that. Whereas they have to ramp up tens of billions of dollars of compute. So my take is that foundation models will all be open source for the deep learning phase, because we've actually got multiple phases now. The first stage is deep learning. That's creating of these large models. And we will be the coordinator of the open source. The next stage is the reinforcement learning, the instruct models, FLAN-PaLM or InstructGPT or others. That requires very specified annotation, and that's something that private companies can excel in. The next stage beyond that is fine-tuning. So actually, let's give a practical example. PaLM is a 540 billion parameter model. It achieves about 50% on medical answers, right? FLAN-PaLM is the instructed version of that, and that achieved 70%. Med-PaLM, they took medical information, they fed it in, this is a recent paper from a few weeks ago, achieved 92%, which is human level on the answers. And then the final stage for that is you take this Med-PaLM and you put it into clinical practice with human-in-the-loop. For me, the private sector will be focused on the instruct-to-human-in-the-loop area, and the base models will be infrastructure available to everyone on an international generalized and national basis, particularly because when you combine models together, I think that's superior to creating multilingual models. So that's quite a bit there, and I'm sure you want to unpack that.
4. EGElad Gil
  Yeah. That's very exciting. Yeah. Could you actually talk about the range of things or efforts that are going on at Stability right now? I know that you've done everything from these foundation models on the lang- on the language side, protein folding, image gen, et cetera. If you- if you could just kind of explain what is the spectrum of stuff that Stability, um, does and supports and works with, and then what are the areas that you're putting the most emphasis behind going forward?
5. EMEmad Mostaque
  Yeah. So I think we are the only independent multimodal AI company in the world. So you have amazing research labs like FAIR, uh, Meta and others, and DeepMind doing everything from protein folding to language to image. And there are cross-learnings from all of these. Um, basically we do...... yeah, everything from audio, to language, coding, models, any kind of almost private model, we are looking at what the open equivalent looks like. And that's not always a replication, right? So with stable diffusion, for example, we optimized it for a 24 gigabyte vRAM GPU. Now, as of the release of distilled stable diffusion, it will run in a couple of seconds on an iPhone and we have neural engine access. Because our view of the future is creating models that aren't necessarily bigger, but that are customizable and editable. So this is a bit of a different emphasis, and we think that's a superior thing for scale- than scaling. I think things like the Chinchilla paper, that's the 67 billion parameter model that's as performant as GPT-3 at 175 billion, are important in that 'cause it said that training more is important. And actually when you dig into it, it
13:09 – 15:28
Stability.AI as the only independent multimodal AI company in the world
1. EMEmad Mostaque
  actually said data quality is important 'cause now we're seeing that the first stage, the DL stage as it were, the deep learning stage is (laughs) let's use all the tokens on the internet, (laughs) you know? But maybe we can use better tokens. And that's what we see when we instruct and use reinforcement learning with human feedback. And we've also been releasing technology around that. So our CARPER lab, Representative Learning, we released our Instruct Framework that allows you to instruct these big models to be more human. The way I kind of put it is that the, our focus is thinking what are the foundation models that will advance humanity, be it commercial or not? What needs to be there and what's very susceptible to this transformer-based architecture that takes about 80% of all research in the space? Making that compute and knowledge and understanding of how to build these models available to academia, independent research, and our own researchers. And then from a business perspective, really focusing on where are our edges, and our edges in two areas. One is media, and so this is why image models, video models, and audio models have been a focus, 3D soon as well. And the other area is private and regulated data, because what's the probability that a GPT-3 model weight or a Palm model weight will be put on prem? It's very low, versus an open model, it's very high and there's a lot more valuable private data than there is public data. So it is a bit of everything, but like I said, there are certain focuses on the business side, on media, and then I think on a breakthrough side, computational biology will be the biggest one.
2. EGElad Gil
  That's really cool. And on the computational biology side, I guess there's a few different areas. There's things like protein folding, and then to your point, there's things like Med-PaLM. Are- are you thinking of playing a role in both of those types of models in terms of both the medical information?
3. EMEmad Mostaque
  Yes. We will release an open Med-PaLM model, well, Med stable GPT. Um, and then protein folding, we are the, one of the key drivers of OpenFold right now. So we just released a paper on that, um, much faster ablations than AlphaFold. We're doing as well, um, DNA diffusion, uh, for predicting the outcome of DNA of sequences. We have BioLM around c- taking language models for chemical reactions, and that's an area that we will aggressively build because there's a lot of demand from the computational biology side for some level of standardization there. There have been initiatives like Melody and others looking at
15:28 – 23:29
Computational biology, medical information and medical models
1. EMEmad Mostaque
  federated learning, but there is a misalignment of incentives in that space that I think we could come in and fix. And I think that's where we really view ourselves. How can you really align incentive structures and create a foundational element that brings people together? And I think that's where we are most valuable, 'cause private sector can't do it that well, public sector can't do it that well. A mission-oriented private company that has this broad base and all these areas could potentially.
2. EGElad Gil
  Yeah, I think also the- the global nature of your focus is really exciting because when I look at things like medical information or medical models, um, you know, ultimately the big vision there, which a number of people have talked about for decades (laughs) at this point is that you'd have a machine that would allow you to have very high access to care and medical information no matter where you are in the world. And especially since you can take images with your phone and then interpret them with different types of models and then have like an output, you know, you should, if you have a cardiac issue, you should have care equivalent to the world's best cardiologist from Stanford or you know, you name the center of excellence available to anybody in the world, whether they're, um, rich, poor, developing country or not, et cetera. And so, you know, it's very compelling to see this big wave of technology and sort of the things that may be able to enable including some of the things that you mentioned around AI and medicine, so...
3. EMEmad Mostaque
  I think it's very interesting as well because this technology is being adopted so fast. I mean, let's face it, Microsoft and Google, $2 trillion companies have made it core of their strategy, which is crazy insane for a technology that's basically five years old, let's say two years old really breaking through because it can adapt to existing infrastructure, (laughs) you know? Like it sits there and it absorbs knowledge when you fine tune it. But then my thing is I look to the future and I'm like that best doctor, which bits of that should be infrastructure for everyone and which bits of that should be private? And so that's how I kind of orient my business. I look to the future, I come back and I think what should be public infrastructure and how can I help build that and coordinate that? And that's valuable. And then everything else other people can build around.
4. EGElad Gil
  How do you think about the traditional pushback that's existed in the medical world around some of these technologies? So for example, you know, the first time an expert system or a computer could actually outperform, um, Stanford University physicians at predicting infectious disease was in the 1970s with this Mycin project where they literally trained an expert system or designed an expert system to be able to predict an infectious disease. But here we are almost 50 years later with none of that technology adopted. And so do you think it's just we have to do a lot of human in the loop things and it's a doctor's assistant and that'll be good enough? Do you think it's just a sea change, there aren't enough physicians? Like what's the, what do you think is the driver for the technological adoption in something so important today?
5. EMEmad Mostaque
  So I think the infrastructural barriers are huge for adoption of technologies, particularly when private sector. Um, I think there is a new space of open source technology adoption that could be very interesting and a willingness now that people kind of understand this, which wasn't there even 10 years ago.... you know, the nature of open source. Now it runs the world's servers and databases, and I think there's another level of open source which is open source complex systems as it were. Um, previously in other discussions, I've talked about our education work. So right now, we're deploying four million tablets to every child in Malawi. By next year, we'll have hundreds of millions of kids hopefully that we deploy too. It's not just education, it's healthcare. And it's working with the governments, it's working with multilateral to say, "Can we build a healthcare system from the bottom up that can do all of these things without an existing infrastructure?" 'Cause they don't have an existing infrastructure, it's one doctor per 1,000 kids, 10,000 kids, one teacher per 400 kids. I am certain that system will outperform anything in the West within five years, which is crazy to say. But then our Western systems can then take bits of that and adapt to it because I think this competitive pressure is required, because Western systems are very hard to change. And in the UK, we've done that with HDR UK, the genomic banks and others, and that was a massive uphill (laughs) battle, as you know, to get these technologies adopted because I mean, it should. There should be barriers to adoption of this technology when it comes to things as important as healthcare. But at the same time, I think now is the time to open it up.
6. SGSarah Guo
  Yeah, I think there is an interesting loose analogy to different pace of adoption of different technologies in, in different geos in the past, right? So one that comes to mind is, um, today, I think it's very commonplace amongst, uh, consumer internet investors to look at what's happened with mobile in East Asia, um, as a precursor to interactions that might happen here. And, you know, mobile technology advanced much more rapidly in China, Korea, many other places. One, because of private partner, uh, private-public partnership and, and two, because, you know, there were, um, there was more, I guess, greenfield in terms of access to information and different infrastructure that supported mobile as the primary communication medium, and I, I, I could certainly see that happening with some AI, uh, native products.
7. EMEmad Mostaque
  I think that's an excellent point. I agree 100%. I think just as they leapfrogged to mobile, a lot of the emerging markets, Asia in particular, will leapfrog to generative AI or personalized AI. And I can see this because I'm having discussions with the governments right now. (laughs) Um, like, what is the reaction? Over the Christmas holiday, I was getting a few hours of sleep finally. I got, like, six calls from headmasters of UK schools saying, "Emad, what is our generative AI strategy?" And I was like, "Your what?" And they were like, "All our kids are using ChatGPT to do their homework." (laughs) And so it's kind of one of the first little moments, an amazing interface that OpenAI built, it's going mainstream, and I was like, "Well, get good, you know, stop assigning essays."
8. SGSarah Guo
  (laughs)
9. EMEmad Mostaque
  So now in some of the top private schools in the UK, they actually have to write the essays during the lessons without computers, which I think is wrong 'cause my discussions in an Asian context, for example, with certain leading governments that are about to put tens of billions into this space, they're embracing the technology and they're like, "How can we have our own versions of this? And how can we implement this to help our students get even better," right? Because also it's very... Even though there might be bureaucracy in some of these nations, if they wanna get something done, they get it done. And this technology is very different in that the costs are not continuous like a 5G network, uh, like the CapEx profile and other things are very different. Like, you know, you can say it costs $10 million to train a GPT, it doesn't cost that much anymore. That's really valuable if you can have a ChatGPT for everyone, like the ROIs are huge. So yeah, I do think that a lot of these nations, like the African context is one that we're driving forward with education as a core piece, and right now we're teaching kids with the most basic AI in the world literacy and numeracy in 13 months on one hour a day in refugee camps. That's insane. That's already better, it's gonna get even better. But I think Asia in particular, they're going to go directly to this technology and embrace it fully. And then we have to have a question, if you're not embracing this in the West, in America and the UK, you're gonna fall behind because ultimately this can translate between structured and unstructured data quicker than anything.
10. SGSarah Guo
  I'd like to see, uh, what, uh, you know, pace of adoption we can have in the United States of this technology, uh, as well, but, um, but I, I can see, uh, the, the prediction coming true. If we just go back to the most advanced, like, mature use case, uh, within stability and s- and as you said, media as an advantage, what does the future of media look like? And actually, if- even if we go back be- before that, you know, you're involved in, um, sort of early ecosystem efforts, uh, with Luther and such, how did you even identify that this was an area of interest for you versus everything else going on across modalities?
11. EMEmad Mostaque
  So, you know, I've always been interested in meaning, like, uh, semantic is even part of my email address and, uh, that's my religious studies as well around epistemology and ethics ironically. Um, the way that I viewed it is that the easiest way for us to communicate is what we're doing right now, via words, right? And that's
23:29 – 26:31
Pace of Adoption
1. EMEmad Mostaque
  held constant, but now we can communicate via phones and podcasts or whatever, and it's nice. Writing was more difficult and the Gutenberg Press made it easier, but visual communication is incredibly difficult, be it a PowerPoint, which is visual communication, or art, which is visual communication, and then you have video and things like that which are just impossible. Now you have TikToks and others making it easier. And I saw this technology and I was like, "If the pace of acceleration continues, visual communication becomes easy for everyone." Like my mum sending me memes every day telling me to call more or kind of whatever.
2. SGSarah Guo
  (laughs)
3. EMEmad Mostaque
  And I'm like, "That's amazing because that creation will make humanity happier." Like you see art therapy, that's visual communication and it's the most effective form of therapy. What if you could give that to everyone? So there was that aspect to it, but then I saw movie creation and things like that. So my first job was actually organizing the British Independent Film Awards and being a reviewer for the Raindance Film Festival. So, uh, you know, every year I put a movie on for my birthday and we give the proceeds to charity, get to see my favorite movie with my friends, it's really cool. And, uh, then I was the biggest video game investor in the world at one point.So, these types of communication and interaction are really interesting. And I thought that people really misunderstood the metaverse UGC, and the nature of what could happen if anyone could create anything instantly. It's not gonna be a world for everyone, or a world that everyone visits. It's gonna be everyone sharing their own worlds and seeing the richness of humanity. And again, I thought that was an amazing ethical/moral imperative for making humanity better, but also an amazing business opportunity. Because the nature and way that we create media will transform as a result of this technology, and we're seeing it right now. We have amazing apps like Descript, right, where you could take this podcast and you can edit it with your words live. You know? You have amazing kind of gaming things come out where you create assets and instances, or you know, some of this new 3D NeRF technology where you can reshoot stuff. We are working with multiple movie studios at the moment who are saving millions of dollars just implementing Stable Diffusion by itself, let alone these other technologies. And that was, for me, tremendously exciting to allow anyone, not to be creative, 'cause people are creative, but to access creativity and then allow the creatives to be even more creative and tell even better stories.
4. SGSarah Guo
  So, uh, I believe Sam from OpenAI said they don't think image generation is kind of like core on the path to AGI. Um, it's obviously really important to, uh, you personally and to Stability. H- uh, tell us about your stance on AGI and if that's part of the Stability mission.
5. EMEmad Mostaque
  Yeah, I don't care about AGI except for it not killing us. I mean, like, they can care about it. Um, my thing, what I care about is intelligence augmentation. You know, this is the classic kind of, uh, MemX type of thing. How can we make humans better? Like, our mission is to build the foundation to activate humanity's potential. Um, so look, AGI is fine. Um, again, we have to have some things around that. I do believe that they are incorrect around multimodality being (laughs) or images being a core component of
26:31 – 31:38
AGI versus intelligence augmentation
1. EMEmad Mostaque
  that. Um, but like, I think there are two paradigms here. One is stack more layers, and I'm sure GPT-4 and PaLM 18 and all these things will be amazing, stacking more layers and having better data as well. But like, one of the things we saw, for example, Stable Diffusion, we were kind of... We put it out together, and then people trained hundreds of different models. When you combine those models, it learns all sorts of features like perfect faces and perfect fingers and other things. And this kind of is related to the work that DeepMind did with GATO and others that show that autoregression of these models and their latent spaces becomes really, really interesting. So, what if the route to AGI is not one big model to rule them all, trained on the whole internet and then narrowed down to human preferences, but instead millions of models that reflect the diversity of humanity that are then brought together? I think that is an interesting way to kind of look at it, 'cause that will also more likely to be a human-aligned AGI rather than trying to make this massive elder god of weirdness (laughs) bow to your will, you know? Uh, which is what it feels like at the moment.
2. SGSarah Guo
  So we're gonna have a hive elder god instead. You've, you've mentioned, uh, that Stability is still working on language. Uh, the application of diffusion models to image is a really unique breakthrough, and it's not as computationally intensive as, like, the known approaches to language so far. I think you've said that the core training run for the original Stable Diffusion was 150,000 A100 hours, which is like not that huge in the grander scheme of things. What can you tell us about your approach to language?
3. EMEmad Mostaque
  Um, so yeah, so we had the kind of Luther AI, um, side of things and our team there. You know, we released GPT-Neo J and X, which have been downloaded 20 million times. They're the most popular language models in the world. Um, you kind of basically either use GPT-3 or use those. They go up to about 20 billion parameters. And like I said, we've released our TRLX from the CARPA Lab, which is the Instruct framework. We're training, you know, multiple models in the up to 100 billion parameters now. I don't think you need more, uh, Chinchilla Optimal, to enable a chat, open ChatGPT equivalent, you know, enable an open Claude equivalent. I think that will be an amazing foundation from which to train sector specific and other models that then again can be autoregressed. And there will be very interesting things around that. Language, it requires more, um, not necessarily because of the approach and diffusion breakthroughs. Like, uh, recently Google had their newspaper where they showed a transformer actually can replace the VAE. (laughs) Um, so you don't necessarily need diffusion for great images. Um, it's more because language is semantically dense, I think, versus images. And there's a lot more accuracy that's required for these things. Um, that, I think there are various breakthroughs that can occur. Like, we have an attention-free transformer model basis in our RWKV that we've been funding. We've got a 14 billion parameter version of that coming out that has showing amazing kind of progress. But I think that the way to kind of look at this is we haven't gone through the optimization cycle of language yet. So OpenAI, again, amazing work they do. They announced InstructGPT, their 1.3 billion parameter version outperformed 175 billion parameter GPT-3. You look at, um, kind of Flan-T5, the instruct version of, uh, the T5-XXL model from Google, the three billion parameter version outperforms GPT at 175 billion parameters in certain s- cases, you know? These are very interesting results, and it's one of those things that as these things get released, it gets optimized. So like with Stable Diffusion, leave aside the architecture. Day one, 5.6 seconds for an image generation using an A100, now 0.9 seconds. With the additional breakthroughs that are coming through, it'll be 25 frame images a second. That's 100 times speed up over 100 times just from getting it out there and people interested in doing that. I think language models will be similar, and I don't think that you need to have ridiculous scale when you can understand how humans interact with the models and when you can learn from the collective of humanity. So like I said, a very different approach of...... small language models (laughs) or medium ones, versus let's train a trillion parameters or whatever. And I think there will be room for both. I think it will be, use these amazingly packaged services from Microsoft and Google if you just want something out the box or if you need something trained on your own data with privacy and things like that, that may not be as good but may be better for you, use an open source base and work with our partners at SageMaker or whoever else, you know?
4. EGElad Gil
  Can you talk more about that in the context of your business model and, and your approach? You mentioned that you think that some of the areas that Stability, uh, will be focused on is media and then proprietary and regulated data sets, so. And if there's things you can share right now in that area, if not, no worries, but if you kind of be interesting to learn more about, you know, how you view the business evolving.
5. EMEmad Mostaque
  Sure. So, like, now we're training on hundreds and soon thousands of Bollywood movies to create Bollywood video models (laughs) with our partnership with Eros, um, you know, and that is exclusively licensed. We'll have audio models coming as well, uh, so you have
31:38 – 37:44
Stability.AI’s business model
1. EMEmad Mostaque
  a A.R. Rahman model or whatever. Um, you know, we're talking to various other entities as well, and this is why we have the partnership with, um, Amazon and SageMaker, so there'll be additional services that can train models for you on behalf of most people. Our focus is on the big models for kind of nations, the big models for the largest corporates who will need to train their own models one day, and that's really difficult. There's only like 100 people who can train models in the world. Like, it's not really a science, it's more an art. Like, uh, losses explode all over the place when you try to do something. And so we're going to make it easy for them, and we're going to be inside the data centers training their own models that they control and our open source models then become the benchmark models for everyone. Like, again, we have access to the neural engine, dedicated teams at Intel and others kind of working on optimizing these. That is the model. The framework and the open model is optimized, and then we take and create private models, and, again, I think that's complementary to the APIs and other things you will see from Microsoft, Google, et cetera, because, yeah, you would want both.
2. EGElad Gil
  Yeah. Some of the other areas that you've talked about, I think, in interesting ways is about how AI can be used to make our democracy more direct and digital, a little bit more about, um, you know, broader global impact. Could you, could you extrapolate a bit more there?
3. EMEmad Mostaque
  Yeah. So I think, you know, if y- you look- you have to look at intelligence augmentation, right? Like, information theory in classical Shannon ways, information is valuable inasmuch as it changes the state, and we've obviously seen political information become more and more influenced by, like, manipulation of stories and things like that, so the divide is being grown. What if we could create an AI that could translate between various things, make things easier to understand, and make people more informed? I think that's- would be ideal with some of these national and public models and interfaces being provided to people, and then that can be very positive for democracy and allowing people to really understand the needs. Like, you can already with ChatGPT when you train it on nature of yourself, it can summarize for you a perspective. You know? That's amazing thing, right? You can tell it to talk like a five-year-old or a six-year-old or an eight-year-old or a 10-year-old. Once it starts understanding Sarah and Elad, that will be even better. And again, you don't need s- open students to do that. The OpenAI embeddings API is fantastic. But I think there'll be more and more of these services that allow there to be that filter layer between us and this mass of information on the internet. That will be amazing. I think if we build the education systems and other things correctly as well, this Young Ladies Illustrated primer that we're gonna give to all the kids in Africa and beyond, like, again, let's really blue sky think. How can we get people engaged with their communities and societies because it will be a full open source stack, not only education and healthcare and beyond. That's super exciting. I think, again, that's the future of how we come together because you want to come together to form a human colossus, like in the Wait But Why style, where you get shit done. Pardon my language. And I think this is one of the best ways for us to do that, leveraging these technologies.
4. SGSarah Guo
  It's okay. We don't have commercial sponsors.
5. EMEmad Mostaque
  (laughs)
6. EGElad Gil
  (laughs) There's actually a book called Lady of Mazes that's a AGI-centric book from, like, 10 years ago and basically the idea is sort of what you mentioned, where as, uh, a- different AGIs gain models of how a subset of the population thinks about certain issues to substantiate into a virtual person who's basically representing them in some House of Representatives equivalent, so you don't actually have to vote. The AGI just kind of synthesizes group opinions and then turns it into representatives.
7. EMEmad Mostaque
  Yeah, and you have to think about, you know, with the advances like, uh, Meta's amazing work on Cicero, for example, you know, beating humans on diplomacy. They used eight different language models combined. And again, I think this is the future, not just zero shot. Multiple models interacting with each other is the way, full stop. Like, any type of... The, the issue and mechanism designs perspective of kind of the game theory of our current economy is that there is no central organizing factor that we trust. Like, what is the trust in Congress? Like, I think they trust Congress less than cockroaches. No offense to Congress. Please don't bring me up. Like, it's just a poll, right? Um, people will err towards trusting machines, as it were, and machines are capable of load balancing. Now they're capable of load balancing facts and things, and so we have to be super careful as we integrate these things what that looks like because they will make more and more decisions for us. Um, that could be for our benefit. You know, like I said, as you said, having AI that speaks on our behalf and amalgamates. But then we need to make sure that these aren't too frail and fragile as we cede more and more of our own personal authority to them because they are optimizing. This is also one of the dangers on the alignment side. Like, you know, as we introduce RLHF into some of these large models, there are very weird instances of mode collapse and out of sample things. Um, I do say these large models as well should be viewed as fiction creative models, not fact models, because otherwise we've created the most efficient compression in the world. Does it make sense you can take terabytes of data and compress it down to a few gigabytes with no loss? No, of course you lose something. You lose the factualness of them, but you keep the principle-based analysis of them. So we have to be very clear about what these models can and can't do, because I think we will cede more and more of our authority individually and as a society to the coordinators of these models.
8. EGElad Gil
  Could you talk more about that in the context of safety? Because ultimately one of the concerns that's sort of increased in the AI community is AI safety and there's sort of three or four components of that. There's alignment, you know, will bots kill humans or h- whatever form you wanna put it in. There's um-
9. SGSarah Guo
  They'll farm us, not kill us, but-
10. EMEmad Mostaque
  Yes.
11. EGElad Gil
  Farm us, yes, good point.
12. EMEmad Mostaque
  Yes.
13. EGElad Gil
  They'll- they'll just build a giant RLHF farm on top of us or something. Um, there's the concern around certain types of content, pedophilia, et cetera, that, you know, um, people don't wanna have exist in society for all sorts of positive reasons. (laughs) Uh, there's politics. You know, there, there's concerns, for example, that, um, AI may become the next big battleground after social media in terms of political viewpoints being represented in these models with the claims that they're not political viewpoints. And so I'm sort of curious how you think about AI safety more broadly, particularly, eh, when you talk about trust of models, to your point part of it is fact versus fiction, but part of it may also be well, it looks like it's political and so therefore maybe I can't trust it at all.
14. EMEmad Mostaque
  Yeah, I don't think
37:44 – 45:02
AI Safety
1. EMEmad Mostaque
  technology is neutral so I'm not one of the people that adhere to that, especially with the way we build it. It does reflect the biases and other things that we have in there. I did kind of follow the open source thing 'cause I think we can adjust that. On the alignment side, you know, it was interesting, Eleuther basically split into two. Part of it is Stability and the people who work here on capabilities. The other part is Conjecture, that does specific work on alignment, um, and they're also based here in London. And I think it's not easy, right? (laughs) I think that everyone is ramping up at the same time and we don't really understand how this technology works but we're doing our best, you know? You have people like kind of Riley Goodside and others prompt whisperers who are like, "Wait, like, what on earth? It can do all these kind of things." Um, I think that there needs to be more formalized work and I actually think there needs to be regulation around this, um, because we are dealing with an unknown unknown. And I don't think we're doing good enough, kind of, tying things together, particularly as we stack more layers and we get bigger and bigger and bigger. I think small models are less dangerous but then the combination of them may not be safe.
2. SGSarah Guo
  You've mentioned before, like, this, um, support for the idea of regulation of large models. What would be a productive outcome of that regulation, that you can imagine?
3. EMEmad Mostaque
  I think that a productive outcome of that regulation is anything above a certain level of flops needs to be registered similar to, well, bioweapons and things that have the potential for dual use. And I think there needs to be a dedicated international team of people who can actually understand and put in place the regulations on how do we test these models for things. Like, you know, the amazing work Anthropic recently did with constitutional models and other things like that. We need to start pooling this knowledge as opposed to keeping it secret, but there is this game theoretic thing of, one of the optimal ways to stop AGI happening is to build your own AGI first. (laughs) And so I'm not sure if that'll ever happen, but we're in a bit of a bind right now which means that everyone's having their own arms race. When governments decide, and I don't believe they've decided yet, that having an AGI is the number one thing, tens of billions, hundreds of billions will go into building bigger models. And again, this is very dangerous, I think, from a variety of different perspectives. So I prefer multilateral action right now as opposed to in the future. So I've- I've put that out there but I can't really drive it. I'm already dying from all the other work as it is, but I do believe that should be the case. Um, I think going onto kind of the next one, as you said the political biases and things like that, we can use this as filters in various ways. Um, and I think one of the interesting things, and the other thing I've called for regulation of, uh, maybe I should do it a bit more loudly, is you have a lot of companies that have ads as one of their key elements, right? And ads are basically manipulation and these models are really convincing. They can write really great prose. Uh, my sister-in-law creates a company, Synantic, they can do human, realistic, emotional voices. She did, like, Val Kilmer's voice for his documentary and stuff like that before selling to Spotify. It's gonna be crazy the types of ads that you see and we need to have regulation about those soon because you're gonna see Meta and Google and others trying to optimize for engagement and, like, manipulation, fundamentally. And I think that those can then be co-opted by various other parties as well on the political spectrum so we need to start building some sort of protections around that. Um, what was the final one, sorry, Elad?
4. EGElad Gil
  Oh, uh, I was just asking about, uh... And I think Sara asked the question around, um, you know, where do you think regulations should be applied and what are the- what would be positive outcomes of that versus negative outcomes?
5. EMEmad Mostaque
  Yeah, so I think, you know, there should be, um, these elements around identification of AI, especially on advertising. I think that there should be regulation on very large models, in particular. Um, European Union introducing a CE mark/uh, generative AI restrictions where the creators are responsible (laughs) for the outputs I think is the wrong way, but there are other ones as well. Like, I would call for opt-out mechanisms and I think we're the only ones building those for data sets, um, 'cause we're also building some of these data sets and trying to figure out attribution mechanisms for opt-in as well on the other side. Like, right now, the only thing that is really kind of checked is robots.txt which is a kind of thing on Scraping, but I think, again, it's evolving so fast that people might be okay with Scraping but they might not be okay with this. Legally it's fine but then I think we should make this more and more inclusive as things go forward.
6. EGElad Gil
  So that's, for example, if an artist doesn't want their work represented in a corpus that a machine is trained on, for example?
7. EMEmad Mostaque
  Yes, and it's difficult. It isn't just a case of, you know, don't look at DeviantArt on my website. Like, what if your picture is on the BBC or CNN with a label? It will pick that up, you know? So it's a lot harder. This is why, like, we trained our own open CLIP model. We have the new CLIP-G landing this week. That's even better on 0-shots, I think 80%. Um, because we needed to know what data was on the generative and the guidance side so that we could start offering opt-out and opt-in and these other things.
8. EGElad Gil
  Yeah. And then, I guess, uh, one other area that people often talk about safety is more around defense applications and the ethics of using some of these models in the context of, uh, defense or offense from a national perspective.What's your view on that?
9. EMEmad Mostaque
  I think the bad guys, you know, put the end quotes, uh, have access to these models already in thousands and thousands of A100s. I think you have to start building defense, but it's a very difficult one. Like, uh, we were gonna do a $200,000 deepfake detector prize, but then it was pointed out quite reasonably that if you create a prize for a detector, that creates, well, a balancing (laughs) effect. Where you have a generator and a detector and they bounce off each other and you just get better and better and better. So now we're trying to rethink that. Maybe we'll offer a prize for the best suggestion of how to kind of do this. Similar to, you know, ChatGPT is detectable, but not really. Um, so I think the defense implications of this is largely around kind of misinformation, disinformation. This is an area that I have advised multiple governments on with my work on counter-extremism and others. It's a very difficult one to unpick, but I think one of the key things here is having attribution-based mechanisms and other things for curation. 'Cause our networks are curated. And so this is where we've teamed up with, like, Adobe on content authori- authenticity.org and others. And I think that metadata element is probably the winning one here. But we have to standardize as quickly as possible around trusted sources. And I think people already don't believe what they see though, (laughs) um, which is a good thing and a bad thing. We wanna have those trusted coordinators around this. Uh, beyond that, in some of the more severe kind of things, around drones and slaughterbots and things like that, I, uh, I don't know how to stop that, unfortunately. Um, and I think that's a very complicated thing, but we need an international compact on that, because again, this technology is incredibly dangerous when used in those areas. And I don't think there's been enough discussion at the highest levels on this, given the pace of adoption right now.
10. SGSarah Guo
  I think that's all we have time for today, so one last important question for you. What controversial prediction, uh, you seem like an optimist, but, uh, good or bad about AI do you have over the next five years?
11. EMEmad Mostaque
  Um, I think that small models will outperform large models massively, like I said, the hive model aspect. And you will see ChatGPT-level models running on the edge on smartphones in five years, which'll be crazy. (laughs)
12. EGElad Gil
  Great. Thanks so much for joining us. Amazing conversation as usual.
13. EMEmad Mostaque
  My pleasure. (instrumental music plays)

Episode duration: 45:02

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode C6Zpjij5AF4

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome