Skip to content
No PriorsNo Priors

No Priors Ep. 7 | With Stanford Professor Dr. Percy Liang

When AI research is evolving at warp speed and takes significant capital and compute power, what is the role of academia? Dr. Percy Liang – Stanford computer science professor and director of the Stanford Center for Research on Foundation Models talks about training costs, distributed infrastructure, model evaluation, alignment, and societal impact. Sarah Guo and Elad Gil join Percy at his office to discuss the evolution of research in NLP, why AI developers should aim for superhuman levels of performance, the goals of the Center for Research on Foundation Models, and Together, a decentralized cloud for artificial intelligence. 00:00 - Introduction 01:44 - How Percy got into machine learning research and started the Center for Research and Foundation Models at Stanford 07:23 - The role of academia and academia’s competitive advantages 13:30 - Research on natural language processing and computational semantics 27:20 - Smaller scale architectures that are competitive with transformers 35:08 - Helm, holistic evaluation of language models, a project with the the goal is to evaluate language models 42:13 - Together, a decentralized cloud for artificial intelligence

Sarah GuohostDr. Percy LiangguestElad Gilhost
Apr 25, 202353mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:001:44

    Introduction

    1. SG

      (music plays) Thanks, Percy.

    2. PL

      Great, welcome.

    3. SG

      Um, so I think just to start, can you tell us a little bit about how you got into, uh, the machine learning research field and your personal background?

    4. PL

      Yeah. So I've been in the field of machine learning and natural language processing for over 20 years. I started getting into it in undergrad. I was an undergrad at MIT. I liked theory. I had a fascination with languages. I was fascinated by how humans could just, um, be exposed to just strings of, uh, text, uh, I mean, uh, speech, and somehow acquire very sophisticated understanding of the world and also syntax, and learn that in a fairly, uh, unsupervised way. And my dream was to get computers to do the, the same, so then I went to grad school, uh, at Berkeley, and then after that started at, uh, Stanford. And ever since, I've been in pursuit of, uh, you know, developing, uh, systems that can really truly understand natural language. Um, and of course, in the last four years, um, this once upon a time kind of dream has really kind of taken off in a s- in a sense. Um, maybe in a, not a way that I would necessarily ex- expect, uh, but with the coming out of, uh, large language models such as GPT-3, it's truly kind of astonishing how much of the structure of language and the world that these models can, can capture. Um, in some ways, it kind of hearkens back when I actually first started in NLP. I was, uh, training language models, but of a very different type. It was based on, uh, Hidden Markov Models. And there the goal was to discover

  2. 1:447:23

    How Percy got into machine learning research and started the Center for Research and Foundation Models at Stanford

    1. PL

      hidden structure in, in text, and we were... I was very excited by the fact that it could learn about, um, tease apart what words were like city names versus days of the week and so on. Um, but now it's kind of on a d- completely different l- level.

    2. SG

      Was there a moment, since y- you know, you've worked on multiple generations of NLP at this point, you know, pushing the forefront of semantic parsing, was there a moment at which you, um, decided that, you know, you were gonna focus on foundation models and large language models?

    3. PL

      Yeah. There was a very decisive moment, and that moment was when GPT-3 came out.

    4. SG

      Okay.

    5. PL

      That was in the middle of the pandemic. Um, and it wasn't so much the capabilities of the model that, um, shocked me, but it was the way that the model was trained, which was basically taking a massive amount of text and asking the model to predict the next word over and over again, you know, billions of times. And just that simple, uh, objective and a very simple principle, what r- rose from it was not only a model that could generate fluent text, but also a model that could do in-context learning, which means that you can prompt a language model with instructions, uh, for example, summarize this document, give it some examples, and have the model on the fly in context figure out what the task was. And this was a paradigm shift in, in my opinion because it changed the way that we conceptualize machine learning and NLP systems from these bespoke systems where you're, it's trained to do question answering, to train to do this, to just a general, um, substrate where you can ask the model to do various things. And then the idea of a task which is so central to AI I think s- begins to dissolve, and I find that extremely exciting. Um, and that's the reason later, um, in 2021, uh, we founded the Center for Research on Foundation Models. We coined the term foundation models because we thought it, there was something that was happening in the world that was, that somehow large language models didn't really capture the significance. And it was not just about language, it was about images and multimodality, it was a more general phenomenon, and we coined the term foundation models and then, um, then the center started and it's been sort of, you know, a kind of a roller coaster ride, uh, ever since.

    6. EG

      We're gonna be talking, I think, about both, um, your experiences in research and academia, and then we'll also separately be talking about Together, which is a company you're involved with now. Um, could you tell us a little bit more about what the center does and what you're focused on?

    7. PL

      Yeah. So the Center for Research on Foundation Models, uh, started two years ago, is under the Human-Centered AI Institute at, uh, Stanford, and the main mission of the center is, I would say, to increase transparency and accessibility to foundation models. So foundation models are f- um, becoming more and more ubiquitous, but at the same time one thing we have noticed is the lack of transparency and accessibility of these models. So if you think about the last decade of deep learning, it has profited a lot from having a culture of openness with tools like PyTorch or TensorFlow, datasets that are open, people publishing openly about, uh, paper, uh, about the research, and this has led to a lot of community and, uh, and progress, uh, not just in academia but also in industry with different startups and hobbyists and whoever just getting involved. And what we're seeing now is sort of a retreat of that, uh, open culture where models are now, uh, being only accessible via APIs. We don't really know all the secret sauce that's going behind them and there's sort of limited access.

    8. SG

      What's your diagnosis of why that's happening?

    9. PL

      I think that, uh, this is very natural, um, because these models take a lot of, um, you know, capital to, to train. They're a enormous amount of, um, you know, can generate a lot of value and it's a competitive advantage, so, you know, incentives are to, to keep these under control. Um, there's also another factor, uh...... which is safety reasons. I think these models are extremely powerful. And, um, maybe the models right now, I think, are, well, if they were out and open, it would be maybe okay. But in the future, these models could be extremely good, and having them, you know, anyone, anything goes-

    10. SG

      Running amok.

    11. PL

      ... might, uh, we might have to think about that a little bit more carefully.

    12. EG

      How do you think all this evolves in terms of... If you look at the history of M- ML or NLP or AI, we've had these waves of innovation in academia, and then we've had waves of innovation and implementation in industry. And it's, in some cases, we've had both happening simultaneously, but it feels a little bit like it's ping-ponged over time in different ways. Um, now that people are starting to be more closed in terms of some of these models on the industry side and publishing less and being less open, how do you view the role of academia and industry diverging, if at all? Like, do you think it'll be different types of research that-

    13. PL

      Yeah.

    14. EG

      ... each type of institution tackles? Do you think there'll be overlap? I'm sort of curious how you, how you view all that-

    15. PL

      Yeah.

    16. EG

      ... evolving.

    17. PL

      I mean, I think industry and academia have very distinctive and important functions. And I always, when I tell my students, "Well, we should be working on things that are, lean on academia's competitive advantage." And historically, I think this has meant different things. So before ML was that big, I think a lot of academic research was really about developing the tools to make these models work at all.

    18. SG

      Mm-hmm.

    19. PL

      I remember working on systems and it being, uh, ML,

  3. 7:2313:30

    The role of academia and academia’s competitive advantages

    1. PL

      uh, models back in grad school, and basically it wasn't working.

    2. SG

      (laughs)

    3. EG

      Yeah.

    4. PL

      I mean, the computer wasn't working-

    5. EG

      Yeah, yeah, yeah.

    6. PL

      ... vision wasn't working, question answering wasn't working. And, and I think the goal of academia there was to make things work. And, and a lot of the advances, uh, that, uh, were born out of academia then influenced other ideas and influenced other ideas before it started clicking, and now we're seeing this, uh, a lot of the fruits of both academia and industry, uh, as research fueling this kind of industry drive that you, you see, uh, today. And now today, I think, uh, it's, the dynamic is, is quite different because it's no longer academia's job, isn't just to get things to work because, um, you can do that in other ways. There's a lot of resources going into, um, big, uh, tech companies where there's, uh, if you have data and compute, you can just sort of-

    7. SG

      Scale. (laughs)

    8. PL

      ... (laughs) Scale and blast through a lot of barriers. And I think a lot of the role of academia is understanding because these models, for all their, um, impressive feats, we just don't understand what they work, how they work, um, what the principles are, um, you know, what's, how does this training data, how does the model architecture affect the different behaviors? Um, what is, uh, the best way to weigh data? How do you, what's the training objective? Many of these questions benefit from a more rigorous, you know, analysis. Um, the other piece, which is a different type of understanding, is understanding social impact. And this is going back to the question about what a CRFM's role is. Uh, CRFM is, uh, is a center with over 30 different faculty across 10 different departments at Stanford, so it's quite interdisciplinary. So we're looking at foundation models, not just from a technical perspective of how do you get these models to work, but also thinking about the, their economic impact, the challenges when it comes to, uh, uh, copyright and legality. We're working on a paper that explores some of those questions. We're looking at, you know, different questions of, you know, s-, uh, social biases and thinking through carefully how the impact of these models have on issues of, you know, homogenization, where you have a central model that's making perhaps, uh, decisions, uh, for, um, a single user across all the different aspects. And this, so some of these are the types of questions. There are also people at the center looking at, uh, risks of disinformation, monitoring to what extent these, uh, these tools are so persuasive, which they are getting increasingly so, and what are the actual risks, uh, when it comes to, let's say, foreign, um, uh, state actors leveraging this technology? And there's also people at the, um, the center who are in medicine, and we're exploring ways of leveraging foundation models and, uh, deployment in actual, um, you know, clinical practice. Um, and that's very exciting because that's, again, um, something that's we benefit from having a hospital attached to, to, to Stanford. Uh-

    9. EG

      How near-term do you think some of those deployments are? Because I, if you go back into the '70s, there was like the MYSCIN project here at Stanford, which was an expert system that outperformed Stanford Medical School staff at, um, predicting what infectious disease somebody had, for example. And that was 50 years ago, or almost 50 years ago, and it never really got implemented in the real world. And so one, one of my concerns sometimes in terms of the impact of some of these things is, are there industries that are resistant to adoption or resistance to change? And it's exciting to hear that, you know, at Stanford, they're, they're actually starting to look at, how do you actually integrate these things into real clinical care? Do you view those things as very far out on the healthcare side? Do you view them as sort of nearer? I know that isn't the main topic we're gonna cover, but I'm just a little bit curious given how close you are to all this.

    10. PL

      Yeah, I think it's a, it's a good question. Um, I think there are a bunch of different issues that need to be resolved. Uh, for example, foundation models, uh, are trained on a lot of data. How do you deal with privacy? How do you deal with robustness? Um, because once you're talking about, you know, in the healthcare space es- uh, especially, there are cases where we know that these models can still hallucinate, uh, facts and sound very confident in doing so. And how do you-

    11. EG

      I know some doctors like that too.

    12. PL

      (laughs)

    13. SG

      (laughs)

    14. PL

      So, uh... (laughs) Yeah, there you go. So, um-

    15. SG

      But you've also taken a point of view that we should, you know, expect superhuman, if we, if we see superhuman performance from these models, like, holding them to the standard of a human doctor is actually insufficient as well, right?

    16. PL

      Yeah, I, I think that's a, that's a great point, is that...... for ages, human level has been the target for, for AI. And that has really been kind of a North Star that has, uh, fueled many dreams and efforts and so on over the decades. But I think we're getting to a point where along ma- many axes, it's, it's, uh, superhuman or should be superhuman. And, and I think we should maybe define more of an objective measure of, like, what we will actually want. We want something that's very reliable, is grounded. Uh, you know, I often want more statistical evidence when I, you know (laughs) , speak to doctors and sometimes fail to get that, and have something that would be sort of a lot more principled and, and rational. And, and so this is more of a general statement about h- how we should think about technology, not just chasing after mimicking a human, because we already have a lot of humans and we don't really need-

    17. EG

      Yeah, no, that's an interesting point because, um, if you're pushing a lot of metrics around what actually works from an adoption perspective, that's an area that certain aspects of healthcare work extremely well at and then certain areas are still deficient in. And so it'll be interesting to see how you have to change certain aspects of culture in order to be able to measure, when you adopt a new technology, its impact in, in that specific area. So I think it's, it's really fascinating to watch all this evolve right now. Now, you've done extensive research on natural language processing and computational semantics. Can you explain what those terms mean and,

  4. 13:3027:20

    Research on natural language processing and computational semantics

    1. EG

      um, how they're relevant to the development of AI?

    2. PL

      So computational semantics, um, is the process where you take language, text, and compute, quote unquote, "meaning" from it, and that is something I'm not gonna (laughs) , you know, maybe try to, uh, attempt to, um, define. There's a huge, uh, literature of linguistics and, you know, philosophy about what, what meaning is. I will say that a lot of my research in, in the past, maybe 10... uh, five to 10 years ago, was adopting this view that, um, language is a programming language. It computes. You can give orders. You can instruct. You can do things with language, and therefore it was natural to model natural language as a formal language. So a lot of semantic parsing is about mapping natural language into a, uh, you know, formal space so that machines could execute this. And so one concrete application of this that I worked on for a while is mapping natural language questions into essentially SQL queries, which obviously has many different applications as well. Um, and what was, uh, nice about this framework is that you will really sorta understand... To really do this, you had to understand how the words contributed to different parts of the SQL query, and then you could get something that was a program that you could execute and you deliver the results, as opposed to many question answering systems which you ask a question, maybe retrieve some document. You're retrieving the answer or either that or making something up rather than computing it rigorously. So, so that was a paradigm I, uh, was working in, um, maybe five or 10 years ago. But the main problem is that the world isn't a database. A small part of the world is a database, but most of the world is unstructured. And then, um, I started thinking about question answering in general, and we developed the, the SQuAD question answering benchmark to fuel, uh, progress in open domain question answering. And that, in turn, and many other data sets that were developed, uh, eith- both at Stanford and elsewhere, I think led to the, the development of these powerful language models that then... like BERT and RoBERTa, um, BERT and ELMo, uh, back in, uh, well, 2018 to then- Many years ago. ... many years ago- Yeah. (laughs) ... ancient history now, um, uh, to more, like, 2020 generation of, you know, uh, these large, uh, foundation models. There are cases where you want to just map natural language into, say, people call it tool use. Like, you ask some, some question that requires calculation, you should just use a calculator rather than try to sorta, quote unquote, "do it in the transformer's head." Um, but there's also a lot of aspects of reasoning which are not quite formal. We do this all the time, and a lot of that happens kind of natively in the, in the language model. And I think it's still an interesting question how to kinda marry the two. I feel like the two are still jammed together in a way, where, um... and maybe it's natural because there's certain things you can do in your head, so certain things you can, uh, invoke a tool to use. But this has been also one of the, the classic, uh, debates in AI is neural versus symbolic. For a while, symbolic AI was dominant, now neural AI has come, uh, really taken off and become dominant. But some of those central problems of how do you do planning, how do you do reasoning, which was the s- focus in study of, um, symbolic AI, are now again really relevant because now we've moved past just simple classification and just entity extraction, but now more to, to more ambitious tasks.

    3. EG

      Mm-hmm. What do you... what do you think are some of the more interesting research programs right now in that area?

    4. PL

      I think that it's, it's interesting to, uh, remark on what's happening because to a first order of approximation, larger models trained on the relevant data seem to do well on various benchmarks. I think that maybe there isn't enough emphasis on data efficiency and how quickly you can get and how robustly you can get to these points. Because we know, uh... it has been well-documented that benchmarks can be gameable, so even though you do well on a benchmark doesn't mean you've necessarily solved, um, the problem. So I think one has to be a little bit cautious about that.... so obviously, scale and more data is just one clear direction. But in terms of orthogonal directions, what are the, the methods? Several things have to happen. One is, uh, we have to have ability to handle greater context lengths. If you think about a long reasoning chain, you know, transformers are fixed, um, and there's ways to extend it, but fundamentally, it's sort of a fixed model. Um, there's... Let's say advanced problem-solving. For example, if you wanna solve to, um, uh, a math problem, you wanna prove something. The language model generates sort of, thinks out, uh, does chain of thought and generates token by token, and then it generates something. But we know that humans, when they solve a problem, you try different things, you backtrack. There's... It's much more-

    5. SG

      Iterative, yeah.

    6. PL

      ... flexible, iterative, um, and it can last a lot longer-

    7. SG

      Mm-hmm.

    8. PL

      ... than if you're going for a, a few iterations. And h- what is the architecture that can handle that level of complexity I think is still, um, an outstanding question.

    9. EG

      Mm-hmm. Is there any aspects of, um, foundation or large language models that are emergent that you didn't anticipate or that really surprised you?

    10. PL

      I think g- going back to GPT-3, I think in-context learning is something that, um, surprised many people, incl- including me. So, here you're prompting a language model with an instruction and input-output pairs. Um, you know, here's, uh, a sentence, it's classified positive. Here's a sentence, classified negative. Um, and the model's somehow able to latch onto these examples and sort of figure out what you're trying to do, um, and solve the task. And this is really intriguing because it's, it's emergent. It wasn't hand-coded by the designers to, "Oh, I want to do in-context learning this way." Now, of course, you could have done that, but I think the, the real sort of magic is you didn't have to do that and yet it still does something. It's not completely reliable, but, but it's, it sort of can get better with, um, with better models and, you know, better data. Then there's chain of thought that sort of emerges from, at a certain scale, um-

    11. SG

      Do you wanna explain what that is?

    12. PL

      So, the idea is if I have a question that's, um, present to a language model, the language model could just answer, and it'll maybe get it right or wrong. But if you ask a language model to generate an explanation, um, of how it would solve the problems, kind of thinking out loud, then it's much more likely to get the answer right. And this is very natural that, um, you know, it would be the case for h- humans as well, but the fact that, again, the chain-of-thought, uh, capability is something that, you know, emerges. The other thing I think is really wild is this... And I think it's maybe a general principle, which is the ability to mix and match. So, you can ask the model to explain the quicksort algorithm in the style of, uh, Shakespeare.

    13. SG

      Mm-hmm.

    14. PL

      And it'll actually construct something that is semantically pretty on point, but also, uh, stylistically, you know, much, much better than what I, m- many people could come up with. Which means that it has learned different concepts of what Shakespeare and what quicksort are, and is able to fuse them. So, if you think about creativity, I think this is sort of an example of creative use. Um, you know, people say that sometimes all language models just memorize because they're so big and trained on clearly a lot of text. But these examples, I think really indicate that there's no way that these language models are just memorizing, because this text just doesn't exist, and you have to have, uh, some creative juice and invent something new. Um, and I, I think just, just to kind of go on, riff on that a little bit, I think the creative aspects of these language models with the potential for scientific discovery or doing research or pushing the boundaries of what we, uh, beyond what humans can do I think is really, really fascinating. Because up until now, again, remember, the, the AI dream tops out at humans, but, but now we can actually go beyond in many, many ways, and I think that unlocks a lot of possibilities.

    15. SG

      Yeah, there are a lot of really interesting examples. I mean, you could actually argue that, uh, like, connecting new concepts in any novel way is creativity, but I love the one that is just discovering, like, new tactics in Go that humans haven't discovered after thousands of years of play, right?

    16. PL

      Yeah, yeah.

    17. SG

      Actually, well, I'll ask if you'll risk making a prediction that is impossible. Um, emergent behaviors of models at the next level of scale, anything you might predict? Emergent capabilities, if we wouldn't have thought chain of, uh, chain of thought or in-context learning would work?

    18. PL

      I, I can give you an example of something I think is emerging, and I can give you an example of a hope, but I don't know if I would call it prediction. So, what we're seeing today is the ability to instruct, uh, a model, um, using natural language to do certain things. You see a lot of this online with ChatGPT and Bing Chat, where you can just... And, and some of Anthropic's work as well. You can instruct a model, uh, to be succinct, um, generate three paragraphs in the style of, and so on. You can lay out these guidelines and have the model actually follow. So, this instruction following ability is getting extremely good. Now, I, I will say that how much is emergent and how much is, uh, not? It's hard to tell, because, um, a lot of these models, it's not just a language model that's trying to predict the next word. Um, there's a lot of, you know, secret sauce that goes under the hood, so...And if you define emergence of, you know, it was not intended by the designers, I don't know how much of that is emergent, but at least it's a capability that I think is very striking. Language models currently mix stuff up. Um, they hallucinate. Um, and this is clearly a, a big problem. Um, and almost a, in some ways, a very difficult problem to crack. The hope is that as models get better, that some of this will actually go away.

    19. EG

      Mm-hmm.

    20. PL

      I don't know if, if that will happen. Um, but, but I think that would be extremely nice because I- I guess the way I think about these, these models is that, um, they're, they're doing some sort of... If you think about predicting the next word, it's, it seems very simple, but you have to really internalize a lot of what is going on in this context. What are the previous words? What's the syntax? What's... who's saying them? And all of that information and context has to get compressed, and then that allows you to predict the next word. And if you're able to do this extremely well, then you sort of have a model of what's happening in the world, at least, uh, s- the world that you've captured in, in text. And so while the notion of truth might be, you know, ambiguous in, in many cases, I think the model can get an idea of what certain, you know, parts of internet are maybe reliable and what parts of the internet are not and what kind of... you know, the idea of having, you know, entities and, you know, dates and locations and w- what activities there are. I- I think that will maybe, uh, become more salient in the model. Like if you think of model, uh, language model that's just fr- um, predicting the next word and it's only trying to do that, and you say, "Elad travel to blank." Of course, it's gonna mix (laughs) you know-

    21. EG

      Yeah, yeah, yeah.

    22. PL

      ... something up without further context, but if it has a better understanding of what's happening and of, of course with more context, then, um, maybe it can use that context to actually know that, well, okay, well, I don't know, maybe I should ask where to begin.

    23. EG

      Yeah, yeah, yeah. So scale is basically increasing the, uh, statistical accuracy of the prediction on the next word because you have more context and more data by which to infer what's coming, and therefore it will reduce hallucinations 'cause you're increasing accuracy.

    24. PL

      Yeah. So I- I think there's pre-training which is, uh, predicting the, the next word and developing a world, um, model so to speak. And with those capabilities, then you can... you still have to say, "Don't hallucinate."

    25. EG

      Yeah.

    26. PL

      But it will be much easier to control that model if it has a notion of what hallucination even is.

    27. EG

      Mm-hmm.

  5. 27:2035:08

    Smaller scale architectures that are competitive with transformers

    1. EG

      There was... um, I was talking to somebody who was close to the development of the, uh, transformer model, and his claim was that one of the reasons it's done so well is to your point around scale, right? Eventually you hit enough scale that you see that it's cl- it's clearly has these really interesting emergent properties, you keep scaling it up and you keep sort of growing it. And so therefore, it's like a self-reinforcing loop to keep using these types of models. And his claim was that, um, it's expensive to do that sort of scale, and so therefore, there may be other architectures or approaches that we've just never scaled up sufficiently in order to actually see if they have the same emergent properties or certain characteristics that may be superior. How do you think about that from the perspective of, you know, just going down the path of the trans- transformer side versus other architectures that may be really interesting-

    2. PL

      Yeah.

    3. EG

      ... and maybe neglected because we just-

    4. PL

      Yeah.

    5. EG

      ... we just haven't thrown enough compute at them 'cause it's expensive.

    6. PL

      Yeah. I really hope that in 10 years we won't be reusing that transformer because I think the transformer is, is... uh, I mean, it's a very good architecture. People have tried to improve it, but it's sort of like kind of good enough for, for people to press ahead. But scientifically there's no reason to believe that this is the one, and there have been some efforts. So one of my colleagues, Chris Re and his students have developed, uh, other architectures which are actually at smaller scales competitive with, with transformers, and actually don't require the central operation of attention. And I would love to see m- much more research exploring other alternatives to transformer. This is something, again, that academia I think is very, uh, well-suited to do because it involves kind of challenging the, the status quo, you're not truly trying to just get it to work and get it out there, but you're trying to reflect on what are the principles, what can we learn from transformers, what is it trying to do and how can we incorporate them in a much more, um, you know, principled way? At some level it's still going to be about compute, right? So scaling laws for LSTM show that if you're able to scale up LSTMs, um, maybe they would work, you know, pretty well as well, but the amount of compute is, you know, many times more and given a fixed compute budget, we're always in a con- compute constrained environment.

    7. EG

      It's an efficient enough architecture to keep trying. Yeah.

    8. PL

      Um, yeah, you would, you would not use an LSTM.

    9. EG

      Yeah.

    10. PL

      A transformer strictly dominates an LSTM from the perspective of, uh, given a con- fixed compute budget. So this question of like what if I could scale the LSTM, it becomes a little bit sort of irrelevant.

    11. EG

      Mm-hmm. So for the things where you see transformer like performance, what sort of compute budget would you need in order to be able to test them out? Is it the scale of a million dollars, $10 million, $100 million of compute? And I know it changes based on compute pricing. I mean-

    12. PL

      Yeah.

    13. EG

      ... just trying to get a rough sense of, you know, how expensive is it to try today and, and then if we extrapolate down a compute curve three years from now, maybe it's tractable again or something. So...

    14. PL

      Yeah, I- it really depends on the, the gaps that you're seeing. Um, right now in academia you can train one billion parameter, you know, models. I mean, it, it's not, it's not cheap by academia standards, but you can, you can do it. And you know, here at CRFM we're training like s- you know, six or seven billion-

    15. EG

      Mm-hmm.

    16. PL

      ... uh, parameter models and I think it's...... enough to, um, be able to try out some ideas. But ultimately, because of emergent properties and the importance of scale, um, you do need to go out farther along the curve to see w- whether you're... you can only make a hypothesis. You can find something like, "Oh, this seems promising at smaller scales," you still have to go out and test whether it's... really pans or the gap just closes. And maybe this is a good segue to th- talk about the compute and the, the, uh, Together. Um, so we founded Together on the, the premise that compute was, is a central bottleneck in foundation models. On the other hand, there's a lot of compute that's decentralized, that's, uh, maybe underutilized, um, or idle, and if we could harness that compute and bring it there to, um, for both, you know, research and also commercial, you know, purposes, then we could actually do a, a, a lot more. There are some, you know, pretty hefty technical challenges around doing that because foundation models are typically trained in, um, very high-end data center environments where the interconnect between devices is extremely good. Um, whereas if you just grab your average desktop, um, or home interconnect, it's, it's, you know, 100 times or more, uh, you know, slower. But, uh, you know, aga- with, uh, you know, Chris Rae and Sajan and others, really, they deserve m- most of the credit for this, um, we've developed, um, some techniques that allow you to leverage this weakly connected compute and actually get, um, you know, pretty interesting training, uh, going. So, so hopefully with that type of infrastructure we can begin to unlock a bit more of compute, both for academic research, um, but also for, you know, other, um, you know, startups and so on.

    17. EG

      That's really cool. So it sounds a little bit like, um, earlier predecessors of this, maybe things like Folding@home where people did protein folding collectively on their computers, or SETI@home where there was search through different astronomical data, and now you can actually do this for training, uh, an AI system, um, on your desktop or, you know, excess compute that exists at data centers or in other places.

    18. PL

      Yeah. So, so Folding@home is, I think, a great, uh, inspiration for a lot of this work. At some point during the middle of the pandemic, they actually had the world's, uh, largest supercomputer in terms of FLOP count, um, because it was used to discover-

    19. EG

      Mm-hmm.

    20. PL

      ... uh, um, do molecular d- dynamics simulations for COVID. Um, the main challenge with foundation models is that there's a lot of big models and big data that needs to be shuffled around, so the task composition is much, much harder, um, so that's why, uh, many of the, the technical things that, uh, that we're doing about scheduling and compression, um, enable, uh, us to, uh, overcome these, uh, hurdles. Um, and then there's the question of incentives. So I think there's two aspects of what Together is building. One is a r- sort of a, what I'll call research computer, which is for academic, you know, research purposes where, um, people can contribute compute, um, and in the, in the process of contributing compute, they're able to, um, use, um, the, sort of the, the decentralized, uh, cloud for doing training, um, when they're not using it, and when they are using it, they can use much more of it. So the hope is that it provides a much more efficient use of the, the compute because you're spreading it across a larger set of people. Um, and then, you know, on the commercial side, um, the hope is that the open models that are, uh, de- developed and, uh, through this, um, in the open source ecosystem can, t- Together platform can allow people to fine-tune and adapt these models to various different, uh, use cases. Um, one thing I think is noteworthy is that, you know, we think of foundation models today as, you know, maybe there's a few foundation models that are, you know, very good and exist, but I think in the future there's

  6. 35:0842:13

    Helm, holistic evaluation of language models, a project with the the goal is to evaluate language models

    1. PL

      gonna be many different ones for different kind of, uh, use cases as this space, uh, takes off. Many of them will be derived from maybe existing foundation models, um, but many of them will also be perhaps trained on... from, from scratch as well.

    2. SG

      I think this is actually a pretty uncommon viewpoint right now. Can you talk a little bit about, like, where you, um, or y- you know, research efforts you're associated with choose to train models. Like, and maybe, um, via PubMed or, or w- whatever else you think is relevant here.

    3. PL

      Oh, okay. So there's... foundation models is a pretty broad, um, category of... and many of the, the... sort of the s- core center is, you know, large language models that are trained on lots of, you know, internet data. Um, we've trained a model here at CRFM, um, in collaboration with M- uh, Mosaic on the b- called BioMed, um, LM. Um, it's not a huge model but it's trained on PubMed articles and it exhibits, um, you know, pretty good, you know, performance, um, on various benchmarks. For a while, uh, you know, we were able to, um, be state-of-the-art on the US medical licensing exam. You know, Google did come out with a model that was, I think, 200 times larger and they, they, they beat that model so, you know, scale does matter but, but I think there are many cases where you... uh, for efficiency reasons, maybe you do want a, a smaller model since cost, I think, is, uh, is, uh, you know, big concern.

    4. SG

      And latency.

    5. PL

      Yeah.... yeah, in the NCS one.

    6. EG

      Have you ever looked into, um, scientific fraud using this? I'm just wondering if effectively you could screen all the papers and see which ones appear to be-

    7. SG

      (laughs)

    8. EG

      ... off relative to the literature or reuse of images, or it just seems like there's some interesting (laughs) things that you could potentially unsurface-

    9. PL

      Yeah.

    10. EG

      ... through the use of-

    11. PL

      I think, yeah-

    12. EG

      ... a model which understands a corpus of this information.

    13. PL

      So, so I think, uh, you know, stepping back, I, I, um, alluded to how these models can be misused, you know, for fraud, spam, disinformation, but also plagiarism.

    14. EG

      Mm-hmm.

    15. PL

      You know, a lot of, uh, students are using ChatGPT to basically do their homework. Um, and, you know, I- I think there are, you know, several things that one can do. So we have-

    16. EG

      Yeah. I was... Sorry to interrupt.

    17. PL

      Yeah.

    18. EG

      I was actually thinking of it the other way. Can you use the model to detect fraud? Um-

    19. PL

      I see.

    20. EG

      Given that you understand the corpus of biomedical information, you should be able to say, "Well, this is inconsistent."

    21. PL

      I see.

    22. EG

      Or, "This is a result that is somehow, um, duplicative or plagiaristic or..."

    23. PL

      Yeah. So, definitely, I think you, you can. Um, you can-

    24. SG

      Well, Vlad's gonna try this tonight.

    25. EG

      Yeah, no, no, I'm actually thinking about it.

    26. PL

      Yeah.

    27. EG

      It sounds really interesting.

    28. PL

      Well, you can, uh, you can review papers then.

    29. EG

      (laughing)

    30. PL

      I mean, I- I- I think that one has to be a little bit careful, um, when, uh, you know, doing these things, um, especially for more consequential, uh-

  7. 42:1353:37

    Together, a decentralized cloud for artificial intelligence

    1. SG

      um, HELM is and what the goal has been?

    2. PL

      Yeah. So HELM stands for Holistic Evaluation of Language Models, which is this project that, uh, happened over the last year. And the goal is to evaluate language models. So, the trouble is that language models is, um, a very generic, um, thing. It's like saying, "Evaluate the internet." Um, what does that, what does that mean? The language model takes text in and text out, and one of the features...... of a language model is that it can be used for a myriad, uh, different, um, applications. Um, and so what we did in that paper is to be as systematically and as rigorous as we could in laying out the different scenarios in which language models could be used, and also measure aspects of the- these uses, um, which include not just accuracy, which a lot of benchmarks focus on, but also issues of how robust it is, how well it's calibrated, meaning that whether does the model know what it doesn't know? Um, whether, um, the models are, um, you know, fair, according to, uh, you know, uh, some definition of- of fairness, whether they're- they're biased, whether they, uh, spew out toxic content, how efficient they are. And then we go and we basically grab every language model that's prominent that we could- could access, which includes open source models like OPT and BLOOM, but also getting access to APIs from Cohere AI21, OpenAI, and, uh, also Anthropic, and, um, you know, Microsoft. So overall, there were 30 different models, 42 scenarios, and seven metrics, and we ran, um, the same evaluations on- on all of- of that. We've put all the results on our- the Helm website, um, so that you could see the top level of statistics and accuracies. But also you can drill down into, hm, on this particular benchmark, what are the instances, what are the predictions that these models are making? Um, all the way down to what prompts are you using for the language models. So the idea here is that we're trying to provide transparency to this space, right? We know that these models are powerful. They have some deficiencies, um, and we're trying to lay that all out in a kind of a scientific, uh, manner. So I'm pretty excited about this project. The challenging thing about this project is since we put out a paper maybe three months ago, um, a bunch of different models have come out, including ChatGPT, LLaMA, you know, uh, uh, Cohere and AI21 have updated their models. Um, GPT-4 might come out at some point. Um, so what had this project has evolved into is this dynamically updating where every two weeks we refresh it with new, um, models that are coming out as well as new, um, scenarios. Because one thing we also realize with, uh, which was made clear by, uh, ChatGPT is that the type of things that we ask of a language model is changing. We don't ask it just to do question and answering or just to do sentiment.

    3. EG

      As they increase in capability, yeah.

    4. PL

      I- increasing capabilities. Now they can do a lot more. They can, you know, write an email or s- um, or, uh, give you, you know, life advice on XYZ if you've put in a scenario and- and/or write a, you know, a- an essay about XYZ. And I think what we need to do with the- the benchmark is also add the scenarios that capture these capabilities as well as kind of new, uh, risks. So we are definitely interested in, um, benchmarking how persuasive are these- these language models which governs, you know, what are the risks that someone is going to be using them to, um... and how- and also how secure they are. One thing I'm actually also w- worried about is given all that- the jailbreaking that is, uh, extremely common with these models where you can get the models to do any- basically bypass safety controls. Um, if these models start interacting with the world and accepting external inputs, now you can not only just sort of jailbreak your own model, but you can jailbreak other people's model and get them to do various things and then so that could lead to sort of a cascade of errors. So, um, some of these are the concerns that we hope to also capture with, uh, the model. Um, I should also mention we're also trying to look at multimodal models which I think is gonna be, um, pretty pertinent. So lots to do, yeah.

    5. EG

      A bunch of the things, um, that you've described as, uh, sort of the role you see for the center or even like academia in the age of foundation models broadly, like they have more of an intersection with policy than traditionally like machine learning research. Like How do you think about that?

    6. PL

      Yeah. Actually we've- I'm glad you asked that because we've been thinking a lot about this- the social implications of these models and sort of the- not the models themselves which we focus a lot on talking about, but the environment in which these models are- are- are built. Um, so I think it's interesting to think about there are few players in the space, um, with different opinions about how models should be built. Um, some are more closed, some are more open. Um, and there's also again this sort of lack of transparency where we have, um, a model that's produced and it's aligned, um, apparently to human values but then once you start kind of questioning you can ask the question, "Okay, well, you know, which- which value, which humans are we talking about? Who determines these values? What legitimacy does that have? Um, and what's the sort of accountability?" Then you start noticing that well, a lot of this is just kind of, uh, completely, uh, a black box. So one thing that we've been working at the center on is developing norms, um, starting with transparency. I think transparency is necessary but not sufficient. You need some level of transparency to even have a conversation about any of the- the policy issues. Um, so...... making sure that, uh, the public can understand how these models are, are built. Um, what's, at least some notion of, like, what the data is, what are the instructions that are given to, um, to align the models. Um, we're trying to advocate for greater, you know, transparency there. Um, and I think this will be really important as these models really get deployed at scale and start impacting, um, you know, our lives. Um, you know, a kind of a analogy I like to think about is, you know, nutrition labels or any sort of specification sheets on electronic devices. There's some sort of, uh, obligation, I think, that, um, you know, producers of some products should have to make sure that their product is used properly and has, um, you know, some, uh, balance on it.

    7. EG

      I, I, I guess I'll ask two questions. Um, one is, if people wanted to participate in Together, is there a client they can download and install or use or... How, how can people help support the Together efforts?

    8. PL

      Yeah. So we are developing a client, um, that, uh, will be made available both from the perspective of joining the Together clouds so that you can contribute your compute, but also where we have an API that we're developing so that people can use, um, the, the Together infrastructure to do inference and fine-tuning on models. Um, we are also training some open models so we have this, um, something called Open Chat, uh, Kit that's, um, uh, we're releasing soon. And this is built on top of EleutherAI's, uh, Neo X model, but, um, you know, improved to include various different types of capabilities. Um, it's still a, you should think about it as really a work in progress. What we've tried to do is open it up so that people can, um, play with it, give feedback and have the community improve this, um, together, um, rather than us trying to produce some finished product and putting it out there. Um, this goes back to the point about involving, you know, the spirit of open source and involving the community to build, um, these, uh, foundation models together as opposed to someone unilaterally, uh, building them.

    9. SG

      While we're talking, uh, timelines and predictions that you don't, uh, quite feel comfortable making, how do you think as a rigorous scientist about AGI?

    10. PL

      (inhales deeply) I must say that my opinions about AGI have changed over time. I think that for a while, um, it was, you know, perceived by most of the community as, um, you know-

    11. SG

      Laughable.

    12. PL

      ... just as laughable.

    13. SG

      Yeah.

    14. PL

      I will say that, uh, in the last 10 years, I have been aware of, you know, there's a kind of certain community of, uh, um, who think about AGI and also existential risk and things like that. You know, so I've been in touch with people who think about these. I think I see the world maybe differently. I think of perhaps, um, certainly these are powerful technologies and could have, uh, extreme social consequences and... But there's a lot of more near-term issues. I focus a lot on kind of robustness of ML systems, um, in the last, uh, you know, five years. But, you know, one thing I've learned about foundation models, because of their emergent qualities, I've learned to be very kind of no, um, uh... Open-minded, I would say. I was asking a lot earlier about what no priors, where that comes from, and I think it's a fitting way to think about, um, you know, the world because I think even, you know, everyone, including scientists, of- often get sort of, uh, drawn into a particular worldview and paradigm. And, and I think that, you know, the world is, is changing, both on a technical side but also how we conceive of AI and, you know, maybe even humans at some level. And I think we have to be open-minded to, you know, how that's gonna evolve over the next, uh, few years.

    15. SG

      (instrumental music plays) Awesome. Thanks for doing this conversation with us, Baris.

    16. EG

      Yeah. Thanks for having us.

    17. PL

      Yeah.

    18. SG

      It was great.

    19. PL

      Thank you very much.

Episode duration: 53:37

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode DzIPBGRhOMQ

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome