Skip to content
Lex Fridman PodcastLex Fridman Podcast

Aravind Srinivas: Perplexity CEO on Future of AI, Search & the Internet | Lex Fridman Podcast #434

Arvind Srinivas is CEO of Perplexity, a company that aims to revolutionize how we humans find answers to questions on the Internet. Please support this podcast by checking out our sponsors: - Cloaked: https://cloaked.com/lex and use code LexPod to get 25% off - ShipStation: https://shipstation.com/lex and use code LEX to get 60-day free trial - NetSuite: http://netsuite.com/lex to get free product tour - LMNT: https://drinkLMNT.com/lex to get free sample pack - Shopify: https://shopify.com/lex to get $1 per month trial - BetterHelp: https://betterhelp.com/lex to get 10% off TRANSCRIPT: https://lexfridman.com/aravind-srinivas-transcript EPISODE LINKS: Aravind's X: https://x.com/AravSrinivas Perplexity: https://perplexity.ai/ Perplexity's X: https://x.com/perplexity_ai PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 1:53 - How Perplexity works 9:50 - How Google works 32:17 - Larry Page and Sergey Brin 46:52 - Jeff Bezos 50:20 - Elon Musk 52:38 - Jensen Huang 55:55 - Mark Zuckerberg 57:23 - Yann LeCun 1:04:09 - Breakthroughs in AI 1:20:07 - Curiosity 1:26:24 - $1 trillion dollar question 1:41:14 - Perplexity origin story 1:56:27 - RAG 2:18:45 - 1 million H100 GPUs 2:21:17 - Advice for startups 2:33:54 - Future of search 2:51:31 - Future of AI SOCIAL: - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/lexfridman - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Reddit: https://reddit.com/r/lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

Aravind SrinivasguestLex Fridmanhost
Jun 19, 20243h 2mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:001:53

    Introduction

    1. AS

      Can you have a conversation with an AI where it feels like you talked to Einstein-

    2. LF

      Mm-hmm.

    3. AS

      ... or Feynman? Where you ask them a hard question, they're like, "I don't know," and then after a week, they did a lot of research-

    4. LF

      They disappear and come back, yeah. (laughs)

    5. AS

      And they come back and just blow your mind. If we can achieve that, that amount of inference compute, where it leads to a dramatically better answer as you apply more inference compute, I think that would be the beginning of, like, real reasoning breakthroughs.

    6. LF

      The following is a conversation with Aravind Srinivas, CEO of Perplexity, a company that aims to revolutionize how we humans get answers to questions on the internet. It combines search and large language models, LLMs, in a way that produces answers where every part of the answer has a citation to human-created sources on the web. This significantly reduces LLM hallucinations and makes it much easier and more reliable to use for research, and general curiosity-driven, late-night rabbit hole explorations that I often engage in. I highly recommend you try it out. Aravind was previously a PhD student at Berkeley, where we long ago first met, and an AI researcher at DeepMind, Google, and finally OpenAI as a research scientist. This conversation has a lot of fascinating technical details on state of the art in machine learning, and general innovation in retrieval augmented generation, AKA RAG, chain of thought reasoning, indexing the web, UX design, and much more. This is a Lex Fridman podcast. To support it, please check out our sponsors in the description. And now, dear friends, here's Aravind Srinivas.

  2. 1:539:50

    How Perplexity works

    1. LF

      Perplexity is part search engine, part LLM. So how does it work, and what role does each part of that, the search and the LLM, play in, uh, serving the final result?

    2. AS

      Perplexity is best described as an answer engine. So you ask it a question, you get an answer. Except the difference is, all the answers are backed by sources. This is like how an academic writes a paper. Now, that referencing part, the sourcing part is where the search engine part comes in. So you combine traditional search, extract results relevant to the query the user asked. You read those links, extract the relevant paragraphs, feed it into an LLM, LLM means large language model. And that LLM takes the relevant paragraphs, looks at the query, and comes up with a well-formatted answer with appropriate footnotes to every sentence it says, because it's been instructed to do so. It's been instructed with that one particular instruction of given a bunch of links and paragraphs, write a concise answer for the user with the appropriate citation. So the magic is all of this working together in one single orchestrated product. And that's what we built Perplexity for.

    3. LF

      So it was explicitly instructed to, uh, write like an academic essentially. You, you found a bunch of stuff on the internet, and now you generate something coherent, and, uh, something that humans will appreciate and cite the things you found on the internet in the narrative you create for the human.

    4. AS

      Correct. When I wrote my first paper, uh, the senior people who were working with me on the paper told me this one profound thing, which is that every sentence you write in a paper should be backed with a citation, with a, with a citation from another peer-reviewed paper or an experimental result in your own paper. Anything else that you say in the paper is more like an opinion. Uh, that's, it's a very simple statement, but pretty profound in how much it forces you to say things that are only right. And we took this principle and asked ourselves, "What is the best way to make chat bots accurate," is force it to only say things that it can find on the internet, right? And find from multiple sources. So this kind of came out of a need, rather than, "Oh, let's try this idea." When we started the startup, there were, like, so many questions all of us had, 'cause we were complete noobs. Never built a product before, never built, like, a startup before. Of course, we had worked on, like, a lot of cool engineering and research problems. But doing something from scratch is the ultimate test. And there were, like, lots of questions. You know, what is health insur- Like, the first employee we hired, he came and asked us for health insurance. Normal need. I didn't care. I was like, "Why do I need health insurance if this company dies?" Like, who cares? Um, my other two co-founders had, were married, so they had health insurance through their spouses. But this guy was, like, looking for health insurance. And I didn't even know anything, who are the providers? What is co-insurance or deductible or, like, none of these made any sense to me. And you go to Google, insurance is a category where, like, a major ad spend category. So even if you ask for something, you're not, Google has no incentive to give you clear answers. They want you to click on all these links and read for yourself, because all these insurance providers are bidding to get your attention. So we integrated a Slack bot that just pings GPT 3.5 and answered a question. Now, sounds like problem solved, except we didn't even know whether what it said was correct or not. And in fact, it was saying incorrect things. We were like, "Okay, how do we address this problem?" And we remembered our academic roots. Uh, you know, Denis and myself are both academics. Denis is my co-founder. And we said, "Okay, what is one way we stop ourselves from saying nonsense in a peer-reviewed paper?"... we're always making sure we can cite what it says, what, what we, what we write, every sentence. Now, what if we ask the chatbot to do that? And then we realized, that's literally how Wikipedia works. In Wikipedia, if you a- do a random edit, people expect you to actually have a source for that. And not just any random source, they expect you to make sure that the source is notable. You know, there are so many standards for, like, what counts as notable and not. So, we decided this was worth working on, and it's not just a problem that'll be solved by an smarter model, 'cause there's so many other things to do on the search layer and the sources layer, and making sure, like, how well the answer is formatted and presented to the user. So that's why the product exists.

    5. LF

      Well, there's a lot of questions to ask there, but first, zoom out once again. So fundamentally, it's about search. So you said first, there's a search element.

    6. AS

      Mm-hmm.

    7. LF

      And then there's an storytelling element via LLM, and the, the citation element. But it's about search first.

    8. AS

      Mm-hmm.

    9. LF

      So you think of Perplexity as a search engine.

    10. AS

      Mm-hmm. I think of Perplexity as a knowledge discovery engine, neither a search engine... I mean, of course we call it an answer engine. But everything matters here. Um, the journey doesn't end once you get an answer. In my opinion, the journey begins after you get an answer. You see related questions at the bottom, suggested questions to ask. Why? Because maybe the answer was not good enough, or the answer was good enough, but you probably wanna dig deeper and ask more. And that's why in, in, in the search bar we say, "Where knowledge begins," 'cause there's no end to knowledge. It can only expand and grow. Like, that's the whole concept of The Beginning of Infinity book by David Dosch. You always seek new knowledge. So I see this as sort of a discovery process. You start, you, you know, let's say you literally, whatever you ask me to right now, you could have asked Perplexity too.

    11. LF

      Mm-hmm.

    12. AS

      "Hey, Perplexity, is it a search engine or is it an answer engine, or what is it?" And then, like, you see some questions at the bottom, right?

    13. LF

      We're gonna straight up a- ask this right now.

    14. AS

      I don't know how, I don't know how it's gonna work.

    15. LF

      (laughs) Is, uh, Perplexity a search engine or an answer engine? That's a poorly phrased question. But one of the things I love about Perplexity, the poorly phrased questions will nevertheless lead to interesting directions. Perplexity is primarily described as an answer engine rather than a traditional search engine. Key points showing the difference between answer engine versus search engine. Uh, this is so nice, and it compares Perplexity versus a traditional search engine like Google. So Google provides a l- a list of links to websites. Perplexity focuses on providing direct answers and thes- synthesizing information from various sources. User experience, technological approach, uh, so there's AI integration with Wikipedia-like responses. This is really well done. And then-

    16. AS

      And if you look at the bottom right here.

    17. LF

      Right.

    18. AS

      So you, you were not intending to ask those questions. But they're relevant. Like, can Perplexity replace Google?

    19. LF

      For everyday searches. All right. Let's click on that. By the way, really interesting generation. That task, that step of generating related searches, so the next step of the-

    20. AS

      Yeah.

    21. LF

      ... curiosity journey of expanding your knowledge is really interesting.

    22. AS

      Exactly. So that's what David Dosch says in his book, which is creation of new knowledge starts from the spark of curiosity to seek explanations, and then you find new phenomenon, or you get more depth in whatever knowledge

  3. 9:5032:17

    How Google works

    1. AS

      you already have.

    2. LF

      I really love the steps that the pro search is doing. Compare Perplexity and Google for everyday searches. Step two, evaluate strengths and weaknesses of Perplexity. Evaluate strengths and weaknesses of Google. (laughs) It's like a procedure.

    3. AS

      Yeah.

    4. LF

      Complete. Okay. Answer, Perplexity AI, while impressive, is not yet a full replacement for Google for everyday searches.

    5. AS

      Yes.

    6. LF

      Here are the key points based on the provided sources. Strengths of Perplexity AI, direct answers, AI-powered summaries, focused search, user experience. We can dig into the details of a lot of these. Weaknesses of Perplexity AI, accuracy and speed. Interesting. I don't know if that's accurate.

    7. AS

      Well, Google, Google is faster than Perplexity because you instantly render the links.

    8. LF

      The latency is faster.

    9. AS

      Yeah, it's like you get 200, I think, 300 to 400 milliseconds results.

    10. LF

      Interesting.

    11. AS

      Here it's like, you know, still not, about 1,000 milliseconds here, right?

    12. LF

      For simple navigational queries such as finding a specific website, Google is more efficient and reliable. So if you actually want to get straight-

    13. AS

      Right.

    14. LF

      ... to the source.

    15. AS

      Yeah. You just wanna go to Kayak.

    16. LF

      Yeah.

    17. AS

      Uh, just wanna go fill up a form. Like, you wanna go, like, pay your credit card dues.

    18. LF

      Real time information. Google excels in providing real time information like sports sco- So like, while I think Perplexity is trying to integrate-

    19. AS

      Yeah.

    20. LF

      ... real time, like recent information, put priority on recent information-

    21. AS

      Yeah.

    22. LF

      ... that requires, that's, like, a lot of work to integrate.

    23. AS

      Exactly.

    24. LF

      Yeah.

    25. AS

      'Cause that's not just about throwing an LLM. Uh, you, like when you're asking, oh, like what, what dress should I wear out today in Austin, um, you do, you do wanna get the weather across the time of the day even though you didn't ask for it. And then Google presents this information in, like, cool widgets. Um, and I think that is where this is a very different problem from just building another chatbot. And, and, and the information needs to be presented well. And, and the user intent, like for example, if you ask for the stock price, uh, you might even be interested in looking at the historic stock price even though you never asked for it. You might be interested in today's price. These are the kind of things that, like, you have to build as custom UIs for every query, and why I think this is a hard problem. It's not just like the next generation model will solve the previous generation model's problems here. The next generation model will be smarter. You can do these amazing things like planning, like query, breaking it down to pieces, collecting information, aggregating from sources, using different tools. Those kind of things you can do. You can keep answering harder and harder queries. But there is still a lot of work to do on the product layer.... in terms of how the information is best presented to the user, and how you think backwards from what the user really wanted and might want as a next step, and give it to them before they even ask for it.

    26. LF

      But I don't know how much of that is a UI problem of designing custom UIs for a specific set of questions.

    27. AS

      Mm-hmm.

    28. LF

      I think at the end of the day, Wikipedia-looking, uh, UI is good enough-

    29. AS

      Mm-hmm.

    30. LF

      ... if the raw content that's provided-

  4. 32:1746:52

    Larry Page and Sergey Brin

    1. LF

      interesting game. I read that you looked up to Larry Page and Sergey Brin, and that you can recite passages from In The Plex and, like, uh, that book was very influential to you and How Google Works was influential. So what do you find inspiring about Google, about, uh, those two guys, uh, Larry Page and Sergey Brin, and just all the things they were able to do in the early days of the internet?

    2. AS

      First of all, the number one thing I took away, which not a lot of people talk about this, is, um, they didn't compete with the other search engines by doing the same thing.

    3. LF

      Mm-hmm.

    4. AS

      They flipped it. Like, they said, "Hey, everyone's just focusing on text-based similarity, traditional information extraction and information retrieval, which was not working that great. What if we instead ignore the text? We use the text at a basic level, but we actually look at the link structure and try to extract ranking signal from that instead." I think that was a key insight.

    5. LF

      PageRank was just a genius-

    6. AS

      PageRank, yeah.

    7. LF

      ... flipping of the table. Yeah.

    8. AS

      Exactly. And the fact, I mean, Sergey's magic came, like, he just reduced it to power iteration, right?

    9. LF

      (laughs)

    10. AS

      And Larry's idea was like, the link structure has some valuable signal.

    11. LF

      Mm-hmm.

    12. AS

      So, look, after that, like, they hired a lot of great engineers who came and kind of, like, built more ranking signals from traditional information extraction, that, that made PageRank less important. But the way they got their differentiation from other search engines at the time was through a different ranking signal, um, and the fact that it was inspired from academic citation graphs, which coincidentally was also the inspiration for us in Perplexity. Citations, you know, if you're an academic, you've written papers. We all have Google Scholars. We all, like, at least, you know, first few papers we wrote, we'd go and look at Google Scholar every single day and see if the citations are increasing. That was some dopamine hit from that, right? So papers that got highly cited was, like, usually a good thing, a good signal. And, like, in Perplexity, that's the same thing, too. Like, we, uh, we said, like, "The citation thing is pretty cool, and, like, domains that get cited a lot, there's some ranking signal there, and that can be used to build a new kind of ranking model for the internet, and that is different from the click-based ranking model that Google's building." So, uh, I, I think, like, that's why I admire those guys. They had, like, deep academic grounding, very different from the other founders who are more like undergraduate dropouts trying to do a company. Steve Jobs, Bill Gates, Zuckerberg, th- they all fit in that sort of a mold. Larry and Sergey were the ones who were, like, Stanford PhDs, uh, trying to, like, have this academic roots and yet trying to build a product that people use. Um, and Larry Page just inspired me in many other ways, too. Like, um, when the product started getting users, uh, I, I think instead of focusing on going and building a business team, marketing team, a traditional, how internet businesses worked at the time, he had the contrarian insight to say, "Hey, search is actually gonna be important. So I'm gonna go and hire as many PhDs as possible." And there was this arbitrage that internet bust was happening at the time, and so a lot of PhDs who went and worked at other internet companies were available at, at, at not a great market rate. So, uh, you could spend less, get great talent like Jeff Dean, uh, and, and, like, you know, really focus on building core infrastructure and, like, like, deeply grounded research. And the obsession about latency, that was-

    13. LF

      Mm-hmm.

    14. AS

      ... you take it for granted today, but I don't think that was obvious. I even read that, uh, at the time of launch of Chrome, uh, Larry would test Chrome intentionally on very old versions of Windows on very old laptops and, and complain that the latency is bad. Obviously, you know, the engineers could say, "Yeah, you're testing on some crappy laptop, that's why it's happening." But Larry would say, "Hey, look, it has to work on a crappy laptop, so that on a good laptop, it would work even with the worst internet." So that sort of an insight, I, I, I apply it, like, whenever I'm on a flight, I always tet- test Perplexity on the flight wifi.

    15. LF

      Mm-hmm.

    16. AS

      Because flight wifi usually sucks.... and I wanna make sure the app is fast even on that. And I benchmark it against ChatGPT or, uh, Gemini or any of the other apps and try to make sure that, like, the latency is pretty good.

    17. LF

      It's funny. Uh, I do think it's a gigantic part of a success of a software product is the latency.

    18. AS

      Yeah.

    19. LF

      That story is part of a lot of the great products, like Spotify. That's the story of Spotify. In the early days, figuring out how to stream-

    20. AS

      Yeah.

    21. LF

      ... music-

    22. AS

      Yeah.

    23. LF

      ... with very low latency.

    24. AS

      Exactly.

    25. LF

      That's, uh, that's an engineering challenge. But when it's done right, like obsessively-

    26. AS

      Yeah.

    27. LF

      ... reducing latency, you actually have... There's like a phase shift in the user experience where you're like, "Holy shit." This becomes addicting, and the amount of times you're frustrated goes quickly to zero.

    28. AS

      And e- every detail matters. Like, on the search bar, you could make the user go to the search bar and click to start typing a query, or you could already have the cursor ready and so that they can just start typing.

    29. LF

      Mm-hmm.

    30. AS

      Every minute detail matters. And auto-scroll to the bottom of the answer, instead of them, f- forcing them to scroll.

  5. 46:5250:20

    Jeff Bezos

    1. AS

    2. LF

      So you talked about Larry Page and Sergey Brin. What other entrepreneurs inspires you on your journey in starting the company?

    3. AS

      One thing I've done is, like, take parts from every person. And so it'll almost be like an ensemble algorithm over them.

    4. LF

      Mm-hmm. (laughs)

    5. AS

      Uh, (laughs) so I'd probably keep the answer short and say, like, each person what I took. Um, like with Bezos, I think it's the forcing yourself to have real clarity of thought. Uh, and, uh, I don't really try to write a lot of docs. There's, you know, when y- when you're in a startup, you, you, you have to do more in actions than there's in docs. But at least try to write, like, some strategy doc once in a while just for the purpose of you gaining clarity, not to, like, have the doc shared around and feel like you did some work.

    6. LF

      You're talking about, like, big picture vision, like, in five years kinda, kinda vision? Or even just for smaller things.

    7. AS

      Just even, like, next six months, what, what are we, what are we doing? Why are we doing what we're doing? What is the positioning? And, um, I think also the fact that meetings can be more efficient if you really know what you wanna... What you want out of it. What is the decision to be made, the one, one way door, two way door things? Example, you're trying to hire somebody. Everyone's debating, like, "Compensation's too high. Should we really pay this person this much?" And you're like, "Okay, what's the worst thing that's gonna happen? If this person comes and knocks it out of the door for us," uh, "you wouldn't regret paying them this much. And if it wasn't the case, then it wouldn't have been a good fit and we would part, part ways."

    8. LF

      Mm-hmm.

    9. AS

      It's not that complicated. Don't put all your brain power into, like, trying to optimize for that, like, 20, 30K in cash just because, like, you're not sure. Instead go and put that energy into, like, figuring out harder problems that we need to solve.

    10. LF

      Mm-hmm.

    11. AS

      So I, I... That, that framework of thinking, that clarity of thought, and the, uh...... operational excellence that he had, uh, upda- and, and, you know, this all, your margins, my opportunity, obsession about the customer. Do you know that relentless.com redirects to amazon.com? You want to try it out?

    12. LF

      (laughs) This is the real thing?

    13. AS

      Relentless.com.

    14. LF

      (laughs)

    15. AS

      He owns the domain. Apparently, that was the first name, or like among the first names he had for the company.

    16. LF

      Registered 1994. Wow.

    17. AS

      It shows, right?

    18. LF

      Yeah.

    19. AS

      Uh, one common trait across every successful founder is they were relentless. So that's why I really like this. And obsession about the user, like, you know, there's this whole video on YouTube where like, uh, "Are you an internet company?" And he says, "Internet, shminternet, doesn't matter. What matters is the customer."

    20. LF

      Yeah.

    21. AS

      Like, that's what I say when people ask, "Are you a wrapper or do you build your own model?" Yeah. We do both, but it doesn't matter. What matters is the answer works, the answer is fast, accurate, readable, nice, the product works. And nobody, like if you really want AI to be widespread, where every, uh, person's mom and dad are using it, I think that would only happen when people don't even care what models are running under the hood. So, um,

  6. 50:2052:38

    Elon Musk

    1. AS

      Elon, I've, like taken inspiration a lot for the raw grit. Like, you know, when everyone says it's just so hard to do something, and this guy just ignores them and just still does it, I think that's like extremely hard. Like, like it basically requires doing things through sheer force of will and nothing else. He's like the prime example of it. Uh, distribution, right? Like, hardest thing in any business is distribution. And I read this Walter Isaacson biography of him. He learned the mistakes that like if you rely on others a lot for your distribution, his first company, uh, Zip2, where he tried to build something like a Google Maps, he ended up ma- like, like as in the company ended up making deals with, you know, putting their technology on other people's sites and losing direct relationship with the users. Because that's, uh, good for your business. You how to make some revenue and like, you know, people pay you. But then, uh, in Tesla, he didn't do that. Like, he actually didn't go dealers or anything. He ha- dealt a relationship with the users directly. It's hard. Uh, you know, you might never get the critical mass. But amazingly, he managed to make it happen. So I think that sheer force of will and like real first principles thinking, like no- no work is beneath you. I think, I think that is like very important. Like, I've heard that, um, in autopilot, he has done data annotation himself just to understand how it works. Like, like every detail could be relevant to you to make a good business decision. And, um, he's phenomenal at that.

    2. LF

      And one of the things you do by understanding every detail is you can figure out how to break through difficult bottlenecks and also how to simplify the system.

    3. AS

      Exactly.

    4. LF

      Like when you, when you see, when you see what everybody's actually doing, you're, there's a natural question, if you could see to the first principles of the matter is like, "Why are we doing it this way?"

    5. AS

      Yeah.

    6. LF

      It seems like a lot of bullshit, like annotation.

    7. AS

      Yeah.

    8. LF

      Why are we doing annotation this way? Maybe the user interface is inefficient. Or why are we doing annotation at all?

    9. AS

      Yeah.

    10. LF

      Why, why can't it be s- self-supervised?

    11. AS

      Yeah.

    12. LF

      And you can just a- keep asking that-

    13. AS

      Correct.

    14. LF

      ... why question.

    15. AS

      Yeah.

    16. LF

      Do we have to do it in the way we've always done? Can we do it much simpler?

    17. AS

      Yeah.

  7. 52:3855:55

    Jensen Huang

    1. AS

      And the strain is also visible in like, uh, Jensen.

    2. LF

      Mm-hmm.

    3. AS

      Um, like, like the sort of real obsession in like constantly improving the system, understanding the details. It's common across all of them. And like, you know, I think he has... It's Jensen's pretty famous for like saying, "I, I just don't even do one-on-ones 'cause I want to know simultaneously from all parts of the system, like all... Uh, like I, I just do one is to n and I'm, I have 60 direct reports and I meet all of them together."

    4. LF

      Yeah.

    5. AS

      "And that gets me all the knowledge at once and I can make the dots connect and like it's a lot more efficient." Like, questioning like the conventional wisdom and like trying to do things a different way is very important.

    6. LF

      I think you tweeted a picture of him and said, uh, "This is what winning looks like."

    7. AS

      Yeah.

    8. LF

      Him in that sexy leather jacket.

    9. AS

      This guy just keeps on delivering the next generation that's like, you know, the B100s are gonna be, uh, 30X more efficient on inference-

    10. LF

      Mm-hmm.

    11. AS

      ... compared to the H100s.

    12. LF

      Yeah.

    13. AS

      I can imagine that. Like 30X is not something that you would easily get. Maybe it's not 30X in performance. It doesn't matter. It's still gonna be pretty good. And by the time you match that, that'll be like Rubin.

    14. LF

      Mm-hmm.

    15. AS

      There's always like innovation happening.

    16. LF

      The fascinating thing about him, like all the people that work with him say that he doesn't just have that like two-year plan or whatever. He has like a 10, 20, 30-year plan.

    17. AS

      Oh, really?

    18. LF

      So he's like, he's constantly thinking really far ahead.

    19. AS

      Uh-huh.

    20. LF

      So (laughs) there's probably gonna be that picture of him that you posted every year for the next 30 plus years. Once the singularity happens and NGI is here and, uh, humanity's fundamentally transformed, (laughs) he'll still be there in that leather jacket announcing the next, (laughs) the, the compute that envelops the sun and, and is now running the entirety of, uh, intelligent civilization.

    21. AS

      NVIDIA GPUs are the substrate for intelligence.

    22. LF

      Yeah. They're so low-key about dominating. I mean, they're not low-key, but...

    23. AS

      I met him once and I asked him like, "Uh, how do you, how do you, like handle the success and yet go and, you know, work hard?" And he just said, "'Cause I- I'm actually paranoid about going out of business. Like, e- every day I wake up like, like in sweat thinking about like how, how things are gonna go wrong." Because one thing you gotta understand hardware is you gotta actually... I don't know about the 10, 20-year thing, but you actually do need to plan two years in advance.... because it does take time to fabricate and get the chips back.

    24. LF

      Mm-hmm.

    25. AS

      And, like, you need to have the architecture ready. You- you might make mistakes in the one generation of architecture, and that could set you back by two years. Your competitor might, like, get it right. So there's, like, that sort of drive, the paranoia, obsession about details. You need that.

    26. LF

      Mm-hmm.

    27. AS

      And he's a great example.

    28. LF

      Yeah. Screw up one generation of GPUs and you're fucked.

    29. AS

      Yeah.

    30. LF

      Which is, that's terrifying to me. Just everything about hardware is terrifying to me 'cause you have to get everything right. The, all the- the mass production, all the different components-

  8. 55:5557:23

    Mark Zuckerberg

    1. AS

    2. LF

      Uh, so who else? You mentioned Bezos. You mentioned Elon.

    3. AS

      Yeah, like Larry and Sergey we've already talked about. Uh, I- I mean, Zuckerberg's obsession about, like, moving fast is, like, you know, very famous. Move fast and break things.

    4. LF

      What do you think about his leading the way in open source?

    5. AS

      It's amazing. Honestly, like, as- as a startup building in this space, I think- I'm- I'm very grateful that, uh, Meta and Zuckerberg are doing what they're doing. Uh, I- I- I think there's a lot... He's controversial for, like, whatever's happened in social media in general, but, uh, I think his positioning of Meta and, like, himself leading from the front in AI, uh, open sourcing great models, not just random models-

    6. LF

      Mm-hmm.

    7. AS

      Really, like Llama-3-70B is a pretty good model. I would say it's pretty close to GPT-4, not... a bit worse in like long tail, but 90/10 is there.

    8. LF

      Mm-hmm.

    9. AS

      And the 405B that's not released yet will likely surpass it or be as good, maybe less efficient. Doesn't matter. This is already a dramatic change from-

    10. LF

      Closest state-of-the-art. Yeah.

    11. AS

      Yeah.

    12. LF

      Yeah.

    13. AS

      And it gives hope for a world where we can have more players instead of, like, two- two or three companies controlling the- the most capable models. And that's why I think it's very important that he succeeds and, like, that- that his success also enables the success of many others.

  9. 57:231:04:09

    Yann LeCun

    1. AS

    2. LF

      So speaking of Meta, uh, Yann LeCun is somebody who funded, uh, Perplexity. What do you think about Yann? He gets... He's been fi-

    3. AS

      (laughs)

    4. LF

      He's been feisty his whole life, but he's been especially on fire recently on Twitter, on X.

    5. AS

      I have a lot of respect for him. I think he went through many years where people just ridiculed or, um, didn't respect his work as much as they should have, and he still stuck with it. And, like, not just his contributions to ConvNets and self-supervised learning and energy-based models and things like that. Uh, he also educated, like, a good generation of next scientists, like Koray, who's now the CTO of DeepMind, was a student. The- the guy who invented DALL-E-

    6. LF

      Mm-hmm.

    7. AS

      ... at OpenAI, un- and Sora, was Yann- Yann's, Yann LeCun's student, Aditya Ramesh. And, uh, ma- many others, like, who've done great work in this field, uh, come from LeCun's lab. Um, and like Wojciech Zaremba, one of OpenAI co-founders. So there's, like, a lot of people he's just given as the next generation too that have gone on to do great work. And, um, I would say that his- his- his positioning on like, you know... He was right about one thing very early on, uh, in- in- in 2016. Uh, you know, you probably remember RL was the real hot shit at the time.

    8. LF

      Mm-hmm.

    9. AS

      Like everyone wanted to do RL, and it was not an easy to gain skill. You have to actually go and, like, read MDPs, understand like-

    10. LF

      Mm-hmm.

    11. AS

      ... you know, read some math, Bellman equations, dynamic programming, model-based, model-free. There's just, like, a lot of terms, policy gradients. It- it goes over your head at some point. It's not that easily accessible, but everyone thought that was the future and- and that would lead us to AGI in, like, the next few years.

    12. LF

      Mm-hmm.

    13. AS

      And this guy went on the stage in Europe's, the premier AI conference, and said, "RL is just the cherry on the cake."

    14. LF

      Yeah. (laughs) Yeah.

    15. AS

      The... A- and bulk of the intelligence is in the cake, and supervised learning is the icing on the cake. And- and the bulk of the cake is unsupervised learning.

    16. LF

      Unsupervised he called at the time, which turned out to be, I guess, self-supervised, whatever.

    17. AS

      Yeah. That is literally the recipe for ChatGPT.

    18. LF

      Yeah.

    19. AS

      Like you're spending bulk of the compute in pre-training, predicting the next token, which is un- un- or self-supervised, whatever you wanna call it. The- the icing is the supervised fine-tuning step, instruction following, and the cherry in the cake, RLHF, which is what gives the conversational abilities.

    20. LF

      That's fascinating. Did he at that time... I'm trying to remember. Did he have inklings about what unsupervised learning...

    21. AS

      I think he was more into energy-based models at the time. Um, and- and, you know, there's... You can sc- say some amount of energy-based model reasoning is there in like RLHF but-

    22. LF

      But the basic intuition he had right.

    23. AS

      Yeah. I mean, he was wrong on the betting on GANs as the go-to idea-

    24. LF

      Mm-hmm.

    25. AS

      ... uh, which turned out to be wrong. And like, you know, our autoregressive models and diffusion models ended up winning. But the core insight that RL is, like, not the real deal-

    26. LF

      Mm-hmm.

    27. AS

      ... most of the computers should be spent on learning just from raw data, was super right and controversial at the time.

    28. LF

      Yeah. And he- (laughs) he wasn't apo- apologetic about it.

    29. AS

      Yeah. And- and now he's saying something else, which is he's saying autoregressive models might be a dead end.

    30. LF

      Yeah. Which is also super controversial.

  10. 1:04:091:20:07

    Breakthroughs in AI

    1. AS

    2. LF

      How surprising was it to you, because you were in the middle of it-

    3. AS

      Mm-hmm.

    4. LF

      ... how effective attention was? How, how-

    5. AS

      Self-attention?

    6. LF

      S- self-attention.

    7. AS

      Yeah.

    8. LF

      The thing that led to the transformer and everything else, like this explosion of-

    9. AS

      Yeah.

    10. LF

      ... intelligence that came from this-

    11. AS

      Yeah.

    12. LF

      ... idea. Maybe you can kinda try to describe which ideas are important here-

    13. AS

      Yeah.

    14. LF

      ... or is it just as simple as self-attention?

    15. AS

      So, uh, I think, I think first of all, attention, like, like Yoshua Bengio wrote this paper with Dmitri Bedano called Soft Attention, which was first supplied in this paper called Align and Translate. Ilya Sutskever wrote the first paper that said you can just train a simple RNN model, uh, scale it up, and it'll beat all the phrase-based machine translation systems. Uh, but that was brute force. There's no attention in it. And spent a lot of Google compute, like I think probably like 400 million parameter model or something even back in those days. And then this grad student, Bedano, uh, in Bengio's lab, identifies attention and beats his numbers with way less compute.

    16. LF

      Mm-hmm.

    17. AS

      So clearly a great idea. And then people at DeepMind figured that, like, like this paper called PixelRNNs-

    18. LF

      Mm-hmm.

    19. AS

      ... uh, figured that, uh, you don't even need RNNs. Even though the title is called PixelRNN, uh, I guess it's the actual architecture that became popular was WaveNet.

    20. LF

      Hmm.

    21. AS

      And, and they figured out that a completely convolutional model can do autoregressive modeling as long as you do mass convolutions. The masking was the key idea. So you can train in parallel. Instead of back propagating through time, you can back propagate through every input token in parallel. So that way, you can utilize the GPU computer a lot more ef- efficiently 'cause you're just doing matmuls.

    22. LF

      Mm-hmm.

    23. AS

      Uh, and so they c- just said throw away the RNN, and that was powerful. Um, and so then Google Brain, like Vaswani et al, that, the, the transformer paper identified that, okay, let's, let's take the good elements of both. Let's take attention. It's more powerful than cons. It learns more higher order dependencies 'cause it applies more multiplicative compute. And, uh, let's take the insight in WaveNet that you can just have a all convolutional model that fully parallel matrix multiplies and combine the two together, and they built a transformer. And that is the... I would say it's almost like the last answer that like nothing has changed since 2017 except maybe a few changes on what the non-linearities are and, like, how the square root descaling should be done. Like some of that has changed, but... And then people have tried mixture of experts, having more parameters per in- uh, for the same flop and things like that, but the core transformer architecture has not changed.

    24. LF

      Isn't it crazy to you that masking as, as simple as something like that works so damn well?

    25. AS

      ... yeah, it's a very clever insight that, look, you want to learn causal dependencies, but you don't wanna waste your hardware, your compute, and keep doing the back propagation sequentially. You want to do as much parallel compute as possible during training. That way whatever job was earlier running on, in eight days would run, like, in a single day. I think that was the most important insight. And, like, whether it's cons or attention, I guess, attention and, and transformers make even better use of hardware than cons, uh, because they apply more, uh, compute per flop. Because in a transformer, the self-attention operator doesn't even have parameters. The QKT transpose softmax times V has no parameter, but it's doing a lot of flops. And that's powerful. It learns multi auto dependencies. I think the insight then OpenAI took from that is, hey, like, Ilya Sutskever has been saying, like, unsupervised learning is important, right? Like, they, they wrote this paper called Sentiment Neuron, and then Alec Radford and him worked on this paper called GPT-1. It's not, it wasn't even called GPT-1. It was just called GPT. Little did they know that it would go on to be this big. But just said, hey, like, let's revisit the idea that you can just train a giant language model and it will learn common, natural language common sense. That was not scalable earlier because you were scaling up RNNs. But now you got this new transformer model that's 100X more efficient at getting to the same performance, which means if you run the same job, you would get something that's way better if you apply the same amount of compute. And so they just trained transformer around like wi- uh, all the books, like storybooks, children's storybooks. And then that got, like, really good. And then Google took that insight and did BERT, except they did bidirectional, but they trained on Wikipedia and books, and that got a lot better. And then OpenAI followed up and said, "Okay, great. So it looks like the secret sauce that we were missing was data and throwing more parameters." So we get GPT-2, which is like a billion-parameter model and, like, trained on, like, lot of links from Reddit. And then that became amazing, like, you know, produce all these stories about a unicorn and things like that, if you remember.

    26. LF

      (laughs) Yeah, yeah.

    27. AS

      Um, and then, like, the GPT-3 happened, which is like you just scale up even more data. You take Common Crawl and instead of one billion, go all the way to 175 billion. But that was done through analysis called the scaling loss, which is for a bigger model, you need to keep scaling the amount of tokens, and you train on 300 billion tokens. Now it feels small. These models are being trained on, like, tens of trillions of tokens and, like, trillions of parameters. But, like, this is literally the evolution. It's not... Like, then the focus went more into, like, par- pieces outside the architecture on, like, data, what data you're training on, what are the tokens, how de-duped they are. Uh, and then there's Chinchilla inside. It's not just about making the model bigger, but you want to also make the dataset bigger. You wanna make sure the tokens are also big enough in quantity and high quality and do the right evals on, like, a lot of reasoning benchmarks. So I think that, that ended up being the breakthrough, right? Like, this... It's not like attention alone was important. Attention, parallel computation, transformer, uh, scaling it up to do unsupervised pre-training, right data, and then constant improvements.

    28. LF

      Well, let's take it to the end because you just gave an epic history (laughs) of LLMs and the breakthroughs of the past 10 years plus. Uh, so you mentioned GPT-3. So 3.5, how important to you, uh, is RLHF? That aspect of it?

    29. AS

      It's really important. It's... Even though you, you call it as a cherry on the cake.

    30. LF

      (laughs) This, this cake has a lot of cherries, by the way.

  11. 1:20:071:26:24

    Curiosity

    1. AS

    2. LF

      This kind of work hints a little bit of a similar kind of approach as self-play.

    3. AS

      Mm-hmm.

    4. LF

      Do you think it's possible we live in a world where we get, like, an intelligence explosion from self-supervised, uh, post-training? Meaning, like-

    5. AS

      Yeah.

    6. LF

      ... there's some kind of insane world where AI, AI systems are just talking to each other and learning from each other? That's, that's what this kind of, at least to me, seems like it's pushing towards that direction.

    7. AS

      Yeah.

    8. LF

      And it's not obvious to me that that's not possible.

    9. AS

      It's not possible to say... Like, like, unless mathematically you can say it's not possible-

    10. LF

      Right (laughs) .

    11. AS

      ... uh, it's hard to say it's not possible. Of course, there are some simple arguments you can make, like where is the new signal to this-

    12. LF

      Right.

    13. AS

      ... is, to the AI coming from? Like, how are you creating new signal from nothing?

    14. LF

      There has to be some human annotation.

    15. AS

      Like, for self-play Go or chess, you know, the, who won the game, that was signal, and that's according to the rules of the game.

    16. LF

      Yeah.

    17. AS

      In, in these AI tasks, like, of course, for math and coding, you can always verify if something was correct through traditional verifiers.

    18. LF

      Mm-hmm.

    19. AS

      But for more open-ended things, like say, uh, predict the stock market for Q3.

    20. LF

      Mm-hmm.

    21. AS

      Like, what, what is correct? You don't even know. Okay, maybe you can use historic data. I, I only give you data until Q1, and see if you predicted well for Q2, and you train on that signal. Maybe that, that's useful, uh, and you, then you still have to collect a bunch of tasks like that and create a RL suit for that, or, like, give agents, like, tasks, like a browser, and ask them to do things-

    22. LF

      Mm-hmm.

    23. AS

      ... and sandbox it, and where, like, completion is based on whether the task was achieved, which will be verified by humans. So you, you don't need to set up, uh, like, a RL sandbox for these agents to, like, play and test and verify, but-

    24. LF

      And get signal from humans at some point.

    25. AS

      Yeah.

    26. LF

      But I guess the, the, the idea is that the amount of signal you need relative to how much new intelligence you gain is much smaller.

    27. AS

      Correct.

    28. LF

      So you just need to interact with humans every once in a while.

    29. AS

      Bootstrap, interact and improve.

    30. LF

      Mm-hmm.

Episode duration: 3:02:15

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode e-gwvmhyU7A

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome