AcquiredGoogle: The AI Company. Google is amazingly well-positioned... will they win in AI? (audio)
EVERY SPOKEN WORD
150 min read · 30,001 words- 0:00 – 4:36
The Innovator's Dilemma: Google's AI Challenge
- BGBen Gilbert
I went and looked at a studio, well, a little office that I was gonna turn into a studio nearby, but it was not good at all. It had drop ceilings, so I could hear the guy in the office next to me. You would be able to hear him talking on episodes. [laughing]
- DRDavid Rosenthal
Third co-host!
- BGBen Gilbert
Third co-host.
- DRDavid Rosenthal
Is it Howard?
- BGBen Gilbert
No, it was, like, a lawyer. He seemed to be, like, talking through some horrible problem that I didn't want to listen to-
- DRDavid Rosenthal
[laughing]
- BGBen Gilbert
- but I could hear every word.
- DRDavid Rosenthal
Does he want millions of people listening to his conversations? [laughing]
- BGBen Gilbert
[laughing] You're right. Right.
- DRDavid Rosenthal
All right.
- BGBen Gilbert
All right. Let's do a podcast.
- DRDavid Rosenthal
Let's do a podcast.
- BGBen Gilbert
[laughing]
- SPSpeaker
Who got the truth? Is it you? Is it you? Is it you? Who got the truth now? Hmm. Is it you? Is it you? Is it you? Sit me down. Say it straight. Another story on the way. Who got the truth?
- BGBen Gilbert
Welcome to the Fall 2025 season of Acquired, the podcast about great companies and the stories and playbooks behind them. I'm Ben Gilbert.
- DRDavid Rosenthal
I'm David Rosenthal.
- BGBen Gilbert
And we are your hosts. Here is a dilemma: imagine you have a profitable business. You make giant margins on every single unit you sell, and the market you compete in is also giant, one of the largest in the world, you might say. But then on top of that, lucky for you, you also are a monopoly in that giant market, with ninety percent share and a lot of lock-in.
- DRDavid Rosenthal
And when you say monopoly, monopoly as defined by the US government.
- BGBen Gilbert
That is correct. But then imagine this: in your research lab, your brilliant scientists come up with an invention. This particular invention, when combined with a whole bunch of your old inventions by all your other brilliant scientists, turns out to create the product that is much better for most purposes than your current product. So you launch the new product based on this new invention, right?
- DRDavid Rosenthal
Right.
- BGBen Gilbert
I mean, especially because out of pure benevolence, your scientists had published research papers about how awesome the new invention is, and lots of the inventions before also, so now there's new startup competitors quickly commercializing that invention. So of course, David, you change your whole product to be based on the new thing, right?
- DRDavid Rosenthal
Uh, this sounds like a movie.
- BGBen Gilbert
Yes, but here is the problem: You haven't figured out how to make this new, incredible product anywhere near as profitable as your old giant cash-printing business, so maybe you shouldn't launch that new product. David, this sounds like quite the, uh, dilemma to me. [laughing]
- DRDavid Rosenthal
[laughing]
- BGBen Gilbert
Of course, listeners, this is Google today, and in perhaps the most classic textbook case of the innovator's dilemma ever. The entire AI revolution that we are in right now is predicated by the invention of the Transformer out of the Google Brain team in 2017. So think OpenAI and ChatGPT, Anthropic, NVIDIA hitting all-time highs. All the craziness right now depends on that one research paper published by Google in 2017. And consider this: Not only did Google have the densest concentration of AI talent in the world ten years ago that led to this breakthrough, but today, they have just about the best collection of assets that you could possibly ask for. They've got a top-tier AI model with Gemini. They don't rely on some public cloud to host their model. They have their own in Google Cloud that now does fifty billion dollars in revenue. That is real scale. They're a chip company with their Tensor Processing Units, or TPUs, which is the only real scale deployment of AI chips in the world besides NVIDIA GPUs. Maybe AMD, maybe, but these are definitely the top two.
- DRDavid Rosenthal
Somebody put it to me in research that if you don't have a foundational frontier model, or you don't have an AI chip, you might just be a commodity in the AI market, and Google is the only company that has both.
- BGBen Gilbert
Google still has a crazy bench of talent, and despite ChatGPT becoming kind of the Kleenex of the era, Google does still own the text box, the single one that is the front door to the internet for the vast majority of people anytime anyone has intent to do anything online. But the question remains: What should Google do strategically? Should they risk it all and lean into their birthright to win in artificial intelligence, or will protecting their gobs of profits from Search hamstring them as the AI wave passes them by? But perhaps first, we must answer the question: How did Google
- 4:36 – 17:21
From PageRank to language models: Google's Early AI Foundations
- BGBen Gilbert
get here, David Rosenthal?
- DRDavid Rosenthal
[laughing]
- BGBen Gilbert
So, listeners, today, we tell the story of Google, the AI company.
- DRDavid Rosenthal
Woo!
- BGBen Gilbert
You like that, David? Was that good?
- DRDavid Rosenthal
I love it. I love it. [laughing]
- BGBen Gilbert
[laughing] All right.
- DRDavid Rosenthal
Did you hire, like, a Hollywood scriptwriting consultant without telling me?
- BGBen Gilbert
I wrote that a hundred percent myself with no AI, thank you very much.
- DRDavid Rosenthal
No AI. [laughing]
- BGBen Gilbert
Well, listeners, if you wanna know every time an episode drops, vote on future episode topics, or get access to corrections from past episodes, check out our email list. That's acquired.fm/email. Come talk about this episode with the entire Acquired community in Slack after you listen. That's acquired.fm/slack.
- DRDavid Rosenthal
Speaking of the Acquired community, we have an anniversary celebration coming up.
- BGBen Gilbert
We do.
- DRDavid Rosenthal
Ten years of the show. We're gonna do an open Zoom call with everyone to celebrate, kinda like how we used to do our LP calls back in the day-
- BGBen Gilbert
Yes
- DRDavid Rosenthal
... with LPs, and we are gonna do that on October 20th, 2025, at 4:00 PM Pacific Time. Check out the show notes for more details.
- BGBen Gilbert
If you want more Acquired, check out our interview show, ACQ2. Our last interview was super fun. We, uh, sat down with Tobi Lutke, the founder and CEO of Shopify, about how AI has changed his life and where he thinks it will go from here. So search ACQ2 in any podcast player. And before we dive in, we wanna briefly thank our presenting partner, J.P. Morgan Payments.
- DRDavid Rosenthal
Yes, just like how we say every company has a story, every company's story is powered by payments, and J.P. Morgan Payments is a part of so many of their journeys, from seed to IPO and beyond.
- BGBen Gilbert
So with that, this show is not investment advice. David and I may have investments in the companies we discuss, and this show is for informational and entertainment purposes only.... David, Google, the AI company.
- DRDavid Rosenthal
So Ben, as you were alluding to in that fantastic intro, really, you're really upping the game again. [laughing] If we rewind ten years ago from today, before the Transformer paper comes out, all of the following people, as we've talked about before, were Google employees: Ilya Sutskever, founding chief scientist of OpenAI, who along with Geoff Hinton and Alex Krizhevsky, had done the seminal AI work on AlexNet and just published that a few years before. All three of them were Google employees, as was Dario Amodei, the founder of Anthropic, [chuckles] Andrej Karpathy, chief scientist at Tesla until recently, Andrew Ng, Sebastian Thrun, Noam Shazeer, all the DeepMind folks, Demis Hassabis, Shane Legg, Mustafa Suleyman. Mustafa, now, in addition to in the past having been a founder of DeepMind, runs AI at Microsoft. Basically, every single person of note in AI worked at Google, with the one exception of Yann LeCun, who worked at Facebook.
- BGBen Gilbert
Yeah. It's pretty difficult to trace a big AI lab now back and not find Google in its origin story.
- DRDavid Rosenthal
Yeah, I mean, the analogy here is it's almost as if at the dawn of the computer era itself, a single company, like, say, IBM, had hired every single person who knows how to code. So it'd be like, "You know, if anybody else wants to write a computer program, oh, sorry, you can't do that. Anybody who knows how to program works at IBM." This is how it was with AI and Google in the mid-2010s. But learning how to program a computer wasn't so hard that people out there couldn't learn how to do it. Learning how to be an [chuckles] AI researcher, significantly more difficult.
- BGBen Gilbert
Right, it was the stuff of very specific PhD programs with a very limited set of advisors, and a lot of infighting in the field of where the direction of the field was going, what was legitimate versus what was crazy, heretical, religious stuff.
- DRDavid Rosenthal
Yep. So Ben, yes, the question is: how did we get to this point? Well, it goes back to the start of the company. I mean, Larry Page always thought of Google as an artificial intelligence company, and in fact, Larry Page's dad was a computer science professor and had done his PhD at the University of Michigan in machine learning and artificial intelligence, which was not a popular field in computer science back then.
- BGBen Gilbert
Yeah, in fact, a lot of people thought specializing in AI was a waste of time because so many of the big theories from thirty years prior to that had been kind of disproven at that point, or at least people thought they were disproven. And so it was frankly contrarian for Larry's dad to spend his life and career and research work in AI.
- DRDavid Rosenthal
And that rubbed off on Larry. I mean, if you squint, PageRank, the PageRank algorithm that Google was founded upon, is a statistical method. You could classify it as part of AI within computer science. And Larry, of course, was always dreaming much, much bigger. I mean, there's the quote that we've said before on this show, in the year 2000, two years after Google's founding, when Larry says, "Artificial intelligence would be the ultimate version of Google. If we had the ultimate search engine, it would understand everything on the web, it would understand exactly what you wanted, and it would give you the right thing. That's obviously artificial intelligence. We're nowhere near doing that now. However, we can get incrementally closer, and that is basically what we work on here." [chuckles] It's always been an AI company.
- BGBen Gilbert
Yep, and that was in 2000.
- DRDavid Rosenthal
Well, one day, in either late 2000 or early 2001, the timelines are a bit hazy here, a Google engineer named Georges Harik is talking over lunch with Ben Gomes, famous Google engineer, who I think would go on to lead search, and a relatively new engineering hire named Noam Shazeer. Now, Georges was one of Google's first ten employees, incredible engineer, and just like Larry Page's dad, he had a PhD in machine learning from the University of Michigan. And even when Georges went there, it was still a relatively rare, contrarian subfield within computer science. So the three of them are having lunch, and Georges says offhandedly to the group that he has a theory from his time as a PhD student that compressing data is actually technically equivalent to understanding it. And the thought process is, if you can take a given piece of information and make it smaller, store it away, and then later reinstantiate it in its original form, the only way that you could possibly do that is if whatever force is acting on the data actually understands what it means, because you're losing information, going down to something smaller, and then recreating the original thing. It's like you're a kid in school. You learn something in school, you read a long textbook, you store the information in your memory, then you take a test to see if you really understood the material, and if you can recreate the concepts, then you really understand it.
- BGBen Gilbert
Which kind of foreshadows big LLMs today are, like, compressing the entire world's knowledge into some number of terabytes that's just, like, this smashed down little vector set. Little, at least compared to all the information in the world, but it's kind of that idea, right? You can store all the world's information in an AI model, in something that is, like, kind of incomprehensible and hard to understand, but then if you uncompress it, you can kind of bring knowledge back to its original form.
- DRDavid Rosenthal
Yep, and these models demonstrate understanding, right?
- 17:21 – 24:05
Parallelizing Language: Jeff Dean's Google Translate Revolution
- DRDavid Rosenthal
I think that one's my favorite.
- BGBen Gilbert
Yes.
- DRDavid Rosenthal
Oh, man, so, so good. Also, a wonderful human being who we spoke to in research and was very, very helpful. Thank you, Jeff.
- BGBen Gilbert
Yes.
- DRDavid Rosenthal
So language models definitely work, definitely gonna drive a lot of value for Google, and they also fit pretty beautifully into Google's mission to organize the world's information and make it universally accessible and useful. If you can understand the world's information and compress it and then recreate it, yeah, that fits the mission, I think. I think that checks the box.
- BGBen Gilbert
Absolutely.
- DRDavid Rosenthal
So PHIL gets so big that apparently by the mid-2000s, PHIL is using fifteen percent of Google's entire data center infrastructure, and I assume a lot of that is AdSense ad serving, but also did you mean, and all the other stuff that they start using it for within Google.
- BGBen Gilbert
So, uh, early natural language systems, computationally expensive.
- DRDavid Rosenthal
Yes. So okay, now mid-2000s, fast-forward to 2007, which is a very, very big year for the purposes of our story. Google had just recently launched the Google Translate product. This is the era of all the great, great products coming out of Google that we've talked about, you know, Maps and Gmail and Docs, and all the wonderful things that Chrome and Android are gonna come later.
- BGBen Gilbert
They had, like, a ten-year run, where they basically launched everything you know of at Google, except for Search.... truly in a ten-year run, and then there were about ten years after that, from 2013 on, where they basically didn't launch any new products that you've heard about until we get to Gemini, which is this fascinating thing. But this '03 to 2013 era was just so rich with hit after hit after hit.
- DRDavid Rosenthal
Magical. And so one of those products was Google Translate. You know, not the same level of user base or perhaps impact on the world as Gmail or Maps or whatnot, but still a magical, magical product. And the chief architect for Google Translate was another incredible machine learning PhD named Franz Och. So Franz had a background in natural language processing and machine learning, and that was his PhD. He was German, uh, he got his PhD in Germany. At the time, DARPA-
- BGBen Gilbert
The Defense Advanced Research Projects Agency, division of the government
- DRDavid Rosenthal
... had one of their famous challenges going for machine translation. So Google and Franz, of course, enters this, and Franz builds an even larger language model that blows away the competition in this year's version of the DARPA Challenge. This is either 2006 or 2007, gets a astronomically high BLEU score for the time. It's called the Bilingual Evaluation Understudy, is the sort of algorithmic benchmark for judging the quality of translations. At the time, higher than anything else possible. So Jeff Dean hears about this and the work- [chuckles] ... that Franz and the Translate team have done, and it's like, "This is great. This is amazing. Uh, when are you guys gonna ship this in production?"
- BGBen Gilbert
Oh, I heard this story.
- DRDavid Rosenthal
So Jeff and Noam talk about this on the Door Catch podcast.
- BGBen Gilbert
Yes.
- DRDavid Rosenthal
That episode is so, so good. And Franz is like, "No, no, no, no, Jeff, you, you don't understand. This is research. This isn't for the product. We can't ship this model that we built. This is a n-gram language model." Grams are like number of words in a cluster. "And we've trained it on a corpus of two trillion words from the Google search index. This thing is so large, it takes it twelve hours to translate a sentence." [laughing] So the way the DARPA Challenge worked in this case was you got a set of sentences on Monday, and then you had to submit your machine translation of those set of sentences by Friday.
- BGBen Gilbert
Plenty of time for the servers to run. [chuckles]
- DRDavid Rosenthal
Yeah, they were like, "Okay, so we have whatever number of hours it is from Monday to Friday, let's use as much compute as we can to translate these couple sentences." [laughing]
- BGBen Gilbert
[laughing] Hey, learn the rules of the game and use them to your advantage.
- DRDavid Rosenthal
Exactly. So Jeff Dean, being the engineering equivalent of Chuck Norris, he's like, "Hmm, let me see your code." So Jeff goes and parachutes in and works with the Translate team for a few months, and he re-architects the algorithm to run on the words and the sentences in parallel instead of sequentially, 'cause when you're translating a set of sentences or a set of words in a sentence, you don't necessarily need to do it in order. You can break up the problem into different pieces, work on it independently. You can parallelize it.
- BGBen Gilbert
And you won't get a perfect translation, but, you know, imagine you just translate every single word, you can at least go translate those all at the same time in parallel, reassemble the sentence, and, like, mostly understand what the initial meaning was.
- DRDavid Rosenthal
Yep, and as Jeff knows very well, because he and Sanjay basically built it with Urs Holze, Google's infrastructure is extremely parallelizable. Distributed, you can break up workloads into little chunks, send them all over the various data centers that Google has, reassemble the projects, return that to the user.
- BGBen Gilbert
They are the single best company in the world at parallelizing workloads across CPUs, across multiple data centers.
- DRDavid Rosenthal
CPUs, we're still talking CPUs here.
- BGBen Gilbert
Yep.
- DRDavid Rosenthal
And Jeff's work with the team gets that average sentence translation time down from twelve hours to one hundred milliseconds. [chuckles] And so then they ship it in Google Translate, and it's amazing.
- BGBen Gilbert
This sounds like a Jeff Dean fact. "Well, you know, it used to take twelve hours, and then Jeff Dean took a few months with it. Now it's a hundred milliseconds." [chuckles]
- DRDavid Rosenthal
[chuckles] Right. Right, right, right, right. So this is the first large, I'm using large in quotes here, language model used in production in a product at Google. They see how well this works. Like, hmm, maybe we could use this for other things, like predicting search queries as you type.
- BGBen Gilbert
[chuckles]
- 24:05 – 35:10
Sebastian Thrun and Geoff Hinton: Laying Google X Foundations
- DRDavid Rosenthal
Also in 2007 begins the sort of momentous intersection of several computer science professors on the Google campus. So in April of 2007, Larry Page hires Sebastian Thrun from Stanford to come to Google and work first part-time and then full-time on machine learning applications. Sebastian was the head of SAIL at Stanford, the Stanford Artificial Intelligence Laboratory. Legendary AI laboratory that was big in the sort of first wave of AI back in the '60s, '70s, when Larry's dad was active in the field.... then actually shut down for a while, and then had been restarted and re-energized here in the early two thousands, and Sebastian was the leader, the head of SAIL.
- BGBen Gilbert
Funny story about Sebastian, the way that he actually comes to Google, Sebastian was kind enough to speak with us to prep for this episode. I didn't realize it was basically an acqui-hire. He and some, I think it was grad students, were in the process of starting a company, had term sheets from Benchmark and Sequoia-
- DRDavid Rosenthal
Yes.
- BGBen Gilbert
And Larry came over and said, "What if we just acquire your company before it's even started in the form of signing bonuses?"
- DRDavid Rosenthal
Yes, probably a very good decision on their part. So SAIL, this group within the CS department at Stanford, not only had some of the most incredible, most accomplished professors and PhD AI researchers in the world, they also had this stream of Stanford undergrads that would come through and work there as researchers while they were working on their CS degrees or symbolic system degrees or, you know, whatever it was that they were doing as Stanford undergrads. One of those people was Chris Cox, who's the-
- BGBen Gilbert
No way!
- DRDavid Rosenthal
... chief product officer at Meta. Yeah, that was kinda how he-
- BGBen Gilbert
Ah
- DRDavid Rosenthal
... got his start in all of this, in AI, and, you know, obviously, Facebook and Meta are gonna come back into the story here in a little bit.
- BGBen Gilbert
Wow.
- DRDavid Rosenthal
You really can't make this up. Another undergrad who passed through SAIL while Sebastian was there, was a young freshman and sophomore who would later drop out of Stanford to start a company that went through Y Combinator's very first batch in summer 2005.
- BGBen Gilbert
I'm on the edge of my seat. Who is this?
- DRDavid Rosenthal
Any guesses?
- BGBen Gilbert
Uh, Dropbox, Reddit. I'm trying to think who else was in the first batch.
- DRDavid Rosenthal
Oh, no, no, but way more on the nose for this episode. The company was a failed local mobile social network.
- BGBen Gilbert
Oh! Sam Altman, Loopt.
- DRDavid Rosenthal
Sam Altman. [laughing]
- BGBen Gilbert
[laughing] That's amazing. He was at SAIL at the same time?
- DRDavid Rosenthal
He was at SAIL, yep, as an undergrad researcher.
- BGBen Gilbert
Wow.
- DRDavid Rosenthal
Wild, right? We told you that it's a very small set of people [chuckles] that are all doing all of this.
- BGBen Gilbert
Man, I miss those days, Sam presenting at the WWDC with Steve Jobs on stage with the double pop collar.
- DRDavid Rosenthal
Right?
- BGBen Gilbert
Different time in tech.
- DRDavid Rosenthal
The, [chuckles] the double popped collar-
- BGBen Gilbert
[laughing]
- DRDavid Rosenthal
... that was amazing. That was a vibe. That was a moment. Oh, man. All right, so April 2007, Sebastian comes over from SAIL into Google, Sebastian Thrun, and one of the first things he does over the next set of months is a project called Ground Truth for Google Maps.
- BGBen Gilbert
Which is essentially Google Maps.
- DRDavid Rosenthal
It is essentially Google Maps. So before Ground Truth, Google Maps existed as a product, but they had to get all the mapping data from a company called TeleAtlas.
- BGBen Gilbert
I think there were two, they were sort of a duopoly. Navteq was the other one.
- 35:10 – 47:05
Unsupervised Learning: Google Brain's Cat Paper Breakthrough
- DRDavid Rosenthal
But before we tell the Google Brain story, now is a great time to thank our friends at J.P. Morgan Payments.
- BGBen Gilbert
Yes. So today, we are going to talk about one of the core components of J.P. Morgan Payments, their Treasury Solutions. Now, treasury is something that most listeners probably do not spend a lot of time thinking about, but it's fundamental to every company.
- DRDavid Rosenthal
Yep. Treasury used to be just a back-office function, but now great companies are using it as a strategic lever. With J.P. Morgan Payments Treasury Solutions, you can view and manage all your cash positions in real time and all of your financial activities across a hundred and twenty currencies in two hundred countries.
- BGBen Gilbert
And the other thing that they acknowledge, really in their whole strategy, is that every business has its own quirks, so it's not a cookie-cutter approach. They work with you to figure out what matters most for you and your business and then help you gain clarity, control, and confidence.
- DRDavid Rosenthal
So whether you need advanced automation or just want to cut down on manual processes and approvals, their real-time Treasury Solutions are designed to keep things running smoothly, whether your treasury is in the millions or billions, or perhaps, like the company we're talking about this episode, in the hundreds of billions of dollars.
- BGBen Gilbert
And they have some great strategic offerings, like Pay by Bank, which lets customers pay you directly from their bank account. It's simple, secure, tokenized, and you get faster access to funds and enhanced data to optimize revenue and reduce fees. This lets you send and receive real-time payments instantly, just with a single API connection to J.P. Morgan.
- DRDavid Rosenthal
And because J.P. Morgan's platform is global, that one integration lets you access forty-five countries and counting and lets you scale basically infinitely as you expand. As we've said before, J.P. Morgan Payments moves ten trillion dollars a day, so scale is not an issue for your business.
- BGBen Gilbert
Not at all. If you're wondering how to actually manage all that global cash, J.P. Morgan again has you covered with their liquidity and account solutions that make sure you have the right amount of cash in the right currencies in the right places for what you need.
- DRDavid Rosenthal
... So whether you're expanding into new markets or just want more control over your funds, J.P. Morgan Payments is the partner you want to optimize liquidity, streamline operations, and transform your treasury. To learn more about how J.P. Morgan can help you and your company, just go to jpmorgan.com/acquired and tell them that Ben and David sent you.
- BGBen Gilbert
All right, David, so Google Brain.
- DRDavid Rosenthal
So when Sebastian left Stanford full-time and joined Google full-time, of course, somebody else had to take over sales, and the person who did is another computer science professor, brilliant guy named Andrew Ng.
- BGBen Gilbert
This is, like, all the hits.
- DRDavid Rosenthal
All the hits. This is all the AI hits in this episode. [chuckles]
- BGBen Gilbert
Yes.
- DRDavid Rosenthal
So what does Sebastian do? He recruits Andrew to- [laughing] ... come part-time, start spending a day a week on the Google campus. And this coincides right with the start of X and Sebastian formalizing this division. So one day in twenty ten, twenty eleven timeframe, Andrew's spending his day a week on the Google campus, and he bumps into, who else? Jeff Dean. And Jeff Dean is telling Andrew about what he and Franz have done with language models and what Geoff Hinton is doing in deep learning. Of course, Andrew knows all this, and Andrew's talking about what he and Sale are doing at Stanford, and they decide, "You know, the time might finally be right to try and take a real big swing on this within Google and build a massive, really large, deep learning model in the vein of what Geoff Hinton has been talking about on highly parallelizable Google infrastructure."
- BGBen Gilbert
And when you say the time might be right, Google had tried twice before, and neither project really worked. They tried this thing called Brains on Borg. Borg is sort of an internal system that they use to run all of their infrastructure. They tried the Cortex project, and neither of these really worked. So there's a little bit of scar tissue in the sort of research group at Google of, "Are large-scale neural networks actually going to work for us on Google infrastructure?"
- DRDavid Rosenthal
So the two of them, Andrew Ng and Jeff Dean, pull in Greg Corrado, who is a neuroscience PhD and amazing researcher who was already working at Google. And in two thousand eleven, the three of them launch the second official project within X, appropriately enough, called Google Brain. And the three of them get to work building a really, really big, deep neural network model.
- BGBen Gilbert
And if they're going to do this, they need a system to run it on. You know, Google is all about taking this sort of frontier research and then doing the architectural and engineering system to make it actually run.
- DRDavid Rosenthal
Yes. So Jeff Dean is working on this system, on the infrastructure, and he decides to name the infrastructure DistBelief, which of course, is a pun both on the distributed nature of the system and also on, of course, the word disbelief, because-
- BGBen Gilbert
No one thought it was going to work. [laughing]
- DRDavid Rosenthal
Most people in the field thought this was not going to work, and most people in Google thought this was not going to work.
- BGBen Gilbert
And here's a little bit on why, and it's a little technical, but follow me for a second. All of the research from that period of time pointed to the idea that you needed to be synchronous, so all the compute needed to be sort of really dense, happening on a single machine with really high parallelism, kind of like what GPUs do, that you really would want it all sort of happening in one place so it's really easy to kind of go look up and see, hey, what are the computed values for everything else in the system before I take my next move? What Jeff Dean wrote with DistBelief was the opposite. It was distributed across a whole bunch of CPU cores and potentially all over a data center or maybe even in different data centers. So in theory, this is really bad because it means you would need to be constantly waiting around on any given machine for the other machines to sync their updated parameters before you could proceed. But instead, the system actually worked asynchronously without bothering to go and get the latest parameters from other cores. So you were sort of updating parameters on stale data. You would think that wouldn't work. The crazy thing is it did.
- DRDavid Rosenthal
Yes.
- BGBen Gilbert
Okay, so you've got DistBelief. What do they do with it now? They want to do some research. So they try out, "Can we do cool neural network stuff?" And what they do in a paper that they submitted in twenty eleven, right at the end of the year, is... I'll give you the name of the paper first, "Building High-Level Features Using Large-Scale Unsupervised Learning," but everyone just calls it the cat paper. [laughing]
- DRDavid Rosenthal
The cat paper. [laughing]
- BGBen Gilbert
You talk to anyone at Google, you talk to anyone at AI, they're like, "Oh, yeah, the cat paper." What they did was they trained a large nine-layer neural network to recognize cats from unlabeled frames of YouTube videos using sixteen thousand CPU cores on a thousand different machines. And listeners, just to, like, underscore how seminal this is, we actually talked with Sundar in prep for the episode, and he cited seeing the cat paper come across his desk as one of the key moments that sticks in his brain in Google's story.
- DRDavid Rosenthal
Yeah. A little later on, they would do a TGIF, where they would present the results of the cat paper, and you talk to people at Google, they're like, "That TGIF, oh, my God, that's when it all changed."
- BGBen Gilbert
Yeah. It proved that large neural networks could actually learn meaningful patterns without supervision and without labeled data. And not only that, it could run on a distributed system that Google built to actually make it work on their infrastructure. And that is a huge unlock of the whole thing. Google's got this big infrastructure asset. Can we take this theoretical computer science idea that the researchers have come up with and use DistBelief to actually run it on our system?
- DRDavid Rosenthal
Yep, that is the amazing technical achievement here. That is almost secondary to the business impact of the cat paper. I think it's not that much of a leap to say that the cat paper-... led to probably hundreds of billions of dollars of revenue generated by Google and Facebook and ByteDance over the next decade.
- BGBen Gilbert
Definitely, pattern recognizers in data.
- 47:05 – 1:00:02
From AlexNet to DeepMind: The GPU AI Revolution
- DRDavid Rosenthal
Jensen at NVIDIA always calls the Big Bang moment for AI, which was AlexNet.
- BGBen Gilbert
Yes.
- DRDavid Rosenthal
So we talked about Geoff Hinton. Back at the University of Toronto, he's got two grad students who he's working with in this era, Alex Krizhevsky and Ilya Sutskever.
- BGBen Gilbert
Of course.
- DRDavid Rosenthal
Future-
- BGBen Gilbert
Coauthor
- DRDavid Rosenthal
... cofounder and chief scientist of OpenAI. And the three of them are working with Geoff's deep neural network ideas and algorithms to create an entry for the famous ImageNet competition in computer science.
- BGBen Gilbert
This is Fei-Fei Li's thing from Stanford.
- DRDavid Rosenthal
It is a annual machine vision algorithm competition, and what it was, was Fei-Fei had assembled a database of fourteen million images that were hand-labeled. Famously, she used Mechanical Turk on Amazon, I think, to get them all hand-labeled.
- BGBen Gilbert
Yes, I think that's right.
- DRDavid Rosenthal
And so then the competition was what team can write the algorithm that without looking at the labels, so just seeing the images, could correctly identify the largest percentage? The best algorithms that would win the competitions year over year were still getting more than a quarter of the images wrong. So, like, seventy-five percent success rate, great, way worse than a human.
- BGBen Gilbert
Can't use it for much in a production setting when quarter of the time you're wrong.
- DRDavid Rosenthal
So then, the two thousand twelve competition, along comes AlexNet. Its error rate was fifteen percent, still high, but a ten percent leap from the previous best being a twenty-five percent error rate all the way down to fifteen in one year. A leap like that had never happened before.
- BGBen Gilbert
It's forty percent better than the next best-
- DRDavid Rosenthal
Yes
- BGBen Gilbert
... on a relative basis.
- DRDavid Rosenthal
Yes.
- BGBen Gilbert
And why is it so much better, [laughing] David? What did they figure out that would create a four trillion dollar company in the future?
- DRDavid Rosenthal
[laughing] So what Geoff and Alex and Ilya did is they knew, like we've been talking about all episode, that deep neural networks had all this potential, and Moore's Law had advanced enough that you could use CPUs to create a few layers.... they had the aha moment of what if we re-architected this stuff not to run on CPUs, but to run on a whole different class of computer chips that were, by their very nature, highly, highly, highly parallelizable, video game graphics cards, made by the leading company in the space at the time, NVIDIA. [laughing] Not obvious at the time, and especially not obvious that this highly advanced, cutting-edge, academic computer science research-
- BGBen Gilbert
That was being done on supercomputers, usually.
- DRDavid Rosenthal
That was being done on supercomputers with incredible CPUs, would use these toy video game cards.
- BGBen Gilbert
That retail for $1,000.
- DRDavid Rosenthal
Yeah, well, less at that point in time, a couple hundred bucks. So the team in Toronto, they go out to, like, the local Best Buy or something. They buy two NVIDIA GeForce GTX 580s, which were NVIDIA's top-of-the-line gaming cards at the time. The Toronto team rewrites their neural network algorithms in CUDA, NVIDIA's programming language. They train it on these two off-the-shelf GTX 580s, and this is how they achieve their deep neural network and do forty percent better than any other entry in the ImageNet competition. So when Jensen says that this was the Big Bang moment of artificial intelligence, A, he's right. This shows everybody that, holy crap, if you can do this with two off-the-shelf GTX 580s, imagine what you could do with more of them or with specialized chips. And B, this event is what sets NVIDIA on the path from a somewhat struggling PC gaming accessory maker to the leader of the AI wave and the most valuable company in the world today.
- BGBen Gilbert
And this is how AI research tends to work, is there's some breakthrough that gets you this big step change function, and then there's actually a multi-year process of optimizing from there, where you get these kind of diminishing returns curves on breakthroughs, where the first half of the advancement happens all at once, and then the second half takes many years after that to figure out. But it's rare and amazing, and it must be so cool when you have an idea, you do it, and then you realize, "Oh, my God, I just found the next giant leap in the field."
- DRDavid Rosenthal
It's like I unlocked the next level, to use the video game analogy. [laughing]
- BGBen Gilbert
Yes.
- DRDavid Rosenthal
I leveled up. So after AlexNet, the whole computer science world is abuzz.
- BGBen Gilbert
People are starting to stop doubting neural networks at this point.
- DRDavid Rosenthal
Yes. So after AlexNet, the three of them from Toronto, Geoff Hinton, Alex Krizhevsky, and Ilya Sutskever, do the natural thing. They start a company called DNNResearch, Deep Neural Network Research. This company does not have any products. This company has AI researchers.
- BGBen Gilbert
Who just won a big competition.
- 1:00:02 – 1:40:22
The DeepMind Acquisition: AlphaGo and AI's New Frontier
- DRDavid Rosenthal
Google.
- BGBen Gilbert
They bought this thing for... We'll talk about the purchase price, but it's worth, what, five hundred billion dollars today? I mean, this is as good as Instagram or YouTube in terms of greatest acquisitions of all time.
- DRDavid Rosenthal
Hundred percent. So I remember when this deal happened, just like I remember when the Instagram deal happened.
- BGBen Gilbert
'Cause the number was big at the time.
- DRDavid Rosenthal
It was big, but I remember it for a different reason. It was like when Facebook bought Instagram, like: Oh, my God, this is... Wow, what a tectonic shift in the landscape of tech. In January 2014, I remember reading on TechCrunch this random news-
- BGBen Gilbert
Right. You're like, "Deep what?"
- DRDavid Rosenthal
-that Google is spending a lot of money to buy something in London that I've never heard of, [chuckles]
- BGBen Gilbert
[chuckles]
- DRDavid Rosenthal
... that's working on artificial intelligence, question mark?
- BGBen Gilbert
Right. This really illustrates how outside of mainstream tech AI was at the time.
- DRDavid Rosenthal
Yeah, and then you dig in a little further, and you're like, "This company doesn't seem to have any products," and it also doesn't even really say anything on its website about what DeepMind is. It says it is a, quote, unquote, "cutting-edge artificial intelligence company."
- BGBen Gilbert
Wait, did you look this up on the Wayback Machine?
- DRDavid Rosenthal
I did. I did.
- BGBen Gilbert
Oh, nice!
- DRDavid Rosenthal
To build general-purpose learning algorithms for simulations, e-commerce, and games.... This is 2014. This does not compute, does not register.
- BGBen Gilbert
Simulations, e-commerce, and games. It's kind of a random spattering of-
- DRDavid Rosenthal
Exactly. It turns out, though, not only was that description of what DeepMind was fairly accurate, this company and this purchase of it by Google was the butterfly flapping its wings equivalent moment that directly leads to OpenAI, ChatGPT, Anthropic, and basically everything-
- BGBen Gilbert
Certainly Gemini
- DRDavid Rosenthal
- that we know. Yeah, Gemini directly in the world of AI today.
- BGBen Gilbert
And probably xAI, given Elon's involvement.
- DRDavid Rosenthal
Yeah, of course, xAI.
- BGBen Gilbert
In a weird way, it sort of leads to Tesla self-driving, too, with Karpathy.
- DRDavid Rosenthal
Yeah, definitely. Okay, so what is the story here? DeepMind was founded in 2010 by a neuroscience PhD named Demis Hassabis-
- BGBen Gilbert
Who previously started a video game company?
- DRDavid Rosenthal
Oh, yeah, and a post-doc named Shane Legg at University College London, and a third co-founder, who was one of Demis's friends from growing up, Mustafa Suleyman. This was unlikely, [laughing] to say the least.
- BGBen Gilbert
This would go on to produce a knight and Nobel Prize winner.
- DRDavid Rosenthal
Yes. So Demis, the CEO, was a childhood chess prodigy-turned-video game developer, who, when he was age seventeen in 1994, he had gotten accepted to the University of Cambridge, but he was too young, and the university told him, "Hey, take a, you know, gap year. Come back." He decided that he was gonna go work at a video game developer, at a video game studio called Bullfrog Productions for the year, and while he's there, he created the game Theme Park, if you remember that. It was like a Theme Park version of SimCity. This was a big game. This was very commercially successful. Rollercoaster Tycoon would be sort of a clone of this that would have many, many sequels over the years.
- BGBen Gilbert
Oh, I played a ton of that.
- DRDavid Rosenthal
Yeah. It sells fifteen million copies in the mid-nineties.
- BGBen Gilbert
Wow!
- 1:40:22 – 1:51:25
Building TPUs: Google's Hardware Bet on AI
- BGBen Gilbert
Let's go back to Google, 'cause last we sort of checked in on them, yeah, they bought DeepMind, but they had their talent rated, and I don't want you to get the wrong impression about where Google is sitting just because some people left to go to OpenAI. So back in 2013, when Alex Krizhevsky arrives at Google with Geoff Hinton and Ilya Sutskever, he was shocked to discover that all their existing machine learning models were running on CPUs. People had asked in the past for GPUs, since machine learning workloads were well-suited to run in parallel, but Google's infrastructure team had pushed back and said, "The added complexity in expanding and diversifying the fleet, let's keep things simple. That doesn't seem important for us." [chuckles]
- DRDavid Rosenthal
"We're a CPU shop here."
- BGBen Gilbert
Yes, and so to quote from Genius Makers, "In his first days at the company, he went out and bought a GPU machine," this is Alex, "from a local electronics store, stuck it in the closet down the hall from his desk, plugged it into the network, and started training his neural networks on this lone piece of hardware," just like he did in academia, except this time, Google's paying for the electricity. Obviously, one GPU was not sufficient, especially as more Googlers wanted to start using it, too, and Jeff Dean and Alan Eustace had also come to the conclusion that DistBelief, while amazing, had to be rearchitected to run on GPUs and not CPUs. So spring of 2014 rolls around, Jeff Dean and John Giannandra-
- DRDavid Rosenthal
Mm.
- BGBen Gilbert
- who we haven't talked about this episode-
- DRDavid Rosenthal
Yeah, JG.
- BGBen Gilbert
Yes, you might be wondering, "Wait, isn't that the Apple guy?" Yes, he went on to be Apple's head of AI, who, at this point in time, was at Google and oversaw Google Brain, 2014. They sit down to make a plan for how to actually formally put GPUs into the fleet of Google's data centers, which is a big deal. It's a big change. But they're seeing enough reactions to neural networks that they know to do this.
- DRDavid Rosenthal
Yeah, after AlexNet, it's just a matter of time.
- BGBen Gilbert
Yeah. So they settle on a plan to order forty thousand GPUs-
- DRDavid Rosenthal
[laughing]
- BGBen Gilbert
- from NVIDIA-
- DRDavid Rosenthal
Yeah, of course. Who else are you gonna order 'em from?
- BGBen Gilbert
... for a cost of a hundred and thirty million dollars. That's a big enough price tag that the request gets elevated to Larry Page, who personally approves it, even though finance wanted to kill it, because he goes, "Look, the future of Google is deep learning." As an aside, let's look at NVIDIA at the time. This is a giant, giant order. Their total revenue was four billion dollars. This is one order for a hundred and thirty million.
- DRDavid Rosenthal
I mean, NVIDIA's primarily a consumer graphics card company at this point.
- BGBen Gilbert
Yes, and their market cap is ten billion dollars.
- DRDavid Rosenthal
[chuckles]
- BGBen Gilbert
It's almost like Google gave NVIDIA a secret, that, hey, not only does this work in research, like the ImageNet competition, but neural networks are valuable enough to us as a business to make a hundred-plus million-dollar investment in right now, no questions asked. We gotta ask Jensen about this at some point. This had to be a tell.
- DRDavid Rosenthal
Mm-hmm.
- BGBen Gilbert
This had to really give NVIDIA the confidence, "Oh, we should way forward invest on this being a giant thing in the future." So all of Google wakes up to this idea. They start really putting it into their products. Google Photos happened, Gmail starts offering typing suggestions. David, as you pointed out earlier, Google's giant AdWords business started finding more ways to make more money with deep learning. In particular, when they integrated it, they could start predicting what ads people would click in the future, and so Google started spending hundreds of millions more on GPUs on top of that hundred and thirty million, but very quickly paying it back from their ad system. So it became more and more of a no-brainer to just buy as many GPUs as they possibly could.... but once neural nets started to work, anyone using them, especially at Google scale, kind of had this problem. Well, now we need to do giant amounts of matrix multiplications anytime anybody wants to use one. The matrix multiplications are effectively how you do that propagation through the layers of the neural network. So you sort of have this problem.
- DRDavid Rosenthal
Yes, totally. There's the inefficiency of it, but then there's also the business problem of, "Wait a minute, it looks like we're just gonna be shipping hundreds of millions, soon to be billions of dollars over to NVIDIA every year for the foreseeable future." [chuckles]
- BGBen Gilbert
Right. So there's this amazing moment right after Google rolls out speech recognition, their latest use case for neural nets, just on Nexus phones, 'cause, again, they don't have the infrastructure to support it on all Android phones. It becomes a super popular feature, and Jeff Dean does the math and figures out if people use this for, I don't know, call it three minutes a day, and we roll it out to all billion Android phones, we're gonna need twice the number of data centers that we currently have across all of Google just to handle it.
- DRDavid Rosenthal
Just for this feature, yeah.
- BGBen Gilbert
There's a great quote where Jeff goes to Urs Hölzle and goes, "We need another Google." [laughing] Or David, as you were hinting at, the other option is we build a new type of chip customized for just our particular use case.
- DRDavid Rosenthal
Yep. Matrix multiplication, tensor multiplication, a tensor processing unit, you might say.
- BGBen Gilbert
Ah, yes, wouldn't that be nice? So conveniently, Jonathan Ross, who's an engineer at Google, has been spending his twenty percent time at this point in history working on an effort involving FPGAs. These are essentially expensive but programmable chips that yield really fantastic results. So they decide to create a formal project to take that work, combine it with some other existing work, and build a custom ASIC, or an application-specific integrated circuit. So enter, David, as you said, the tensor processing unit made just for neural networks that is far more efficient from GPUs at the time, with the trade-off that you can't really use it for anything else. It's not good for graphics processing, it's not good for lots of other GPU workloads, just matrix multiplication and just neural networks. But it would enable Google to scale their data centers without having to double their entire footprint. So the big idea behind the TPU, if you're trying to figure out, like, what was the core insight, they use reduced computational precision. So it would take numbers like four thousand five hundred and eighty-six point eight two seven two and round it just to four thousand five hundred and eighty-six point eight, or maybe even just four thousand five hundred and eighty-six with nothing after the decimal point. And this sounds kinda counterintuitive at first. Why would you want less precise rounded numbers for this complicated math? The answer is efficiency. If you can do the heavy lifting in your software architecture, or what's called quantization, to account for it, you can store information as less precise numbers. Then you can use the same amount of power and the same amount of memory and the same amount of transistors on a chip to do far more calculations per second. So you can either spit out answers faster or use bigger models. The whole thing is quite clever behind the TPU.
- DRDavid Rosenthal
Hmm.
- BGBen Gilbert
The other thing that has to happen with the TPU is it needs to happen now, 'cause it's very clear speech-to-text is a thing, it's very clear some of these other use cases at Google-
- DRDavid Rosenthal
Yeah, demand for all of this stuff that's coming out of Google Brain is through the roof immediately.
- BGBen Gilbert
Right, and we're not even to LLMs yet. It's just like everyone sort of expects some of this, whether it's computer vision in photos or speech recognition, like, it's just becoming a thing that we expect, and it's gonna flip Google's economics upside down if they don't have it. So the TPU was designed, verified, built, and deployed into data centers in fifteen months.
- DRDavid Rosenthal
Wow!
- 1:51:25 – 2:04:01
"Attention Is All You Need": The Transformer's Impact
- DRDavid Rosenthal
so many areas. And then in twenty seventeen, a paper gets published from eight researchers on the Google Brain team.
- BGBen Gilbert
Kinda quietly.
- DRDavid Rosenthal
These eight folks were obviously very excited about the paper and what it described and the implications of it, and they thought it would be very big. Google itself, "Uh, cool, this is, like, the next iteration of our language model work. Great!"
- BGBen Gilbert
Which is important to us, but are we sure this is the next Google? No.
- DRDavid Rosenthal
No. There are a whole bunch of other things we're working on that seem more likely to be the next Google. But this paper and its publication would actually be what gave OpenAI the opportunity-
- BGBen Gilbert
To build the next Google.
- DRDavid Rosenthal
- to grab the ball [chuckles] and run with it and build the next Google, because this is the Transformer paper.
- BGBen Gilbert
Okay, so where did the Transformer come from? Like, what was the latest thing that language models had been doing at Google?
- DRDavid Rosenthal
So coming out of the success of Franz Och's work on Google Translate and the improvements that happened there-
- BGBen Gilbert
In, like, the late two thousands-ish, two thousand and seven?
- DRDavid Rosenthal
Yeah, mid to late two thousands. They keep iterating on Translate, and then once Geoff Hinton comes on board and AlexNet happens, they switch over to a neural network-based language model for Translate.
- BGBen Gilbert
Which was dramatically better and, like, a big, crazy cultural thing, 'cause you've got these researchers parachuting in, again, led by Jeff Dean, saying, "I'm pretty sure our neural networks can do this way better than the classic methods that we've been using for the last ten years. What if we take the next several months and do a proof of concept?" They end up throwing away the entire old code base and just completely wholesale switching to this neural net. There's actually this great New York Times Magazine story that ran in twenty sixteen about it, and I remember reading the whole thing with my jaw on the floor, like, "Wow, neural networks are a big effing deal." And this was the year before the Transformer paper would come out.
- DRDavid Rosenthal
Before the Transformer paper, yes. So they do the rewrite of Google Translate, make it based on recurrent neural networks, which were state-of-the-art at that point in time, and it's a big improvement. But as teams within Google Brain and Google Translate keep working on it, there's some limitations, and in particular, a big problem was that they, quote-unquote, "forgot things too quickly." I don't know if it's exactly the right analogy, but you might say in sort of like today's Transformer world speak, you might say that their context window was pretty short.
- BGBen Gilbert
As these language models progressed through text, they needed to sort of remember everything they had read so that when they need to change a word later or come up with the next word, they could have a whole memory of the body of text to do that.
- DRDavid Rosenthal
So one of the ways that Google tries to improve this is to use something called long short-term memory networks, or LSTMs is the acronym that people use for this. And basically, what LSTMs do is they create a persistent or long short-term memory- [laughing]
- BGBen Gilbert
[chuckles]
- DRDavid Rosenthal
... You gotta use your brain a little bit here, for the model, so that it can keep context as it's going through a whole bunch of steps.
- BGBen Gilbert
And people were pretty excited about LSTMs at first.
- DRDavid Rosenthal
People are thinking like, "Oh, LSTMs are what are gonna take language models and large language models mainstream."
- BGBen Gilbert
Right.
- DRDavid Rosenthal
And indeed, in twenty sixteen, they incorporated into Google Translate these LSTMs. It reduces the error rate by sixty percent. Huge jump.
- BGBen Gilbert
Yep.
- DRDavid Rosenthal
The problem with LSTMs, though, they were effective, but they were very computationally intensive, and they didn't parallelize that great. All the efforts that are coming out of AlexNet and then the TPU project of parallelization, this is the future. This is how we're gonna make AI really work. LSTMs are a bit of a roadblock here.
- BGBen Gilbert
Yes.
- DRDavid Rosenthal
So a team within Google Brain starts searching for a better architecture that also has the attractive properties of LSTMs, that it doesn't forget context too quickly, but can parallelize and scale better.
- BGBen Gilbert
To take advantage of all these new architectures.
- DRDavid Rosenthal
Yes, and a researcher named Jakob Uszkoreit had been toying around with the idea of broadening the scope of, quote-unquote, "attention" in language processing.... what if, rather than focusing on the immediate words, instead, what if you told the model, "Hey, pay attention to the entire corpus of text, not just the next few words. Look at the whole thing, and then based on that entire context and giving your attention to the entire context, give me a prediction of what the next translated word should be"? Now, by the way, this is actually how professional human translators translate text. You don't just go word by word. I actually took a translation class in college, which was really fun. You read the whole thing of the original in the original language, you get and understand the context of what the original work is, and then you go back, and you start to translate it with the entire context of the passage in mind.
- BGBen Gilbert
Hmm.
- DRDavid Rosenthal
So it would take a lot of computing power for the model to do this, but it is extremely parallelizable. So Jakob starts collaborating with a few other people on the Brain team. They get excited about this. They decide that they're gonna call this new technique the Transformer, because, one, that is literally what it's doing, it's taking in a whole chunk of information, processing, understanding it, and then transforming it, and, B, they also love Transformers as kids. [laughing] That's not not why they named it the Transformer. [laughing]
- BGBen Gilbert
And it's taking in the giant corpus of text and storing it in a compressed format, right?
- 2:04:01 – 2:15:43
Elon's Exit, Microsoft's Entry: OpenAI's New Path
- DRDavid Rosenthal
Right after Google publishes the Transformer paper, in September of twenty seventeen, Elon gets really, really fed up with what's going on at OpenAI.
- BGBen Gilbert
There's, like, seven different strategies. Are we doing video games? Are we doing competitions? What's the plan?
- DRDavid Rosenthal
What is happening here, as best as I can tell, all you're doing is just trying to copy DeepMind. Meanwhile, I'm here building SpaceX and Tesla. Self-driving is becoming more and more clear as critical to the future of Tesla. I need AI researchers here, and I need great AI advancements to come out to help what we're doing at Tesla. OpenAI isn't cutting it, so he makes an ultimatum to Sam and the rest of the OpenAI board. He says, "I'm happy to take full control of OpenAI, and we can merge this into Tesla." [chuckles] I don't even know how that would be possible, to merge a nonprofit into Tesla.
- BGBen Gilbert
But in Elon land, if he takes over as CEO of OpenAI, it almost doesn't matter. We're just treating it as if it's the same company anyway, just like we do with the deals with all of my companies.
- DRDavid Rosenthal
Right. Or he's out completely, along with all of his funding, and Sam and the rest of the board are like, "No." [chuckles]
- BGBen Gilbert
And as we know now, they're sort of culling capital into the business. It's not like they actually got all the cash up front.
- DRDavid Rosenthal
Right. So they're only a hundred and thirty million-ish into the billion dollars of commitment. They don't reach a resolution, and by early twenty eighteen, Elon is out, along with him, the main source of OpenAI's funding. So either this is just a really, really, really bad misjudgment by Elon, or the sort of panic that this throws OpenAI into is the catalyst that makes them reach for the Transformer [chuckles] and say, "All right, we got to figure things out. Necessity is the mother of invention. Let's go for it."
- BGBen Gilbert
It's true. I don't know if during this personal tension between Elon and Sam, if they had already decided to go all in on Transformers or not. Because the thing you very quickly get to if you decide, Transformers, language models, we're going all in on that, you do quickly realize you need a bunch of data, you need a bunch of compute, you need a bunch of energy, and you need a bunch of capital. And so if your biggest backer is walking away, the three-D chess move is, "Oh, we got to keep him because we're about to pivot the company, and we need his capital for this big pivot we're doing." The four-D chess is, "If he walks away, maybe I can turn it into a for-profit company and then raise money into it and eventually generate enough profits to fund this extremely expensive new direction we're going in." I don't know which of those it was.
- DRDavid Rosenthal
Yeah, I don't know either. I suspect the truth is it's some of both.
- BGBen Gilbert
Yes, but either way, how nuts is it that, A, these things happened at the same time, and B, the company wasn't burning that much cash, and then they decided to go all in on, "We need to do something so expensive that we need to be a for-profit company in order to actually achieve this mission, 'cause it's just gonna require hundreds of billions of dollars for the far foreseeable future"?
- DRDavid Rosenthal
Yep. So in June of twenty eighteen, OpenAI releases a paper describing how they have taken the Transformer and developed a new approach of pre-training them on very large amounts of general text on the Internet and then fine-tuning that general pre-training to specific use cases. And they also announced that they have trained and run the first proof-of-concept model of this approach, which they are calling GPT-1, Generatively Pre-trained Transformer version one.
- BGBen Gilbert
Which we should say is right around the same time as BERT and right around the same time as another large language model based on the Transformer out of, here in Seattle, the Allen Institute.
- DRDavid Rosenthal
Yes, indeed.
- BGBen Gilbert
... So it's not as if this is heretical and a secret. Other AI labs, including Google's own, is doing it, but from the very beginning, OpenAI seemed to be taking this more seriously, given the cost of it would require betting the company if they continued down this path.
- DRDavid Rosenthal
Yep, or betting the nonprofit, [chuckles] betting the entity.
- BGBen Gilbert
Yes.
- DRDavid Rosenthal
We're gonna need some new terminology here.
- BGBen Gilbert
Yes.
- DRDavid Rosenthal
So Elon's just walked out the door. Where are they gonna get the money for this? Sam turns to one of the other board members of OpenAI, Reid Hoffman. Reid, just a year or so earlier, had sold LinkedIn to Microsoft, and Reid is now on the board of Microsoft. So Reid says, "Hey, why don't you come talk to Satya about this?"
- BGBen Gilbert
Do you know where he actually talks to Satya?
- DRDavid Rosenthal
Oh, I do. Oh, I do. [laughing]
- BGBen Gilbert
[laughing]
- DRDavid Rosenthal
In July of twenty eighteen, they set a meeting for Sam Altman and Satya Nadella to sit down while they're both at the Allen & Company Sun Valley- [chuckles]
- BGBen Gilbert
[chuckles]
- DRDavid Rosenthal
... Conference in Sun Valley, Idaho.
- BGBen Gilbert
It's perfect.
- DRDavid Rosenthal
And while they're there, they hash out a deal for Microsoft to invest one billion dollars into OpenAI in a combination of both cash and Azure cloud credits, and in return, Microsoft will get access to OpenAI's technology, get an exclusive license to OpenAI's technology for use in Microsoft's products. And the way that they will do this is OpenAI, the nonprofit, will create a captive for-profit entity called OpenAI LP, controlled by the nonprofit OpenAI Inc., and Microsoft will invest into the captive for-profit entity. Reid Hoffman joins the board of this new structure, along with Sam, Ilya, Greg Brockman, Adam D'Angelo, and Tasha McCauley. And thus, the modern OpenAI for-profit/nonprofit, question mark, is created.
- BGBen Gilbert
The thing that's still being figured out, even today, here in twenty twenty-five, is created. This is like the complete history of AI. This is not just the Google AI episode.
- DRDavid Rosenthal
Well, these things are totally inextricable, and I was just gonna say, this is the Google Part Three episode. Microsoft, they're back! Microsoft is Google's mortal enemy.
- BGBen Gilbert
Yes.
- 2:15:43 – 2:37:32
The Chat GPT Shockwave: Google's Code Red Response
- DRDavid Rosenthal
Yes, GPT-2.
- BGBen Gilbert
This was the first time I heard about it. Data scientists around Seattle were talking about this cool-
- DRDavid Rosenthal
Right. So after the first Microsoft partnership, the first billion-dollar investment, in 2019, OpenAI releases GPT-2, which is still early, but very promising, that can do a lot of things.
- BGBen Gilbert
A lot of things, but it required an enormous amount of creativity on your part. You kind of had to be a developer to use it, and if you were a consumer, there was a very heavy load put on you. You had to go write a few paragraphs and then paste those few paragraphs into the language model, and then it would suggest a way to finish what you were writing based on the source paragraphs, but it wasn't interactive.
- DRDavid Rosenthal
Yes, it was not a chat interface. [chuckles]
- BGBen Gilbert
Yes.
- DRDavid Rosenthal
There was no interface, essentially, for it.
- BGBen Gilbert
It was an API.
- DRDavid Rosenthal
But it can do things like, obviously, translate text. I mean, Google's been doing that for a long time, but GPT-2, you could do stuff like make up a fake news headline and give it to GPT-2, and it would write a whole article. [chuckles]
- BGBen Gilbert
[chuckles]
- DRDavid Rosenthal
You would read it, and you'd be like, "Uh, sounds like it was written by a bot."
- BGBen Gilbert
Yeah.
- DRDavid Rosenthal
But again, there was no front door to it for normal people. [chuckles] You had to really be willing to wade in the muck to use this thing. So then, the next year, in June of 2020, GPT-3 comes out. Still no front door, you know, user interface to the model, but it's very good. GPT-2 showed the promise of what was possible. GPT-3, it's starting to be in the conversation of, "Can this thing pass the Turing test?"
- BGBen Gilbert
Oh, yeah.
- DRDavid Rosenthal
You have a hard time distinguishing between articles that GPT wrote and articles that humans wrote. It's very good, and there starts to be a lot of hype around this thing.
- BGBen Gilbert
And so even though consumers aren't really using it, the broader awareness is that there's something interesting on the horizon. I think the number of AI pitch decks that VCs are seeing is starting to tick up around this time.
- DRDavid Rosenthal
As is the NVIDIA stock price.
- BGBen Gilbert
Yes.
- DRDavid Rosenthal
So then in the next year, in the summer of 2021, Microsoft releases GitHub Copilot using GPT-3. This is the first, not just Microsoft product that comes out with GPT baked into it, but first-
- BGBen Gilbert
Productization
- DRDavid Rosenthal
... product anywhere. [chuckles] Yeah, first productization of GPT.
- BGBen Gilbert
Yes, of any OpenAI technology.
- DRDavid Rosenthal
Yeah. It's big. This starts a massive change in how software gets written in the world.
- BGBen Gilbert
Slowly, then all at once. It's one of these things where at first just a few software engineers, and there was a lot of whispers of, "How cool is this? It makes me a little bit more efficient." And now you get all these comments like, "Seventy-five percent of all companies' code is written with AI."
- DRDavid Rosenthal
Yep. So after that, Microsoft invests another two billion dollars in OpenAI, which seemed like a lot of money at the time. So that takes us to the end of 2021. There's an interesting kind of context shift that happens around here.
- BGBen Gilbert
Yeah, the bottom falls out on tech stocks. Crypto, the broader markets, really, everyone suddenly goes from risk on to risk off, and part of it was war in Ukraine, but a lot of it was interest rates going up, and Google gets hit really hard. The high-water mark was November 19th of 2021. Google was right at two trillion dollars of market cap. About a year after that slide began, they were worth a trillion dollars, nearly a fifty percent drawdown.
- DRDavid Rosenthal
Wow! So towards the end of 2022, leading up to the launch of ChatGPT-
- BGBen Gilbert
People, I think, are starting to realize Google's slow. They're slow to react to things. It feels like they're a old, crusty company. Are they like the Microsoft of the 2000s, where they haven't had a breakthrough product in a while? People are not bright on the future of Google, and then ChatGPT comes out.
- DRDavid Rosenthal
Yeah. Wow. Which means if you were bullish on Google back then- [chuckles] ... and contrarian, you could have invested at a trillion-dollar market cap.
- BGBen Gilbert
Which is interesting, like in October of '21, the market was saying that the forthcoming AI wave will not be a strength for Google. Or maybe what it was saying is, "We don't even know anything about a forthcoming AI wave, 'cause people are talking about AI, but they've been talking about VR, and they've been talking about crypto, and they've been talking about all this frontier tech, and, like, that's not the future at all. This company just feels slow and unadaptive." And slow and unadaptive at that point in history, I think, would've been a fair characterization. They had an internal chatbot, right?
- 2:37:32 – 2:46:58
Unifying AI: Gemini and the Google DeepMind Merger
- DRDavid Rosenthal
better. [laughing] Code Red goes out December 2022.
- BGBen Gilbert
Bard, baby. Launch Bard!
- DRDavid Rosenthal
Oh, boy. Well, even before that, January '23, when OpenAI hits a hundred million registered users for ChatGPT, Microsoft announces they are investing another ten billion dollars in OpenAI and says that they now own forty-nine percent of the for-profit entity. Incredible in and of itself, but then now think about this from the Google lens of Microsoft, our enemy, they now arguably own... Obviously, in retrospect here, they don't own OpenAI, but it seems at the time, like, "Oh, my God, Microsoft might now own OpenAI, which is our first true existential threat in our history as a company." [chuckles]
- BGBen Gilbert
Not great, Bob.
- DRDavid Rosenthal
So then February 2023, the Bing integration launches. Satya has the quote about wanting to make Google dance. Meanwhile, Google is scrambling internally to launch AI products as fast as possible. So the first thing they do is they take the LaMDA model and the chatbot interface to it. They rebrand it as Bard.
- BGBen Gilbert
They ship that publicly.
- DRDavid Rosenthal
And they release it immediately. February 2023, ship it publicly. [chuckles] Available GA to anyone.
- BGBen Gilbert
Which maybe was the right move, but, God, it was a bad product.
- DRDavid Rosenthal
It was really bad.
- BGBen Gilbert
I didn't know the term at the time, RLHF, but it was clear it was missing a component of-... some magic that ChatGPT had, this reinforcement learning with human feedback, where you could really tune the appropriateness, the tone, the voice, the sort of correctness of the responses. It just wasn't there.
- DRDavid Rosenthal
Yep. So to make matters worse, in the launch video for Bard, a video- this is a choreographed, pre-recorded video where they're showing conversations with Bard. Bard gives an inaccurate factual response [chuckles] to one of the queries that they include in the video.
- BGBen Gilbert
This is one of the worst keynotes in history.
- DRDavid Rosenthal
After the Bard launch and this keynote, Google's stock drops eight percent on that day, and then, like we were saying, once the actual product comes out, it becomes clear it's just not good.
- BGBen Gilbert
Yep.
- DRDavid Rosenthal
And it pretty quickly becomes clear it's not just that the chatbot isn't good, it's the model isn't good. So in May, they replace LaMDA with a new model from the Brain team called PaLM. It's a little bit better, but it's still clearly behind not only GPT-3.5, but in March of twenty twenty-three, OpenAI comes out with GPT-4, which is even better.
- BGBen Gilbert
Oof!
- DRDavid Rosenthal
You can access that now through ChatGPT. And here is where Sundar makes two really, really big decisions. Number one, he says, "We cannot have two AI teams within Google anymore. We're merging Brain and DeepMind into one entity called Google DeepMind."
- BGBen Gilbert
Which is a giant deal. This is in full violation of the original deal terms of bringing DeepMind in.
- DRDavid Rosenthal
Yep, and the way he makes it work is he says, "Demis, you are now CEO- [laughing]
- BGBen Gilbert
[laughing]
- DRDavid Rosenthal
-of the AI division of Google, Google DeepMind. This is all hands on deck, and you and DeepMind are gonna lead the charge. You're gonna integrate with Google Brain, and we need to change all of the past ten years of culture around building and shipping AI products within Google."
- BGBen Gilbert
To further illustrate this, when Alphabet became Alphabet, they had all these separate companies, but things that were really core to Google, like YouTube, actually stayed a part of Google. DeepMind was its own company. That's how separate this was. They're working on their own models. In fact, those models are predicated on reinforcement learning. That was the big thing that DeepMind had been working on the whole time, and so reading in between the lines, it's Sundar looking at his two AI labs and going, "Look, I know you two don't actually get along that well, but look, I don't care that you had different charters before. I am taking the responsibility of Google Brain and giving it to DeepMind, and DeepMind is absorbing the Google Brain team." I think that's what you should sort of read into it, because as you look at where the models went from here, they kinda came from DeepMind.
- DRDavid Rosenthal
Yep. There's a little bit of interesting backstory to this, too. So Mustafa Suleyman, the third co-founder of DeepMind, at some point before this-
- BGBen Gilbert
He became, like, the head of Google AI policy or something.
- DRDavid Rosenthal
He had already shifted over to Brain and to Google.
- BGBen Gilbert
Hmm.
- DRDavid Rosenthal
He stayed there for a little while, and then he ended up getting close with, who else? Reid Hoffman. [laughing] Remember, Reid is on the ethics board for DeepMind, and Mustafa and Reid leave and go found Inflection AI, which fast-forward now into twenty twenty-four, after the absolute insanity that goes down at OpenAI in Thanksgiving twenty twenty-three, when Sam Altman gets fired over the weekend during [chuckles] Thanksgiving, and then brought back by Monday when all the team threatened to quit and go to Microsoft.
- BGBen Gilbert
OpenAI loves Thanksgiving. Can't wait for this year.
- DRDavid Rosenthal
[chuckles] They love Thanksgiving! Yeah, gosh. After all that, which certainly strains the Microsoft relationship... Remember, again, Reid is on the board at Microsoft. Microsoft does one of these hackquision-type deals with Inflection AI and brings Mustafa in as the head of AI for Microsoft. [laughing]
- BGBen Gilbert
Crazy.
- 2:46:58 – 3:03:44
From DARPA Challenge to Robotaxis: Waymo's Long Road
- DRDavid Rosenthal
tell us the Waymo story.
- BGBen Gilbert
Awesome. So we got to rewind back all the way to two thousand and four, the DARPA Grand Challenge, which was created as a way to spur research into autonomous ground robots for military use. And actually, what it did for our purposes here today is create the seed talent for the entire self-driving car revolution twenty years later. So the competition itself is really cool. There is a one hundred and thirty-two-mile race course. Now, mind you, this is two thousand and four. In the Mojave Desert that the cars have to race on, it is a dirt road. No humans are allowed to be in or interact with the cars. They are monitored a hundred percent remotely, and the winner gets one million dollars.
- DRDavid Rosenthal
One million dollars! [chuckles]
- BGBen Gilbert
Which was a break from policy. Normally, these are grants, not prize money, so this needs to be authorized by an act of Congress. The one million dollars eventually felt comical, so the second year, they raised the pot to two million dollars. It's crazy thinking about what these researchers are worth today, that that was the prize for the whole thing. So the first year in two thousand and four went fine. There were some amazing tech demonstrations on these really tight budgets, but ultimately, zero of the one hundred registered teams finished the race. But the next year, in two thousand and five, was the real special year. The progress that the entire industry made in those first twelve months from what they learned is totally insane. Of the twenty-three finalists that were entering the competition, twenty-two of them made it past the spot where the furthest team the year before had made it. The amount that the field advanced in that one year is insane. Not only that, five of those teams actually finished all hundred and thirty-two miles. Two of them were from Carnegie Mellon, and one was from Stanford, led by a name that all of you will now recognize, Sebastian Thrun.
- DRDavid Rosenthal
Mm-hmm. Indeed.
- BGBen Gilbert
This is Sebastian's origin story before Google. Now, as we said, Sebastian was kind enough to help us with prep for this episode, but I actually learned most of this from watching a twenty-year-old Nova documentary that is available on Amazon Prime Video. Thanks to Brett Taylor for giving us the tip on where to find this documentary.
- DRDavid Rosenthal
Yes, the hot research tip. [chuckles]
- BGBen Gilbert
[chuckles] So what was special about this Stanford team? Well, one, there is a huge problem with noisy data that comes out of all of these sensors. You know, it's in a car in the desert, getting rocked around. It's in the heat. It's in the sun. So common wisdom, and what Carnegie Mellon did, was to do as much as you possibly can on the hardware to mitigate that. So things like custom rigging and gimbals and giant springs to stabilize the sensors. Carnegie Mellon would essentially buy a Hummer and rip it apart and rebuild it from the wheels up. We're talking, like, welding and real construction on a car. The Stanford team did the exact opposite. They viewed any new piece of hardware as something that could fail, and so in order to mitigate risks on race day, they used all commodity cameras and sensors that they just mounted on a nearly unmodified Volkswagen. So they only innovated in software, and they figured they would just kind of come up with clever algorithms to help them clean up the messy data later. Very Googly, right?
- DRDavid Rosenthal
Very Googly.
- BGBen Gilbert
The second thing they did was an early use of machine learning to combine multiple sensors. They mounted laser hardware on the roof, just like what other teams were doing, and this is the way that you can measure texture and depth of what is right in front of you. And the data, it's super precise, but you can't drive very fast because you don't really know much about what's far away, since it's this fixed field of view. It's very narrow. Essentially, you can't answer that question of, "How fast can I drive?" or, "Is there a turn coming up?" So on top of that, the way they solved it was they also mounted a regular video camera. That camera can see a pretty wide field of view, just like the human eye, and it can see all the way to the horizon, just like the human eye, and crucially, it could see color. So what it would do... This is, like, really clever. They would use a machine learning algorithm in real time, in two thousand and five. This computer is, like, sitting in the middle of the car.... They would overlay the data from the lasers on top onto the camera feed, and from the lasers, you would know if the area right in front of the car was okay to drive or not. Then the algorithm would look up in the frames coming off the camera, overlaid what color that safe area was, and then extrapolate by looking further ahead at other parts of the video frame to see where that safe area extended to.
- DRDavid Rosenthal
Mm.
- BGBen Gilbert
So you could figure out your safe path through the desert.
- DRDavid Rosenthal
That's awesome.
- BGBen Gilbert
It's so awesome.
- DRDavid Rosenthal
I'm imagining, like, a Dell PC sitting in the middle of this [chuckles] car in two thousand and five. [laughing]
- BGBen Gilbert
It's not far off. In the email that we send out, we'll share some photos of it. It could then drive faster with more confidence, and it knew when turns were coming up. Again, this is real time, onboard the camera, two thousand and five is wild on that tech. So ultimately, both of these bets worked, and the Stanford team won in super dramatic fashion. They actually passed one of the Carnegie Mellon teams autonomously through the desert. It's like this big, dramatic moment in the documentary. So you would kind of think-- So then Sebastian goes to Google and builds Waymo. No! As we talked about earlier, he does join Google through that crazy, "Please don't raise money from Benchmark and Sequoia, and we'll just hire you instead." But he goes and works on Street View and Project Ground Truth and co-founds Google X. David, as you were alluding to earlier, this Project Chauffeur, that would become Waymo, is the first project inside Google X.
- DRDavid Rosenthal
And I think the story, right, is that Larry came to Sebastian and was like-
- BGBen Gilbert
Yes.
- DRDavid Rosenthal
"Yo, that self-driving car stuff, like, do it." [chuckles]
- BGBen Gilbert
[chuckles]
- DRDavid Rosenthal
And Sebastian was like: "No, come on, that was a DARPA challenge." And Larry was like: "No, no, you should do it."
- BGBen Gilbert
He's like: "No, no, no, that won't be safe. There's people running around cities. I'm not just gonna put multi-ton killer robots on roads and go and potentially harm people." And Larry finally comes to him and says: "Why? What is the technical reason that this is impossible?" And Sebastian goes home, has a sleep on it, and he comes in the next morning, and he goes, "I realize what it was. I'm just afraid."
- DRDavid Rosenthal
[laughing] Such a good moment.
- BGBen Gilbert
So they start. He's like: "There's not a technical reason. As long as we can take all the right precautions and hold a very high bar on safety, let's get to work." So Larry then goes, "Great, I'll give you a benchmark, so that way, you know if you're succeeding." He comes up with these ten stretches of road in California that he thinks will be very difficult to drive. It's about a thousand miles, and the team starts calling it the Larry One Thousand, and it includes driving to Tahoe, Lombard Street in San Francisco, Highway One to Los Angeles, the Bay Bridge. This is the bogey.
- DRDavid Rosenthal
Yep. If you can autonomously drive these stretches of road, pretty good indication that you can probably do anything.
- BGBen Gilbert
Yep. So they start the project in two thousand and nine. Within eighteen months, this tiny team, I think they hired, I don't know, it's like a dozen people or something, they've driven thousands of miles autonomously, and they managed to succeed in the full Larry One Thousand within eighteen months.
- DRDavid Rosenthal
Totally unreal how fast they did it, and then also totally unreal how long-
- BGBen Gilbert
How slow [chuckles]
- DRDavid Rosenthal
... it takes after that to productize and create the Waymo that we know today.
- BGBen Gilbert
Right. It's like the first ninety-nine percent and then the second ninety-nine percent that takes ten years. [chuckles]
Episode duration: 4:06:37
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode lCEB7xHer5U
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome