Skip to content
The Twenty Minute VCThe Twenty Minute VC

Alex Wang: Why Data Not Compute is the Bottleneck to Foundation Model Performance | E1164

Alexandr Wang is the Founder and CEO @ Scale.ai, the company that allows you to make the best models with the best data. To date, Alex has raised $1.6BN for the company with a last reported valuation of $14BN earlier this year. Scale tripled their ARR in 2023 and is expected to hit $1.4BN in ARR by the end of 2024. Their investors include Accel, Index, Thrive, Founders Fund, Meta and Nvidia to name a few. ----------------------------------------------- Timestamps: (00:00) Intro (01:05) Diminishing Returns in AI Compute (09:08) Solving Reasoning to Overcome Limits (10:56) From Data Scarcity to Abundance (14:37) Challenges in Structuring Massive Enterprise Data (18:59) Fair Access to Proprietary Data for Models (22:02) Model Commoditization (26:51) Value Extraction Challenges in AI Commoditization (32:55) Navigating Data Regulatory Challenges for Innovation (36:53) A Military Asset in Global Conflict: China & Russia (42:49) The Future Landscape of Foundation Models (44:52) About Founder Brand & PR & Media (52:11) Hiring (01:00:41) Quick-Fire Round ----------------------------------------------- In Today’s Show with Alex Wang We Discuss: 1. Foundation Models: Diminishing Returns: What are the three core pillars that can meaningfully improve foundation models performance? Why is data the single largest bottleneck to the performance of models today? What data do we need to capture that we do not currently, that will have the biggest impact on model performance moving forward? Will we see the largest companies in the world revert back to on-prem with the increasing security challenges of migrating all customer data to foundation models? 2. AI: A Military Asset in Global Conflict: China + Russia Why does Alex believe that AI has the potential to be an even more powerful military asset than nuclear weapons? If this is the case, should we have open systems? Do we not have to have closed systems? Why does Alex believe that the CCP’s approach to industrial policy is better than anyone else’s? How does Alex evaluate the rise of Chinese EV car manufacturers in the last few years? Does Alex really believe that China is two years behind the US in the AI race? 3. “I Get Fairer Treatment in Congress than in the Press”: Why does Alex believe that the best PR is no PR? Why does Alex believe that he got fairer treatment in congress than he does in the media? Why does Alex believe that all founders should look to own their own distribution channels today? 4. Alex Wang: AMA: What are some of Alex’s biggest lessons from Patrick Collison on the impact that a hot company brand has on the ability for that company to hire the best? Does Alex think Trump is going to win? What would be the impact if he were to? Why does Alex believe that enterprise software will be changed forever in the next few years? What question is Alex never asked that he thinks he should be asked? ----------------------------------------------- Subscribe on Spotify: https://open.spotify.com/show/3j2KMcZTtgTNBKwtZBMHvl?si=85bc9196860e4466 Subscribe on Apple Podcasts: https://podcasts.apple.com/us/podcast/the-twenty-minute-vc-20vc-venture-capital-startup/id958230465 Follow Harry Stebbings on Twitter: https://twitter.com/HarryStebbings Follow Alexandr Wang on Twitter: https://twitter.com/alexandr_wang Follow 20VC on Instagram: https://www.instagram.com/20vchq Follow 20VC on TikTok: https://www.tiktok.com/@20vc_tok Visit our Website: https://www.20vc.com Subscribe to our Newsletter: https://www.thetwentyminutevc.com/contact ----------------------------------------------- #20vc #harrystebbings #alexandrwang #scaleai #openai #venturecapital #founder #chatgpt #ai #foundationmodels #china #military

Alexandr (Alex) WangguestHarry Stebbingshost
Jun 12, 20241h 6mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:001:05

    Intro

    1. AW

      At its core, this AI technology has the potential to be one of the greatest military assets that humanity has ever seen. Potentially even more of a military asset than nukes. Let's say China or Russia had AGI today and the United States didn't, I would imagine they would use that to conquer. The CCP system is incredibly good at taking very aggressive centralized action and centralized industrial policy to drive forward critical industries. They have a clear shot at racing forward.

    2. HS

      Ready to go? Alex, I am thrilled that we could do this in person. Thank you so much for joining me today.

    3. AW

      Yeah, great to be here.

    4. HS

      Now, listen, it's funny. I told you, I tweeted before like, we should skip the founding stories because there are many, many great times you've told it before. But I want to dive straight in, and I want to ask you the question of when we look at model performance today, let's just start high

  2. 1:059:08

    Diminishing Returns in AI Compute

    1. HS

      level, do you think we're seeing a case of diminishing returns where more compute doesn't lead to better performance?

    2. AW

      Yeah, I think it's pretty fascinating. I mean, I think there's been, um, this especially coming up now where GPT-4, uh, you know, OpenAI has had GPT-4 since fall of 2022. Uh, and since that timeframe, we haven't yet seen a new base model or new, a new model that's, you know, um, jaw droppingly better than GPT-4. You know, we haven't seen the GPT-4.5 or the GPT-5 or, you know, the other labs haven't yet come out with models that are, you know, leagues and leagues better than GPT-4, despite just like way, way more compute expenditure. You know, since, since when ChatGPT came out, you know, you can look at the graph of NVIDIA's revenue, it just inflects. It's just like, it just goes straight up after GPT-4 came out, and it goes from, you know, I think the NVIDIA's data center revenue, they were doing, uh, roughly about five billion a quarter, and then it shoots up to now it's, you know, north of $20 billion a quarter. So there's been, you know, tens of billions going to a hundred- more than a hundred billion of spend on, uh, high-end NVIDIA GPUs all in the same timeframe. We haven't yet seen the big breakthrough since GPT-4, which actually that model was, came out before this huge inflection in NVIDIA, um, expenditure. So, uh, I think overall it's very, it's this interesting thing where we're seeing investment into compute go up dramatically, go up exponentially right now. But we're still, I think as a community, as an industry, kind of waiting for the next great model.

    3. HS

      Do you think we've reached this kind of asymptote of performance where actually we'll see this kind of plateauing in performance while we wait for that? And do we think that's like a monthly thing or do we think that's kind of like self-driving? Do you remember with self-driving we saw kind of the plateau in performance actually for kind of several years, and actually it was only recently where we see that inflect again.

    4. AW

      It's kind of this interesting thing. So there's, there's three ingredients that go into, um, these AI models, or three pillars. So there's compute, of course, there's data, and there's the algorithms. And the history of AI is that, uh, progress comes from sort of, you know, uh, all three of these pillars sort of being built altogether. So, you know, you need, you certainly need a lot of computational capability, but you need the algorithmic advances, algorithmic advances like the transformer originally or RLHF or, you know, whatever future algorithmic advances come. Um, and then you need, you need the data pillar to support it as well. And I think a lot of the plateau that we've recently seen can almost be explained at a very high level from, um, hitting kind of a data wall. So GPT-4 was, was a model basically trained on nearly all of the internet, uh, and using a huge amount of computational capabilities. And a lot of the, a lot of, I think what the industry has been doing over the past few years is scaling the computation dramatically, um, but not necessarily by building up the other two pillars, um, in tandem. So there needs to be, I think, a combination of mo- more algorithmic improvement, but in particular we need to, we need to ensure that there's more data to support it.

    5. HS

      When you say data wall, what is the data wall and what can we do to enable our overcoming of it?

    6. AW

      Yeah. So at a, at a super, um, super high level, um, I think we've, we've used up all the easy data. We've used up all of the internet data. Um, you know, we, the, uh, the common crawl and, and, uh, newer versions of the common crawl.

    7. HS

      Just so we understand, easy data is stuff on social media, anything not behind paywalls, anything that's e- easy and free to crawl.

    8. AW

      Anything that's easy and free to crawl or stuff that can be torrented. There's a lot of, you know, reports that there's like a lot of torrented data in some of these models or, you know, basically anything that has, that is sort of like already written down and easy to get from the open internet.

    9. HS

      Mm-hmm.

    10. AW

      And then, um, and so, uh, you know, the, the, the first stage of a lot of this AI improvement has been this, these advances in pre-training, which is basically training these models to, um, to be really, really good at emulating the internet. And right now we're at a point where like these models are, are exceptionally good at emulating the internet. They're better than any human (laughs) at, at emulating the internet. But the problem is, uh, what, when we think of AGI or when we think of powerful AI systems, we want much more than just emulating the internet. You want AI systems that can do tasks. You want AI systems that can solve difficult problems. You want AI systems that humans can collaborate with to, to solve all their daily problems. And this, this process of building agents and AI, uh, AI models that are capable of all these things, you know, we're not gonna get there from internet data. And we've already used up all the internet data. So what we need is-

    11. HS

      Why are we not gonna get there from internet data? When we think about effective agents and when we think about software doing the work, not just selling the tools, as I think Sarah Townley said quite well before, why is existing data, uh, not equipped to do that transition from tools to work?

    12. AW

      Yeah. So, uh, the, um, the simple answer is like a lot of the, uh, a lot of the thought process and a lot of the thinking that humans go through when they are doing more complex tasks...... that doesn't get written down on the internet. So for example, if I'm a fraud analyst, um, inside a, a large bank, you know, I go through... My job is, like, understanding with, based on a set of transactions that seem suspicious, um, whether or not it's a fraudulent, uh, it's a fraudulent transaction, and I need to analyze all sorts of different pieces of data and use my deduction and use all my, like, sort of, uh, human intelligence to make that decision. And that process that I go through, it's not like every... like, I'm writing down step by step, like, "Oh, I looked at this piece of data, and I looked at this piece of data. And then based on that, I deduce this." I'm not writing all that down on the internet to later be crawled, uh, by these models. So one way to think about it is, like, all of the reasoning and thinking that is powering the economy today, none of that gets written down on the internet. And so if you just train on the internet, the model has noth- has no ability to learn from all of that.

    13. HS

      So how do we codify and capture the data that's not codified already? As you said there with the fraud analyst, the thought process, the analysis, the discussion that goes on in internal meetings that's not codified in data sets, how do we capture that to enable us to do the work?

    14. AW

      Our thesis, or what, what I really believe, is what we need from now forward is, is frontier data. And we need to, we need to basically have, uh, data abundance of frontier data, where right now we're in a sort of data scarcity mindset or a, or a d- or hitting a data wall. And this frontier data is exactly what we're talking about. Frontier data, in my mind, is, you know, complex reasoning chains, complex discussion, um, you know, uh, reasoning chains of models going... i- agent chains of models going and looking up a piece of data, doing some reasoning, looking up another piece of data, uh, maybe correcting if it has an error, um, tool use. You know, all of the, all of the key components that we would think of, of a, of an agent being able to do, that all needs to be encapsulated into the frontier data to power the forward capabilities of these models.

    15. HS

      How do we capture that data?

    16. AW

      Yeah, I... And I think that it needs to be built through basically a, uh, a combination of... there's basically three pillars. So first is, uh, there's a lot of this data that's locked up in, uh, the world's enterprises today.

    17. HS

      Mm-hmm.

    18. AW

      And so, um... and none of that gets on the internet for very good reasons. But just to give a sense of scale, you know, um, JP Morgan's, uh, uh, proprietary internal dataset is 150 petabytes. Um, the GPT-4 was trained on an internet dataset that was less than o- uh, one petabyte. So the amount of data that exists inside large enterprises is just absolutely astronomical. So there's one process of just sort of mining all this existing enterprise data for all the goodness that exists within it.

    19. HS

      But you would never get that open source, would you? So this is all proprietary and then delivered custom to that customer.

    20. AW

      Exactly. This has to be a process like every enterprise, like I have... uh, I have a set of very important problems for my enterprise-

    21. HS

      Yeah.

    22. AW

      ...that I need to go through the process of, like, basically mining all of my existing data and refining all of that existing data for use for AI systems to solve my own problems.

  3. 9:0810:56

    Solving Reasoning to Overcome Limits

    1. AW

    2. HS

      When we think about kind of breakthroughs, we said, said about diminishing returns at the beginning. You know, I spoke to one of the most powerful CTOs in the world the other day, and they said the real breakthrough in kind of this question of are we reaching diminishing returns is, like, whether we can really solve reasoning. Like, how do you think about our ability to solve reasoning and that impact of data that you mentioned there in helping us navigate that?

    3. AW

      You know, if you look at what these models can do, they're very good at reasoning in situations where they've seen a lot of data before. You know, I think we like to think about these AIs as, uh, as if they're like little human intelligences, but they're, they're very different. Human intelligence and machine intelligence are very different. Humans are very good at sort of, um... humans have a very general form of intelligence. They have this great ability... you know, uh, if a human... i- if a, if a kid is raised in a, a very small neighborhood, um, they can live their whole lives in that small neighborhood, and they can go to an entirely different part of the world and they can navigate, understand what's going on. No AI system today would be able to do that level of sort of, uh, you know, drag and drop in one situation to another situation and figure out what's going on.

    4. HS

      Yeah.

    5. AW

      So, um, so we have to... I think we have to be cognizant that that's a limitation. But what that means is that, you know, for any situation that we want these models to perform well in, we need to have data of that situation or that scenario, and actually the model will perform really well. So there's kind of two ways to think about, um, resolving the reasoning gap that exists in these current models. One is, uh, obviously you build some sort of general reasoning capability, which would definitely be a big breakthrough, for sure. The other one is just, it's like, it's a data problem. It's like you need data for every scenario where you want these models to reason well in. You just need to overwhelm them with data in all those scenarios, and you're gonna get models that can reason

  4. 10:5614:37

    From Data Scarcity to Abundance

    1. AW

      really well.

    2. HS

      How do we move from a- an environment of data scarcity to data abundance when we appreciate the immense amounts of data that, say, JP Morgan or Goldman Sachs or any large enterprise has, but also the proprietary nature of that which won't actually go to generalized models which will help the world or humanity or any of these kind of breakthroughs actually occur to everyone else? How do we move from that data scarcity to data abundance? Is it synthetic data that we're creating? How do we think about that?

    3. AW

      Yeah, so I think the second part is, to your point, um, new data that has to be produced.

    4. HS

      Yeah.

    5. AW

      And we need, uh, we need the means of production of new frontier data to get, to get us from, you know, GPT-4 to GPT-10. And I think this is a very natural... you know, when... I think when we think about chips, the- this is very natural, which is, oh yeah, we need to build more and more fabs. We need to buil- build bigger fabs. We need to build, uh... we need to, like, increase the, the, uh, resolution and get lower, lower nanometer fabs. Like, I think we think about, for compute, it's very natural for us to think about increasing the means of production.

    6. HS

      Yeah.

    7. AW

      But I think we don't think about this with data, and I think we need to do something very similar. And this process of, of producing data, you know, it's, it's a common... it's sort of a hybrid, uh, human synthetic process, and that's really how we think about it, which is, you need, you need algorithms that can do a lot of the heavy lifting in producing s- uh, synthetic data, but you need human experts who are going to be able to guide the AI systems and basically, um, help provide input as to, you know, when the AI system gets stuck or when they have a factuality issue or when it's in a situation where it hasn't encountered before. I mean, c- one... another way to think about it is we kind of need the... a lot of autonomous vehicle scale-up has been through, uh, these, uh, these safety drivers. You know, you have safety drivers inside the car, and when the car starts screwing up, you have the safety driver disengage and sort of, uh, take over. And you need that kind of setup for these AI systems. You need, um, the AI sys- you need AI models to be generating, uh, large amounts of data, and then humans who can kind of take over and, and nudge the models when necessary to make sure that you get really high quality data.

    8. HS

      What does that look like in the structure of an organization's day? Like, do we create new roles for these AI, uh, prompt... not prompters, but, um, kind of savers?

    9. AW

      Yeah, trainers is one term, AI trainers or, or, uh, or contributors, uh, is another term. I think this process of, of contributing data to AI is actually one of the-... uh, highest leverage, uh, jobs that humans can have. And the reason for that is, um, you know, uh, if let's say I'm a mathematician, I can either go into my hole and, you know, do pure math, uh, and, and try to do, uh, try to do pure math research, and there's, like, one, that's one trajectory for my life. The other trajectory is I use all my skills and talents and intelligence to help make these AI models smarter. And even if I make, let's say I make GPT-4 just, like, a little bit smarter on math. If I, um, take that little bit of improvement of the model and I sum that up across all the times that GPT-4 is gonna be called and used across every, you know, every math student who's going to use GPT-4, every company that's gonna use GPT-4, um, every developer that's gonna use GPT-4, that's a huge amount of impact. And so as a, as a human expert, you have the ability to have society-wide impact by producing data to help improve these models. And I think that's a, you know, w- what we see is for, you know, scientists, mathematicians, doctors, you know, uh, people who are, you know, human experts in the world, it's an incredibly exciting proposition to be able to, you know, I can transmit my capabilities, intelligence, training, all of that into a model that's going to be able to have society-wide impact. I mean, it's an incredibly exciting

  5. 14:3718:59

    Challenges in Structuring Massive Enterprise Data

    1. AW

      proposition.

    2. HS

      How do we think about the structure of data? Often people talk about the biggest challenge in, in kind of, you know, data governance is actually just, like, the structure and cleanliness of it. When we look at the 150 petabytes of, you know, JP Morgan data, I have no idea, but I presume it's not structured perfectly for a lot of models to ingest efficiently. How do we think about the structuring of this huge dataset that, uh, I'm sure all large enterprises have and the challenge that that poses?

    3. AW

      There's two parallel efforts. One is, is mining existing data, which, um, by all means is going to be a one-time hit. There's going to be a one-time benefit that you get from mining all your existing data. And it could be really meaningful.

    4. HS

      Do you think in five years' time everyone will have mined their largest data sources internally?

    5. AW

      I don't think everyone will, but certainly the most sophisticated companies will.

    6. HS

      Okay.

    7. AW

      And then, and then we'll be at a point where we still need to make the models better.

    8. HS

      Yeah.

    9. AW

      And so it all will, w- at the end of the day, it'll all boil down to data production and, and, uh, effectively, like, you know, what are the, what are the means of forward production, um, in the same way that, you know, you need the means of forward production for chips and, and all the other things that you care about.

    10. HS

      Okay, so we have that in terms of mining existing. You said there was another form.

    11. AW

      So there's data mining, uh, and then there's, there's forward data production.

    12. HS

      Okay.

    13. AW

      Um, these are the two core, um, core directions for where we need this data to come from. And I think, you know, kind of taking a, a broader step back, I think that, um, a lot of AI progress at this point is fundamentally more data bottleneck, such that I think that, like, if we were able to produce, compute, and data in lockstep with one another, so as NVIDIA continue to manufacture, you know, uh, uh, hundreds of billions dollars worth more of chips, if we were able to produce a proportional amount of data as we got more and more chips, and we were able to produce these two together, then we would just get, um, we would get astronomically more capable models.

    14. HS

      But just so I understand, so, eh, when we think about increasing the supply side of data, like, what is the literal ways that we can do that? Or, like, what comes to my mind is actually Dan Siroka at, like, Limitless, I don't know if you know Dan at Limitless, uh, he's the founder of Optimizely, but he basically has, like, this new hardware device which records, like, every single thing that you say and do, and it produces your own personal AI because it has everything that you've ever said in the day. Like, that is a new form of data creation, in my mind. How do we increase the supply side of data?

    15. AW

      There's probably two main pieces. One is t- like, like this, uh, effort from Limitless or other efforts which is basically, um, uh, much more longitudinal data collection. Collecting more of what's naturally happening in the world.

    16. HS

      Mm-hmm.

    17. AW

      So this is, you know, and there's a bunch of forms of this, so one is, like, in a workplace, um, I think you're gonna want, uh, some kind of, you know, as creepy as it sounds, some kind of constant data collection of, you know, what apps you're using, what order apps you're using, you know, w- where do you copy paste one thing to another thing?

    18. HS

      You have a lot of this with RPA and a lot of UIPath's flows, so-

    19. AW

      Yeah, exactly.

    20. HS

      ... people are quite used to that, I think.

    21. AW

      Yeah, yeah, yeah, yeah. So, so process mining, which is one of the terms, um, in SaaS, but basically, like, the continued, um, collection of, uh, of existing enterprise processes.

    22. HS

      Mm-hmm.

    23. AW

      Then there's the consumer version of that, which looks maybe like kind of what you're referencing or, you know, uh, maybe it's with the Meta Ray-Ban, uh, collaboration or, you know, whatever, whatever device ultimately does it, but sort of something that collects, you know, the, the longitudinal view of your own life, and then there has to be, um, this sort of, uh, a, a real, a real investment towards what I, you know, kind of, um, uh, human experts collaborating with models to produce frontier data. So th- both of the things I referred to before, both, you know, enterprise process mining and, um, you know, for lack of a better term, sort of, like, uh, consumer data collection, those are all gonna produce valuable datasets, but they're not gonna produce the data that's actually gonna push the models forward, because to push the models forward, we need, you know, uh, you need really highly complex data that's going to be able to push the frontiers of what the models can do. So these are, this is where you need the agentic behavior, this is where you need the complex reasoning chains, this is where you need advanced code data or maybe advanced physics or biology or chemistry data, these are the, these are the things that are really needed to push the, the boundaries of the models, and that's a... I think this is a global, kind of, infrastructure-level effort that needs to happen. Like, I think we need to think about it as, you know, how do we get the world's experts to collaborate with the models to help produce AI systems that are going to be, you know, the world's best scientists or the world's best coders

  6. 18:5922:02

    Fair Access to Proprietary Data for Models

    1. AW

      or mathematicians?

    2. HS

      When we think about the commoditization of the models, as everyone says that we have, uh, w- how do we think about, like, proprietary access to these data sources? You know, people have said to me before that, like, you know, and I don't mean to throw shade, but, like, OpenAI's models are not necessarily better, they've just had better access to data, they've bought more data, whatever, whatever, but data being the central kind of superiority element of why they had better performance in the past. Like, will we see...... one model get access that others don't? How do we think about, like, fair and equitable access to data from the model side?

    3. AW

      Well, I actually think, to your point, if you think about the competitive playing field of these different model providers against one another, um, uh, they have... You know, data is one of the few, uh, you know, pillar... Th- there's three pillars, right? There's algorithms, compute, and data. Um, and data, I think, is actually the primary pillar that you can imagine, uh, a real com- durable competitive advantage emerging. Right? So, so, if you think about where there are moats in this LLM race, or where there are moats in this foundation model game, I think data is one of the few areas where you can produce a sustainable moat.

    4. HS

      Yeah.

    5. AW

      Because the issue is, algorithms, that's IP that at some point, you know, the rest of the industry will learn about. Um, compute is, uh, you know, you can have more compute than other people, but other people can just spend more money and buy that compute. And data's one of the few areas where you can actually produce a, a long-term, sustainable competitive advantage.

    6. HS

      I agree. When you look at some of OpenAI's agreements, they obviously partnered with the FT to get access to all of the FT's historical library, and they've done quite a few, actually, with Axel Springer, I think. Like, that is access that a lot of other models do not have, which will make their content superior in whatever queries they have in that respect.

    7. AW

      Yeah, exactly. And I think this is the... It's the start of this, um, I think this form of thinking of sort of, um, data as a moat, right? And so, you know, these are... These... You know, there's the FT, there's Axel Springer. These are the ini- these are the first, um, indications of this. But in the future, I think, you know, these labs are gonna be thinking a lot about, "Okay, what are the... What's the data that I'm going to use to differentiate relative from my competitors, and how am I going to produce that data? And what is the, um, what is the long-term durable advantage created by that?" And so I think you're gonna see, you know... I actually expect that, you know, everything we've been talking around data... Around model commoditization, you know, we're gonna see companies start building, uh, data strategies that, that drive more differentiation in the market over time. And I think it's a... I think it's pretty, um, it's pretty exciting. I mean, another way to think about this is, you know, right now, uh, if you... You know, in San Francisco, the big, you know, researchers and, and, uh, the big CEOs brag about how many GPUs they have. You know, they're... They're sort of, um... The biggest indicator of how serious they are about AI is, is how many GPUs they have. Um, but I think in the future, they're gonna brag about, you know, what data they have access to and what are their... You know, how much data they're producing, and what are their sort of unique rights to, to different data sources. And I think that's actually going to be the primary plane of, of competition in the future versus just, "Okay, Jensen's giving me, you know, however many hundreds of thousands

  7. 22:0226:51

    Model Commoditization

    1. AW

      of GPUs."

    2. HS

      Given data strategy being a potential, uh, element that one could win on and compete on in different ways, do you think we will not see the commoditization of these models over time?

    3. AW

      There's two futures. One is that even data strategy becomes something that very quickly, um, commoditizes, and, and different peop- and different labs sort of, uh, copy one another or they all, they all end up converging to the same direction.

    4. HS

      100%, 'cause especially with a lot of the content producers, they're not gonna f- do exclusive agreements with one model and not other models.

    5. AW

      Yeah. So, so then I think it comes down to, like, they... Like, different labs need to have strategies to produce their unique datasets. So, let's say I'm, uh... You know, Anthropic, for example, has, has, uh, has focused a lot on enterprise use cases, and maybe they need to develop a data strategy that enables them to have a very differentiated access to new data to support those enterprise use cases. Or maybe, um, OpenAI with, with ChatGPT needs to develop, you know, a unique data strategy that lets them leverage the fact that they have all these users and, you know, all this d- all this reach. But I think that, like, the, the various labs are gonna need to lean into, uh, where they're gonna be able to get, you know, proprietary and differentiated data going into the future.

    6. HS

      Do you think we're gonna see a reversion back to on-prem? W- I'm, I'm jumping around so much, but I'm loving this conversation (laughs) . Sorry for that. But when we think about the, like, 150 petabytes of J.P. Morgan data, I don't know if they're gonna be like, "Yeah, I'll throw it all in the cloud, my most sensitive data." And I think... Uh, will we see the reversion back to on-prem and models that work on-prem for these large enterprises?

    7. AW

      I think when we talk to these large enterprises and, and the leaders within these enterprises, you know, they are, um, they are very quickly realizing this fact that you stumbled upon, which is that, um, you know, their data, their enterprise data, might be their only competitive differentiator-

    8. HS

      Mm-hmm.

    9. AW

      ... in an AI world. And so, um, they're extremely, extremely cautious about... You know, if they do a deal where all their data somehow, you know, a model developer gets access to it or, you know, they share it in some way, then, you know, they could be, um, mortgaging away their entire future.

    10. HS

      Mm-hmm.

    11. AW

      And so I think they're, they're very, very cautious about that. And this is actually why I think there's a, there's a very, um, there's a very big, uh, sort of opportunity for whether it's open source models or, or the LLaMa models or the Mistral models or whatnot, basically these models that can go on-prem, and that enterprises can take and then customize on top of their own data, and then it never has to go back to a model developer or cloud or anything like that. I think that's a, there's a huge, um, sort of unmet need there. And I think that's actually where most serious enterprises are gonna go towards, which is, "My data... I need really, really strong guarantees that my data is not going to be used in any way to improve my competitor's."

    12. HS

      I think AI services will actually create more revenue over the next five years than AI models. We saw, actually, Accenture come out with, I think it was $2.4 billion in revenue from generative AI, and OpenAI was obviously at $2 billion. How do you think about... I'm just intrigued with Scale AI staying and working with some of the largest enterprises, a services component, the learning and adoption curve is challenging for large enterprises. Do you see that as a core part of your business in the next few years as we scale the education curve?

    13. AW

      Yeah. I mean, I think that, um... First of all, I think you're right. I think that the, um... There's so much value to be generated from AI, for sure.... but then there's this, there's this very natural question of, uh, where's the value capture going to be, right? And, you know, there's this fascinating thing. You know, um, w- if you go back and read, uh, High Output Management by Andy Grove-

    14. HS

      Yeah.

    15. AW

      ... there's like, there's these chapters around like, oh, you know, for Intel, like, hey, we thought that, first we thought that like, this is where the value capture is going to be, but then we realized it was going to be in this other part of the stack and so we had to migrate to that part of the stack, and then we had to migrate again, and it's this incredible case study. You know, I remember reading that and, uh, I read it maybe a decade ago in a, in a different era of tech and I was like, "This is weird. This doesn't feel very relevant." And then now in AI, uh, you're, you're seeing it once again, where it's, I think, you know, it's so new and so nascent where exactly where in the stack value will accrue feels like it's constantly moving. And I agree with you, I think that the, the models themselves, um, there's so much competition there, um, that I don't know if... I don't know how much value accrues at, at literally the model itself, but the... everything above the model and everything below the model, I, I feel very confident there, there will be value accruing. So, for the infrastructure, I mean, NVIDIA's, the biggest company built on AI today, like, (laughs) they're, they're the third most valuable company in the world. NVIDIA's more valuable, you know, their market cap is higher than Meta and Google and Amazon, uh, and Saudi Aramco. I mean, it's, it's really stag- like, NVIDIA's an incredible, incredible company. And that's kind of below the model. And then above

  8. 26:5132:55

    Value Extraction Challenges in AI Commoditization

    1. AW

      the model, you have all these apps and these services that are going to be built on top of it.

    2. HS

      So I was arguing with someone this morning actually on the way here, though, and I was saying like, yes, okay, we have Notion AI and we have, you know, Box, the storage company, who are going to implement AI solutions into their existing storage products so you can extract information better. Yeah, have you seen the numbers? Salesforce are now growing at single digits, like Mongo are now growing at single digits. Point being, actually the commoditization of these features means it'll be better products for us, but I don't know if you'll get value extraction from that in the form of increased pricing. How do you feel about that?

    3. AW

      Yeah, so our thesis on this is, um, you know, there's this, uh, there was this, um, article that, that flew around, The End of Software, right? Uh-

    4. HS

      Mm, I saw this. Chris-

    5. AW

      From Pike. From Chris-

    6. HS

      Yeah.

    7. AW

      Pike, yeah. And it was kind of pro- it was an intentionally provocative point of view, I think.

    8. HS

      Mm.

    9. AW

      But, uh, I do think there is a version of this which is true, which is-

    10. HS

      Sorry, for those that haven't read it, what was the core premise? Just so they understand it.

    11. AW

      Yeah, so, so he basically... I thought it was like a brilliant comparison, but he basically drew this comparison of, um, software companies today to media companies, pre-social media. The rough comparison was, you know, uh, in the days of, uh, you know, in the, in the, uh, older days of media, you had all these incredible media companies. There were, there were, uh, these high-end shops where there were, like, all these experts producing this, this, uh, this very differentiated content, but then it got disrupted by social media and the internet broadly because all of a sudden you just had, um, you know, as the content creation costs came down, or mo- more, more of the content distribution costs came down dramatically, the world of media consumption turned into this, like, very broad constellation, uh, where you would consume whatever media was produced by anybody that was interesting to you and it was sort of like much more on demand versus, you know, being this sort of walled garden of, of large media producers. And basically this comparison to that's what's about to happen to software, which is, you know, now that you're- the enterprises live with this walled garden of some small number of software providers, uh, and what's gonna happen now with generative AI and all these other trends is they're going to have this constellation of all of these, you know, uh, different apps and point solutions and this sort of portal to that constellation of, of various, uh, software providers and this sort of like, we're gonna move from this current world of like smaller number of, of walled garden SaaS apps to this sort of much more decentralized, uh, universe.

    12. HS

      Do you agree with that?

    13. AW

      It's intentionally provocative, right? Uh, but I think one thing that, that is true is I do think that enterprises and, and the world writ large are going to demand greater levels of customization. And, you know, they're going to demand, um, greater levels of personalization and stuff that is, that is really purpose-built for their business, like a glove. I think that, like, the beginning of this, you know, the first tech company that ever, uh, did something in this direction was Palantir. Um, and, you know, they got a bad rap for lo- for a long time because, you know, the Palan- everyone thought that Palantir, oh, they're just a consulting company, um, but Palantir's point of view, which was provocative as well was like, no, what we're gonna do is we're going to go into enterprises, we're going to understand exactly what their problems are, and we're going to help them build the perfect application for them that's built on top, th- that connects all their data and all that stuff, and then, um, and if we can do that, then we're going to build something that's, that's far more valuable for them than what any other software provider is going to be able to produce. And they did this, you know, obviously before generative AI, before all these, you know, tools that are gonna make this, this motion a lot more, um, feasible, but I do think there's an element that this is, like, the way the world is moving, which is now especially that the software production costs and software creation costs are going down so dramatically, we're going to end up moving towards a world where, um, more and more of the software that enterprises consume are going to be customized and custom-built and purpose-built for exactly their problems.

    14. HS

      What does that mean in terms of the makeup of engineering teams of large enterprises? Do they shrink? Do they focus on different things? Do we just have teams with the world's best prompters? What does that mean in terms of the changing structure of engineering teams?

    15. AW

      Yeah, well I think software engineering in general is, um, is going to change dramatically. A lot of what developers spend a lot of time on today, they will not need to spend time on going into the future as the models get better and better at coding. But there's certainly big parts of what they do which are irreplaceable. And over time, I think that the, the part in particular that's really, um...... that's very, very valuable is this, is this sort of, like, general process of going from what are the, what are my customer problems or what are the sort of, like, problems I need to solve? And, like, translating those into engineering problems. And scoped sort of like, um, you know, tickets almost that can be solved by an AI engineer.

    16. HS

      Uh, everyone says that we're gonna see the end of per seat pricing, like you said. You know, Chris had that provocative article. But, like, everyone talks about the end of per seat pricing. To what extent do you think we will see the end of per seat pricing in this next wave of software? And especially with the data lens, where you could see a more consumption-based pricing model aligned, do you think that truly takes over?

    17. AW

      The reason that per seat pricing, um, uh, doesn't make sense going into the future is that at an enterprise, uh, today, certainly most of the productive work is done by, is done by their employees, done by people. But in a future where you imagine more and more of the work is done by AI agents or AI models, um, then per seat pricing doesn't really make sense 'cause you want, uh, you know, as a, as a provider of software, provider of, of solutions, you wanna make sure that you're capturing the value that you're providing to the people, but also the value that, you know, your agents or your, you know, your AI systems are producing. And so I think that, that shifts a lot of the world towards consumption-based pricing, um,

  9. 32:5536:53

    Navigating Data Regulatory Challenges for Innovation

    1. AW

      versus per seat.

    2. HS

      One of my biggest worries is, obviously we're in London, we specialize in, in many things here, uh, long lunch breaks and regulation. Um, and (laughs) so d- diminishing on London, um, but my question to you is I really worry that we're gonna see regulatory provisions which stifle innovation because of consumer data, protection acts, and just unnecessary regulation around data access. Do you think I am justified? And how do you navigate the regulatory access to data question?

    3. AW

      It's a really important point, and I think that, that certainly, um, what we've seen in the EU, uh, is, is a very restrictive approach to data. My personal belief, I don't think that, uh, more permissive, uh, uh, regulations around data are at odds with being a liberal democracy. More sort of liberal, uh, data, data access, uh, provisions are, are in fact very, very compatible with being a liberal democracy. And I think that, that we as a society need to figure out what the right balance there is and h- and how we sort of square the circle, um, but, uh, but I think, I think this is a very important question 'cause it's almost like I think in, in the United States, there's been a huge amount of, of effort and, and, uh, real regulatory effort on terms of how do we ensure that we do not slow down chip production?

    4. HS

      Yeah.

    5. AW

      And how do we ensur- you know, how do we make sure that, you know, um, that we can keep manufacturing huge amounts of chips and the US won't be disadvantaged from that perspective? We need to take a similar lens to data. So how do we, from, from a policy standpoint, both in the US and in the UK, frankly, um, how do we think about ensuring that, uh, as countries, we are not gonna, we're not holding, uh, we're not tying one hand behind our backs for future data production for these models?

    6. HS

      Do you think the US is currently tying one hand behind its back in terms of that?

    7. AW

      We're definitely not taking a pro-data, um, regulatory-

    8. HS

      What would a pro-data regulatory stance look like?

    9. AW

      I think there's a few things. I think first there's, um, there's large datasets that are not, um, do not lend proprietary advantages to specific players that need to be sort of centralized and, and made accessible to whole industries. So simple examples, um, safety data, um, uh, in, in, let's say, aerospace, which is a, which is a hot topic obviously. Um, but safety data in aerospace should be collectively pooled for the purpose of advancing the entire industry forward. Or the data I mentioned before, or the, the example I mentioned before, fraud, uh, and compliance in financial services should be pooled together and, and should be, you know, uh, should build forward capabilities. Um, so I think there's, like, entire indu- like, industrial sectors where there should be some degree of data pooling to just push forward the overall industry, and I think what you need is, um, in, in a lot of consumer-facing, uh, areas, you need to, we need to work through a lot of the existing restrictions to make sure that those don't prevent AI progress. So, so one great example here is actually, uh, HIPAA in healthcare. So, um, and, and all the PII and other, other, uh, limitations. Right now, um, uh, you know, HIPAA and PI- and all the PII regulations will more or less prevent, uh, patient data from being used to train, um, you know, AI models. But I think we can agree as a, as a, as a civilization, as a human race, we really wanna learn from all of the existing medical data on how do we cure human diseases going forward. And so we need to figure out, like, how are we going to sort of, um, uh, make it so that, like, there's very clear anonymization provisions or there's a, there's a very clear and obvious way in which you can use existing patient data to improve future health outcomes.

  10. 36:5342:49

    A Military Asset in Global Conflict: China & Russia

    1. AW

    2. HS

      Totally agree. I, I heard actually that China apparently, uh, I can't remember who said this on the show, but they said they're, like, two years behind the US in terms of AI progress. I heard that and I thought that is absolute shit.

    3. AW

      (laughs)

    4. HS

      Uh, and I think when you look at, like, data provisions (laughs) and what the Chinese government will be willing to do in terms of data access and data provisions and regulation, I, I think if they are two years behind, that will s- very quickly catch up. How do you see China being two years behind and do you agree with that?

    5. AW

      Two years ago, they were probably more than two years behind. So, um, when, you know, when OpenAI first produced GPT-4 in the lab, uh, you know, China were nowhere near that. But just even in the past few months, uh, there's a Chinese company, Zero One, ZeroOne.AI, uh, that produced, uh, a model Yii-... large, uh, Y- Yi-large, that is now the, uh, one of the best models in the world. I think it's just behind... So it's behind GPT-4o and, uh, Gemini, Claude 3 Opus, and it's the next model right behind that in the leaderboards. So it's one of the best models in the world. Um, so we've already seen them meaningfully catch up. They are like, you know... Chinese- Chinese LLM and AI capabilities are, I would say, right now, pretty close to neck and neck with US capabilities and I think if you plot the path ahead, based on everything we've talked about with data, I mean, I think they're gonna... They have a clear shot at racing forward and racing ahead of us. And I think it- it comes down to, at its core, the- that the CCP system is incredibly good at, um, at taking, uh, very aggressive centralized action and centralized industrial policy to drive forward critical industries. And what we've seen, even in the past few years on, or the past few decades frankly, on solar, um, uh, how China has... or the CCP has been able to make, um, uh, take industrial policy to the point of, like, being, by and large, the world's leader in solar. And then most recently, EVs, um, uh, and how, you know, uh, the CCP system and approach has been able to create, you know, very, very cheap EVs. You're seeing this pattern play out over and over again, where, um, you know, the- the CCP approach to industrial policy is not the most innovative, but once an industry's been established and it's about, you know, turning the crank, they are better at turning the crank than any other economy in the world.

    6. HS

      I totally agree with you. I saw actually th- a chart, I can't remember who tweeted it yesterday, but I think it was- I think it was either Elon or Bill Ackman, and it w- basically showed countries, uh, different kind of creation of EV providers. And it showed, like, the US, and it was like, the US, I mean, without Tesla, it would have been in the dumps because it would have only had, you know, General Motors. But China was, like, up and to the right, for sure. Does that worry you?

    7. AW

      It worries me a lot. Yeah, and I think that the, um... You know, one of the elephant in the room topics, which I think, you know, as an AI community, we rarely discuss is that, um, at its core, this AI technology, uh, has the potential to be one of the greatest military assets that humanity has ever seen. You know, if you imagine a f- let's say you had AGI, and you have one country with AGI and another country without AGI, you know, which one will win in a war? Well, (laughs) probably the one with AGI. It's gonna, like, figure out all, you know, how to produce all the weapons or it'll figure out a brilliant military strategy or it'll be able to hack the other country's systems and... It is potentially one of the greatest military assets that the world's ever seen, potentially even more of a military asset than nukes, right? And so, um, and, uh, if you think about this, you know, we're in a- a geopolitical environment that is, uh, increasingly tense. You know, the amount of conflict in the world is- is- has been mon- increasing the past few decades. Um, the, uh... You're seeing these- these, uh, multiple wars being fought in the world, um, and some of them without very clear paths to resolution and you have... There's, uh... There are totalitarian leaders right now in the world, you know, many of them, who with- for whom, like, you know... Let's say China or Russia, uh, had AGI today and the United States didn't, um, you know, I would- I would imagine, uh, they would use that to- to, um, to conquer. That's a really scary outcome for the world at large, and I think it's one that, you know, we, the Western world, needs to spend a lot of our thought and effort towards preventing that outcome.

    8. HS

      Given that concern, should we not have closed systems? Obviously open systems have a lot of benefits, but the challenge with open systems is anyone can use them and that means that Russia can use them, China can use them, and everyone has the same levels of access, or supposedly so. Should we not have closed systems, with what you just said?

    9. AW

      I think there's a bit of a- of a dichotomy that must emerge. There's sort of... I think we need to think about the most cutting edge and the most advanced systems, um, those we will want to ensure are closed. For geopolitical reasons, for military reasons, for whatever reasons, like, as we develop systems that are- that are genuinely so, so powerful, we- we'll want to keep those closed. That- that doesn't preclude us from, you know, making open, less advanced versions of technology that frankly just have the ability to produce a lot of economic value, um, and I think that's where we are with LLaMA right now. Like, I don't think LLaMA, um, that LLaMA 3 models are so advanced that you would think... Like, LLaMA 3 in and of itself is not a military asset yet, um, and I think that there's- we're- there's clearly a line underneath which I think it's totally fine to have open models, um, and I think we... So that's what we need to be thoughtful about, is where that line is and when

  11. 42:4944:52

    The Future Landscape of Foundation Models

    1. AW

      are we getting close to it?

    2. HS

      Before we discuss some kind of company building principles, which I do want to touch on, in 10 years time, what does that foundation model layer look like? Who's independent? Who's been acquired? What does it look like?

    3. AW

      I think at its core, um, what we've seen about the foundation model race is that it is, uh, incredibly, incredibly expensive, um, and it is expensive to the level of, you know, these models have gone from costing hundreds of millions to a billion dollars to mul- maybe multiple billions. I think in 10 years time, maybe they'll cost tens or hundreds of billions, and so there's just not very many entities that have that much discretionary capital to invest (laughs) into these AI models. So naturally, what will happen over time is, AI effort, the- the- the foundational efforts, will coalesce around, you know, uh, nations or the large tech companies over time, and so, you know, the, um... Basically, you will see the most, uh...... you know, all these hyper profitable business models, whether that's a nation state or one of the hyperscalers, will... Those will be the only entities that could possibly subsidize or, or, or underwrite these massive AI programs. And so I think in the future, (clears throat) it looks like... You know, it is a battle of giants. Already it looks like a battle of giants, but you... But at that point it's even more a battle of giants.

    4. HS

      So do you agree with me in saying that you'll see all of the smaller players acquired by the large cloud providers, your Google, your Amazon, your NVIDIA, you, you name your large incumbents, but especially the large cloud providers, and have them integrated into their existing solutions?

    5. AW

      Uh, yes, with maybe an asterisk that there's some of these partnerships that, um, I think it'll be interesting to see how they play out. You know, the, the OpenAI Microsoft partnership, or the Anthropic, uh, Amazon partnership. You know, I think it'll be... One of the most interesting questions of this technology era is how do these partnerships actually end up playing out long term?

    6. HS

      Listen, I do want to touch on some company building principles. Let's start with,

  12. 44:5252:11

    About Founder Brand & PR & Media

    1. HS

      uh... I, I can't remember the exact ph- You said one PR, it was a brilliant statement, this was it, which is, "The best PR is no PR."

    2. AW

      (laughs)

    3. HS

      What did you mean, Alex (laughs) ?

    4. AW

      The traditional press industry, um, is, is not particularly conducive to great companies being built. And let me be more specific around that. I think that the, um... A lot of traditional press, you know, is oriented around generating clicks, and so the traditional press engine will... It'll build you up and it'll generate clicks on the way up, and then it'll tear you down and generate clicks on the way down. And this is in contrast to, I think, 20VC and other, uh, sort of direct outlets, so to speak, where I think there's a, um... Where founders and companies have a direct channel to get their message out and, and explain, uh, what they're working on.

    5. HS

      Well, I think the other thing, and I, I, I actually think it's a little bit unfair, I feel, for traditional media. I don't care about clicks. Like, yeah, we have sponsors. Respectfully, if we didn't have them, we'd still be doing the show. I don't do sensationalist headlines. I'm not gonna put some glossy thing, "Scale.ai predicts military devastation" (laughs) with this episode, 'cause I'm not there to just optimize for clicks.

    6. AW

      Exactly. Yeah, you're, you're there to, um, (clears throat) to genuinely educate-

    7. HS

      Yeah.

    8. AW

      ... and, and explain what's going on to y- to your audience. And I think there's, you know-

    9. HS

      It's almost unfair though. Can you imagine if someone said, "Hey, I'm gonna do Scale.ai, but I don't care if we lose money."

    10. AW

      (laughs)

    11. HS

      You'd be like, "Oh fuck, how do I compete with that?"

    12. AW

      Yeah. But, but I think it's, um, it's pretty stark. You know, I've received more fair treatment testifying in front of Congress than I have from, uh, various media outlets over the years. And, um, it's... It, it, it feels like this totally ridiculous statement, but I think we're in this perverse state of, of a lot of traditional media where, um, the, the system itself, you know, because of this sort of like very click, uh, oriented approach versus a genuine educational approach, you know, it, it, um, it almost has no way of being, uh, you know, fully fair to the companies. And so I think it's... The, the imperative is on the companies themselves to properly tell their story through direct channels and through podcasts and through, um, avenues where their, their message won't be, won't be altered.

    13. HS

      I, I completely... I think this is why found a brand today is more important than ever, 'cause if you don't own your means of distribution, it will be contorted.

    14. AW

      Exactly. And I think that's the... Kind of a shocking state of the world, but I, I think it's-

    15. HS

      Has that changed your strategy then?

    16. AW

      Yeah, I think, I think for us, we think a lot about how do we, how do we get the direct message out there and how do we develop, to your point, like what are, what are the purest ways that we can transmit and explain what we're doing? And this is, this is a great example. You know, you'll ask me a question, I will answer exactly how I believe that... W- you know, exactly the answer that I think, and th- this will go out to your, to your listeners and your viewers. And it's sort of the, one of the purest forms of getting the message out there.

    17. HS

      I think one thing that people make a big mistake on though is they then try and build the direct channels for the companies. And respectfully, people don't follow Scale, people follow Alex. It's much easier to build followings with personalities than it is companies.

    18. AW

      I think there's, like, so few companies that can... You know, OpenAI is one where, like, I think OpenAI as an entity has, like, a lot of... Has a lot of meaning as a brand, but, um, very f-

    19. HS

      It does, but if you look at the amount of times that Sam Altman trends versus the amount of time that OpenAI trends, it is disproportionately higher to Sam Altman. People still, now more than ever, love the cult of personality.

    20. AW

      Yeah, that's a fascinating thing. I mean, that definitely should be, uh-

    21. HS

      And that transcends actually when you look at Lionel Messi at Miami, when you look at Margot Robbie with Barbie, the celebritization of individuals in organizations or in movements drives everything.

    22. AW

      That's fascinating. I mean, it, it d- It, uh, it probably speaks to a deep, a deep human need to, to relate. You know, I think, I think we as people, we have a lot of circuitry to understand individuals, um, and we, we have this ability to understand individuals. It's very hard to understand what an organization means. There's no, there's no intuitive, um-

    23. HS

      So should founders give a shit about traditional PR? Should they care about getting in the traditional press?

    24. AW

      I would argue we're in an era now where they, they shouldn't, um, and they should think about, uh, they should think about w- what is a... what is an interesting point of view they can have, and what are the... what is the most pure way to get that point of view across?

    25. HS

      When do you feel the tr- press tried to tear you down unfairly?

    26. AW

      So I, I would say that like, uh, almost, almost precisely we've had this, this, this, uh, this story where, you know, we had an incredible rise up and an incredible come up maybe when we had our initial... We initially became a unicorn back in 2019, um, and, uh, for the few years after that, you know, it, it, it felt like smooth sailing. And then starting at about, um-... 2022, right? So when it was the entire tech narrative was... or the entire media narrative, let's say, was tearing down tech companies because, you know, it was... I mean, in some ways, it was very fair. You know, all of... Many, many tech companies received very high valuations. There was, uh, there was an incredible, um, amount of excitement in tech and then the markets all crashed. And that's, that... And starting in 2022 was when I noticed, um, for us specifically, the tone entirely shifted, where the media engine pointed themself towards, you know, pointing out all the missteps from companies like us or e- a lot of our peers versus, uh, versus trying to point... you know, trying to take a balanced perspective. And there were even... You know, another example of this is... So starting in about, um, we, we began working with the US military and the US DOD. Um, and this was, you know, this was obviously long before the current defense tech, um, hype wave and, and long before all that, but it was, it was driven by an intrinsic belief that we had as... uh, that I had, and we had as a company that it's important for the United States DOD to have access to, uh, incredible, uh, AI technology. That was like a- a fundamentally important thing for the future of the world. And, um, and in the years after that, you know, uh, I would say by and large, the traditional media engine actually tore us down for supporting the US government and supporting the military versus, I think, taking a broader view that, you know, maybe this was... maybe this was a positive thing actually to (laughs) to support, um, support the US military. And this goes to... This almost goes to what I was saying about the, um, about, you know, the, the d- the dichotomy in treatment and, you know, testifying in Congress versus, uh, versus, um, with the media. You know, I testified before Congress about, um, AI's use in the military, and I would say my... the, the treatment I got there, I think, was, was a properly broad one. It was, "Hey, obviously this is powerful technology that we need to be thoughtful of, but it is so important that America leads in this, and, you know, thank you for, for everything that you're doing." That w- that, I felt, was the response, whereas in the media, it's this incredibly scornful, uh, perspective around, "Oh, like, you know, is this a good thing? Do we trust this company? Like, what does this mean?" I mean, it's, it's just shocking.

    27. HS

      But I think it goes back to the incentives drive outcomes, and what is the incentives of media versus the incentives of Congress? Congress are not there to sell you clicks. (laughs) They're there to hopefully get to an informed decision on the best outcome.

    28. AW

      Exactly.

  13. 52:111:00:41

    Hiring

    1. AW

      Yeah.

    2. HS

      On the incentives drive outcomes, I, I loved something you also said. You said, "Why, hiring people who give a shit is harder than it sounds." W- What do you mean, and how do you think about that when hiring?

    3. AW

      You know, it sounds so simple when you, when you really boil it down, but if you hire people who, you know, h- we say give a shit internally, but who really, really care. You know, they really, really care about the, th- their work product, they really, really care about the quality of their work, they really, really, really care about the organization, they care about making sure that the company has an impact, you know, they just... they just really care. And what that means, how that manifests is, you know, they're willing to sweat every single detail, and if they get roadblocked or there's like something in their way, they'll spend the ex- they'll go the extra mile to make sure that they get through those things. You know, that ends up being... That's, that's how startups work, fundamentally, is that you have these small teams of people who each care 10 times more than the, than the average employee, 10 or 100 times more than the average employee inside a big company, and so they end up, you know... You end up just solving so many more problems than, um, than the big companies can.

    4. HS

      How many people do you have in Scale today?

    5. AW

      We are about 800 people.

    6. HS

      800 people. You are now getting to the kind of bigger company size. It is harder. You know, they kind of, uh, only hire A+ players or A players. A players by definition are rarer. Can you have 800 A players?

    7. AW

      I think the answer is yes, and I think that, um... You know, what we say a lot internally is, "How do we hire the Navy SEALs, not, not the Navy?" Not that there's anything wrong with the Navy-

    8. HS

      (laughs)

    9. AW

      ... but how do you have, you know, a really small elite group, um, that is really, uh... where you're really hiring the cream of the crop? And this goes down to process. You know, for us, um, at this point of the company still, I approve every hire. So, I will look at the... I will either indirectly interview or look at the interview feedback and look at the, um... understand every single person who we hire to ensure that we're keeping an exceptionally high bar. And that way if there's-

    10. HS

      What percent of the time will you go against the recommendation of the team on a new hire?

    11. AW

      I would say maybe on average 25 to 30%. Like a lot.

    12. HS

      Wow.

    13. AW

      Like a lot. And I think usually it's due to, you know, maybe there's a new hiring manager, um, who, you know, needs to get calibrated or, you know, it's, i- it's... or it's like an edge case of various forms, but to me, the way I think about this is like, I, as the founder of the company, I have seen, you know, everybody who's come in and I've seen who succeeds and who fails, and I've... I have like... I have as an algorithm almost, like, developed the most fine-grained dataset of understanding, you know, what it looks like for people to be successful at Scale and what it looks like to have the Navy SEALs versus the Navy, and it's my job as a founder to help ensure that we as an organization, um, are actually utilizing all this knowledge and utilizing all this learning that's been happening over the past eight years in the organization and carrying that forward.

    14. HS

      Final one. What was your biggest, like, management or leadership fuckup? So like for an example for mine, I'm like, people act out of fear or freedom. You know, when you bring someone in, some people act out of like, "You have to perform. You have to perform," and other people act out of, "Hey, I trust you. I respect you. Do your best work." And you just have to identify which camp someone's in, and then hopefully if their skills are there, they should operate to their best. I wish I'd known that when I started and I didn't, and I just tried to act out of fear (laughs) for everyone there. What do you know now that you wish you'd known and where did you fuck up?

    15. AW

      The biggest one was, um, was actually, you know, in the same era of like 2020, 2021, was thinking that...... hypergrowth as a company meant that you had to hyper-grow your team. And so, in those few years, we did what a lot of tech companies did, we like doubled, tripled the team year on year. Like the, the, you know, from, uh, in 2020 we were about 150 people. By (clears throat) the end of 2022, uh, we were over 700. And it was just this, it was this insane amount of hiring and this incredible amount of, of, of hyper-growth as a team. And what I found out is when you hire that quickly, it is impossible to, to do what we've just been talking about, which is maintaining-

    16. HS

      Right.

    17. AW

      ... this high bar, maintaining this, this sort of feeling of excellence within the team.

    18. HS

      Did you see the reduction of that bar in real time?

    19. AW

      It felt... It was kind of subtle. It was something where, um, you know, you, you would hire all these people in and then, um, and then you'd notice it like, maybe like the next year or the next six, uh, you know, six months later. You would notice it slowly and that the organization, you know, there were, there were challenges that the organization used to be able to, to deal with and solve that slowly just, um, calcified, and we weren't able to, we weren't able to get around. And so you'll notice, you know, from the end of '22, you know, where I said we were 700 people to now we're 800 people, the team has mostly kept the same size. And I think what we've really thought about is like how do we, you know, how do we move towards w- But the company, the revenue of the company has grown dramatically.

    20. HS

      It's funny, companies have like brand inflection points. They go hot, they go cold, they go hot again. Do you know what I mean?

    21. AW

      Yeah.

    22. HS

      And it feels like from the outside, Scale's hot again. Do you know what I mean?

    23. AW

      Well, that's, uh... (laughs)

    24. HS

      (laughs) I don't mean that like to be super nice or, or, or not nice. I didn't mean it when saying you're cold, but it's just weird how brands have moments of heat and not heat.

    25. AW

      This is a fascinating thing actually, from, um, you know, I, I actually asked, uh, Patrick Collison this question as well, and Stripe obviously is an incredible company that has, uh, that, uh, for a lot of their lifetime I think has been one of the iconic, um, Silicon Valley companies. And I asked him whether or not he thought that the fact that they were such an iconic company, um, was beneficial in all the hiring they did, and he made an interesting point which was that, um, and hopefully it's okay with me sharing this, but like he made an interesting point that actually, you know, the, um, the best people they hired he thinks would have been people who would have joined whether or not they were the hottest company in Silicon Valley. Um, you know, it was the, it was these sort of like off the beaten path people who, um, who are actually the, the best hires they could have gotten. And a lot of the people who joined because they were the hottest company in Silicon Valley, you know, for one reason or another, weren't necessarily the, the, the most valuable employees. And so there's this, there's this element where I think the common belief and the common narrative is like, you know, you wanna be the hottest company so you can attract the best talent so you can hyper-grow so you can then go keep growing, and I think that's often so, so difficult, and it's much more about like how do you develop an ecosystem of talent that is, that is like very self-preserving, it keeps a very high bar, and always seeks out and searches for the best people? And then independent of whether the company's hot or not hot, 'cause you will have, to your point, you'll have moments where you're hot, moments where you're not hot, moments you're hot, not hot, and you need that talent ecosystem to be self-preserving independent of that, um, to, to, to drive the best outcomes.

    26. HS

      I also think it depends on function. Like when you look at a lot of go-to-market functions, like traditionally for sales, they do like concentrate towards hotter brands, and actually if you can get a concentration of incredible salespeople, especially as you expand geographies, I'm thinking about OpenAI's kind of go-to-market team in London. Unbelievably good. One of the best in London, and it's because they have an amazing brand. Do you see what I mean? So it depends on how close you are to the nucleus and what function you're in.

    27. AW

      Yeah. I think that's, I think that's right. Yeah, yeah. Uh, but then if you look at, if you look at a lot of the core technical development at OpenAI, a lot of that is still driven-

    28. HS

      100% are in the Valley and are as close as could be.

    29. AW

      Well, not even that. It's l- It's these people who've been at OpenAI since before they became the hottest company ever, and so it's, it's one of these things where I think there's... You know, another company, um, that I think experiences or went through this, um, is, uh, is Airbnb, Brian Chesky, right? And I think that, you know, he's talked about this publicly. After the pandemic he all of a sudden realized like, "Hey, I have to..." He had to kind of like rebuild the entire company, and he massively shrunk the team, he invested a lot more into talent density, um, he... And he sort of built it, and then he built the team to, to remain small. And I think that they're even now, uh, the most or one of the most profitable companies per head, uh, in, in all of tech, and that's because of this sort of, this realization that he had that, you know, he didn't need to keep growing that team to see the financial gains

  14. 1:00:411:06:03

    Quick-Fire Round

    1. AW

      or the financial output.

    2. HS

      Listen, I wanna do a quick fire, so I'm gonna say a short statement and you give me your immediate thoughts. Does that sound okay?

    3. AW

      Yes. Let's do it.

    4. HS

      Okay. So what have you changed your mind on most in the last 12 months?

    5. AW

      I think it's actually everything about this hypergrowth stuff that we've been talking about, and it's really around divorcing team hypergrowth from company hypergrowth, and, and extra investing into quality and, and e- excellence.

    6. HS

      What's the biggest misconception you hear most often about AI?

    7. AW

      I think the biggest one today is that, uh, all that's between us and AGI is compute, and, uh, and I think we need data to get there too.

    8. HS

      Tell me, you can have any board member in the world who you don't currently have, you have an amazing board but who you don't currently have. Who would you choose as your next board member?

    9. AW

      Obviously I don't think this is practical, but I do think Satya Nadella has, has been one of the most brilliant business strategists of the modern era. I think what, what he has accomplished at, uh, at Microsoft is just staggering, and, uh, and I think any board would be very lucky to have him.

    10. HS

      Uh, unfair one for me to ask, but I, I actually like it, which is, what question are you not asked or are you never asked that you feel you should be?

    11. AW

      The interesting one is like how my perspective on AI has changed in the successive eras and I mention this because, um, you know, I started the company in 2016.

    12. HS

      Mm-hmm.

    13. AW

      Uh, it, we... The first three years of the company were just full focused on autonomous driving and autonomous vehicles. And so we started working on AI and then in 2019, we actually started working on generative AI, we started working with OpenAI on GPT-2. And so we, uh, we are one of the few AI companies that I think has seen multiple eras of the technology and has seen the sort of, um, the... At least the first boom and bust cycle with, with, uh, with autonomous vehicles and I, I think it's a f- it's an interesting one, which is like what's the same in, in these successive eras and what's different?

    14. HS

      How's your view changed? Are you most excited now?

    15. AW

      I'm quite excited, um, but I think there's also, there's also reason- reasons to be cautious. And I think that one of the things that happened in the autonomous vehicle craze is that there were a lot of promises that were being made that were divorced from the technical reality. And so a lot of the prominent autonomous vehicle companies, a lot of the prominent, um, organizations were making bolder and bolder promises to be able to raise money, um, and those were, you know... At first, they weren't super divorced, but over time, they became more and more divorced from the technical realities, and that resulted in this very painful trough where it sort of the, the promises, uh, weren't met, and so, and so it felt like the entire industry was falling apart. And actually at the end of the day, you know, now we have Waymos driving around San Francisco, we have f- elf- you know, perfectly proper L4 autonomous vehicles driving around, um, Tesla Autopilot has gotten really good. And so if we had more measured promises along the way, I think now we would feel amazing about autonomous vehicles, whereas instead we went through this huge up, this big down, and maybe it's sort of like, uh, on the upswing again. I think this is one of the big concerns I have about generative AI, which is, uh, I hope not, but the same thing might happen again, which is that we have, we have these really big promises that are starting to... That get made, um, about the technology that get divorced from technical reality and then that creates this sort of gap that is bound to cause a hangover.

    16. HS

      Will Trump win? Penultimate one.

    17. AW

      Um, I still think it's a tossup actually. I think that the, um... This is, you know, US elections are so strange to think about because it always gets decided by the swing states. And frankly, I don't trust anyone on the coasts to have... And I'm one of these people on the coast, I live in San Francisco, to have any fine-grained understanding of how the swing states will play out. And so I have no idea, I don't think anybody should listen to anybody who lives on the coast to understand what's gonna happen and I think it, uh, it always boils down to the swing states.

    18. HS

      Final one for you, my friend. Where's Scale in 10 years time?

    19. AW

      You know, hopefully doing something very similar to what we're doing now, which is continuing to, to be the data foundry for AI and serve the data pillar for, uh, for AI progress. You know, one thing I think a lot about is-

    20. HS

      Would you like to go public?

    21. AW

      Uh, for sure. Yeah. Well, one thing I think a lot about is how do you solve problems that will never go out of style?

    22. HS

      But, like, would you like to be the CEO of a public company?

    23. AW

      Uh-

    24. HS

      Do you know what I mean? Like, when you look at the Colisons, you're like, "I don't know why you would if you were Stripe."

    25. AW

      Well, there, there's clear benefits to being a public company, for sure, but I think, uh, Stripe is a, is an incredible company in that they, they can be incredibly profitable and, um, and, uh, so they can, they can accomplish all their core financial goals without needing to go public.

    26. HS

      Listen, Alex, I loved having you on the show. Thank you so much for joining me. As I said, it's so much nicer to do this in person. I'm sorry for the many kind of meandering pivots and turns, but this was fantastic.

    27. AW

      Yeah, this was a lot of fun.

Episode duration: 1:06:04

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode jNbEr9F0wiE

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome