Skip to content
No PriorsNo Priors

No Priors Ep. 65 | With Scale AI CEO Alexandr Wang

Alexandr Wang was 19 when he realized that gathering data will be crucial as AI becomes more prevalent, so he dropped out of MIT and started Scale AI. This week on No Priors, Alexandr joins Sarah and Elad to discuss how Scale is providing infrastructure and building a robust data foundry that is crucial to the future of AI. While the company started working with autonomous vehicles, they’ve expanded by partnering with research labs and even the U.S. government. In this episode, they get into the importance of data quality in building trust in AI systems and a possible future where we can build better self-improvement loops, AI in the enterprise, and where human and AI intelligence will work together to produce better outcomes. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @alexandr_wang 0:00 Introduction 3:01 Data infrastructure for autonomous vehicles 5:51 Data abundance and organization 12:06 Data quality and collection 15:34 The role of human expertise 20:18 Building trust in AI systems 23:28 Evaluating AI models 29:59 AI and government contracts 32:21 Multi-modality and scaling challenges

Sarah GuohostAlexandr (Alex) WangguestElad Gilhost
May 22, 202439mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:003:01

    Introduction

    1. SG

      (instrumental music plays) Hi, listeners, and welcome to No Priors. Today, I'm excited to welcome Alex Wang, who started Scale AI as a 19-year-old college dropout. Scale has since become a juggernaut in the AI industry. Modern AI is powered by three pillars, compute, data, and algorithms. While research labs are working on algorithms and AI chip companies are working on the compute pillar, Scale is the data foundry, serving almost every major LM effort, including OpenAI, Meta, and Microsoft. This is a really special episode for me, given Alex started Scale in my house in 2016 and the company has come so far. Alex, welcome. I'm so happy to be talking to you today.

    2. AW

      Thanks for having me. I've known you of all for, uh, for quite some time, so excited to be on the pod.

    3. SG

      Why don't we start at the beginning just for a broader audience? Talk a little bit about the founding story of Scale.

    4. AW

      Right before Scale, I was studying AI machine learning at MIT, and this was the year when DeepMind came out with AlphaGo, uh, where Google released TensorFlow, so sort of the, maybe the beginning of the deep learning hype wave or hype cycle. And, uh, I remember I was at college. I was trying to, to use, uh, neural networks. I was trying to train, you know, image recognition, uh, neural networks, and the thing I realized very quickly is that, uh, these models were very much so just a product of their data, and you s- I sort of played this forward and thought through it, and, you know, these models, or AI in general, is the product of, you know, three fundamental pillars. There's the algorithms, uh, the compute and the computation power that goes into them, and the data. Um, and at that time, it was clear, you know, there were, uh, there were companies working on the algorithms, uh, labs like OpenAI or Google's labs or, you know, a number of, of AI research efforts. Uh, there were... NVIDIA was already a very clear leader in building compute for these AI systems. Um, but there was nobody focused on data, and, uh, it was really clear that over the long arc of this technology, data was only gonna become more and more important. And so in 2016, uh, dropped out of MIT, did YC, and really started Scale to solve the data pillar of the AI ecosystem and be the organization that was gonna solve all the hard problems associated with, how do you actually, um, produce and create enough data to fuel this ecosystem? And really, this was the start of Scale as the, the data foundry for AI.

    5. SG

      It's incredible, uh, foresight because you describe it as, like, the beginning of the deep learning hype cycle. I don't think most people noticed that a hype cycle was yet going on, um, and so, uh, I just distinctly remember, uh, you know, you working through a number of early use cases, you know, building this company in my house at the time and discovering, I think far before anybody else noticed, that, um, the AV companies were spending all of their money on data.

  2. 3:015:51

    Data infrastructure for autonomous vehicles

    1. SG

      Um, how did you think about... Like talk a little bit about how the business has evolved since then 'cause it's certainly not just that use case today.

    2. AW

      AI is an interesting technology because it is, at the, at the core mathematical level, such a general-purpose technology. It could be... You know, it's basically, um, functions that can approximate nearly any function, including, like, intelligence, uh, and, and so it can be applied in a very wide breadth of use cases. And I think one of the challenges in building an AI over the past, you know, we've been at it for eight years now, um, has really been, uh, what are the applications that are gaining traction and how do you build the right infrastructure to fuel those applications? So, um, as an infrastructure provider, you know, we, we provide the data foundry for all these AI applications. We... Our burden is to be thinking ahead as to where are the breakthrough use cases in AI going to be and how do we basically lay down the tracks before the sort of, you know, freight train of AI comes rolling through? Um, we... You know, when we got started in 2016, this was, uh, the very beginning of the autonomous vehicle, uh, sort of cycle. Uh, it was, I think, right when we were doing YC was when Cruise got acquired, um, and it was sort of the beginning of, you know, the, the sort of, uh, the wave of autonomous driving being one of the key tech trends. And, um, and I think that, you know, we followed the early startup advice, you have to focus, uh, early on as a company, and so we, we built the very first data engine that supported, um, sensor-fused data, so support a combination of 2D data plus 3D data, so Lidars plus cameras that were built on, onto the, onto the vehicles. Um, and then that very quickly became, uh, an industry standard across, uh, across all the players, uh, you know, working with folks like General Motors and Toyota and Stellantis and many others. F- in the first few years of the company, we're just focused on autonomous driving and a handful of other robotics use cases, but that was the, that was sort of the, the primetime AI use case. And then starting in about 2019, 2020, um, it was an interesting moment where, uh, it was actually pretty unclear where the future of, you know, AI use cases, where AI applications were going to come, and this is obviously pre-language model, pre-generative AI, and, and it was, uh, a period of high uncertainty. So we, uh, we then started focusing on government applications. That was one of the areas where it was clear that there was high applicability, um, and it was one of the areas th- that was becoming more and more important, um, globally. So we built the very first data engines to support government data. Um, this was to support, uh, mostly geospatial and satellite and over- other overhead imagery. This ended up fueling the first, um, AI program of record for the US DOD, um, and, and was sort of the, the, the start of our government business, and that technology ended up being critical years later, um, in the, in the Ukraine conflict.

  3. 5:5112:06

    Data abundance and organization

    1. AW

      And then also around that time was when we started working on generative AI, so we partnered with OpenAI at that time to do the very first experiments on RLHF on top of GPT-2.Um, this was, like, the, the primordial days of, of RLHF, and the models back then were, uh, really rudimentary. Like, they didn't... They truly... It did not seem like anything (laughs) to us, but we were just like, "You know, OpenAI, they're a bunch of smart people. We should work with them. We should partner with them." And so, um, we, we partnered with the team that, that originally invented RLHF, and then we basically continued innovating with them from 2019 onwards, but we didn't think that much about the, the underlying technological trend. Um, you know, y- they integrated this, uh, all of this technology into GPT-3 with the... There was a paper InstructGPT which was kind of the precursor to ChatGPT that we worked with them on. And then ultimately, you know, in 2022, DALLE-2 and, and ChatGPT rolled around, and we ended up focusing a lot of our effort as a company into how do we fuel the data for generative AI? How do we be the data foundry for generative AI? Um, and, uh, today, you know, fast forward to today, uh, our data foundry fuels basically every major large language model in the industry. We work with OpenAI, Meta, Microsoft, um, many of the other players, partner with them very closely in fueling their AI development. And in that timeframe, the ambitions of AI have just, you know, um, totally exploded. I mean, we've gone from... You know, GPT-3 I think was... It was a landmark model, but it, it was... You know, there was a modesty to GPT-3 at the time. And now, you know, we're looking at building, you know, agents and very complex reasoning capabilities, um, multimodality, multilinguality. I mean, the, the infrastructure that we have to build to, to support all the directions that developers want to take this technology has been really staggering and, and quite, uh, quite incredible.

    2. EG

      Yeah. You've basically way, uh, surfed multiple waves of AI. And one of the big shifts that's happening right now is there's other types of parties that are starting to engage with this technology, so you're obviously now working with a lot of the technology giants, with government, um, with automotive companies. It seems like there's emergence now of enterprise customers and a platform for that. There's emergence of sovereign AI. How are you engaging with these other massive use cases that are coming now on the generative AI side?

    3. AW

      It's quite an exciting time because I think for the first time in a, in maybe the entire history of AI, AI truly feels like a general purpose technology which can be applied in, you know, a very large number of business use cases. I contrast this to, you know, the autonomous vehicle era where you really felt like we were building a very specific use case that happened to be very, very valuable. Now, it- its general purpose can be... It can, uh, be encompassed across the, the broad span. And as we think about what are the infrastructure requirements to support this broad industry and what is the, what is the broad arc of the technology, um, it's really one where I think, we think, "How do we empower data abundance," right? Um, there's a, there's this question that comes up a lot. You know, "Are we gonna run out of tokens?" Uh, and, uh, and, "What happens when we do?" And I think that that's a choice. I think we as an industry can either choose data abundance or data scarcity, um, and we view our role and our job in the ecosystem to be... to build data abundance. Um, w- the key to the scaling of these large l- language models and the, you know, these, these, uh, uh, language models in general is the ability to scale data. And I think that one of the fundamental bottlenecks to, you know, what's, what's in the way of us getting from GPT-4 to GPT-10 is, you know, data abundance. Are we gonna have the data to actually get there? And, um, our goal is, you know, how do we, how do we ensure that we have enough tokens to do that? And we've sort of... As a, as a community, we have, we've had the easy data which is all the data on the internet, um, and we've kind of exhausted all of the easy data, and now it's about, you know, forward data production, uh, that has high supervisory signal that is basically very valuable. And we think about this as a, you know, frontier data production. And the kinds of data that are really relevant and valuable to the models today, there's a... Um, you know, the quality requirements have just increased dramatically. It's not any more the case that these models can learn that much more from, you know, various comments on Reddit or whatnot. They need, uh, they need truly frontier data. And what does this look like? This is, you know, reasoning chain of thoughts from the world's experts or from mathematicians or physicists or biologists or chemists or lawyers or doctors. Um, this is agent workflow data of agents in enterprise use cases or in consumer use cases or, uh, even coding agents and, and other agents like that. This is, uh, multilingual data, so data that encompasses the full span of, you know, the, the many, many languages that are spoken in the world. This includes, uh, all the multimodal data, to your point. Like, you know, how do we integrate video data, uh, audio data, um, you know, uh, start including more of the esoteric data types that exist within enterprises and s- uh, exists within a lot of industrial use cases into these models? There's this very, uh, large mandate, I think, for th- for our industry to actually figure out what is the means of production by which we're actually going to a- be able to generate and produce more tokens, um, to fuel the future of this industry? And I think there's, there's a few sources, um, or there's a few answers to this. So, the first is, uh, we need, um, we need the best and brightest minds in the world to be contributing data. I think it's, um... One of the things I think is actually quite interesting about this technology is, um, you know, very smart humans, so PhDs or doctors or lawyers or experts in all these various fields, um, actually have a... can have an extremely high impact into the future of this technology by producing data that ultimately feeds into the algorithms. Um, if you think about it, it's actually... Their work is one of the ways that they can have, um, a very scaled society l- level impact. You know, there's, there's an argument that you can make that, um, uh, producing high quality data for AI systems is, is near infinite impact because, you know, even if you improve the model just a little bit, if you were to integrate that over all of the future invocations of that model, that's, like, a ridiculous amount of impact.

  4. 12:0615:34

    Data quality and collection

    1. AW

      So, I think that's something that's quite exciting.

    2. EG

      It's kind of interesting because Google's original mission was to organize the world's information and make it universally accessible and useful.And, um, they would go and they would, um, scan in books, right, from l- library archives, and they were trying to find different ways to collect all the world's information. And effectively, that's what you folks are doing or helping others do, right? You're effectively saying, "Where is all the expert knowledge and how do we translate that into data that can then be used by machines so that people can ultimately use that information?" And that's super exciting.

    3. AW

      It's exciting to the contributors who are in our network as well, because I think, you know, that there's- there's obviously, um, a monetary component and they're excited to- to do this work, but there's a- there's a very meaningful motivation which is, "How do I leverage my expert knowledge and expert insight and use that to fuel this entire AI movement?" Which I think is, um, is like a deep... You know, that's kind of like the deepest scientific motivation, which is, "How do I use my knowledge and capability and intelligence to fuel humanity and progress and knowledge, um, going into the future?"

    4. SG

      I think it's a somewhat undervalued thing where, um... This is gonna age me, but, like, there was a decade or so where, like, the biggest thing happening in technology was, uh, digitization of different processes.

    5. AW

      Mm-hmm.

    6. SG

      And I think there's actually some belief that like, "Oh, that's happened," right? Like, you know, interactions are digital and so, like, information is captured in relational database systems on, you know, customers and employees or whatever. But one of the big discoveries as a investor in this field over the last five years has been, like, the data is not actually captured-

    7. AW

      Yeah.

    8. SG

      ... for almost any use case you might imagine, um, for AI, right? Because I have multiple companies, and I'm sure Elad does too, and you in your personal investing where, you know, the first six months of the company is a question of, where are we going to get this data? You go to many of the incumbent, um, software and services vendors, and despite having done this task, you know, for years, they have not actually captured the information you'd want to teach a model.

    9. AW

      Yeah.

    10. SG

      Um, and, like, that, you know, that knowledge capture era I think is happening at Scale is a really important part.

    11. AW

      To make a Dune 2 analogy, I mean, I think it really is spi-... You know, data production is very similar to spice production. It is the- it will be the- the lifeblood of all the future of these AI systems. Um, and, you know, so- so I think best and brightest people is one key source. Um, proprietary data is definitely a very important source as well. You know, uh, crazy stat, but JPMorgan's proprietary data set is 150 petabytes of data. Um, GPT-4 was trained on less than one petabyte of- of data. So, there's clearly so much data that exists within enterprises and- and, uh, governments that is proprietary data that can be used for training, um, incredibly powerful AI systems. And then I think there's this key question of what's the- what's the future of synthetic data and how synthetic data needs to emerge? And- and our perspective is that the critical thing is- is what we call hybrid human AI synthetic data. So, how can you build hybrid human AI systems such that, uh, AI are doing a lot of the heavy lifting but human experts and people, you know, the basically best and brightest, the- the smartest people, the sort of best at reasoning can contribute all of their insight and capability to ensure that you produce data that's of extremely high quality, of high fidelity to ultimately fuel the- the future of these models?

    12. SG

      I want to pull this thread a little bit because something you and I were talking about, both in the context of data collection and evals, is like, what do you do when the models are actually quite good,

  5. 15:3420:18

    The role of human expertise

    1. SG

      right? Better than humans on many measured dimensions? Um, and so, like, can you talk about that, um, from both the data and perhaps, um, you know, we should talk about evaluation as well.

    2. AW

      I mean, I think philosophically, the question is not, is a model better than a human unassisted from- from a model? The question is, is a human plus a model together going to be able to produce better output than a model alone? And I think that'll be the case for a very, very, very long time. That- that humans are still... You know, human intelligence is complementary to machine intelligence that we're building, and they're going to be able to combine to build, you know, to do things that are, uh, strictly better than what the models are going to be able to do on their own.

    3. SG

      I have this optimism. Uh, Elad and I had a debate at one point that was, um, challenging for me, uh, philosophically about whether or not centaur play or, like, machine and human intelligence were complementary.

    4. AW

      My simple case for this is when we look at the machine intelligence, like the models that are produced, you know, we always... You know, you- you see things that are really weird. You know, there's like the- the RoT 13 versus RoT 8 thing, for example, where the models know how to do RoT 13, they don't know how to do RoT 8. Um, there's the reversal curse. You know, there's all these- these artifacts that indicate somehow that it is not like human intelligence or not like biological intelligence, and I think that's a- that's the bold case for humanity which is that, you know, there are certain qualities and attributes of human intelligence which are somehow distinct from the very separate and very different process by which we're training these algorithms. Um, and so then I think, you know, what- what does it look like in practice? It's, you know, if a model produces an answer or response, how can a human critique that response to improve it? How can a human expert, you know, highlight where there's factuality errors or where there's reasoning errors to improve the quality of it? How can the human aid in guiding the model over like a long period of time to produce reasoning chains that are very, um, that are very, uh, correct and deep and- and are able to drive, you know, the capability of these models forward? And so I think there's a lot that goes into... This is what we spend all of our time thinking about. What is the human expert plus, uh, plus model teaming that's going to help us keep pushing the boundary of what the models are capable of doing?

    5. EG

      How long do you think human expertise continues to play a role in that? So if I look at certain models, Med-PaLM 2 would be a good example where Google released a model where they showed that, um, the model output was better than the average physician. You could still get better output from a cardiologist, but if you just asked a GP a cardiology question, the model would do better as ranked by physician experts. Um, so it showed that already for certain types of capabilities, the model provided better insights or output than people who are trained to do some- some aspects of that.How far do you think that goes in terms of, uh, or when do you think human expertise no longer is additive to these models? Is that never? Is it three years from now? I'm sort of curious of the timeframe.

    6. AW

      I think it's never because I think that, um, you know, the, the key quality of human, uh, intelligence or biological intelligence is this ability to reason and optimize over very long time horizons. So, and this is biological, right? Because our, our goals as, as biological entities is to optimize over, you know, our lifetimes, optimize for reproduction, et cetera. So we have the ability as, as human intelligences to produce long-term goals, continue optimizing, adjusting, um, and reasoning over very long, very long time horizons. You know, current models don't have this capability because the models are trained on these like little nuggets of human intelligence. So they're, they're tr- you know, they're very good at like y- almost like a, a, uh, like a shot glass full of human intelligence, but they're very bad at continuing that intelligence over a long time period or a long time horizon. And so this, this fundamental quality of biological intelligence I think is something that will only be taught to the model over time through, you know, uh, through direct transfer via data, um, to, to fuel these models.

    7. SG

      You don't think there's a, like a, um, architectural breakthrough in planning that solves it?

    8. AW

      I think there will be architectural breakthroughs that improve performance dramatically, but I think if you think about it in- inherently, like these models are not trained to optimize over long time horizons in any way, and we don't have the environments to be able to, to get them to optimize for these like, you know, amorphous goals over long time horizons. So I think this is a somewhat, um, fundamental, uh, limitation.

    9. SG

      Before we, like, talk about, you know, some of the cool releases you guys have coming out and what's next for Scale, um, maybe we can, uh, zoom out and just congratulate you on the fundraise that you guys just did. Um-

    10. AW

      Thank you.

    11. SG

      ... a billion dollars at almost 14 billion in valuation, um, with, uh, you know, really interesting investors like AMD, Cisco, Meta. I wanna hear a little bit about the strategics.

  6. 20:1823:28

    Building trust in AI systems

    1. SG

    2. AW

      Our mission is to serve the entire AI ecosystem and the broader AI industry. You know, we're an infrastructure provider. That's our role, is to be, um, as much as possible supporting the entire industry to flourish, uh, as, as much as possible. And we thought an important part of that was, um, how can we, uh, be an important part of the ecosystem and build as much ecosystem around this data foundry which is going to fuel the future of the industry, um, as much as possible, which is one of the reasons why we wanted to bring along, A, other infrastructure providers like Intel and AMD and folks who are also laying the groundwork for the future of the technology, um, but also, you know, key players in the industry, like, uh, like Meta. Um, folks like Cisco as well. You know, our view is that ultimately there's, there's the stack that we think about the te- There's the infrastructure, there's the technology, and there's the application. And our goal as much as possible is how do we, um, how do we leverage this, this data capability, this data foundry to empower every layer of that stack as much as possible, um, and, and build an in- a broader industry, um, viewpoint around what's needed for the future of data? I mean, I think that this is an, an exciting moment for us. I mean, we see our role, uh, you know, going back to the framing of what's holding us back from GPT-10? What's, what's in the way from GPT-4 to GPT-10? Um, we want to be investing into actually enabling that pretty incredible, uh, technology journey. And, you know, there's, there's tens of billions, maybe hundreds of billions of dollars investment going into the compute side of this equation. And one of the reasons why we thought it was important to, uh, to raise the money and continue investing is, you know, there's real investment that's gonna have to be made into the data production to actually get us there.

    3. SG

      With great power comes great responsibility. Um, if, uh, you know, if these AI systems are what we think they are in terms of societal impact, like, trust in those systems is a crucial question. Like, how do you guys think about this as part of your work at Scale?

    4. AW

      A lot of what we think about is how do we utilize... How does the data foundry, um, enhance the entire AI life cycle, right? And that life cycle goes from, you know, A, ensuring that there's data abundance as well as data quality going into the systems, but also being able to measure the AI systems, which builds confidence in, in AI, and also enables fur- further development and further adoption of the technology. And this is, this is the fundamental loop that I think every AI company goes through. You know, they, they get a bunch of data or they generate a bunch of data. They train their models, they evaluate those systems, and they sort of, you know, uh, go again in the loop. And so evaluation and measurement of the AI systems is a critical component of the life cycle, but also a critical component I think of, of society being able to build trust in these systems. You know, how are governments gonna know that these AI systems are, are safe and secure and fit for, uh, you know, broader adoption within their countries? How do... How are, um, enterprises gonna know that when they deploy an AI agent or an AI system that it's actually going to be good for the consumers and that it's not gonna create greater risk for them? How do, um, how are labs gonna be able to consistently measure what are the intelligences of my, of the AI systems that we build and how are we gonna, you know, how do they make sure they continue to develop responsibly as

  7. 23:2829:59

    Evaluating AI models

    1. AW

      a result?

    2. SG

      Can you give our listeners a little bit of intuition for like what makes evals hard?

    3. AW

      One of the hard things that... You know, because we're building systems that we're trying to approximate and, and build human intelligence, um, grading one of these AI systems is, is not something that's very easy to do automatically. And it's, it's sort of like, um, you know, you have to kind of build IQ tests for these models, which in and of itself is a very fraught philosophical question is like, how do you measure the intelligence of a system? And this, uh, there's very practical problems as well. So most of the benchmarks that we as a community look at for the-

    4. SG

      The academic benchmarks, yeah.

    5. AW

      Yeah, the academic benchmarks that are what the industry use to measure the performance of these algorithms are fraught with issues. Many of the models are overfit.... on these benchmarks. They're sort of in the training datasets of these models, um, and so-

    6. SG

      You guys just did some interesting research here.

    7. AW

      Yes.

    8. SG

      Like, you published them.

    9. AW

      Yup. So we, one of the things we did is we published DSM-1K, which was a, uh, a held out eval. So we basically produced a new, um, evaluation of the math capabilities of models that there's no way would ever exist in the, in the training dataset to really see how much of the perfo- how were the performance of the models, um, uh, what were the reported performance of the model capability versus the actual capability. And what you notice is some of the models perform really well, but some of them perform much worse than their reported performance. And so this whole question of how we as a society are actually gonna measure these models is, is a really tough one. And our answer is we have to leverage the same human experts and the, kind of the best and brightest minds to do expert evaluations on top of these models to understand, you know, where are they powerful, where are they weak, and, and what's the sort of, um, what are the sort of risks associated with these, with these models? So, you know, one of the things that, um, we're very, we're, you know, we're going to, uh, we're very passionate about is there needs to be sort of public visibility and transparency into the performance of these models. So there need to be leaderboards, there need to be evaluations that are public that demonstrate, uh, in a, in a very rigorous, scientific way what the performance of these models are. And then we need to build the platforms and capabilities for governments, enterprises, labs to be able to do constant evaluation on top of these models to ensure that we're always developing the technology in a safe way and that we're always deploying it in a safe way. Um, so this is something that we think is, you know, just in the same way that our role as an infrastructure provider is to support the data needs for the entire ecosystem, we think that building this layer of confidence in the systems through accurate measurement is going to be fundamental to the further adoption and further development of the technology.

    10. SG

      Do you wanna talk about state of AI at the application layer? 'cause you have a, uh, viewpoint into that that very few people do.

    11. AW

      You know, after GPT-4 launched, there was sort of this, this frenzy of, of sort of an application build-out. And, um, and I think that there was, you know, there were all these, like, agent companies, all this excitement around agents. There was all these, like, you know, a lot of applications that were built out. And I actually think it's, it's an interesting, um, moment in the, in the, in the, the life cycle of AI, which is that, you know, GPT-4 I think w- as a model was a little early of a technology for us to have this entire hype wave around. And I think we, you know, the community very quickly discovered all the limitations of GPT-4, but, you know, we all know GPT-4 is not the terminal model that, that AIs... that we are going to be using. There are better models on the way. And so I think there was a, there was an element by which, you know, it's sort of a classic hype cycle. GPT-4 came out, lots of hype around building, um, applications around GPT-4, um, but it was, it was probably a few generations too early of a model to, for the thousand flowers to bloom. And so I think in the coming models, we're going to see, um, we're going to this, this sort of, like, trough of disillusionment I think we're going to come out of because the next, the future models are gonna be so much more powerful. And you're actually gonna have all of the fundamental capabilities you need to build agents or all sorts of incredible things on top of it. And we think wha- what we're very passionate about is how do we empower, um, application builders, so whether that be enterprises or governments or startups, to have, um, to build self-improvement into the applications that they build? So, uh, what we see from the large labs, um, like OpenAI and others, is that self-improvement comes from data flywheels. So how do you have a flywheel by which you're constantly, you know, getting new data that improves the, your model? You're constantly evaluating that system to understand where there's weaknesses and you're, you're sort of, like, continually hydrating this, this workflow. Um, we think that fundamentally every enterprise or government or startup, um, is going to need to build applications that, that have the self-improvement s- loop and cycle, and it's very hard to build. And so, you know, we built this product, our Gen AI platform, to really, to really build, you know, lay the groundwork and the platform to enable the entire ecosystem to be able to build these self-impro- improvement loops into their, into their products, um, as well as possible.

    12. EG

      I, I was just curious. I mean, one thing related to that is you mentioned that, for example, JP Morgan has 150 petabytes of data relative... You know, that's a 150 times what, um, some early GPT models trained on. How do you work with enterprises around those loops or what other types of customer needs that you're seeing right now or application areas?

    13. AW

      One of the things that every, you know, all the model developers understand well but the enterprises don't understand super well is that, um, you know, not all data is created equal and high-quality data or frontier data is, is, can be, you know, 10,000 times more valuable than just any run-of-the-mill data, uh, within an enterprise. And so a lot of the challenge in, you know, or, or a lot of the problems that we solve with enterprises are how do you go from this giant mountain of data that sort of is, like, all over this... truly all over the place, um, and distributed everywhere within the enterprise to what are the... How do you compress that down and filter it down to the high-quality data that you can actually use to, you know, fine-tune or train, um, or continue to enhance these models to actually drive differentiated performance?

    14. EG

      I think one thing that's interesting is that there's some papers out at Meta which basically shows that, uh, actually narrowing the amount of data that you use creates better models. Um, so the output is better, the models are smaller, which means they're cheaper to run, they're faster to run. And so to your point, it's really interesting because a lot of people are sitting on these massive datasets and they think all that data is really important, and it sounds like you're really g- really working with enterprises to sort of narrow that down into what's the data that actually improves the model?

    15. AW

      Yes.

    16. EG

      So is that infor- if- for information, uh, theory question in some sense. Um-

    17. What, what are some of the launches that are coming from Scale now?

  8. 29:5932:21

    AI and government contracts

    1. EG

    2. AW

      You know, we're building evaluations, uh, for the ecosystem. So one is that we're la- going to launch, um, uh, these private held out evaluations and have leader boards associated with these evals, um, for the leading LLMs, uh, in the ecosystem, and we're gonna rerun this contest, uh, periodically. So every few months, we're going to do a new set of held out evals to basically consistently benchmark and monitor the performance of, of our models and continue adding more domains. So we're gonna start with areas like math, and coding, instruction following, um, adversarial, uh, capabilities, and then we're gonna, over time, continue, you know, increasing the number of, of areas that, uh, we test these models on. We think about it as kind of like a, like an Olympics for LLMs, but instead of every four years, it'll be every few months, um, uh, so that's one thing we're quite excited about, um, and then, uh, we have an exciting launch coming with, uh, some of, uh, with our government, uh, customers. So, uh, one of the things that, uh, we see with, w- in the government space is they're trying to use, um, LLMs and they're trying to use these capabilities is, there's actually a lot of, uh, there's a lot of cases where, um, even the current agentic capabilities of the models can be extremely valuable to the, to the government. Um, and it's often in, in pretty boring use cases, like writing reports, or filling out forms, or pulling information from one place to another, but it's well within the capabilities of these models, um, and so we're gonna be... we're excited about launching, uh, some agentic features for our, for our government customers in, uh, with our Donovan product.

    3. EG

      These are applications you build yourselves, or an application building framework?

    4. AW

      So for our government customers, we basically build a, like a, uh, AI staff officer, so it's a, it's a full application, but it integrates with whatever model our customers think is appropriate for their use case.

    5. EG

      And do you think that Scale will invest in that for enterprise applications in the future?

    6. AW

      Our view for enterprises is, is fundamentally, like, how do we, how do we, w- for the applications that enterprises are gonna build, how do we help them build, uh, self-improvement into those products? So, we think about it much more as, uh, at the platform level for enterprises.

    7. EG

      Does the new OpenAI or Google, uh, release, uh, change your point of view on anything fundamentally? Multi-modality, um, you know, the applicability of voice agents, et cetera?

    8. AW

      You know, I think you tweeted about this, but one, one very interesting element is, uh, the, the direction that we're going in terms of consumer focus,

  9. 32:2139:00

    Multi-modality and scaling challenges

    1. AW

      and it's, it's fascinating. I mean, I think multi-modality... Well, taking a step back, first off, I think it points to where there's, where there's still huge data needs. So multi-modality as an entire space is one where, um, for the same reasons that we've, like, exhausted a lot of the internet data, there's a lot of scarcity for good multi-mod- modal data that can empower these personal agents and these personal use cases. So I think there's, um, you know, as we wanna keep improving these systems and improving these personal agent use cases, there's, you know, we think about this a lot, what are the data needs that are actually, um, that are going to be required to actually fuel that? Um, I think the other thing that's fascinating is, is, um, the convergence, actually. So, both labs have been working, um, independently on, on d- various technologies, and, you know, Astra, which is Google's, uh, major sort of, uh, hubcap release, as well as 4o, you know, they're both shockingly similar, um, and, uh, sort of, uh, you know, demonstrations of the technology. And so there's... I think that was, that was very fascinating that the labs were sort of converging on the same end use cases, or the same visionary use cases for the technology.

    2. EG

      I think there's two reads of that. One is, like, there's an obvious technical next step here, um, and very smart people have independently arrived, and the other is, like, competitive intelligence is pretty good.

    3. AW

      Yeah, I-

    4. EG

      (laughs)

    5. AW

      ... I think both are probably true. I think both are true.

    6. EG

      Yeah. It's funny because when I used to work on products at Google, um, we'd spend two years working on something, and then the week of launch, somebody else would come out with something and we'd launch it, and then people would claim that we copied them. And so I do think a lot of this stuff just happens to be, in some cases, just where the whole industry is heading, and it's kind of people are aware that multi-modality is one of the really big areas, um, and a lot of these things are years of work going into it, so it's kind of interesting to watch that as an external observer, yeah. Yeah. I mean, this is also not a, like a, a training run that is a one-week copy effort, right?

    7. AW

      It... Well- and then I think the last thing that is, um, that, you know, I've been thinking a lot about is, like, when are we gonna get smarter models? So, you know, we got multi-modality capability. That's exciting. It's more of a lateral, um, expansion of the models, and the, the industry needs smarter models. We need GPT-5 or we need Gemini 2, or whatever that, those models are going to be, um, and so, to me it was, you know, uh, I was somewhat disappointed because I just want much smarter models that are gonna e- enable, kind of as you mentioned before, you know, way more applications to be built on top of them.

    8. EG

      The year is long. End of year. Um, okay, so quick fire, and Elade chime in if you, um, have ones here. Uh, um, something you believe about AI that other people don't?

    9. AW

      My biggest belief here is that the, the path to AGI is, uh, is one that looks a lot more like curing cancer than, uh, than developing a vaccine, and what I mean by that is, I think that the, the path to build AGI is going to be, um, in inc- you know, you're going to have to solve a bunch of small problems that where you don't get that much positive leverage between, um, solving one problem to solving the next problem, and there's just sort of, you know, it's like curing cancer, which is you have to then zoom in to each individual cancer and solve them independently, and eventually, over a multi-decade timeframe, we're gonna look back and realize that we've, we've, you know, built AGI. We've cured cancer, but the, the path to get there will be this, like, you know, quite plodding road of, of solving individual capabilities and building individual sort of, um, data flywheels to support this end mission, whereas I think a lot of people in the industry paint the path to AGI as, like, you know, eventually we'll just, boop, we'll get there. We'll, like, you know... (laughs) we'll, we'll, we'll, we'll, like, uh, we'll solve it, uh, in one fell swoop, and, um, I think there's a lot of implications for how you actually think about, you know, the technology arc and, and, and how the- how society is gonna have to deal with it. I think it's actually a pretty bullish case for society adapting the technology because I think it's going to be, you know, consistent slow progress for quite some time, and society will have time to fully sort of, uh, uh, acclimate to the technology that develops.

    10. SG

      When you say solve, like, a problem at a time, right, if we just, like, pull away from the analogy a little bit, should, uh, should I think of that as, um, generality of multi-step reasoning is really hard, as, you know, Monte Carlo tree search is not the answer that people think it might be, um, we're just gonna run into scaling walls? Like, what, sort of what are the dimensions of, like, solving multiple problems?

    11. AW

      I think the main thing fundamentally is I think there's, there's very limited generality that we get from these models, um, and even for multi-modality, for example, uh, my understanding is there's no positive transfer from learning in one modality to other modalities. So, like, training off of a bunch of video doesn't really help you that much with your text problems and vice versa. And so, I think what this means is, like, each, um, sort of, uh, each niche of capabilities or each area of capability is go- going to require separate flywheels, data flywheels to be able to, to push through and drive performance.

    12. SG

      You don't yet believe in video as basis for world model that helps-

    13. AW

      I think that's-

    14. SG

      ... models reason?

    15. AW

      I think it's great narrative, I don't think there's strong scientific evidence of that yet. Maybe there will be eventually, um, but I think that this is the, uh, I think the base case, let's say, is one where, you know, there's not that much generalization coming out of the models, and so we actually just need to slowly solve lots and lots of little problems to ultimately result in AGI.

    16. SG

      One last question for, for you is like, you know, leader of scale, a scaling organization, like, what are you thinking about as a, as a CEO?

    17. AW

      And this will almost sound cliché, but just how early we are in this, in this technology. I mean, I think that there's, um, you know, it's, it's strange 'cause on the one hand, it feels like we're so late because the tech giants are investing so much and there's a bajillion launches all the time and, uh, there's, you know, uh, there's, there's all sorts of investment into this space, but-

    18. SG

      Markets look crowded in the obvious use cases.

    19. AW

      Yeah, exactly. Markets look super crowded, but I think fundamentally, we're still super early because the technology is, you know, 1/100 or 1/1000 of its future capability, and as we, as a community and as an industry and as a society ride that wave, it's just gonna be, you know, there's so many more chapters to the book. And so as a, you know, if you think about any organization, what we think about a lot is, is nimbleness. Like, how do we ensure that as this technology continues to develop that we're able to continue, um, adapting alongside, uh, the developments of the technology?

    20. EG

      All right. That's a great place to end. Thanks so much for joining us today.

    21. AW

      Yeah.

    22. SG

      Thanks, Alex.

    23. AW

      Thank you.

    24. SG

      Find us on Twitter @nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way, you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.

Episode duration: 39:00

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode 2SWRU7YOd6c

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome