No Priors Ep. 91 | With Cohere Co-Founder and CEO Aidan Gomez

In this episode of No Priors, Sarah is joined by Aidan Gomez, cofounder and CEO of Cohere. Aidan reflects on his journey to co-authoring the groundbreaking 2017 paper, “Attention is All You Need,” during his internship, and shares his motivations for building Cohere, which delivers AI-powered language models and solutions for businesses. The discussion explores the current state of enterprise AI adoption and Aidan’s advice for companies navigating the build vs. buy decision for AI tools. They also examine the drivers behind the flattening of model improvements and discuss where large language models (LLMs) fall short for predictive tasks. The conversation explores what the market has yet to account for in the rapidly evolving AI ecosystem, as well as Aidan’s personal perspectives on AGI—what it might look like and when it could arrive. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @AidanGomez Show Notes: 0:00 Introduction 0:36 Co-authoring “Attention is all you need” 2:27 Leaving Google and founding Cohere 4:04 Cohere’s mission and models 6:15 Pitfalls of current AI 8:14 How enterprises are deploying AI today 10:58 Build vs. buy strategy for AI tools 14:37 Barriers to enterprise adoption 20:04 Which types of companies should pretrain models? 24:25 Addressing flaws in open-source models 25:12 Current and expected progress in scaling laws 29:54 Advances in multi-step problem solving and reasoning 32:29 Key drivers behind the flattening curve of model improvements 36:25 Exploring AGI 39:59 Limitations of LLMs 42:10 What the market has mispriced

Sarah GuohostAidan GomezguestElad Gilhost

Nov 21, 202444mWatch on YouTube ↗

EVERY SPOKEN WORD

75 min read · 14,641 words

0:00 – 0:36
Introduction
1. SGSarah Guo
  (instrumental music plays) Hi, listeners, and welcome to No Priors. Today, we're hanging out with Aidan Gomez, co-founder and CEO of Cohere, a company valued at more than 5 billion in 2024, which provides AI-powered language models and solutions for businesses. Aidan founded Cohere in 2019. But before that, during his time as an intern at Google Brain, he was a co-author on the landmark 2017 paper, Attention Is All You Need. Aidan, thanks for coming on today.
2. AGAidan Gomez
  Yeah, thank you for having me. Excited to be here.
3. SGSarah Guo
  Maybe we can start, uh, just a little bit with the
0:36 – 2:27
Co-authoring “Attention is all you need”
1. SGSarah Guo
  personal background. Um, how do you go from growing up in the woods in Canada to, um, you know, working on the most important technical paper in the world?
2. AGAidan Gomez
  A lot of luck and, and chance. Um, but yeah, I happened to go to school at the place where Geoff Hinton, uh, taught. And so, um, obviously, Geoff recently won the Nobel Prize. He's kinda, like, uh, attributed with being the, the godfather of, of deep learning. In U of T, the school where I went, he was a legend, and pretty much everyone who was in computer science studying at the school wanted to get into AI. Uh, and so in some sense, I, I feel like I was raised into AI. Like, as soon as I stepped out of high school, um, I was steeped in an environment that really saw the future, uh, and wanted to build it. Um, and then from there, it was a bunch of happy accidents. So I, I somehow managed to get an internship with Lukasz Kaiser, uh, at, at Google Brain. Um, and I found out at the end of that internship I wasn't supposed to have gotten that internship. It was supposed to have been for PhD students. And so they were, like, throwing a goodbye party for me, the intern, um, and Lukasz was like, "Okay, so Aidan, you're going back. How many, how many years have you got left in your PhD?" Uh, and I was like, "Oh, I'm going back into third year undergrad." Uh, and he was like, "We don't do (laughs) undergrad internships." So I think it was a bunch of, like, really lucky mistakes, uh, that led me, led me to that team.
2:27 – 4:04
Leaving Google and founding Cohere
1. SGSarah Guo
  Working on really interesting, important things at Google, what, uh, convinced you that you should start Cohere?
2. AGAidan Gomez
  Yeah, so I bounced around. Like, when I was working with Lukasz and Noam and the Transformer guys, I was in Mountain View, and then I went back to U of T, uh, started working with Hinton and my co-founder, Nick, in Toronto, uh, at Brain there. And then I started my PhD, and I went to England, um, and I was working with, uh, Jakob, who's another Transformer paper author, in Berlin, and collaborating with Justin-
3. SGSarah Guo
  Mm-hmm. We had Jakob on the podcast.
4. AGAidan Gomez
  Oh, nice. Yeah, yeah, yeah. Okay. Fan of the pod. Good, good. Um, so yeah, I, I was working with Jakob in Berlin, and then I was also collaborating remotely with Jeff Dean and Sanjay on Pathways, which was, like, their, you know, bigger-than-a-supercomputer training program. Uh, the idea was, like, wiring together supercomputers to create a new larger unit of compute that you could train models on. And at that stage, GPT-2 had just come out, and it was pretty clear the trajectory of the technology. Like, we were on a very interesting path, and these models that were ostensibly models of the internet, models of the web, um, were gonna yield some pretty interesting, interesting things. So I, I called up Nick, I called up Ivan, my co-founders, and I said, "You know, maybe we should figure out how to build these things. I, I think they're gonna be useful."
5. SGSarah Guo
  For
4:04 – 6:15
Cohere’s mission and models
1. SGSarah Guo
  anyone who doesn't know yet, can you just describe at the high level, like, what Cohere's mission is and then what the models and products are?
2. AGAidan Gomez
  Yeah, so our, our mission, the way that we wanna create value in the world, um, is by enabling other organizations to adopt this technology and make their workforce more productive or transform their product, uh, and the services that they offer. So we're very focused on the enterprise. We're not going to build a ChatGPT competitor. What we wanna build is a, a platform and a series of products to enable enterprises to adopt this technology and, and make it valuable.
3. SGSarah Guo
  And in terms of, like, your North Star of how you organize, um, the team and invest, uh, you obviously come from a research background yourself. Like, how much do you think, um, you know, Cohere's success is dependent on core models versus other, you know, platform and go-to-market support investments you make?
4. AGAidan Gomez
  It's all of the above. Like, the models are the foundation, and if you're building on a foundation that, um, doesn't meet the customer's needs, then there, there's no hope. And so the models are crucial, and, um, it's like the heart of the company. But in the enterprise world, things like customer support, reliability, security, these are all key, and so we've heavily invested on both sides. We're not just a modeling organization, we're a modeling and go-to-market organization. Um, and increasingly, product is becoming a, a priority for Cohere, and so figuring out ways to shorten time to value for our customers. Um, yeah, over the past, like, 18 months, since the enterprise world sort of woke up to the technology, we've watched, we've watched folks build with our models, seen what they're trying to accomplish, seen the common mistakes that they make. That's been helpful. It's been sometimes frustrating, right? Watching the same mistake again and again.But we think there's a huge opportunity to be able to help enterprises avoid those mistakes and implement things right the first time. And so that's really where we're pushing towards.
5. SGSarah Guo
  Yeah. Can we make that a little
6:15 – 8:14
Pitfalls of current AI
1. SGSarah Guo
  bit more real? Like, what is the mistake that frustrates you most and how can product go meet that?
2. AGAidan Gomez
  Yeah. Well, I- I think all language models are quite sensitive, uh, to- to prompts, to the way that you present data. They all have their own individual quirks. The way that you talk to one might not work for the way that you talk to another. And so when you're building a system like a RAG system, where there's an external database, it really matters how you present the retrieved results to the model. It matters how the data is actually stored in those- in those databases. The formatting counts. A- and these small details are often lost on people. They, uh, they overestimate the models. They think they're- they're like humans. And that has led to a lot of repeat failures. Uh, people try to implement a RAG system, they don't know about these, like, idiosyncratic elements of implementing one properly, and then it fails. And so in 2023, there were a lot of these POCs, a lot of people trying to get familiar with the technology, wrap their heads around it, and a lot of those POCs fail, uh, because of unfamiliarity, because of, um, yeah, these common errors that we've seen. And so moving forward, we have two approaches. One is making the models more robust. So the model should be robust to a lot of different ways that you present data. And the second piece is being more structured about the product that we expose to the user. So instead of just handing a model and saying, you know, "Prompt it, good luck," uh, actually putting more structure around it. So creating APIs that more rigorously define how you're supposed to use the model. These sorts of pieces, I think, just reduce the chances of failure, uh, and make these systems much more usable for the user.
3. SGSarah Guo
  What are
8:14 – 10:58
How enterprises are deploying AI today
1. SGSarah Guo
  people trying to do? Can you give us a flavor of some of, like, the, um, biggest use cases you see in the enterprise?
2. AGAidan Gomez
  It's super broad. Uh, so it spans pretty much every vertical. I mean, the common things are, like, Q&A. So speaking to a corpus of documents. For instance, if you're a manufacturing company, you might wanna build a Q&A bot for your engineers or your workers who are on the assembly line, uh, and plug in all of the- the manuals of the different tools and diagnostic manuals for common errors and parts, and then let the user chat to that instead of having to open up a thousand-page book and- and try to find what they need. Similarly, Q&A bots for the average enterprise worker. So plugging in your IT FAQ, your HR docs, all the things about your company, and having a centralized chat interface onto the knowledge of your organization so that they can get their questions answered. Those are some of the common ones. Beyond that, there are kind of specific functions that we power. Um, a good example might be for a healthcare company, they have these longitudinal health records of patients, and that consists of every interaction that that patient has with the- the healthcare system, from visits to a pharmacy, to the different labs or tests that they're getting, uh, to doctor's visits, and it can span decades. And so it's a huge, huge record of someone's medical history. And typically what happens is that patient will call in and they'll ring up the receptionist and be like, "My knee hurts. I need an appointment." And the doctor then needs to kind of comb through the past few entries, see has this come up before, and maybe they missed something that was two years ago 'cause they only have 15 minutes before an appointment. Um, but what we can do is we can feed that entire history in alongside the reason they're coming in. So contextually relevant, right? To what they said they're coming in for, and surface a- a briefing for the doctor. Um, and so this tends to be, one, dramatically faster for the doctor to review, but also often it catches things that a doctor couldn't possibly review before every patient meeting. They're not going through 20 years of medical history. Like that, it's just not possible. Um, but the model can do that. It can do that in under a second. So tho- those are the sorts of functions that we're seeing, summarization, Q&A bots, a lot of these, um, you might think of them as mundane, but the impact is immense.
10:58 – 14:37
Build vs. buy strategy for AI tools
1. AGAidan Gomez
2. SGSarah Guo
  We see tons of startups working on problems such as, um, let's say, enterprise search overall, specialized applications to, uh, let's say, like, technical support for a particular vertical, even looking at health records and reasoning against them and retrieving from them. How do you think about, like, what the end state... um, there's no end state, but what some, um, stable equilibrium state is for how enterprises consume from, let's say, specialist AI-powered application providers versus custom applications built in-house with, uh, AI platforms and- and, uh, model APIs?
3. AGAidan Gomez
  I think it's gonna be a hybrid. I think it's probably... you can imagine like a pyramid where the bottom of that pyramid, every organization needs this stuff, and it's like Copilot, like a generalist chatbot in the hands of every single employee to answer their questions. And then as you head up the- the pyramid, it's more specific to the company itself or the specific domain or product that they- they operate in or offer. Um, and as you push up that- that pyramid...It's much less likely you're gonna find an off-the-shelf solution to address it, uh, and so you're gonna have to build it yourself. What we've pushed organizations to do is have a strategy that encompasses that full pyramid. Yes, you need the- the generalist standard stuff. Maybe there's some industry-specific tools that you can go out and buy. But then if you're building, don't build those things that you could buy. Instead, focus on the stuff that no one's gonna sell to you and it gives you uniquely a competitive advantage. So we- we worked with this insurance company, uh, and they- they insure, like, large industrial development projects, and it turns out, like, I- I know nothing about this space. It turns out what they do is there's like an RFP, uh, put out by a mine or something, like, whatever the project is for insurance, and they have actuaries jump on that RFP, do tons of research about, you know, the land that it's on, the potential risks, et cetera, and then it's essentially a race to whoever responds first usually gets it. And so it's a- a time-based thing. How quickly can these actuaries-
4. SGSarah Guo
  Hmm.
5. AGAidan Gomez
  ... put forward a good researched proposal? Um, and what we built with them was like a research assistant. So we plugged in all the sources of knowledge that these actuaries go to to do their research, uh, via RAG, and we gave them a chatbot, and it dramatically sped up their ability to respond to RFPs, and so it grew their business, uh, 'cause they were just winning many more of them. Um, and so it's tough for like, you know, we built horizontal technology. An LLM is kinda like a- a CPU. I don't know all the applications of an LLM, right? It's so broad and really the- the deep insight or the- the competitive advantage, the thing that puts you ahead, um, is listening to the customer and letting them tell you what would put them ahead. Um, and so that's- that's a lot of what we've been doing is just being a thought partner and helping brainstorm these projects and ideas, uh, that are- that are strategic to them.
6. SGSarah Guo
  I'd wager that's- you know, this company is winning because the vast majority of their competitors haven't been able to move so quickly to, uh, adopting, you know, and building a- like, this research assistant product that is helping them. Like, what is the biggest barrier you see to, um, generally enterprise
14:37 – 20:04
Barriers to enterprise adoption
1. SGSarah Guo
  adoption?
2. AGAidan Gomez
  I think the big one is trust. Uh, so security is a big one, um, in particular in regulated industries like finance, uh, uh, healthcare. Data is often not in a cloud or if it is in a cloud, it can't leave their VPC, and so it's very locked down.
3. SGSarah Guo
  Okay.
4. AGAidan Gomez
  It's very sensitive. And so that's- that's a unique differentiator of Cohere. The fact that we haven't locked ourselves into one ecosystem and we're- we're flexible to deploy on prem if you want us, in VPC, outside of VPC, literally whatever the customer wants, we're able to touch more data, even the most sensitive data, uh, and provide something that's- that's more useful. So I- I would say security and privacy is probably the biggest one. Beyond that, there's knowledge, right? Like the- the knowledge to know how to build these- these systems. They're new. Uh, it's unfamiliar to folks. Um, you know, the people with the most experience have a few years of experience. Um, and so that's the other major piece. Uh, that bit, I think it's honestly just a time game. Like eventually-
5. SGSarah Guo
  Okay.
6. AGAidan Gomez
  ... developers will become more familiar w- with building with this technology. Um, but I- I think it's gonna take another two or three years before it really permeates.
7. SGSarah Guo
  Do you think in, uh, like a traditional hype cycle for enterprise technologies, probably for most technologies, but in particular enterprise, um, uh, you know, there's this trough of disillusionment concept of people get very excited about something and it ends up being harder to apply or more expensive than they thought. Do we see that in AI?
8. AGAidan Gomez
  I'm sure we see some of it, for sure. Um, but I think honestly, like, the core technology is still improving at a steady clip and new applications are getting unlocked every few months. So I- I don't think we're in that trough of diff- disillusionment yet. Yeah, it feels like we're super early. It feels like we're really, really early. And if you look at the market, this technology just unlocks an entire new set of things that you can build. You just fundamentally couldn't build them before and- and now you can. And so there's a resurfacing of technology, products, systems that's underway. Even if we didn't train a single new language model, like it- okay, all the data centers blow up, we can't improve the L1.
9. SGSarah Guo
  (laughs)
10. AGAidan Gomez
  We only have what we have today. There's a half decade of work to go integrate this into the economy, to build all these things, to build the, you know, uh, RFP, insurance RFP response bot, to build the, uh, healthcare record summarizer. Like, there's a- a half decade of just resurfacing to go do. So there- there's a lot of work ahead of us. I- I think we're kind of past that point. There was a question of, oh, is there too much hype? Is this technology actually gonna be useful? But it's in the hands of 100 million people now, hundreds of millions of people now. It's in production. Uh, there's very clear value. The project is now...... putting it to work and delivering it, uh, to the world.
11. SGSarah Guo
  In this question of, like, integration into the real world, um, some piece of it is, of course, like, interfaces and change management and, like, figuring out how users are gonna understand the, the model outputs and, and guardrails and all of that. Um, specifically when we think about the model and specialization, uh, like, do you have some framework you offer customers or that you use internally around, um, what version of it they should invest in, right? So, we have pre-training, post-training, fine-tuning, retrieval, like in, in those sort of traditional sense, like prompting, especially as we get longer context. Like, how, how do you tell customers to make sense of w- how to specialize?
12. AGAidan Gomez
  It really depends on the application. Like, there's some stuff, for instance, we partnered with Fujitsu, um, who's, like, the largest SI in, in Japan, um, to build a Japanese language model. There's just no way you can do that without intervening-
13. SGSarah Guo
  Mm-hmm.
14. AGAidan Gomez
  ... on pre-training. You can't, like, fine-tune or post-train Japanese into (laughs) a model, effectively, and so you have to start from scratch. Um, on the other side, there's more narrow things, like if you want to change the tone of the model, or you want to, um, I, I don't know, change how it formats certain things, I think you can just do fine-tuning. You can take the, the end state. Um, and so there is this gradient. What we usually recommend to customers is start from the cheapest, easiest thing, which is fine-tuning, and then work backwards. And so start with fine-tuning, then go back into post-training, right? Like SFT or LHF. Then, if you need to, and, you know, it, it's kind of a journey, right? Like, as you're talking about a production system and the constraints are getting higher and higher, you potentially will need to touch pre-training. Hopefully not all of pre-training. Hopefully it's, like, 10% of pre-training at the very end, or maybe 20% of pre-training. But yeah, that's usually how we think about it, is, like, this journey from the simplest, cheapest thing to the most sophisticated but most performant.
15. SGSarah Guo
  Moving along the gradient from the cheapest thing makes sense to me.
20:04 – 24:25
Which types of companies should pretrain models?
1. SGSarah Guo
  Uh, the idea that any enterprise customer will invest in pre-training is, I think, a bit more controversial. I, I believe some of the lab leaders would say, like, "Nobody should be touching this," and it doesn't make any sense for people from a scale of compute and data, data curation effort required and just sort of the talent required to do pre-training in any sort of competitive way. Like, w- how would you react to that?
2. AGAidan Gomez
  I think if you're building, like, a... if you're a big enterprise and you're sitting on a ton of data, like, hundreds of billions of tokens of data, um, pre-training is a real lever that you're able to pull. I think for most, like, SMBs and, and certainly startups, it makes no sense. Like, you should not be pre-training a model. Um, but if you're a large enterprise, I think it's, it should be a serious consideration. The question is how much pre-training? It's not like you have to start from scratch and do a, you know, $50 million training run. But you can do a frac- you could do a $5 million training run. That's what we've seen succeed, these sort of continuation pre-training efforts. Um, so yeah, that, that's one of the offerings that we have. But of course, we don't jump straight into that. You don't need to spend massively if you don't want to. And, and usually, uh, the enterprise buying cycle or, or technology adoption cycle is quite slow, and so you have time to move back into it. I would say it's totally at the customer's discretion. Um, but to the folks who say that no one should be pre-training...
3. SGSarah Guo
  No one outside of, let's say, AGI Labs should be pre-training.
4. AGAidan Gomez
  That's empirically wrong.
5. SGSarah Guo
  Maybe that's a, uh, like, a good jumping-off point into just, like, talking a little bit more about what's going on in the technical landscape and also what that means for Cohere. Like, what is the, what is the bar you set internally for Cohere? You said the model's the foundation. Um, and, uh, I believe you've also said, like, there's no market for last year's models. Like, how do you square that with the expense of, the capital expense of competition and the rise of open source models now?
6. AGAidan Gomez
  Well, I think you have to spend... there's some, like, minimum threshold that you need to be spending at in order to build a, a model that's useful. The things get cheaper. The compute to train the model get cheaper. Um, the sources of data, uh, well, in some directions, they get cheaper, in others not. With synthetic data, it's gotten dramatically cheaper, but with expert data, it's getting harder and harder and more expensive. And so what we've seen is today, you can build a model that's as good as GPT-4 in all the things that enterprises might care about for $10 million, $20 million. Like, just orders of magnitude less than what was spent to develop that model. And so if you're willing to wait six months or a year to build the technology, you can build it at a fraction of what those frontier labs have paid, uh, to develop it. And so that's been a key part of Cohere's strategy, is we don't need to build that thing first. What we'll do is we'll, we'll figure out how to do it dramatically cheaper, and we'll focus on the parts of it that matter to our customers. So we'll focus on the capabilities that our customers really depend on. Now, at the same time, we still have to spend. Like, relative to a regular startup, we have to pay for a supercomputer, and those things cost hundreds of millions of dollars a year. Um, so it is capital-hungry, but it's not capital-inefficient. Um, it's very clear that we'll be able to build a very profitable business off of what we're building.So that's the, the strategy, is don't lead, don't burn, you know, three, five, seven billion dollars a year to be at the front. Be six months behind and offer something to market to enterprises that actually fits their needs at a price point that makes sense for them.
7. SGSarah Guo
  Why spend on the super computer and the training yourself at all if you have, um, increasingly open source options?
8. AGAidan Gomez
  Well, you don't.
24:25 – 25:12
Addressing flaws in open-source models
1. AGAidan Gomez
  Not really.
2. SGSarah Guo
  Say more.
3. AGAidan Gomez
  So for LLaMA, uh, yeah, you get, like, the base model, uh, at the end when it's cooled down and it has zero gradient. You get the, the post-trained model at the end when it's cooled down and has zero gradient. Taking those models and trying to, um, fine-tune them, it's just, it's not as effective as building it yourself, and you have much fewer levers to pull, um, than if you actually have access to the data, and you can change the data that goes into that process. And so we feel that by being vertically integrated and by building these models ourselves, we just have dramatically more leverage to offer our customers.
4. SGSarah Guo
  Maybe if we go to, um, projection, and we'll hit on
25:12 – 29:54
Current and expected progress in scaling laws
1. SGSarah Guo
  a few things that you've mentioned as well, um, where are we in scaling laws? Like, how much capability improvement do you expect over the next few years?
2. AGAidan Gomez
  We're, we're pretty far along, I would say. Like, we're starting to enter into a sort of flat part of the curve, um, and we're certainly past the point where if you just interact with a model, you can know how smart it is. Like, the, the vibe checks, they're losing utility. And so instead, what you need to do is you need to get experts to measure within very specific domains like physics, math, uh, chemistry, biology, um... You, you need to get experts to actually assess the quality of these models, because the average person can't tell the difference at this stage between generations. Yes, like, there's still much more to go do, uh, but those gains are gonna be felt in very specialized areas and have impacts on more researchy, um, more researchy domains. I think for enterprises and the general sorts of tasks that they want to automate or tools that they want to build, the technology is already good enough or close enough that a, a little bit of customization will get them there.
3. SGSarah Guo
  Mm-hmm.
4. AGAidan Gomez
  So that- that's sort of the stage that we're at. There is, there's a new unlock in terms of the category of problems that you can solve, and that's reasoning. And so online reasoning is something that has been missing. These models, they don't have a, they previously didn't have an internal monologue, right? Like, they didn't really think to themselves. You would just ask them a question and then expect them to immediately answer that question. They couldn't reason through it. They couldn't fail, right, like make a mistake, catch that mistake, fix it, and try again. And so the fact that we now have reasoning models coming online, of course, OpenAI was the first to put it into production, but Cohere's been working on it, uh, for about a year now. Um, this category of tech, I think is, is really interesting. There's a new set of problems that you can go solve. Um, and it also changes the, it changes the economics. So before, if I had a customer come to me and say, "Uh, Aidan, I want your model to be better at X," or, "I want a smarter model," I would say, "Okay, you know, give us six to twelve months. We need to go, uh, spin up a new training run, train it for longer, train a big- bigger model, et cetera, et cetera." Um, now there's a, that, that was kind of the only lever we had to pull to improve the performance of our product. There's now a second lever, which is you can charge the customer more. You can say, "Okay, let's spend twice as many, uh, you know, tokens or, or let's spend twice as much time, ad inference time, and you'll get a smarter model." So there's a much nicer product experience. "Okay, you want a smarter model? You can have it today, you just need to pay this." And so they have that option. They don't need to wait six months. And similarly for model builders, I don't need to go double the size of my super computer to hit a requisite intelligence threshold. I can just double the amount of inference time compute that my customers pay for. So I think that's a really interesting structural change, uh, in how we can go to market and what products we can build and what we can offer to the customer.
5. SGSarah Guo
  I agree. I think it's, um, uh, perhaps undervalued in the ecosystem right now how, uh, much more appealing it should be to all types of customers that you can move from a, like a Capex model of improvement to a consumption model of improvement, right? And it's not like, you know, these are apples and oranges things, but, um, but I- I think you'll see people invest a lot more in, you know, solving problems when they don't have to pony up for a, a training run and have this delay, as you described.
6. AGAidan Gomez
  Yeah, it hasn't been clocked. Like, people haven't really priced in the impact of inference time compute delivering intelligence. Um, there's loads of consequences, even at, like, the chip layer, right? Like what sort of chips you want to build, what you should prioritize. For data center construction, um, if we have a new avenue, which is inference time compute, that doesn't require this densely interconnected super computer. You, it's fine to have nodes. You can do a lot more locally and less distributed. I, I think it has loads of impact up and down this chain, and it's a new paradigm of, um, what these models can do and, and how they do it.
7. SGSarah Guo
  You were dancing around this, but because our, you know, your average person
29:54 – 32:29
Advances in multi-step problem solving and reasoning
1. SGSarah Guo
  doesn't spend that much time thinking about, like, what is reasoning, right? Do you have any intuition you can offer people for, like, what are the types of problems this allows us to tackle better?
2. AGAidan Gomez
  Yeah, I think any sort of multi-step problem. Um, like there's some multi-step problems you can just memorize, which is what we've been asking models to do so far. Um, like solving a polynomial, right? Like that- that... really, that should be approached multi-step. That's how humans solve it. We don't just get given a polynomial and then, boom. There's a few that maybe we've memorized, right? But by and large, you have to work through those problems, break them down, solve the smaller parts, and then compose it into the overall solution, and that's what we've been lacking. We've really lack- a- and we've had stuff like chain of thought, which has, um, enabled that, but it's sort of like a retrofitting. It's sort of like we train these models to just memorize input-output pairs, and we found a nice little hack to, uh, elicit the behavior that mimics reasoning. I think what's coming now is from scratch, the next generation of models that is being built and delivered will have that reasoning capability burnt into it from scratch. And it's- it's not surprising that it wasn't there to begin with because we've been training these models off of the internet, and the internet is like a set of documents which are the output of a reasoning process, with the reasoning all hidden. It's like a human wrote an article-
3. SGSarah Guo
  Mm-hmm.
4. AGAidan Gomez
  ... and, you know, spent weeks thinking about this thing and deleting stuff and blah, blah, blah, um, but then posted the final product, and that's what you get to see. Everything else is implicit, hidden, unobservable. Um, and so it makes a lot of sense why the first generation of language models lacked this inner monologue, but now what we're doing is we're, with human data and with synthetic data, we're explicitly collecting people's inner thoughts. So we're asking them to verbalize it, and we're transcribing that, and we're gonna train on that and model that part of the problem-solving process. Um, and so I'm- I'm really excited for that. I think right now it's extremely inefficient, and it's quite brittle, similar to the early versions of- of language models. But over the next two or three years, it's gonna become incredibly robust and unlock just a whole new set of problems.
5. SGSarah Guo
  What is the basic driver of the slowdown,
32:29 – 36:25
Key drivers behind the flattening curve of model improvements
1. SGSarah Guo
  you know, reaching the flat part of the curve that you, you describe with scaling? Is it, is it the cost of, you know, increasingly expert data and collecting, as you said, like hitting reasoning traces that is harder and more expensive than just taking the data on the internet? Is it the difficulty of having evals for, you know, increasingly complex problems? Um, is it just overall, uh, cost of compute? Like w- w- why do you think that flattening is happening?
2. AGAidan Gomez
  When- when someone's making an oil painting, um, they do a- a back coat a- and just cover the whole- the whole canvas, and then they- they sort of paint in the shapes of, uh, you know, the mountains and the- the trees, and- and as you- you get more and more detailed and you're bringing out very fine brushstrokes, um, there's a lot more of them that you need to make. Before, you could just take a big wedge and just throw paint across the canvas and accomplish the thing that you wanted to accomplish. But as you start to get more and more targeted or, um, more and more detailed in what you're trying to accomplish, um, it requires a much more fine instrument. Uh, and so that's what we've seen with language models. We're able to do a lot of the common, simple, easy tasks quite quickly, but as we've approached much more specific sensitive domains, like science, uh, math, that's where we've started to see resistance to improvement. And in some places, we've gotten around that by using synthetic data, like in code, in math. These are places where the answer is very verifiable. You- you know when you're right or you're wrong, and so you can generate tons of synthetic data and just verify whether it's correct or not. You know it's correct. Okay, let's train on it. Um, in other areas that require testing, uh, and knowledge in the real world, like in biology, like in chemistry, um, there's a- there's a bigger bottleneck to creating that sort of data, and you have to go to experts who know the field, who- who have experienced it for decades, and basically distill their knowledge. Um, but eventually, you- you run out of experts, and you run out of that data, and you're at the frontier of what humans know about X, Y, or Z. There's just increasing friction to fill in these much finer details of this portrait. I think that's a fundamental problem. I- I don't think that there's any shortcuts, uh, around that. Um, you know, at some stage, we're gonna have to give these models the ability to run their own experiments to- to fill in areas of their knowledge that they're- they're curious about. Um, but I- I think that's quite- quite a ways away, uh, and it's gonna be tough to scale that. It will take many, many years to do. We will do it. We're gonna get there, 100%. Um, but for the stuff that I care about today with Cohere, I think there are many applications which this technology is ready for production for, and so the primary focus is getting it to production and ensuring that our- our economy adopts this technology and integrates it as quickly as possible, gets that productivity uplift. And so while that technical question is super interesting about, you know, why is progress slowing down, I- I think it should be kind of obvious, right? It's like...... the models are getting so good they're hitting- they're running into the thresholds of human knowledge, um, which is really where they're getting their capability from.
3. SGSarah Guo
  You are so grounded in,
36:25 – 39:59
Exploring AGI
1. SGSarah Guo
  you know, getting the capabilities we have and will continue to progress even if the curve is flattening into production. I- I think I know this answer, but how much do you, or how much does Cohere think about, like, AGI and takeoff, and does that matter to you?
2. AGAidan Gomez
  Well, AGI means a lot of things to a lot of different people. I think I- I believe in us building generally intelligent machines, like, completely. The- it's like, of course we're gonna do that. Um, but AGI has been conflated.
3. SGSarah Guo
  How soon?
4. AGAidan Gomez
  We're already there. I- it's not a, you know, it's not a binary, it's not discrete. It's continuous, and we're, like (laughs) , uh, well on our way. We're- (laughs) we're pretty far down that road.
5. SGSarah Guo
  There's some, uh, definition elsewhere in industry that they're- like, you can put a break point at- even if you- even if you have this, um, continuous function, you- you can put a break point in, like- it- there's intelligence that replaces, like, an educated, adult professional in any digital role. Your view is there's no really important break point that's happening.
6. AGAidan Gomez
  That sort of, like, objective checklist thing, like, when you've checked all these boxes and you've got it, I think you can always find, like, a counter-example. You're like, "Oh, well it hasn't actually beaten this one human over here who's doing this, like, random (laughs) -
7. SGSarah Guo
  (laughs)
8. AGAidan Gomez
  ... random thing." Um, no. I think it's- I think it's pretty continuous and we're, like, quite far- quite far along. Um, but the- the AGI that I- I really don't subscribe to is the super intelligence takeoff, self-improvement just leading to, uh, the terminator that exterminates us all.
9. SGSarah Guo
  Or creates abundance. Unclear.
10. AGAidan Gomez
  Yeah, (laughs) or creates abund- right, right. Yeah. Um, no. I think we'll be the one to create abundance. We don't need to wait for this god to emerge and do it for us. Let's go do it with the tech that we're- (laughs) we're building, you know? We don't need to depend on that. We can go do it ourselves. We will build AGI if what you mean is very useful, generally capable technology that can do a lot of the stuff, uh, that humans can do and flex into a lot of different domains. Um, if what you mean is, you know, are we gonna build God? Uh, no.
11. SGSarah Guo
  What do you think is the driver in that difference of opinion?
12. AGAidan Gomez
  I don't know. I- I think maybe I'm a little bit more in the weeds of the practical frustrations of the technology, where it breaks, where it's slow, where it- we start to see things plateau or slow down. Um, and perhaps others are more- maybe they're more optimistic. May- maybe they see- um, they see a- a curve increasing and they just think, "It goes on forever. Like, that will just continue arbitrarily," which I- I disagree with. I think there's- there's friction points, like there is genuinely friction that enters in. Like, maybe even if in theory, you know, like, a neural net is a universal approximator, it can learn anything, to universally approximate, you would need to build a neural net the size of the universe (laughs) . And so, like, there's some fundamental barriers to reaching limits that people extrapolate out to that I think will, um, bound the practically realizable, um, forms of this technology.
13. SGSarah Guo
  Are there domains where you, um, just believe LMS as
39:59 – 42:10
Limitations of LLMs
1. SGSarah Guo
  we have them today are, like, not a good fit for prediction, right? And so an example might be, like, are we going to get to physics simulation from sequence-to-sequence models?
2. AGAidan Gomez
  I mean, probably, yeah. Like, physics is just, like, a series of states and, um, transition probabilities. So I- I think it's probably quite well-modeled by sequence modeling. But are there areas where it's poorly suited? Um, I'm sure. I'm sure that there are better models for certain things, more efficient models. Like, you can- you can take it- if you zoom into a specific domain, you can take advantage of structure in that domain to carve off some of the unnecessary generalities of the transformer, um, or of these- this category of architectures, um, and get a more efficient model. That's definitely true when you- when you zoom in.
3. SGSarah Guo
  And it doesn't sound like you think it's, like a- at its core, like a- a representation issue where it's just not gonna work.
4. AGAidan Gomez
  There's irreducible uncertainty in the world. There- like, there's things that you genuinely cannot know and, like, building a bigger model will not help you know this genuinely random, uh, or unobservable thing. Uh, and so those things, we'll- we'll never be able to model effectively until we learn how to observe them or, uh, you know. I think the transformer and this category of model can do much more than people give it credit for. It's a very general architecture. Many, many things can be phrased as a- as a sequence, and these models are just sequence models. Uh, and so if you can phrase it as a sequence, a transformer can do a fairly good job at picking up any regularity in it. But I- I'm- I'm certain that there are examples that I'm just not able to think of right now where sequence modeling is, uh, super inefficient. Like, you can do it with sequences, you can phrase a graph as a sequence, um, but it's just, like, the wrong model and you would pay dramatically less compute if you approached it from a- a different angle.
5. SGSarah Guo
  Okay. One last question for you. So you, uh, concluded
42:10 – 44:15
What the market has mispriced
1. SGSarah Guo
  earlier that, um, scaling computed inference time, like, ah, people have noticed but it's not really priced in, like, how big of a change this is. Is there anything else you think is not priced in by the market right now that, like, Cohere thinks about, that you think about?
2. AGAidan Gomez
  Yeah. I think there's this idea of, like, commoditization of models. I don't really think that's true.I don't think that models are actually getting commoditized. I, I think what you see is you see price dumping, um, and so you see people giving it out for free, giving it out at a loss, giving it out at zero margin, um, and so they see the prices coming down and they assume prices coming down means commoditization. I think in reality, the state of the world is there's a total technological refactor that's going on right now and will last the next 10 to 15 years, and it's kind of like we have to, we have to repave every road on the planet, and there's, like, four or five companies that know how to make concrete.
3. EGElad Gil
  (laughs)
4. AGAidan Gomez
  Okay? And, like, maybe today some of them give their concrete-crete away for free, um, but over time there's ver- there's a very small number of parties that know how to do this thing and a huge job in front of us, and pressures to drive growth to show return on investment, it, it's an unstable present state to be operating at a loss or giving away very expensive technology for free. Um, so growth pressures of the market will push things in a certain direction and, yeah, I, you know, the price of Haiku 4x'd, uh, two weeks ago.
5. EGElad Gil
  Eden, this has been super fun. Thank you so much for doing this with us.
6. AGAidan Gomez
  Yeah. My pleasure. My pleasure. It was super fun. Great seeing you.
7. EGElad Gil
  Find us on Twitter @nopriorspod. Subscribe to our YouTube channel if you wanna see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.

Episode duration: 44:15

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode 2XRpTZpHjfc

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome