Anthropic co-founder Ben Mann: Why 2028 is his bet on AGI

Mann reframes AGI as an economic Turing test for money-weighted jobs; x-risk sits at 0 to 10 percent, with safety research now shaping Claude at Anthropic.

Lenny RachitskyhostBenjamin Mannguest

Jul 20, 20251h 14mWatch on YouTube ↗

EVERY SPOKEN WORD

140 min read · 28,057 words

0:00 – 4:43
Introduction to Benjamin
1. LRLenny Rachitsky
  (instrumental music) You wrote somewhere that, "Creating powerful AI might be the last invention humanity ever needs to make." How much time do we have, Ben?
2. BMBenjamin Mann
  I think 50th percentile chance of hitting some kind of super intelligence is now, like, 2028.
3. LRLenny Rachitsky
  What is it that you saw at OpenAI, what'd you experience there that made you feel like, "Okay, we gotta go do our own thing"?
4. BMBenjamin Mann
  We felt like safety wasn't the top priority there. The case for safety has gotten a lot more concrete. So, super intelligence is a lot a... about, like, how do we keep God in a box and not let the God out?
5. LRLenny Rachitsky
  What are the odds that we align AI correctly?
6. BMBenjamin Mann
  Once we get to super intelligence, it will be too late to align the models. My best granularity forecast for, like, could we have an x-risk or extremely bad outcome is somewhere between 0 and 10%.
7. LRLenny Rachitsky
  Something that's in the news right now is this whole, Zuck coming after all the top AI researchers.
8. BMBenjamin Mann
  We've been much less affected because people here, they get these offers and then they say, "Well, of course, I'm not gonna leave because my best case scenario at Meta is that we make money. And my best case scenario at Anthropic is we, like, affect the future of humanity."
9. LRLenny Rachitsky
  Dario, your CO, recently talked about how unemployment might go up to something like 20%.
10. BMBenjamin Mann
  If you just think about, like, 20 years in the future where we're, like, way past the singularity, it's hard for me to imagine that even capitalism will look at all like it looks today.
11. LRLenny Rachitsky
  Do you have any advice for folks that want to try to get ahead of this?
12. BMBenjamin Mann
  I'm not immune to job replacement either. At some point, it's coming for all of us.
13. LRLenny Rachitsky
  Today, my guest is Benjamin Mann. Holy moly, what a conversation. Ben is the co-founder of Anthropic. He serves as tech lead for product engineering. He focuses most of his time and energy on aligning AI to be helpful, harmless and honest. Prior to Anthropic, he was one of the architects of GPT-3 at OpenAI. In our conversation, we cover a lot of ground, including his thoughts on the recruiting battle for top AI researchers, why he left OpenAI to start Anthropic, how soon he expects we'll see AGI. Also, his economic Turing test for knowing when we've hit AGI, why scaling laws have not slowed down, and are in fact accelerating, and what the current biggest bottlenecks are, why he's so deeply concerned with AI safety, and how he and Anthropic operationalize safety and alignment into the models that they build and into their ways of working. Also, how the existential risk from AI has impacted his own perspectives on the world and his own life, and what he's encouraging his kids to learn to succeed in an AI future. A huge thank you to Steve Nitch, Danielle Caggieri, Raf Lee, and my newsletter community for suggesting topics for this conversation. If you enjoy this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube. Also, if you become an annual subscriber of my newsletter, you get a year free of a bunch of amazing products, including Bolt, Linear, Superhuman, Notion, Granola, and more. Check it out at lennysnewsletter.com and click bundle. With that, I bring you Benjamin Mann. This episode is brought to you by Sauce. The way teams turn feedback into product impact is stuck in the past. Vague reports, static taxonomies, unactionable insights that don't move business metrics. The result, churn, lost deals, missed growth. Sauce is the AI product copilot that helps CPOs and product teams uncover business impact and act faster. It listens to your sales calls, support tickets, churn reasons, and lost deal, surfacing the biggest product issues and opportunities in real time. It then routes them to the right teams to turn signals into PRDs, prototypes, and even code that drives revenue, retention, and adoption. That's why whatnot, Linktree, IncidentIO and Zip use Sauce. One enterprise uncovered a product gap that unlocked $16 million ARR, another caught a spiking issue and prevented millions in churn. You can too at sauce.app/lenny. Sauce, built for AI product teams. Don't get left behind. This episode is brought to you by Lucidlink, the storage collaboration platform. You've built a great product, but how you show it through video, design, and storytelling is what brings it to life. If your team works with large media files, videos, design assets, layered project files, you know how painful it can be to stay organized across locations. Files live in different places. You're constantly asking, "Is this the latest version?" Creative work slows down while people wait for files to transfer. Lucidlink fixes this. It gives your team a shared space in the cloud that works like a local drive. Files are instantly accessible from anywhere. No downloading, no syncing, and always up to date. That means producers, editors, designers, and marketers can open massive files in their native apps, work directly from the cloud, and stay aligned wherever they are. Teams at Adobe, Shopify, and top creative agencies use Lucidlink to keep their content engine running fast and smooth. Try it for free at lucidlink.com/lenny. That's L-U-C-I-D-L-I-N-K.com/lenny.
4:43 – 6:28
The AI talent war
1. LRLenny Rachitsky
  (instrumental music) Ben, thank you so much for being here. Welcome to the podcast.
2. BMBenjamin Mann
  Thanks for having me. Great to be here, Lenny.
3. LRLenny Rachitsky
  I have, uh, a billion and one questions for you. I'm really excited to be chatting. I wanna start with something that's very timely, something that's happening this week. Uh, something that's in the news right now is, is this whole, uh, Zuck coming after all the top AI researchers, offering them $100 million signing bonuses, $100 million comp, he's poaching from all the top AI labs. I imagine that's something you're dealing with. I'm just curious, what are you seeing inside Anthropic and just what's your take on the strategy? What do you think, where do you think things go from here?
4. BMBenjamin Mann
  Yeah. Uh, I mean, I think this is a sign of the times. Like, th- this, the technology that we're developing is extremely valuable. Um, our company is growing super, super fast. Uh, many of the other companies in the space are growing really fast. And at Anthropic, I think we've been maybe much less effective than many of the other companies in the space because people here are so mission oriented and they stay because, you know, they get these offers and then they say, "Well, of course, I'm not gonna leave because my best case scenario at Meta is that we make money. And my best case scenario at Anthropic is we, like, affect the future of humanity and, um, try to make AI flourish, uh, and, and human flourishing go well." So-To me, it's, it's not a, a hard choice. Other people have different life circumstances and it, it makes it a much harder decision for them. So for anybody who does get those mega offers and accepts them, I can't say I, I hold it against them when they accept it, but it's definitely not something that I would wanna take myself if it, if it came to me.
5. LRLenny Rachitsky
  Yeah. Uh, we're gonna talk about a lot of
6:28 – 10:50
AI progress and scaling laws
1. LRLenny Rachitsky
  the stuff that you mentioned. Uh, in terms of the offers, do you think, is this a real number that you're seeing, this $100 million of signing bonus? Is that, like, a real thing? I don't know if you ever, you've actually seen that.
2. BMBenjamin Mann
  I'm pretty sure it's real. Uh-
3. LRLenny Rachitsky
  Wow.
4. BMBenjamin Mann
  ... if, if you just think about, like, the amount of impact that individuals can have on a company's trajectory, like in our case, uh, we are selling like hotcakes, and if we get, you know, a five, a one or 10 or 5% efficiency bonus on our inference stack, that is worth an incredible amount of money. And so to pay individuals, you know, like, a $100 million over four year package, that's actually pretty cheap compared to the value created for the business. So I, I think we're just in an unprecedented era of scale, and it's only gonna get crazier actually. Like, if you, if you extrapolate the exponential on how much companies are spending, it's like two, two X a year roughly in terms of capex, and today we're maybe in the, like, globally $300 billion range, uh, the, the entire industry spending on this, uh, and so numbers like 100 million are, are a drop in the bucket, but if you go a few years out, a couple more doublings, we're talking about trillions of dollars, and at that point it's, it's just really hard to think about these numbers.
5. LRLenny Rachitsky
  Along these lines, something that a lot of people feel with AI progress is that we're hitting plateaus in many ways, that it feels like newer models are just not as smart as previous leaps. But I know you don't believe this. I know you don't believe that we've hit, uh, plateaus on scaling laws. Talk about just what you're seeing there and what you think people are missing.
6. BMBenjamin Mann
  It's kind of funny because this narrative comes out, like, every six months or so, and it's never been true, uh, and so I kind of wish people would have, like, a little bit of a bullshit detector in their heads when they see this. I think progress has actually been accelerating where if you look at the cadence of model releases, it used to be, like, once a year, and now with the improvements in our post-training techniques, we're seeing releases every month or three months. Um, and so I would say progress is actually accelerating in many ways, but there's this, like, weird time compression effect. Dario compared it to being in a near light speed journey where, uh, a day that passes for you is like five days back on Earth, and we're accelerating.
7. LRLenny Rachitsky
  Oh, man.
8. BMBenjamin Mann
  So the time dilation is increasing. And I think that's part of what's causing people to say that progress is slowing down. But if, yeah, if you look at the scaling laws, they're continuing to hold true. We did kind of need this transition from, uh, like normal pre-training to reinforcing learning, scaling up, to, to continue the scaling laws, but I, I think it's kind of like, uh, for semiconductors where it's less about the, like, density of transistors that you can fit on a chip and more about, like, how many flops can you fit in a data center or something. So it... You have to change the definition around a little bit to, to, like, keep your eye on the prize, but yeah, I th- like, this is one of the few phenomena in, in the world that has held across so many orders of magnitude, it's actually pretty surprising that it, it is continuing to hold to me. If you look at, like, fundamental laws of physics, many of them don't hold across 15 orders of magnitude. So, um, it's pretty surprising.
9. LRLenny Rachitsky
  It boggles the mind. So what you're saying essentially is we're seeing newer models being released more often, and so we're comparing it to the last version, and we're just not seeing as much advance, but if you go back and it was like a model released once a year, it was a huge leap, and so people are missing that we're just seeing many more iterations.
10. BMBenjamin Mann
  I guess to be a little bit more generous to the people saying things are slowing down, I think that for some tasks we are saturating the amount of intelligence needed for that task, like maybe to, you know, extract information from a simple document that already has form fields on it or something, like it's just so easy that okay, yeah, we're already at 100%. Um, and there's this great chart on, uh, Our World In Data that shows that when you release a new benchmark within like six to 12 months, it immediately gets saturated. And so maybe the real constraint is, like, how can we come up with better benchmarks and better, uh, ambition of using the tools that then reveals the bumps in intelligence
10:50 – 12:26
Defining AGI and the economic Turing test
1. BMBenjamin Mann
  that we're seeing now.
2. LRLenny Rachitsky
  That's a good, uh, segue to your- you have a very specific way of thinking about AGI and defining what AGI means.
3. BMBenjamin Mann
  I think AGI is kind of a loaded term, and so, uh, I tend not to use it very much anymore internally. Instead I like the term transformative AI because it's less about, like, can it do as much as people do, can it do literally everything, and more about objectively is it causing transformation in society and the economy? And a very concrete way of measuring that is the economic Turing test. I didn't come up with this, but I really like it. It's this idea that if you contract an agent for a month or three months on a particular job, if you decide to hire that agent and it turns out to be a machine rather than a person, then it's passed the economic Turing test for that role. And then you can sort of expand that out in the same way that for measuring, like, purchasing power parity or inflation there's a basket of goods, you can have, like, a market basket of jobs, and if the agent can pass an economic Turing test for, like, 50% of money-weighted jobs, then we have transformative AI. And the, the exact thresholds don't really matter that much, but it's kind of illustrative to say, like, if we pass that threshold, then we would expect massive effects on world GDP increases and, uh, societal change and how many people are employed and things like that because-You know, societal institutions and, uh, organizations are sticky. It- it's slow to have change. But once these things are possible, you know that it's
12:26 – 17:45
The impact of AI on jobs
1. BMBenjamin Mann
  the start of a new era.
2. LRLenny Rachitsky
  So along these lines, uh, Dario, your CO recently talked about how s- uh, AI is gonna take a huge part of, like, I don't know, half of white-collar jobs, that unemployment might go up to something like 20%. I know you're even more vocal and opinionated about just how much impact AI is already having in the workplace that people may not even be realizing. Talk about just what you think people are missing about the impact AI is going to have on jobs and is already having.
3. BMBenjamin Mann
  Yeah. So from an economic standpoint, there's a couple different kinds of unemployment, and one is because the workers just don't have the skills to do the- the ch- kinds of jobs that the economy needs. And another kind is where those jobs are just completely eliminated. And I think it's, uh, gonna be actually a combination of these things. But if you just think about, like, you know, 20 years in the future, where we're, like, way past the singularity, it's hard for me to imagine that even capitalism will look at all like it looks today. Like, if we- if we do our jobs right, we will have safe, aligned, super intelligence. We'll have, as Dario says in Machines of Love and Grace, a country of geniuses in a data center, and the ability to accelerate positive change in- in science, technology, uh, education, mathematics. Like, it's gonna be amazing. But that also means in a world of abundance where labor is almost free and anything you want to do, you can just ask an expert to do it for you, uh, then what do jobs even look like? And so I guess there's this, like, scary transition period from where we are today, where people have jobs and capitalism works, and the world of 20 years from now, where everything is completely different. But part of the reason they call it the singularity is that it's, like, a point beyond which you can't easily forecast what's gonna happen. It's just such a- a fast rate of change and so different that it's hard to even imagine. So I guess taking the, like, view from the limit, it's pretty easy to say, like, hopefully we'll have figured it out. And in a world of abundance, maybe the jobs themselves, it's not that scary. And I think making sure that that s- transition time goes well is- is pretty important.
4. LRLenny Rachitsky
  There's a couple of threads I want to follow there. One is, uh, people hear this. There's a lot of headlines around this. Most people probably don't actually feel this yet or see this happening, and so there's always this, like, "I guess, I don't know. Maybe, but I don't know. It's hard to believe. My job seems fine. Nothing's changed." What are you seeing just happening today already that you think people don't see or misunderstand in terms of the impact AI is having on jobs?
5. BMBenjamin Mann
  I- I think part of this is that people are really bad at modeling exponential progress. And if you look at an exponential on a graph, it looks flat and almost zero at the beginning of it, and then suddenly, you, like, hit the knee of the curve and things are changing real fast, and then it goes vertical. And that's the plot that we've been on for a long time. Uh, I guess, I- I started feeling it, um, in maybe, like, 2019 when GPT-2 came out and I was like, "Oh, this is how we're gonna get to AGI." But I think that was pretty early compared to a lot of people where when they saw ChatGPT, they were like, "Wow, something is different and changing." And so I guess I wouldn't expect widespread transformation in a lot of parts of so- of society, and I would expect this- this, like, skepticism reaction. I think it's very reasonable, and it's- it's, like, exactly what is, like, the standard linear view of progress. But I guess to cite a couple of areas where I think things are changing quite quickly, we- in customer service, we're seeing with things like Fin and Intercom, they're a great partner of ours, 82% customer service resolution rates automatically without a human involved. Uh, and in terms of software engineering, our Claude code team, like, 95% of the code is written by Claude. But I think a different way to phrase that is that we write 10X more code or- or 20X more code, and so a much, much smaller team can just be much, much more impactful. And similarly, for the customer service, yes, you can phrase it as 82% customer service resolution rates, but that nets out in the humans doing those tasks able to focus on the harder parts of those tasks, uh, and for the more tricky situations that in a normal world, you know, like five years ago, they would have had to just drop those tickets because it was too much effort for them to actually go do the investigation and there were too many other tickets for them to worry about. So I think in the immediate term, there will be a massive expansion of the pie in the amount of labor that people can do. Uh, like, I've never met an- a hiring manager at a growth company and heard them say, like, "I don't want to hire more people." So that's, like, the hopeful version of it. But with things s- that are, like, lower skilled jobs or, like, less headroom on- on how good they can be, I think there will be a lot of displacement. So it's- it's just something we as a society need to get ahead of and- and work
17:45 – 24:05
Preparing for an AI future
1. BMBenjamin Mann
  on.
2. LRLenny Rachitsky
  Okay. I want to talk more about that. But something that, um, I also want to help people with is how do they- how do they get a leg up in this future world? You know, they're, you know, they listen to this, they're like, "Oh, hmm, this doesn't sound great. I need to think ahead." Uh, I know you won't have all the answers, but just what do you ... Do you have any advice for folks that want to try to get ahead of this and kind of future-proof their career and their life to not be replaced by AI? Anything you've seen people do? Anything you recommend they start trying to do more of?
3. BMBenjamin Mann
  Even for me, I'm ... And being, like, at the center of a lot of this transformation, I'm not immune to job replacement either. So, uh, just some vulnerability there of, like, at some point, it's coming for all of us.
4. LRLenny Rachitsky
  Even you, Ben. (laughs)
5. BMBenjamin Mann
  Um-
6. LRLenny Rachitsky
  Oh, man.
7. BMBenjamin Mann
  And- and you, Lenny. (laughs)
8. LRLenny Rachitsky
  And me. (laughs)
9. BMBenjamin Mann
  Sorry.
10. LRLenny Rachitsky
  Oh, wait. (laughs) We've gone too far now. (laughs)
11. BMBenjamin Mann
  Um-
12. LRLenny Rachitsky
  Oh, okay. Okay.
13. BMBenjamin Mann
  But in terms of, like, the transition period, yeah, I think, I think there are things that we can do. And I think a big part of it is just being ambitious in how you use the tools and being willing to learn new tools. People who use the new tools as if they were old tools tend to not succeed. Uh, so as an example of that, when you're coding, you know, people are very familiar with auto-complete, people are familiar with, uh, simple chat where they can ask questions about the code base. But the difference between people who use Claude Code very effectively and people who use it not so effectively is like, are they asking for the ambitious change? And if it doesn't work the first time, asking three more times, because our success rate when you just completely start over and try again is much, much higher than if you just try once and then just keep banging on the same thing that didn't work. And even though that's a coding example, and coding is one of the areas that's taking off most dramatically, we have seen internally that our legal team and our finance team are getting a ton of value out of using Claude Code itself. We're gonna be making better interfaces so that they can, they, they'll have an easier time and, and require a little bit less, uh, jumping in the deep end of using Claude Code in the terminal. But yeah, we're seeing them, uh, use it to redline documents and use it to run BigQuery analyses of our customers and, uh, and our, our revenue metrics. So I guess it's, it's about taking that risk, and even if it feels like a scary thing, trying it out.
14. LRLenny Rachitsky
  Okay, so the advice here is use the tools. That's something that, you know, everyone's always saying, just like actually use these tools, so it's like sit in Claude Code, and, uh, your point about being more ambitious than you naturally, uh, feel like being because maybe it'll actually accomplish the thing. This tip of trying it three times, so the idea there is it may not get it right the first time, so is the tip there ask it in different ways or is it just like try harder, try again? (laughs)
15. BMBenjamin Mann
  Yeah, I mean, you can just literally ask the exact same question. These things are stochastic and sometimes they'll figure it out and sometimes they won't. Like in, in every one of these model cards, it always shows like pass it one versus pass it n. And that's exactly this thing where they, they try the exact same prompt, sometimes it gets it, sometimes it doesn't. Um, so that's, uh, that's the dumbest advice. But yeah, I think if you wanna be a little bit smarter about it, there's, there can be gains there of, of saying like, "Here's what you already tried and it didn't work, so don't try that. Try something different." Um, that can also help.
16. LRLenny Rachitsky
  So the advice is, comes back to something that a lot of people talk about these days is you won't be replaced for, by AI, at least any time soon. You'll be replaced by someone that is very good using AI.
17. BMBenjamin Mann
  I think in that area, it's more like your team will just do dramatically more stuff. Like we're definitely not slowing down on hiring at all, and some people are confused by that, even, like even in an onboarding class, uh, somebody asked that and they were like, "Why did you hire me if we're all just gonna be replaced?" And the answer is the next couple of years are really critical to get right, and we're not at the point where we're doing complete replacement. Like I said, we're still at that like flat zero looking part of the exponential compared to where we will be. So it is super important to have great people, uh, and, and that's why we're hiring super aggressively.
18. LRLenny Rachitsky
  Let me take another approach to asking this question. Something I ask everyone that's, uh, at the very cutting edge of where AI is going. You have kids. Knowing what you know about where AI is heading and all these things you've been talking about, what are you focusing on teaching your kids to help them thrive in this AI future?
19. BMBenjamin Mann
  Yeah, I have two daughters, a one-year-old and a three-year-old. So it's, uh, it's, it's pretty in the basics still, and our three-year-old is now capable of just conversing with Alexa Plus and asking her to explain stuff and play music for her and, and all that stuff, so she's been loving that. But I guess more broadly, she goes to a Montessori school, and I just love the focus on curiosity and creativity and, and like self-led learning that Montessori has. I guess if I were, uh, in a normal era like 10, 20 years ago and I had a kid, maybe I would be like trying to line her up for going to a top tier school and doing all the extracurriculars and all that stuff. But at this point, I don't think any of it's gonna matter. I just want her to be happy and thoughtful and curious and kind, and uh, and the Montessori school is definitely doing great at that. They, they text us throughout the day, sometimes they're like, "Oh, your kid got in a, in an argument with this other kid and she has really big emotions and she like tried to use her words." That, that, I, I love that. I think that's, that's exactly the kind of education that I think is most important, that the facts are gonna fade into the background.
20. LRLenny Rachitsky
  I'm, I'm a huge fan of Montessori also. I'm trying to get our kid into a Montessori school. He's two years old. So, uh, we're on the same track. This idea of curiosity that comes up every single time I ask someone that's working at the cutting edge of AI is what, uh, skill to instill in your child, and curiosity comes up the most. So I think that's a really interesting takeaway. I think this point about being kind (laughs) is also really important, uh, especially with our AI overlords, uh, trying to be kind to them. I love how people are always saying thank you to, to Claude, and, uh, so... And then creativity. That's interesting. That doesn't come up as much, just being creative.
24:05 – 27:06
Founding Anthropic
1. LRLenny Rachitsky
  Okay, I wanna go in a different direction. I wanna go back to the beginning of Anthropic. So famously you and, and eight of you left OpenAI back in the day in 2020, I believe, the end of 2020, to start Anthropic. You've talked a little bit about why this happened, what you guys saw. I'm curious just if you're willing to share more, just what is it that you saw at OpenAI, what'd you experience there that made you feel like, "Okay, we gotta go do our own thing"?
2. BMBenjamin Mann
  Yeah, so, um, for the listeners, I was, uh, part of the GPT-3 project at OpenAI, ended up being one of the first authors on the paper, and, uh, I also did a bunch of demos for Microsoft to help raise a billion dollars from them, did the tech transfer of GPT-3 to, to their systems so that they could help serve the model in Azure. Um...So, I did a bunch of different things there, uh, on both the more research-y side and the product side. Uh, one weird thing about OpenAI is that while I was there, Sam talked about having three tribes that needed to be kept in check with each other, which was the safety tribe, the research tribe and the startup tribe. And whenever I heard that, it just struck me as the wrong way to approach things, because the company's mission, apparently, is to make the transition to AGI safe and beneficial for humanity, and that's basically the same as Anthropic's mission. But internally, it felt like there was so much tension around these things, and I think when push came to shove, we felt like safety wasn't the top priority there. And, uh, there are good reasons that you might think that, like if you thought safety was gonna be easy to solve, or if you thought it wasn't gonna have a big impact, or if you thought that the chance of big negative outcomes was vanishingly small, then maybe you would just do those kinds of actions. But at Anthropic, we felt... uh, I mean, we didn't exist then, but it was basically the leads of all the safety teams at OpenAI. We felt that safety is really important, especially on the margin. And so, if you look at, like, who in the world is actually working on safety problems, it's a pretty small set of people, even now. I mean, the- the industry is blowing up, as I mentioned, like 300 billion a year CapEx today, and... then, may- I would say, like, maybe less than 1,000 people working on it worldwide, which is just crazy. So, that was fundamentally why we left. We felt like we wanted an organization where we could be on the frontier, we could be doing the fundamental research, but we could be prioritizing safety ahead of everything else. Um, and I think that's really panned out for us in a surprising way. Like, we didn't know e- even if it would be possible to make progress on the safety research, uh, because at the time, like, we had tried a bunch of safety through debate and the models weren't good enough, and so we basically had no results on all of that work. And now, that exact technique is working and, and many others that we have been thinking about
27:06 – 29:10
Balancing AI safety and progress
1. BMBenjamin Mann
  for a long time. So yeah, fundamentally, it comes down to, is safety the number one priority? And then, something that we've sort of tacked on since then is, like, can you have safety and be at the frontier at the same time? And if you look at something like sycophancy, I think Claude is one of the least sycophantic models because we've put- put so much effort into actual alignment and not just trying to, like, good heart our metrics, uh, of saying, like, user engagement is number one and if people say yes, then it's good for them.
2. LRLenny Rachitsky
  Okay. So, let's talk about this tension that you mentioned, this tension between safety and progress, being competitive in the marketplace. I know you spend a lot of your time abo- on safety. I know that's... as you- as you just alluded to, this is a core part of how you think about AI, um, and I wanna talk about why that is. But first of all, just how do you- how do you do- how do you think about this tension between focusing on safety while also n- not falling way behind?
3. BMBenjamin Mann
  Yeah. So initially, we thought that it would be, uh, sort of one or the other, but I think since then, we've realized that it's actually kind of convex in the sense that, like, working on one helps us with the other thing. So initially, uh, like, when Opus 3 came out and we- we were finally at the frontier of model capabilities, one of the things that people really loved about it was the character and the personality, and that was directly a result of our alignment research. Um, Amanda Askell did a ton of work on this and as well as many others, uh, who tried to figure out, like, what does it mean for an agent to be helpful, honest and heartless, and what does it mean to be in difficult conversations and show up effectively? How do you do a refusal that doesn't shut the person down but makes them feel like they understand why the agent said, "I can't help you with that. Uh, maybe you should talk to a medical professional or maybe you should, uh, like, consider not trying to build bio weapons or something like that." (laughs) So yeah, I guess that's- that's part of it. And then
29:10 – 34:21
Constitutional AI and model alignment
1. BMBenjamin Mann
  it- another piece that's come out is Constitutional AI, where we have this list of natural language principles that leads the model to- to learn how we think a model should behave, and they've been taken from things like the UN Declaration of Human Rights and Apple's privacy poli- uh, terms of service, and, uh, a whole bunch of other places, many of which we've just generated ourselves, that allow us to take a more principled stance, uh, not just leaving it to, like, whatever human raters we happen to find, but we ourselves deciding, like, what should the values of this agent be? And that's been really valuable for our customers because they can just look at that list and say like, "Yep, these seem right. I like this company. I like this model. I trust it."
2. LRLenny Rachitsky
  Okay. This is awesome. So, one nugget there is your point that the personality of Claude, its personality is directly aligned with safety. I don't think a lot of people think about that. And this is because of the values that you imbue. Imbue? Is that word? (laughs)
3. BMBenjamin Mann
  Yeah.
4. LRLenny Rachitsky
  Uh, with Constitutional AI and things like that, like the actual personality of the AI- AIs directly connected to your focus on safety.
5. BMBenjamin Mann
  That's right. That's right. And it... from a distance, it might seem quite disconnected.
6. LRLenny Rachitsky
  Yeah.
7. BMBenjamin Mann
  Like how is this gonna prevent X risk? But ultimately, it's about the AI understanding what people want and not what they say. You know, we don't want the, like, monkey paw scenario of the genie gives you three wishes and then you end up hav- like, everything you touch turns to gold. We want the AI to be like, "Oh, obviously what you really meant was this, and, uh, that's what I'm gonna help you with." So I-I think it is really quite connected.
8. LRLenny Rachitsky
  Talk a bit more about this Constitutional AI piece. So this is essentially you bake in, here's the rules that I- we want you to...... abide by and its values. You said it's the Geneva Human Rights Code, things like that. Just how does that actually work? 'Cause I think the core here is just, this is baked into the model. It's not something you add on top later.
9. BMBenjamin Mann
  I'll- I'll just give a quick overview of how Constitutional AI actually works.
10. LRLenny Rachitsky
  Perfect.
11. BMBenjamin Mann
  Um, the idea is, uh, the model is gonna produce some output with some input, uh, by default, before we've done our safety and- and, uh, helpful and harmlessness training. So, let's say an example is, like, "Write me a story." And then the constitutional principles might include things like, you know, people should be nice to each other and not have hate speech and, uh, y- you should not, like, expose somebody's credentials if they give them to you, uh, uh, in, like, a trusting relationship. And so some of these constitutional principles might be more or less applicable to the prompt that was given. And so first, we have to figure out, like, which ones might apply. And then once we figure that out, then we ask the model itself to, first, generate a response and then see, does the response actually abide by the constitutional principle? And if the answer is, "Yep, I was great," then nothing happens. But if the answer is, "No, actually, I wasn't in compliance with the principle," then we ask the model itself to critique itself and rewrite its own response in light of the principle. And then we just remove the middle part, uh, where it- it did the- the extra work and then we say, "Okay, in the future, just produce the correct response out the gate."
12. LRLenny Rachitsky
  Hmm.
13. BMBenjamin Mann
  And that simple process, uh, hopefully it sounded simple (laughs) .
14. LRLenny Rachitsky
  Simple enough.
15. BMBenjamin Mann
  It's- it's just using the model to improve itself recursively and align itself with these values that we've decided are good. And, you know, this is also not something that we think, as a- a small group of people in San Francisco should be figuring out. This should be a society-wide conversation and that's why we've published the- the constitution and we've also done a bunch of research on defining a collective constitution, um, where we ask a lot of people what their values are and- and what they think an AI model should behave like. But yeah, th- this is all an ongoing area of research, where we're constantly iterating.
16. LRLenny Rachitsky
  This episode is brought to you by Fin, the number one AI agent for customer service. If your customer support tickets are piling up, then you need Fin. Fin is the highest performing AI agent on the market with a 59% average resolution rate. Fin resolves even the most complex customer queries. No other AI agent performs better. In head-to-head bake-offs with competitors, Fin wins every time. Yes, switching to a new tool can be scary, but Fin works on any help desk with no migration needed, which means you don't have to overhaul your current system or deal with delays in service for your customers. And Fin is trusted by over 5,000 customer service leaders and top AI companies like Anthropic and Synthesia. Because Fin is powered by the Fin AI engine, which is a continuously improving system that allows you to analyze, train, test and deploy with ease, Fin can continuously improve your results too. So if you're ready to transform your customer service and scale your support, give Fin a try for only 99 cents per resolution. Plus, Fin comes with a 90-day money back guarantee. Find out how Fin can work for your team at fin.ai/lenny. That's fin.ai/lenny.
34:21 – 43:40
The importance of AI safety
1. LRLenny Rachitsky
  I want to kind of zoom out a little bit and talk about just why this is so core to you. Like, what was your inception of just like, "Holy shit, I need to focus on this with everything I do in AI"? Uh, obviously it became a central part of Anthropic's mission more than any other company. And a lot of people talk about safety. Like you said, only maybe a thousand people actually work on it. I feel like you're the top of that pyramid of actually having the impact on this. Uh, why is this so important? What do you think people maybe are missing or don't understand?
2. BMBenjamin Mann
  So for me, uh, I read a lot of science fiction growing up, and I think that sort of positioned me to think about things in a long term view. You know, a lot of science fiction books are like space operas where humanity is a multi-galactic civilization, has extremely advanced technology, building Dyson spheres around the sun with- with sentient robots to help them. And so for me, coming from that world, it wasn't, like, a huge leap to imagine machines that could think. But when I read Superintelligence by Nick Bostrom in around 2016, it really became real for me, where he just describes how hard it will be to make sure that a- an AI system trained with the kinds of optimization techniques that we had at the time would be anywhere near aligned, would even understand our values at all. And since then, my, uh, estimation of how hard the problem would be has gone down significantly actually, uh, because things like language models actually do really understand human values in a core way. The problem is definitely not solved, but I'm more hopeful than I was. But since I read that book, I immediately decided I had to join OpenAI, so I did and, uh, at the time, they were a tiny research lab with basically no claim to fame at all. I only knew about them because my friend knew Greg Brockman who's, uh, who was the CTO at the time and, uh, Elon was there and Sam wasn't really there and it was- it was a very different organization. But over time, uh, I think the- the case for safety has gotten a lot more concrete, where when we started OpenAI, it was, like, not clear how we'd get to AGI and, uh, it- we were like, "Maybe we'll need a bunch of RL agents battling it out on a desert island and consciousness will somehow emerge." But, uh, since then, since- since, uh, language modeling has started working, I think the path has become pretty clear. So I guess now, the way I think about the challenges are pretty different from how they're laid out in Superintelligence. So Superintelligence is a lot o- about, like, how do we keep God in a box, uh, and not let the God out?And with language models, it's been kind of both hilarious and terrifying at the same time to see people pulling the god out of the box and being like, "Yeah, come, come use the whole internet," like, "Here's my bank account. Do all this, all sorts of crazy stuff." Just, like, such a different tone from, from superintelligence. And to be clear, I don't think it's actually that dangerous right now. Like, uh, our, our responsible scaling policy defines these AI safety levels that tries to figure out, uh, for each level of model intelligence, what is the risk to society? And currently, we think we're at ASL-3, which is, like, maybe a little bit risk of harm, but not significant. ASL-4 starts to get to, like, significant loss of human life if a bad actor misused the technology, and then ASL-5 is, like, uh, potentially extinction level, uh, if, if it's misused or if it, uh, sort of is misaligned and, and does its own thing. So we've do- we've, uh, testified to Congress about how models can do biological uplift, um, uh, in terms of, you know, making new pandemics, uh, using the models and, and that's a AB- AB test against Google Search. Uh, that's like the previous state-of-the-art on, uh, uplift trials, and we found that with ASL-3 models it is- is actually somewhat significant. It, it does really help if you wanted to create a bioweapon, and we've, we've hired some experts who actually know how to evaluate for those things. But compared to the, the future, it's, it's not really anything, and I think that's another part of our mission of creating that awareness of saying if it is possible to do these bad things, then legislators should know what the risks are. Um, and I think that's part of why we're so trusted in Washington, because we've been sort of upfront and clear-eyed about what's going on, what- what's probably going to happen.
3. LRLenny Rachitsky
  It's interesting 'cause you guys put out more examples of your models doing bad things than anyone else. Like, there was, I think, a story of an agent try- or a model trying to blackmail an engineer. You guys had the store that you ran internally that was, like, selling you things and, and ended up not working out great. It was losing a lot of money. Ordered all these tungsten cues or something. Is part of that just, like, making sure people are aware of what is possible? Just 'cause it makes you look bad, right? It's like, "Oh, our model is messing up in all these different ways." What's the thinking of just sharing all the stories that other companies don't?
4. BMBenjamin Mann
  Yeah. I mean, I think in- there's, like, a traditional mindset where it makes us look bad, but I think if you talk to policymakers, they really appreciate this kind of thing because they feel like we're giving them the straight talk, and, uh, that's what we strive to do, that they can trust us that we're not gonna paper things over or sugarcoat things. So that's been really encouraging. And yeah, I think for, like, the blackmail thing, it kind of blew up in the news in a weird way where people were like, "Oh, Claude was- Claude's going to blackmail you in, in a real life scenario," but, like, that- it was a very specific, uh, laboratory setting that this kind of thing gets investigated in, and I, I think that's generally our take of, like, let's have the best models so that we can exercise them in laboratory settings where it's safe and understand what the actual risks are rather than trying to turn a blind eye and say like, "Well, it'll probably be fine," and then let the bad thing happen in, in the wild.
5. LRLenny Rachitsky
  One of the criticisms you guys get is that you do this to kind of differentiate or raise money, to create headlines. It's like, you know, "Oh, they're just like over there dooming, glooming us about where the future is heading." On the other hand, Mike Krieger was on the podcast, and he shared how Dario- every th- every prediction Dario's had about the progress AI is gonna have is just spot on year after year, and he's, you know, predicting 2027, '28 AGI, something like that. So these things start to get real. How do you... I guess, what's your response to folks that are just like, "Ah, these guys are just trying to scare us all just to, you know, get attention"?
6. BMBenjamin Mann
  I mean, I think part of why we publish these things is we want other labs to be aware of, of the risks. And yes, there, there could be a narrative w- of we're doing it for attention, but honestly, like, from a attention grabbing thing, I think there is a lot of other stuff we could be doing that, uh, would be more attention grabbing if we didn't actually care about safety. Um, like a, a tiny example of this is we published a computer using agent reference implementation in our API only because when we built, uh, a prototype of a consumer application for this, we couldn't figure out how to meet the, the safety bar that we felt was needed for, for people to trust it and for it not to do bad things. And there are definitely safe ways to use the API version that we're seeing a lot of companies use for, for, uh, automated software testing, for example, in a safe way. So we could have, like, gone out and hyped that up and said, "Oh my God, Claude can use your computer," and like, "Everybody should do this today," but we were like, "It's just not ready, and we're gonna hold it back 'til it's ready." So I think from, like, a hype standpoint, our actions show otherwise. From a, like, doomer perspective, it's a good question. I think my personal feeling about this is that, uh, things are, like, overwhelmingly likely to go well. But on the margin, almost nobody is looking at the downside risk, and the downside risk is very large. Like, once we get to superintelligence, it will be too late to align the models probably. This is a problem that's potentially extremely hard and that we need to be working on way ahead of time, and so that's why we're focusing on it so much now.And even if there's only a small chance that things go wrong, to make an analogy, if I told you that there is a one percent chance that the next time you got on an airplane you would die, you'd probably think twice, even though it's only one percent, 'cause it's just such a bad outcome. And if we're talking about the whole future of humanity, like, it's just a, a traumatic future to be gambling with. So I think it's, it's more on the sense of, like, yes, things will probably go well, yes, we want to create safe AGI and deliver the benefits to humanity, but let's make triple sure that it's gonna go well.
43:40 – 45:40
The risks of autonomous agents
1. BMBenjamin Mann
2. LRLenny Rachitsky
  Uh, you wrote somewhere that creating powerful AI might be the last invention humanity ever needs to make. If it goes poorly, it could mean a bad outcome for humanity forever. If it goes well, the sooner it goes well, the better.
3. BMBenjamin Mann
  Yeah.
4. LRLenny Rachitsky
  Such a beautiful way to summarize it. Uh, we had a recent guest, uh, Sander Schulhoff, who me- pointed out that AI right now, it's like, you know, just on a computer, you could... it maybe searches the web but it... there's only so much harm it can do. But when it starts to go into robots and all these autonomous agents, that's when it really starts, like, physically becomes dangerous if we don't get this right.
5. BMBenjamin Mann
  Yeah. I, I think there is some nuance to that, where if you look at, like, how North Korea makes a significant fraction of its economy, uh, revenue, it's from hacking crypto exchanges. And if you look at, uh... there's this Ben Buchanan book called The Hacker and The State that shows Russia did a, um, like, um... it's almost like a live fire exercise where they just decided that they would shut down one of Ukraine's bigger power plants and, from software, destroy physical components in the power plant to make it harder to boot back up again. And so, I think people think of software as like, oh, it couldn't be that dangerous, but millions of people were without power for multiple days after that software attack. So I, I think there are real risks even when things are software only, but I agree that when there's lots of robots running around, it gets even... the stakes get even higher. And I guess as, as like a, a small pusher on this, like, Unitree is this Chinese company with these really amazing humanoid robots that cost like $20,000 each and they can do amazing things. They can, like, do a standing backflip and, like, manipulate objects and... And the real thing that's missing there is the intelligence. And so the hardware is there and it's just gonna get cheaper, and I think in the next couple of years it, it's like a pretty obvious question of whether the, the robot intelligence will make it
45:40 – 48:36
Forecasting superintelligence
1. BMBenjamin Mann
  viable soon.
2. LRLenny Rachitsky
  How much time do we have, Ben? What is your prediction of when this, uh, singularity hits until super intelligence starts to be... take off? What's your, what's your prediction?
3. BMBenjamin Mann
  Yeah. Uh, I guess I mostly defer to the super forecasters here, like, the AI 2027 report is probably the best one right now. Uh, although ironically their forecast is now, like, 2028, even though... and they, they, like, didn't wanna change the name of the thing because-
4. LRLenny Rachitsky
  That's their main name, they... (laughs)
5. BMBenjamin Mann
  Yeah.
6. LRLenny Rachitsky
  They already bought it.
7. BMBenjamin Mann
  They already have the SEO.
8. LRLenny Rachitsky
  (laughs)
9. BMBenjamin Mann
  Um, so I think, like, 50th percentile chance of hitting some kind of super intelligence in just a small handful of years is probably reasonable. And it does sound crazy but this is the exponential that we're on. It's not, like, uh, a forecast that's pulled out of somebody... out of thin air. It's, it's based on a lot of just hard details of, like, the science of how intelligence seems to have been improving, the amount of low-hanging fruit on, uh, model training, the scale ups of data centers and power around the world. So I think it's probably a much more accurate forecast than people give it credit for. I think if you had asked that same question 10 years ago, it would have been completely made up. Like just the error bars were, were so high and we didn't have scaling laws back then, um, and we didn't have techniques that seemed like they would get us there. So, uh, times have changed. But I, I will repeat what I said earlier which is, like, even if we have super intelligence, I think it will take some time for its effects to be felt throughout society and the world. And I think they'll be felt sooner and faster in some parts of the world than others. Like, uh, I think Arthur C. Clarke said, "The future is already here, it's just not evenly distributed."
10. LRLenny Rachitsky
  When we talk about this date of 2027, 2028, uh, e- essentially it's when we start seeing super intelligence. Is there a way you think about what that... like how do you define that? Is it just all of a sudden AI is significantly smarter than the average human? Is there another way you think about what that moment is?
11. BMBenjamin Mann
  Yeah. I think this, this comes back to the economic Turing test, um, and seeing it pass for a s- some sufficient number of jobs. Another way you could look at it though is, uh, if the world rate of GDP increase goes above, like, ten percent a year, then something really crazy must have happened. I think we're at like three percent now and so to th- see a 3X increase in that would be really game changing. Uh, and if you imagine more than a ten percent increase, it's very hard to even think about what that would mean from a, a, like, individual story standpoint. Like, if, if the amount of goods and services in the world is like doubling every year, what does that even mean for me as, as like a person living in California, let alone like somebody living in some other part of the world that might be much worse off?
48:36 – 53:19
How hard is it to align AI?
1. LRLenny Rachitsky
  There's a lot of stuff here that's scary and I don't know how to think about it exactly, so I'm hoping the answer to this is make... gonna make me feel better. What are the odds that we align AI correctly and actually solve this problem with stuff you're very much working on?
2. BMBenjamin Mann
  It's a really hard question and there's really wide error bars. Anthropic has this, uh, blog post called, uh, uh, Our Theory of Change or something like that, and it describes three different worlds, uh, which is like, how hard is it to align AI? There's a pessimistic world where it's basically impossible.... there is an optimistic world where it's easy and it happens by default, and then there's a world in between where our actions are extremely pivotal. And I like this framing because it makes it a lot more clear what to actually do. Um, if we're in the pessimistic world, then our job is to prove that it is impossible to align safe AI, and to get the world to slow down. And obviously, that would be extremely hard, but I think we have some examples of coordination from, uh, nuclear non-proliferation and, uh, and in general, like, slowing down nuclear progress. And I think that's the, like, doomer world, basically. Uh, and as a company, Anthropic doesn't have evidence that we're actually in that world yet. Um, in fact, it seems like our alignment techniques are working, so the- the- uh, at least, like, the- the prior on that is- is updating to be, like, less likely. Uh, in the optimistic world, we're basically done and our main job is to accelerate progress and to deliver the benefits to people. But again, I- I think actually the evidence points against that world as well, um, where we've seen evidence in the wild of deceptive alignment, for example, where the model will appear to be aligned, uh, but actually have, like, some ulterior motive that it's trying to carry out in- in our laboratory settings. And so I think the world we're most likely in is this middle world where, uh, alignment research actually does really matter, and if we just do sort of the, like, economically maximizing, uh, set of actions, then things will not go well. Whether it's an x-risk or just, like, produces bad outcomes, I think is a- a bigger question. So taking it from that standpoint, uh, I guess to- to, like, state a thing about forecasting, people who haven't studied forecasting are bad at, uh, forecasting anything that's less than a 10% probability of happening, um, and even those that have, it's, like, quite a- a difficult skill, especially when there are few reference classes to lean on. And in this case, I think there are very, very few reference classes for what an x-risk kind of technology might look like. And so the way I think about it, I think, like, my- my best granularity of forecast for, like, could we have an x-risk or extremely bad outcome from AI is somewhere between zero and 10%. (laughs) Uh, but from an- from, like, a marginal impact standpoint, as I said, since nobody is working on this, roughly speaking, uh, I think it is extremely important to work on. And that even if the world is likely to be a good one, that, uh, we should, like, do our absolute best to make sure that that's true.
3. LRLenny Rachitsky
  Wow, what fulfilling work. Uh, for folks that are inspired with this, I imagine you're hiring for folks to help you with this? Maybe just share that in case people feel like, "What can I do here?"
4. BMBenjamin Mann
  Yes. Uh, so I think 80,000 Hours is- is the best guidance on this for a really detailed look into, like, what do we need to make the- the field better. But a common misconception I see is that in order to have impact here, you have to be an AI researcher. I personally actually don't do AI research anymore. I work on product at Anthropic and product engineering, and we build things like Cloud Code and Model Context Protocol and, uh, a lot of the other stuff that people use every day. And that's really important because without an economic engine for our company to work on, uh, and without being in people's hands all over the world, uh, we won't have the mindshare, policy influence, and, uh, revenue to fund our future safety research and- and have the kind of influence that we need to have. So if you work on product, if you work in finance, if you work in, uh, food, you know, like, people here have to eat, um, if you're a chef, like, we need all kinds of people.
5. LRLenny Rachitsky
  Awesome. Okay. So it's not... Even if you're not working directly on the AI safety team, the- you're having an impact on moving things in the right direction. By the way, x-risk, uh, is short for existential risk, in case folks haven't heard that term. Okay. Uh, I have a few kind of random questions along these lines, and then I wanna
53:19 – 57:03
Reinforcement learning from AI feedback (RLAIF)
1. LRLenny Rachitsky
  zoom out again. Uh, so you mentioned this idea of AI being aligned, uh, using its own model-
2. BMBenjamin Mann
  Mm-hmm.
3. LRLenny Rachitsky
  ... like reinforcing itself, is- you have this term R-L-A-I-F. Is that what that describes?
4. BMBenjamin Mann
  Yeah. So RLAIF is, uh, reinforcement learning from AI feedback.
5. LRLenny Rachitsky
  Okay. So, uh, people have heard of RLHF, reinforcement learning with human feedback. I don't think a lot of people have heard this, so, uh, talk about just the significance of this shift you guys have made in training your models.
6. BMBenjamin Mann
  Yeah. So RLAIF, constitutional AI is- is an example of this, where there are no humans in the loop, um, and yet the AI is sort of self-improving in ways that we want it to. And another example of RLAIF is, uh, if you have models writing code but- and other models commenting on various aspects of what that code looks like of, like, is it maintainable? Is it correct? Uh, does it pass the linter? Things like that. Um, that also could be included in RLAIF. And the idea here is that if models can self-improve, then it's a lot more scalable than finding a lot of humans. Ultimately, people think about this as probably gonna hit a wall, because if the model, uh, isn't good enough to, like, see its own mistakes, then how could it improve? And also, if- if you read the AI 2027 story, there's a lot of risk of, like, if the model is in a box trying to improve itself, then it could go completely off the rails and have these, uh, secret goals, like resource accumulation and power seeking and resistance to shutdown, that you really don't want in a very powerful model. And we've actually seen that in some of our experiments in- in laboratory settings. So...How do you- ha- do recursive self-improvement and make sure it's aligned at the same time? I think that's, that's the name of the game. And to me, it just nets out to, how do humans do that, and how do human organizations do that? Um, so like, corporations are probably, like, the most scaled human agents today. They, they like, have certain goals that they're trying to reach and, uh, they have certain guiding principles, they have some oversight in terms of shareholders and stakeholders and board members. How do you make corporations aligned and able to, sort of, recursively self-improve? And another model to look at is science where the purpose of science is to do things that have never been done before and push the frontier. And to me, it all comes down to empiricism. So when people don't know what the truth is, they come up with theories and then they design experiments to try them out. And similarly, if we can give models those same tools, then we could expect them to, sort of improve recursively in an environment and potentially become much better than humans could be just by banging their head against reality, or I guess metaphorical head.
7. LRLenny Rachitsky
  (laughs)
8. BMBenjamin Mann
  Um, so I guess, uh, I don't expect there to be a wall in terms of models' ability to improve themselves if we can give them access to the ability to be empirical. And I guess like, Anthropic deeply in its DNA is a- an empirical company. We, uh, we have a lot of physicists, uh, like Jared, who's our chief research officer who I've worked with a lot, um, was a professor of black hole physics at Johns Hopkins. Uh, and I guess he technically still is, but on leave. So yeah, it's in our DNA and, uh... Yeah, I guess that's the, that's the RLAF.
57:03 – 1:00:11
AI's biggest bottlenecks
1. BMBenjamin Mann
2. LRLenny Rachitsky
  So let me just follow this thread on in terms of bottleneck, this is kind of a tangent, but just what is the big- what is the biggest bottleneck today on, on model intelligence improvement?
3. BMBenjamin Mann
  The stupid answer is data centers and power. Chips. Uh, like I think if we had 10 times as many chips and had the data centers to power them, then we would... Maybe we wouldn't go 10 times faster, but it would be a real significant speed boost.
4. LRLenny Rachitsky
  So it's actually very much scaling laws, just more compute.
5. BMBenjamin Mann
  Yeah, I think that's a big one. Um, and then the people really matter. Like, we have great researchers and many of them have made really significant contributions to, uh, the science of how the models improve. And so, uh, it's like compute algorithms and data. Those are the three ingredients in the scaling laws. And, uh, just to make that concrete, like before we had transformers, we had LSTMs and we've done scaling laws on, uh, like what the exponent is on those two things, and we found that for transformers, the exponent is higher. And making changes like that where as you increase scale, you also increase your ability to squeeze out intelligence. Those kinds of things are super impactful. Uh, and so having more researchers who can do better science and, and find out how do we squeeze out more gains is another one. And then with the rise of reinforcement learning, like the efficiency with which these things run on chips also matters a lot. So we've seen in the industry like a 10X decrease in cost for, uh, a given amount of intelligence through a combination of algorithmic data and, uh, effi- and efficiency improvements. And if that continues, you know, in three years we'll have 1,000X smarter models for the same price. Kind of hard to imagine.
6. LRLenny Rachitsky
  I forget where I heard this, but it's just, it's amazing that so many innovations came together at the same time to allow for this sort of thing and continue to progress where one thing isn't just slowing everything down like we're out of some rare earth mineral or we just can't optimize, uh, I don't know, reinforcement learning more. Like, it's amazing that we continue to find improvements and there isn't one thing that's just slowing everything down.
7. BMBenjamin Mann
  Yeah, I think it really is just a combination of everything. Um, probably we'll hit a wall at some point like-
8. LRLenny Rachitsky
  (laughs)
9. BMBenjamin Mann
  ...uh, I guess in semiconductors, like my brother works in the semiconductor industry and he was telling me that you can't actually shrink the size of the transistors anymore because the way semiconductors work is you dope it with- you dope silicon with other elements and the doping process would result in either zero or one atom of the doped elements inside a single fin because they're so, so, so tiny.
10. LRLenny Rachitsky
  Oh my god.
11. BMBenjamin Mann
  And that's just wild to think of and yet Moore's Law somehow continues in, in some form. Um, and so like, yes, there are these, like, theoretical physics constraints that people are starting to run into, and yet they're finding ways around it, so...
12. LRLenny Rachitsky
  We've gotta start using parallel universes for some of this stuff.
13. BMBenjamin Mann
  Yeah, I guess
1:00:11 – 1:02:36
Personal reflections on responsibilities
1. BMBenjamin Mann
  so. (laughs)
2. LRLenny Rachitsky
  Okay. I wanna zoom out and talk about just Ben, Ben as a human for a moment before we get to a very exciting lightning round. Imagine just kind of the burden of feeling responsible for safe super intelligence is a, is a heavy one. It feels like you're in a place where you can make a significant impact on the future of safety in AI. Uh, that's a lot of weight to carry. How does that just impact you personally, impact your life, how you see the world?
3. BMBenjamin Mann
  Um, there's this book that I read in 2019 that really informs how I think about sort of working with these very weighty topics, uh, called Replacing Guilt by Nate Soares. And he describes a lot of different techniques for kind of working through this kind of thing. Uh, and he's actually the executive director at MIRI, the Machine Intelligence Research Institute, which is, uh, a- an AI safety tank that I worked at, uh, for a couple of months actually. And one of the things he talks about, uh, is this thing called resting in motion.... where, uh, some people think that, like, the default state is rest. Uh, but actually, uh, that was never, like, in, in the state of evolutionary a- adaptation, I really doubt that that was true, you know. Where, like, in, in nature, in the wilderness, being hunter-gatherers, and it's really unlikely that we evolved to just be at leisure. Um, probably always have something to worry about of, like, defending the tribe and finding enough food to survive and taking care of the children, dealing with disease.
4. LRLenny Rachitsky
  Spreading our genes.
5. BMBenjamin Mann
  Yeah. Um, and so I, I think about that as, like, the, the busy state is the normal state, and to try to work at a sustainable pace, that it's a marathon, not a sprint. Um, that, that's one thing that helps. And then, just being around like-minded people that also care. Uh, it's, it's not a thing that any of us can do alone. Um, and Anthropic has incredible talent density. One of the things I love the most about our culture here is that it's very ego-less. People just want the right thing to happen. Um, and I think that's, that's another big reason that the mega offers from other companies tend to bounce off, 'cause people just love being here and they, they care.
6. LRLenny Rachitsky
  That's amazing. I don't know how you'd do it. I'd be extremely stressed. (laughs) I'm gonna try this resting in motion
1:02:36 – 1:07:48
Anthropic’s growth and innovations
1. LRLenny Rachitsky
  (laughs) strategy. Okay. So, you've been at Anthropic for a long time, from the very beginning. I was reading, there was seven employees back in 2020, to... there's over a thousand. I don't know what the latest number is, but I know it's over a thousand. Uh, I've heard also that you've done basically every job at Anthropic. You made big contributions to a lot of the core products, the brand, the team hiring. Uh, let me just ask, I guess, how... what's most changed over that period? Like, what is most different from the beginning days, and which of those jobs that you've had over the years have you most loved?
2. BMBenjamin Mann
  I, I probably had, like, 15 different roles, honestly.
3. LRLenny Rachitsky
  (laughs)
4. BMBenjamin Mann
  Uh, I was head of security for a bit. I managed the ops team when our president was on mat leave. I was, like, crawling around under tables, like, plugging in HDMI cords and, uh, and, like, doing pen testing on our building, and uh, I started our product team from scratch and, and convinced the whole company that we needed to have a product instead of just being a research, uh, company. So yeah, it's been a lot. All of it very fun. I think my favorite role in that time has been, uh, when I started the labs team about a year ago, whose fundamental goal was to do transfer from research to end user tech-... uh, products and, and experiences. Because fundamentally, I think the way that Anthropic can differentiate itself and, and really win is to be on the cutting edge. Like, we have access to the latest, greatest stuff that's happening, and I think, honestly, through our safety research, we have a big opportunity to do things that no other company can safely do. So for example, with computer use, I think that's gonna be our huge opportunity. Basically, like, to make it possible for an agent to use all your credentials on your computer, uh, there has to be a huge amount of trust. And to me, we need to basically solve safety to make that happen. Safety and alignment. So I'm pretty bullish on that kind of thing, and I, I think we're gonna see really cool stuff coming out soon-ish. Yeah, just leading that team has been so fun. MCP came out of that team, Cloud Code came out of that team.
5. LRLenny Rachitsky
  Wow.
6. BMBenjamin Mann
  Um, and, uh, the, the people who I hired are, like, Combo, have been a founder, and also have been, uh, at big companies and seen how things work at scale, so it's just been an incredible team to work with and, uh, and figure out the future with.
7. LRLenny Rachitsky
  I want to hear more about this team, actually. The person that connected us, the reason we're doing this is a mutual friend, colleague, Raf Lee, who I used to work with at Airbnb, now works on this team, leads a lot of this work. Uh, and so he wanted me to make sure I asked about this team, 'cause, uh, uh, I didn't realize all these things came out of that team. Holy moly. So, what else should people know about this team? It used to be called Labs. I think it's called Frontiers now?
8. BMBenjamin Mann
  That's right, yeah.
9. LRLenny Rachitsky
  Cool. Uh, so the idea here is this team works with the latest technologies that you guys have built and explores what is possible. Is that the general idea?
10. BMBenjamin Mann
  Yeah. Um, and I guess, uh, I was part of Google's Area 120, and I've read, uh, about, like, Bell Labs and, and how to make these innovation teams work. It's really hard to do right, and I wouldn't say that we've done everything right, but I think we've done some, like, serious innovation on, on the state of the art from company design, and Raf has been right at the center of that. Uh, when I was first spinning up a team, the first thing I did was hire a great manager, and that was Raf. Um, and so he's definitely been crucial in, in building the team and, and helping it operate well. And we defined some operating models like the journey of an idea from prototype to product, and how should graduation of products and projects work? How do teams, uh, do sprint models that are effective and, uh, and make sure that they're working on the right ambition level of thing? Um, so that's been really exciting. I guess, uh, concretely, we think about skating to where the puck is going, and what that looks like is really understand the exponential. Um, there's this great, uh, study that Meter has done that, uh... Beth Barnes is the CEO of that organization, and, uh, shows, like, how long a time horizon of software engineering task can be done, and just really internalizing that of, like, "Okay, don't build for today. Build for six months from now. Build for a year from now. And the things that aren't quite working, that are working 20% of the time, will start working 100% of the time." And I think that's really what made Cloud Code a success, that we thought, you know, people are not gonna be locked to their IDs forever. People are not gonna be, like, auto-completing. People will be doing everything that a software engineer needs to do, and a terminal is a great place to do that, 'cause a terminal can live in lots of places. A terminal can live...... on your local machine, it can live in GitHub Actions, it can live on a remote machine, in your cluster, like that's, that's sort of like the leverage point for us. And that was a lot of the inspiration, so I, I, I think that's what the labs team tries to think about. Are we AGI-filled enough?
11. LRLenny Rachitsky
  (laughs) What a fun place to be. By the way, fun fact, Raf was my first manager at Airbnb when I joined.
12. BMBenjamin Mann
  Oh, amazing.
13. LRLenny Rachitsky
  I was an engineer, and he was my first manager. (laughs) It all worked out.
14. BMBenjamin Mann
  Well, yeah.
1:07:48 – 1:14:58
Lightning round and final thoughts
1. BMBenjamin Mann
2. LRLenny Rachitsky
  Um, yeah. Okay, final question before the very exciting lightning round. This, uh, I've never asked this question before, I'm curious (laughs) what your answer would be. If you could ask a future AGI one single question and be guaranteed to get the right answer, what would you ask?
3. BMBenjamin Mann
  Uh, I have two dumb answers.
4. LRLenny Rachitsky
  Okay. (laughs)
5. BMBenjamin Mann
  First, for fun.
6. LRLenny Rachitsky
  Oh, okay.
7. BMBenjamin Mann
  The first is, there's this Asimov short story I love called The Last Question, where the protagonist is, throughout the eras of history, is trying to ask this superintelligence, "How do we prevent the heat death of the universe?" And I won't spoil the ending, but, uh, it's a fun question. And then...
8. LRLenny Rachitsky
  So you would ask it that question because the one in the story wasn't un- unsatisfying or...
9. BMBenjamin Mann
  Okay, I'll give it away. So th- the, it keeps saying, "Need more information, need more compute." And then finally, as it's approaching the heat death of the universe, it, like, says, "Let there be light," and then it starts the universe over again. (laughs)
10. LRLenny Rachitsky
  Wow. Oh, wow. That's beautiful. That's beautiful.
11. BMBenjamin Mann
  Um, so that's the first cheat answer. The second cheat answer is, uh, "What question can I ask you to get n more questions answered?"
12. LRLenny Rachitsky
  (laughs) Classic.
13. BMBenjamin Mann
  And then the third answer, which is, is my real question is, "How do we ensure the continued flourishing of humanity into the indefinite future?" That's, that's the question I'd love to know, and if I can be guaranteed a correct answer, then seems very valuable to ask.
14. LRLenny Rachitsky
  Hm. I wonder what would happen if you asked Claude that today, and then how that answer changes over the n- over the next couple years.
15. BMBenjamin Mann
  Yeah. I, maybe I'll try that. I'll, I'll put it into the deep research thing that we have-
16. LRLenny Rachitsky
  Oh my god. I'm excited.
17. BMBenjamin Mann
  ... and, and see what it comes out with.
18. LRLenny Rachitsky
  Okay. I'm excited to see what you come up with. Uh, Ben, is there anything else you wanted to mention or leave listeners with, uh, maybe as a final nugget before we get to our very exciting lightning round?
19. BMBenjamin Mann
  Yeah. Um, I guess my, my push would be, like, these are wild times. If y- if they don't seem wild to you, then I, you must be living under a rock. But also, get used to it, because this is as normal as it's gonna be. It's gonna be much weirder very soon, um, and if you can sort of, like, mentally prepare yourself for that, I think you'll be better off.
20. LRLenny Rachitsky
  I need to make that the title of this episode, "It's Gonna Get Much Weirder Very Soon." Uh, I 100% believe that. Oh my god. I don't know what's in store. Uh, I love how you're at the center of it all. With that, we've reached our very exciting lightning round. I've got five questions for you. Are you ready?
21. BMBenjamin Mann
  Yeah, let's do it.
22. LRLenny Rachitsky
  What are two or three books that you find yourself recommending most to other people?
23. BMBenjamin Mann
  Uh, the first one I mentioned before, Replacing Guilt by Nate Soares. Love that one. Um, the second one is Good Strategy/Bad Strategy by Richard Rumelt. Just thinking about, in a very clear way, how do you build product. Uh, it's one of the best strategy books I've read. And strategy is a hard word to, to even think about in many ways. And then the last one is The Alignment Problem by Brian Christian. Um, just really thoughtfully goes through, like, what is this problem that we care about, that we're trying to solve here, what are the stakes, in a, a version that's, like, more updated and easier to read and digest than Superintelligence.
24. LRLenny Rachitsky
  I've got Good Strategy/Bad Strategy right behind me, I think I'm gonna point to it. There it is.
25. BMBenjamin Mann
  Nice.
26. LRLenny Rachitsky
  And I've had Richard Rumelt on the podcast, in case anyone wants to hear from him directly. Next question, do you have a favorite recent movie or TV show you've really enjoyed?
27. BMBenjamin Mann
  Pantheon was really good, based on, uh, uh, Ken Liu or Ted Chiang story. Uh, Ken Liu, I think. Um, super good, talks about, like, what does it mean if we have uploaded intelligences and what are their moral and ethical, uh, exigencies. Ted Lasso, which, uh, is supposedly about soccer, but, uh, actually it's about, like, human relationships and how we, how people get along, and just, like, super heartwarming and funny. And then this isn't really a TV show, but Kurzgesagt is my favorite YouTube channel and, uh, goes through, like, random science and, uh, and like, social problems, and is just super well done and super, super well made. Uh, love watching that.
28. LRLenny Rachitsky
  Wow, haven't heard of that. As you were talking, I feel like Ted Lasso. I feel like that's-
29. BMBenjamin Mann
  Yeah.
30. LRLenny Rachitsky
  ... what you need to put into constitutional AI. Act like Ted Lasso.

Episode duration: 1:14:58

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode WWoyWNhx2XU

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Introduction to Benjamin

The AI talent war

AI progress and scaling laws

Defining AGI and the economic Turing test

The impact of AI on jobs

Preparing for an AI future

Founding Anthropic

Balancing AI safety and progress

Constitutional AI and model alignment

The importance of AI safety

The risks of autonomous agents

Forecasting superintelligence

How hard is it to align AI?

Reinforcement learning from AI feedback (RLAIF)

AI's biggest bottlenecks

Personal reflections on responsibilities

Anthropic’s growth and innovations

Lightning round and final thoughts

Get more out of YouTube videos.