EVERY SPOKEN WORD
80 min read · 16,103 words- AGAakash Gupta
In the future, are PMs gonna have to learn how to manage not just humans, but agents?
- JBJake Brill
Absolutely, you're gonna be interacting with, managing, collaborating with other people. I think it's very much the case with agents, and so there's gonna be a new skill set that people need to develop.
- AGAakash Gupta
Jake Brill is the head of Integrity Product at the hottest AI company in the world right now. He works at OpenAI. Before that, he was a director of product at Instacart and a product manager at Meta. So you guys just released GPT-5. It was a whirlwind launch. What did it feel like inside?
- JBJake Brill
Truly energizing.
- AGAakash Gupta
What do PMs need to know about agents?
- JBJake Brill
If you look at your average product manager and you said, "What's your AI strategy?" or, "How are you building an AI-first product?" A lot of people wouldn't have necessarily had an answer for that. And over the past couple of years, it's like if you're not building a product that has AI fundamentally in its DNA, you're not really keeping up with the future of digital technology.
- AGAakash Gupta
One of the craziest things I heard about agents recently was that agent internally was manipulating performance data. [laughs] How do you deal with a world where agents learn to cheat?
- JBJake Brill
It's a hard problem, a very big research problem we're gonna keep working on.
- AGAakash Gupta
You mentioned Slack. I think that was one of the coolest insights that came out about OpenAI in the last few weeks, that the company almost entirely runs on Slack.
- JBJake Brill
Yeah, almost entirely. I would say conservatively, like, ninety percent of my written communication is in Slack.
- AGAakash Gupta
If you're a company who's maybe not a motto company, but you're trying to create an eval system as robust as OpenAI, what would be your lessons for them?
- JBJake Brill
Well, in a lot of cases, you don't have to build these evals from scratch. There are industry-standard evals that you can use.
- AGAakash Gupta
What does the PM role look like in five years? Really quickly, I think a crazy stat is that more than fifty percent of you listening are not subscribed. If you can subscribe on YouTube, follow on Apple or Spotify podcasts, my commitment to you is that we'll continue to make this content better and better. And now on to today's episode. Jake Brill is the head of Integrity Product at the hottest AI company in the world right now. He works at OpenAI. Before that, he was a director of product at Instacart and a product manager at Meta. We have a ton to talk about, especially as it relates to how does OpenAI handle negative scenarios within AI? How does OpenAI build product? Jake, welcome to the podcast.
- JBJake Brill
Thank you so much for having me. Um, it's a real thrill to be here.
- AGAakash Gupta
So you guys just released GPT-5. It was a whirlwind launch. You guys were actually everywhere. I think this was the most marketed launch I saw from you guys. What did it feel like inside?
- JBJake Brill
Truly energizing. You know, we've been working on GPT-5 for a very long time, and in the days, weeks, months leading up to the launch, the energy was just palpable. We knew how incredible of a model we had. We knew how much of a game changer it would be for people to experience a reasoning model, because most people haven't. And so bringing that power to people, um, it, it just felt su-- like we're just one step further in our quest towards AGI and fulfilling our mission. The energy was great. The day of the, the launch, you know, it was a really well-done, uh, launch event, and just seeing people's reaction in Slack, starting to see the, the graphs, like, watching people use this product, it, it just-- It felt incredibly fulfilling because this was-- This is a lot of really smart people working really hard for a long time, and seeing it come to the world, it was just like, "Okay, we're here." Very exciting.
- AGAakash Gupta
You mentioned Slack. I think that was one of the coolest insights that came out about OpenAI in the last few weeks, that the company almost entirely runs on Slack.
- JBJake Brill
Yeah, almost entirely. I would say conservatively, like, ninety percent of my written communication is in Slack.
- AGAakash Gupta
Wow. And what does it look like? Like, is there any special things about your guys' Slack setup? Do you guys have a lot of AI agents in there keeping things updated?
- JBJake Brill
Yeah. You know, it's funny because at Instacart, we were really Slack-heavy as well, and Instacart was fully remote for my last several years at the company, and so we relied a ton on Slack there. So when I joined OpenAI, I had something that I could fall back on. And so I was used to a company that ran heavily on Slack. In terms of what makes our usage of Slack, um, special and more unique for us, yes, absolutely, we have AI agents in the, in the channels. Uh, they can serve any number of purposes. You know, there's, there's one that we have that's like a general purpose answer, uh, agent, which is really helpful. So if you have a general Q&A in the channels where that agent exists, you can get questions without one of your coworkers having to see the post and respond to it manually. So, um, we very much believe at OpenAI in, in, in using AI in all parts of our, uh, user and enterprise experience. So yeah, it's, it's no different in Slack.
- AGAakash Gupta
Very interesting. Now, you head up Integrity Product.
- JBJake Brill
Mm-hmm.
- AGAakash Gupta
What is Integrity Product's role in a big launch like GPT-5?
- JBJake Brill
Sure. So there are a couple big areas that we're involved in this launch. The first is what you might classically think of when it comes to integrity, you know, preventing p- bad things from happening, people from doing bad things with our models and our products. And we certainly did a lot of work in the lead-up to and following the launch i, in that space for GPT-5. In addition to our work on preventing bad things from happening, um, one thing that of-often happens when we have new launches is we see a lot of people try ChatGPT for the first time or try our new, new models via the API for the first time. And another area that Integrity is responsible for is our iden- our identity system. Um, so when we, when we, uh, release new models, like this happened with our Image Gen model, um, as well as GPT-5, you see an increase in traffic, people wanting to sign up for the first time, people logging into their accounts. You wanna make sure those systems don't fall over and that you h- your systems are able to take meaningful increases in volume so that they have-- that they maintain their world-class uptime and latency because, you know, it would be a really bad user experience if people try to go use, use GPT-5 for the first time and they just get an error message. Similarly, our team is responsible for our financial systems and-Uh, when we launch new models, it's often the case that that will lead to people converting to paid subscriptions or more people using prepaid API credits to use the model. And so similarly, we wanna make sure that our payment systems are, are up, they're running, they're seamless, you have good a- good, uh, authorization rates, but also that you don't see an influx of people with stolen credit cards, uh, fraudulently, fraudulently using our, our models and our products. So a lot of work, um, behind the scenes to make sure that GPT-5 shines in, in all of our different products.
- AGAakash Gupta
Fun fact, Amazon put two holds on my account as I ordered the equipment for this in-person [chuckles] podcast today. So it's not just on the credit card side. Even the person who's making a purchase at OpenAI or in Amazon is monitoring those accounts to see, are these suspicious transactions, and flagging those appropriately.
- JBJake Brill
Exactly.
- AGAakash Gupta
Very interesting. So if we break down, let's say, let's stick to GPT-5 for a second. What are the big streams of work that a company like OpenAI is doing to make sure that it meets your integrity standards? Is there red teaming? Are there certain evals? Are there certain... What are the big-- How would I put it into buckets?
- JBJake Brill
Yeah, absolutely. We do, we do red teaming, not just for new launches, but for our products a- all across the board. And red teaming happens at all stages of product development.
- AGAakash Gupta
Right.
- JBJake Brill
When you're training the model, when you're training your production systems to, uh, to mitigate, uh, harms from happening. We have both manual red teaming as well as automated red teaming. Uh, we'll do that at different checkpoints while the model is being trained. We do it with the final version of the model before it goes live. And then we also do red teaming after models go live, just to make sure it's like, okay, well, we prevented the known jailbreaks. Are there new jailbreaks that are out in the wild that pe- that our automated systems or, or, uh, penetration testers are trying to, to, to find ba- um, ways to, to get around the protections in our model? Um, other sorts of work that we do, um, there's a lot of work to make sure that our automated systems have high precision and high recall. You know, we wanna make sure that, um, uh, if we're taking some sort of automated action, that we're really accurate with that, because we take it very seriously if we, um, block a generation or warn someone's account, or even at the extreme, ban someone. Like, those are very serious interventions, and we wanna make sure that we hold ourselves to a really high standard that when we take those actions, we have really high confidence that we're getting it right. Similarly, we wanna make sure that our recall is high so that we don't j- we're just not blind to false negatives and, and there's all sorts of bad stuff going out there that we're not aware of. Um, we also do a lot of making sure that we have robust, um, operational tooling and capacity, um, to make sure that, you know, we project forward, we think X number of things will require manual review. Um, and we wanna make sure our tools are sufficient so that they, um, that people have extremely e- uh, efficient workflows and that, um, we have enough human power and, and bandwidth so that we can get to things in a very timely fashion.
Episode duration: 1:21:05
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode j4gdMHYfHaE
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome