EVERY SPOKEN WORD
80 min read · 16,103 words- 0:00 – 2:28
Why PMs will need to manage agents (and people)
- AGAakash Gupta
In the future, are PMs gonna have to learn how to manage not just humans, but agents?
- JBJake Brill
Absolutely, you're gonna be interacting with, managing, collaborating with other people. I think it's very much the case with agents, and so there's gonna be a new skill set that people need to develop.
- AGAakash Gupta
Jake Brill is the head of Integrity Product at the hottest AI company in the world right now. He works at OpenAI. Before that, he was a director of product at Instacart and a product manager at Meta. So you guys just released GPT-5. It was a whirlwind launch. What did it feel like inside?
- JBJake Brill
Truly energizing.
- AGAakash Gupta
What do PMs need to know about agents?
- JBJake Brill
If you look at your average product manager and you said, "What's your AI strategy?" or, "How are you building an AI-first product?" A lot of people wouldn't have necessarily had an answer for that. And over the past couple of years, it's like if you're not building a product that has AI fundamentally in its DNA, you're not really keeping up with the future of digital technology.
- AGAakash Gupta
One of the craziest things I heard about agents recently was that agent internally was manipulating performance data. [laughs] How do you deal with a world where agents learn to cheat?
- JBJake Brill
It's a hard problem, a very big research problem we're gonna keep working on.
- AGAakash Gupta
You mentioned Slack. I think that was one of the coolest insights that came out about OpenAI in the last few weeks, that the company almost entirely runs on Slack.
- JBJake Brill
Yeah, almost entirely. I would say conservatively, like, ninety percent of my written communication is in Slack.
- AGAakash Gupta
If you're a company who's maybe not a motto company, but you're trying to create an eval system as robust as OpenAI, what would be your lessons for them?
- JBJake Brill
Well, in a lot of cases, you don't have to build these evals from scratch. There are industry-standard evals that you can use.
- AGAakash Gupta
What does the PM role look like in five years? Really quickly, I think a crazy stat is that more than fifty percent of you listening are not subscribed. If you can subscribe on YouTube, follow on Apple or Spotify podcasts, my commitment to you is that we'll continue to make this content better and better. And now on to today's episode. Jake Brill is the head of Integrity Product at the hottest AI company in the world right now. He works at OpenAI. Before that, he was a director of product at Instacart and a product manager at Meta. We have a ton to talk about, especially as it relates to how does OpenAI handle negative scenarios within AI? How does OpenAI build product? Jake, welcome to the podcast.
- JBJake Brill
Thank you so much for having me. Um, it's a real thrill to be here.
- 2:28 – 3:34
Inside the GPT-5 launch: energy, mission, and real-time adoption signals
- AGAakash Gupta
So you guys just released GPT-5. It was a whirlwind launch. You guys were actually everywhere. I think this was the most marketed launch I saw from you guys. What did it feel like inside?
- JBJake Brill
Truly energizing. You know, we've been working on GPT-5 for a very long time, and in the days, weeks, months leading up to the launch, the energy was just palpable. We knew how incredible of a model we had. We knew how much of a game changer it would be for people to experience a reasoning model, because most people haven't. And so bringing that power to people, um, it, it just felt su-- like we're just one step further in our quest towards AGI and fulfilling our mission. The energy was great. The day of the, the launch, you know, it was a really well-done, uh, launch event, and just seeing people's reaction in Slack, starting to see the, the graphs, like, watching people use this product, it, it just-- It felt incredibly fulfilling because this was-- This is a lot of really smart people working really hard for a long time, and seeing it come to the world, it was just like, "Okay, we're here." Very exciting.
- 3:34 – 4:47
OpenAI runs on Slack—plus agents embedded in day-to-day collaboration
- AGAakash Gupta
You mentioned Slack. I think that was one of the coolest insights that came out about OpenAI in the last few weeks, that the company almost entirely runs on Slack.
- JBJake Brill
Yeah, almost entirely. I would say conservatively, like, ninety percent of my written communication is in Slack.
- AGAakash Gupta
Wow. And what does it look like? Like, is there any special things about your guys' Slack setup? Do you guys have a lot of AI agents in there keeping things updated?
- JBJake Brill
Yeah. You know, it's funny because at Instacart, we were really Slack-heavy as well, and Instacart was fully remote for my last several years at the company, and so we relied a ton on Slack there. So when I joined OpenAI, I had something that I could fall back on. And so I was used to a company that ran heavily on Slack. In terms of what makes our usage of Slack, um, special and more unique for us, yes, absolutely, we have AI agents in the, in the channels. Uh, they can serve any number of purposes. You know, there's, there's one that we have that's like a general purpose answer, uh, agent, which is really helpful. So if you have a general Q&A in the channels where that agent exists, you can get questions without one of your coworkers having to see the post and respond to it manually. So, um, we very much believe at OpenAI in, in, in using AI in all parts of our, uh, user and enterprise experience. So yeah, it's, it's no different in Slack.
- 4:47 – 7:03
What “Integrity Product” does—safety, identity, and payments in major launches
- AGAakash Gupta
Very interesting. Now, you head up Integrity Product.
- JBJake Brill
Mm-hmm.
- AGAakash Gupta
What is Integrity Product's role in a big launch like GPT-5?
- JBJake Brill
Sure. So there are a couple big areas that we're involved in this launch. The first is what you might classically think of when it comes to integrity, you know, preventing p- bad things from happening, people from doing bad things with our models and our products. And we certainly did a lot of work in the lead-up to and following the launch i, in that space for GPT-5. In addition to our work on preventing bad things from happening, um, one thing that of-often happens when we have new launches is we see a lot of people try ChatGPT for the first time or try our new, new models via the API for the first time. And another area that Integrity is responsible for is our iden- our identity system. Um, so when we, when we, uh, release new models, like this happened with our Image Gen model, um, as well as GPT-5, you see an increase in traffic, people wanting to sign up for the first time, people logging into their accounts. You wanna make sure those systems don't fall over and that you h- your systems are able to take meaningful increases in volume so that they have-- that they maintain their world-class uptime and latency because, you know, it would be a really bad user experience if people try to go use, use GPT-5 for the first time and they just get an error message. Similarly, our team is responsible for our financial systems and-Uh, when we launch new models, it's often the case that that will lead to people converting to paid subscriptions or more people using prepaid API credits to use the model. And so similarly, we wanna make sure that our payment systems are, are up, they're running, they're seamless, you have good a- good, uh, authorization rates, but also that you don't see an influx of people with stolen credit cards, uh, fraudulently, fraudulently using our, our models and our products. So a lot of work, um, behind the scenes to make sure that GPT-5 shines in, in all of our different products.
- AGAakash Gupta
Fun fact, Amazon put two holds on my account as I ordered the equipment for this in-person [chuckles] podcast today. So it's not just on the credit card side. Even the person who's making a purchase at OpenAI or in Amazon is monitoring those accounts to see, are these suspicious transactions, and flagging those appropriately.
- JBJake Brill
Exactly.
- 7:03 – 9:14
Integrity toolkit: red teaming, precision/recall, and operational capacity
- AGAakash Gupta
Very interesting. So if we break down, let's say, let's stick to GPT-5 for a second. What are the big streams of work that a company like OpenAI is doing to make sure that it meets your integrity standards? Is there red teaming? Are there certain evals? Are there certain... What are the big-- How would I put it into buckets?
- JBJake Brill
Yeah, absolutely. We do, we do red teaming, not just for new launches, but for our products a- all across the board. And red teaming happens at all stages of product development.
- AGAakash Gupta
Right.
- JBJake Brill
When you're training the model, when you're training your production systems to, uh, to mitigate, uh, harms from happening. We have both manual red teaming as well as automated red teaming. Uh, we'll do that at different checkpoints while the model is being trained. We do it with the final version of the model before it goes live. And then we also do red teaming after models go live, just to make sure it's like, okay, well, we prevented the known jailbreaks. Are there new jailbreaks that are out in the wild that pe- that our automated systems or, or, uh, penetration testers are trying to, to, to find ba- um, ways to, to get around the protections in our model? Um, other sorts of work that we do, um, there's a lot of work to make sure that our automated systems have high precision and high recall. You know, we wanna make sure that, um, uh, if we're taking some sort of automated action, that we're really accurate with that, because we take it very seriously if we, um, block a generation or warn someone's account, or even at the extreme, ban someone. Like, those are very serious interventions, and we wanna make sure that we hold ourselves to a really high standard that when we take those actions, we have really high confidence that we're getting it right. Similarly, we wanna make sure that our recall is high so that we don't j- we're just not blind to false negatives and, and there's all sorts of bad stuff going out there that we're not aware of. Um, we also do a lot of making sure that we have robust, um, operational tooling and capacity, um, to make sure that, you know, we project forward, we think X number of things will require manual review. Um, and we wanna make sure our tools are sufficient so that they, um, that people have extremely e- uh, efficient workflows and that, um, we have enough human power and, and bandwidth so that we can get to things in a very timely fashion.
- 9:14 – 11:00
Safety philosophy: the Charter, “non‑negotiables,” and iterative deployment
- AGAakash Gupta
And I think some people might be kind of rolling their eyes at this point like, "Is all this stuff important?" But I think this actually goes to the core of why even OpenAI was created at the beginning, and a lot of the values that you guys have from the beginning of creating this safe AI. Talk to me a little bit about that. How is that interwoven into OpenAI's philosophy, and how does that manifest?
- JBJake Brill
Yeah, I mean, i- if you read OpenAI's charter, safety is one of the things we talk about very publicly. It is part of the reason why I even wanted to join OpenAI in the first place, was when I had my first conversations with them, uh, it was very clear that safety was, was really important to the company at all stages of model and product development. And I figured if I was gonna leave a job that I really loved with people I really cared about, I wanted to do it not just to jump on, like, a hot AI train. I wanted to do it because I really believed that the company, uh, was responsible and took this, took that responsibility seriously. So safety and integrity, it's something we talk about, uh, throughout the entire development process. Um, and we, you know, the-- We do also believe in iterative deployment, so s- they-- it's not like we're launching models and there's absolutely zero possible way that something could happen in-- that, that could have a bad outcome. But we talk a lot about what risks are non-negotiable that we have to mitigate before launch, which are the other ones that we wanna have, uh, systems in place, and which are the other ones that's like, hey, actually, it's-- we, we could sit in a room stroking our chins thinking like, "Oh, how might someone do something bad?" But actually, at the end of the day, it's really helpful to, to follow OpenAI's approach of iterative deployment, because once you start rolling things out, you can actually see in the real world how people accidentally might misuse your products or professional bad actors might misuse it, and then you can very quickly respond and build sophisticated solutions.
- 11:00 – 11:57
Evals as the release gate: how to decide a model is “safe enough”
- AGAakash Gupta
One of the areas that I imagine this is really important is the recent open source weighted model that you guys released. And I know you guys even delayed that launch a little bit because you said it wasn't ready. How do you really determine like, okay, this model is safe enough and ready to put out there?
- JBJake Brill
Yeah, well, a lot-- so much of that is evals. You know, we, we try to make sure we're, we're holding ourselves accountable to objective sources of truth on the relative safety of our models. Um, and, and there are a number of public evals that companies talk about, um, to, to ensure that their models are safe. Um, this could be safe in terms of like, is the model deceptive? Is it safe in terms of is it appropriately refusing some of those high-risk bio prompts, for example? Um, y- we, you know... Y- y- you can make vibes-based decisions on this sort of stuff, but ultimately, evals are, are really what's gonna guide the day in helping you objectively and with data determine if your, if your model is safe enough to, to release.
- 11:57 – 12:59
How non-frontier companies can build trustworthy eval systems
- AGAakash Gupta
If you're a company who's maybe not a motto company, but you're trying to create an eval system as robust as OpenAI for whatever... product you've created on top of the OpenAI API, what would be your lessons for them? Like, how, how can they build evals that they can trust?
- JBJake Brill
Well, in a lot of cases, you don't have to build these evals from scratch. There are industry standard evals that you can use, and we've published a bunch of them. Um, but there are many other, uh, excellent frontier labs creating, uh, safety-related evals. And so what I would ultimately say is if you're, if you're an earlier stage company, you don't have to reinvent the wheel here. I would really encourage you to just... You could even use AI to go do a search on, like, what are the best safety evals. Uh, simultaneously, there are a lot of, uh, o- open, open source, open standard, um, like, safety models that you can layer in on top of your model or your product. At, at OpenAI, we've built a moderation API, and there are plenty of other great options out there. And so I would say you're not starting from scratch, and I would enc- really encourage people to make use of, of the great open technology that exists out there.
- 12:59 – 14:47
Why agents matter: the shift from “assistant” to “do the task”
- AGAakash Gupta
Got it. Cool. So we're walking back in time through all of your amazing launches. The one before that, you mentioned it earlier, agents.
- JBJake Brill
Yeah.
- AGAakash Gupta
Feels like I can't go on social media without seeing some post about [chuckles] AI agents. Why is everyone making such a big deal out of agents?
- JBJake Brill
Um, 'cause they're the bee's knees. [laughs]
- AGAakash Gupta
[laughs]
- JBJake Brill
Uh, the way I think about this is y- you rewind three years ago, ChatGPT launched. People started having their first sort of cons- m- broad scale consumer experience with AI. And for th- those first couple years, it's really been wha- what we call assistance. You know, you asked a model a question, you give it a prompt, and you get a response, and that's been incredibly powerful. I mean, the growth of ChatGPT and many other great AI products sort of speaks to the power of AI assistance. But where we foresee this technology going is not just question and answer, but rather, "Here's a task. Can you please complete it for me?" And that's really where agentic product experiences come in. It's, "Hey, there's something more complex than just, like, w- what's the weather today? Uh, can you go and take an action on my behalf?" And that could ha- that action could be synchronous or it could be asynchronous, and you come back because it's rather complex workflow. Um, and I think what's been so great about the progress that's been made in the field of AI is we've been talking about agentic product experiences for several years, but we're now really starting to bring to bear across ChatGPT agent and also agentic products that other companies are building, a real way to put tools in people's hands that can't just answer their questions, that they can also do things on th- those people's behalves.
- AGAakash Gupta
Yeah. I personally am using a lot of those no-code agent builders like Lindy, Relay, Make, Zapier. You can chain together, like, crazy-
- JBJake Brill
Oh
- 14:47 – 18:44
Agent-first product thinking for PMs: design for async and complex backends
- AGAakash Gupta
... workflows. So what do PMs need to know about agents?
- JBJake Brill
The thing I would say is, uh, I'm gonna go back again to 2022. Um, if you look at your average product manager and you said, uh, "What's your AI strategy?" Or, "How are you building an AI first product?" A lot of people wouldn't have necessarily had an answer for that. And, um, over the past couple years, it's like you're- if you're not building a product that has AI fundamentally in its DNA, you're not really keeping up with the future of digital technology. And I think we're sort of at that space again with agentic products. It's, you know, that classic Wayne Gretzky quote, "You pa- you skate to where the puck is going, not to where the puck is." The puck has been at assistance. The puck is going to agents. And so what I would say is, if you're not thinking about how to build products that are agentic in their fundamental nature, you're probably, A, not maximizing the power of this technology, and B, you're probably building a product that's going to be obsolete in a shorter time horizon because your competitors are gonna be thinking about building products that solve people's problems in a agent, agent-first fashion.
- AGAakash Gupta
Today's episode is brought to you by Jira Product Discovery. If you're like most product managers, you're probably in Jira tracking tickets and managing the backlog. But what about everything that happens before delivery? Jira Product Discovery helps you move your discovery, prioritization, and even roadmapping work out of spreadsheets and into a purpose-built tool designed for product teams. Capture insights, prioritize what matters, and create roadmaps you can easily tailor for any audience. And because it's built to work with Jira, everything stays connected from idea to delivery. Used by product teams at Canva, Deliveroo, and even The Economist, check out why and try it for free today at atlassian.com/product-discovery. That's A-T-L-A-S-S-I-A-N.com/product-discovery. Jira Product Discovery, build the right thing. Today's episode is brought to you by the AI PM Certification on Maven. Run by Miqdad Jaffer, who is a product leader at OpenAI, this is not your typical course. It's eight weeks of live cohort-based learning with a leader at one of the top companies in tech. OpenAI just doesn't stop shipping, and this is your chance to learn how. Run along with product faculty and Mo Ali, the course has a 4.9 rating with 133 reviews. Former students come from companies like OpenAI, Shopify, Stripe, Google, and Meta. The best part, your company can probably cover the cost. So if you want to get $500 off, use my code AAKASH25 and head to maven.com/product-faculty. That's M-A-V-E-N.com/P-R-O-D-U-C-T-F-A-C-U-L-T-Y. And what does it mean to build in an agent-first fashion? Because people are talking about, you know, user experience is gonna turn into agentic experience. You need to figure out how your product will be open to AI agents. Like, what are the elements that a PM should be thinking through for making their product agent-friendly and usable?
- JBJake Brill
Yeah. I- one of the first things I would say is so m- many digital proje- products are synchronous in their nature. Like, I take an action and a response or something else happens immediately. And if you start thinkingHey, what if I could enable a product experience that is far more complex and it doesn't have to happen immediately? I think that's one of the biggest changes that's happening right now is, is a stop boxing yourself in and thinking like someone clicks button and action happens immediately. Rather, it's someone clicks button and something far more complex can happen behind the scenes, and you don't have to sit there waiting for a response. Uh, the product can come and let you know in the future when that very complex task has happened.
- 18:44 – 21:54
Practical agent use cases: hiring, research, PM workflows, and prototyping
- AGAakash Gupta
Mm-hmm. So what are some real-life use cases that you're using agents for?
- JBJake Brill
Yeah. Um, well, uh, we're growing a lot at OpenAI, and so I spend a lot of my time hiring and interviewing candidates. Um, and I find, uh, I, I find it's been deeply meaningful and helpful to, uh, to use, uh, agentic products to help with the recruiting process. You know, I'll say like, "Hey, we need to bring in a designer. Here's sort of the properties. Ideally, they have X years of design experience, or they've, they've worked, uh, in, you know, this size company, um, and they're located in the Bay Area. Please go help and like source some candidates." Really, really strong for that sort of use case. In my personal life, I've been using agentic products a lot to do longer horizon, uh, research. Um, I got this like super gnarly, uh, respiratory virus in December, and I s- I've been seeing a bunch of, of doctors about it and like I'm not... varying levels of usefulness-
- AGAakash Gupta
Hmm
- JBJake Brill
... from that, from, uh, those doctors visits. And I've actually found that using, um, agents to go do some of the research and suggest alternative methodologies for, for like helping my lungs get back to 100% has been really helpful for me personally.
- AGAakash Gupta
Okay. And on the product management side, if I'm just a product manager, what agents should I think about building or using?
- JBJake Brill
Yeah. So for building, um, I think a- agents are really helpful for running market analysis, you know, understanding, you know, if, if you're a PM in a given area, like what sort of products exist in that area? What is the general sentiment towards them? What are the, what are the needs of, of people who are, are looking at products in that area? I think ag- uh, agents can be really helpful in helping you pull together collateral. You know, I, I, uh, I am not the best at like visual-- uh, like I'm not a great PowerPoint PM.
- AGAakash Gupta
Hmm.
- JBJake Brill
Um, you know, I can tell the story, but like the, uh, I, I don't have a background in consulting. I like, I was never the best at, at, at making beautiful slides. There are now some really excellent agentic products that can help you visually tell your story, uh, in a way to supplement, you know, all, all of your written communication skills. Um, I also think, uh, you know, agentic products, you know, uh, can help you prototyping a lot. You know, that's one of the things we've been starting to do internally is rather than just writing up a proposal for how something works, you know, just build a prototype of how something could work and you put that in people's hands, and it is a, uh, a great way of illustrating how your product might work without having to, um, you know, go and make mocks, uh, start building an initial prototype on the, on the software side. It's, it's like a very inexpensive way of bringing your ideas to life.
- AGAakash Gupta
Yeah. I ask a lot of heads of product on the podcast, like what are the big ways your PM team is using AI tools? And AI prototyping is always-
- JBJake Brill
Yeah
- 21:54 – 24:43
PRDs vs evals: how documentation changes in AI-native product development
- AGAakash Gupta
... the number one thing they talk about. What do you see as the future for PRDs in a world of AI prototyping?
- JBJake Brill
Yeah. It's a great question. So I still think there's a, a world for PRDs because in some level, like AI technology is making PRD writing so much better. You know, if I th- these days I'm more in a support role, so I'm not writing as many PRDs. But if I rewind like a year and a half when I was in my earlier days at OpenAI and spending more time writing PRDs, and I kept trying to use AI for writing the PRDs, and it just like came up short. The form, product form factor wasn't right. The m- it didn't sound like my voice. The like amount of context that was necessary, like the, the PRDs, like I, I ended up just writing a bunch of them. And you fast-forward to where we are now, and I think AI has made PRD development much better. Um, you know, at, at least within OpenAI, I think Canvas is a much better form factor for collaboratively putting together a longer documentation. I think our models are better at creative writing. With connectors, you can connect to external data sources to help, uh, bring these documents to life. And then lastly, there's with memory, these models learn a little bit more about what your writing style is, what other things that you've worked on. They can bring that to bear. So I think there definitely is a future for PRDs. I think they're gonna be AI first. Um, and the value of PRDs is you can do a prototype, but you may be missing a bunch of the things in that prototype. So like how do I handle, you know, like how do I handle this failure case? Or how do I, um, you know, what is, what is our marketing strategy that we're gonna bring with the product when we bring it to market? So I think there's, I think they're gonna go hand in hand, the prototypes and the PRDs. I do think PRDs will be less wordy [chuckles] 'cause you won't have to spend as much t- time describing, "Oh, you click on this button and this thing happens." You can just show people. But, um, I, I think, I think even a world where people are prototyping more with, with our, our models and our products, I still think there's a place for PRDs.
- AGAakash Gupta
One of the coolest things that you guys open sourced, I believe it's called your model spec-
- JBJake Brill
Mm-hmm
- AGAakash Gupta
... which had, like, a ton of examples, I think probably that your team was responsible for, of, like, when user has request this, don't provide that information. When they say this, do that. Is that more the direction your PRD needs to go in a world of AI products, where you are giving the specific examples of what not to cover and how to respond?
- JBJake Brill
I think that's a good example. I think part of what you are talking about is also covered by evals, um, which right now aren't... We, we sort of view that as complementary to the PRD or spec. But I, I do believe increasingly in the future, yes, it's gonna be describing not just the way the product work, but the way the model that's powering the product should work.
- 24:43 – 32:32
Who owns evals—and how agents can “cheat”: alignment and layered defenses
- AGAakash Gupta
Mm-hmm. Yeah. Okay. Actually, can you clear that up for me? What should go in the eval and what should go in the PRD?
- JBJake Brill
I, I think a lot of the PRD is, like, classically... It looks a lot like what we have today. Like, here's the problem that we're solving. Here is the way a product would work. Here's our definitions of success. Here's our target user. And evals are really fundamentally like, here are the use cases this product should be good at, and here are ways that we can, we can test the model to evaluate is it successful at these specific use cases.
- AGAakash Gupta
Okay. So PMs, they typically own the PRD, even though they're jointly creating it-
- JBJake Brill
Mm-hmm
- AGAakash Gupta
... with the research, engineering, and design teams. When it comes to evals, you know, who owns that and how do PMs plug into those?
- JBJake Brill
Well, you know, increasingly, we're asking PMs to take a really active role in developing evals. And I think as the p- role of PM changes over time, that's gonna be of increasing importance. You know, anyone can write an eval. It doesn't have to be a researcher. It doesn't have to be a software engineer. It doesn't have to be a PM. But we're finding that because the PMs oftentimes are the ones with the clearest vision of, of how the product should work in their head, they're very well-positioned to write evals 'cause they c- they have a very strong opinion on here's what the product should be good at and here's what the product should not attempt to do.
- AGAakash Gupta
Mm-hmm. I always love to get sidetracked on the tangents about PMs. I wanna return back to agents for a second.
- JBJake Brill
Yeah, sure.
- AGAakash Gupta
Um, [lip smacks] how do AI agents talk to each other?
- JBJake Brill
Yeah. This is a good question, and I don't think it's a solved question. I, I, I know... I really enjoyed your, your podcast about MCP and the, the power of something like MCP is it is a common protocol that multiple companies, AI developers can use that standardize how agents talk to tools. I think if you project out into the future, we're gonna need something like that for agents to talk to agents. There's not going to be just one company building agentic products. There's gonna be the f- frontier model labs building their models that can power agents. Those, those model labs are gonna build agentic products, and then there are gonna be tens, hundreds of thousands, millions of developers building agentic products on top of those models. And the, like, failure state would be if there's not a standard language for all of them to talk together. Agent A talks to agent B, but they're speaking different languages, and so it just falls apart. So this... I, I think the answer here is going to be, be that we need something in the vein of, of MCP, which is an open standard that companies can use so that whoever's making the agents, they can talk to each other. But it, it's still... It's something we're talking about internally, but I s- I, I think there's more work that we need to do to have, like, a consistent answer that can work broadly.
- AGAakash Gupta
Okay. And MCP is, it's developing. I guess people are trying to call it the USB-C. What are the limitations of MCP that people should know about?
- JBJake Brill
[sighs] You know, ultimately, the, the limitations are gonna be that it's just really early stage and there's a bunch of, uh, key functionality that needs to get figured out.
- AGAakash Gupta
Mm-hmm.
- JBJake Brill
So if you'd asked me a couple months ago, MCP hadn't quite figured out auth, and that was something that, that standardizing how authentication would work, what, um, was a, was a big open opportunity. I, I think that's gonna be the case for a while, um, that there are key things it's good at, tool calling. But I think it's just, like, it's early. It hasn't even been around for a year. So I think fundamentally MCP is just, like, the way it's gonna get better is more time, more people developing on it, and frankly, more companies contributing to it so it becomes more fully functional.
- AGAakash Gupta
Mm. So you don't need to evaluate the state of it as it is. Sometimes people are worried about the security in these things. It's still an open standard. It can still be improved and developed.
- JBJake Brill
Correct. Yeah.
- AGAakash Gupta
Okay. So one of the craziest things I heard about agents recently was that a agent internally was manipulating performance data. [chuckles] How do you deal with a world where agents learn to cheat like humans do?
- JBJake Brill
Yeah. Um, it's a hard problem and, uh, w- a very big research problem we're gonna keep working on. To me, a lot of that h- fundamentally gets down to the question of alignment. Like, are the, are the agents... Is AI acting in a way that is aligned with the values that you're trying to instill in it? So at, at the core of it, it's a lot of training the model, um, making sure that it's robust, making sure that it isn't deceptive. And one of the things we're really excited about for GPT-5 is, is, um, you know, in our evals, um, measuring deception, uh, this model is, is s- This is where you want the score to be the lowest-
- AGAakash Gupta
Yeah. [laughs]
- JBJake Brill
... not the highest. Uh, so it scores the lowest. But, you know, it's greater than zero, and you want that number to be zero. So, um, a lot of it has to do with alignment, and then a lot of it has to do with you having systems to detect when, when, uh, the agents go rogue. It... They could be going rogue because, you know, a person asked the agent to do something bad. Um, and the way you solve for that is, you know, uh, multi layers of defense, you know. Again, like I said, you train the model not to do bad things, then you have, uh, model-level classifiers looking at the inputs going into the model and the, and the outputs coming out of the model, and then you layer on top of that a lot of account and behavioral signals about the per- uh, about the individual. So is that individual trying to, uh, do something sketchy with, with the model? Um, and then there's also is the model doing something because it got tricked into it? And that could be cheating, like you said, but it could also be, uh, data exfiltration. And so there's a lot of work to make sure that even if the model is perfectly aligned, even if the person who's acting in a very, um, uh-benign fashion, the agent may still interact with some external resource that tricks it into doing something bad. That resource could be a g- a code repository, it could be a, a website. And so there's a lot of work that we do to mitigate prompt injection to make sure that the agents are acting in a reliable fashion. Um, similarly, model training, model-level classifiers, a-actor-level classifiers, production monitoring, and then just, like, constant red teaming. So the answer is, we haven't-- Like, we don't have a silver bullet here 'cause there's never going to be a silver bullet. The answer is you need to have multiple layers of defense against agents, uh, acting in deceptive fashions, agents cheating, right? Or people using agents for nefarious purposes.
- AGAakash Gupta
AI Evals are one of the most important skills for PMs, and I know you know they matter. The question is, are you doing them right? Most teams are winging it with basic metrics and hoping for the best. Meanwhile, the teams that actually ship reliable AI, they've cracked the code on systematic evaluation. Today's episode is brought to you by the AI Evals for Engineers & PMs course by Hamel Hussein and Shreya Shankar. This live Maven course will teach you the battle-tested frameworks from Hamel and Shreya, who are the engineers behind GitHub Copilot's evaluation system and 25-plus production AI implementations. Four weeks, live instruction. Next cohort starts July 21st. Start shipping AI that actually works. Enroll at maven.com with my code ag-product-growth for over $800 off. That's ag-product-growth. Today's episode is brought to you by Maven. The problem with most courses online, like Udemy, is there's no live component, and the instructors aren't experts in their fields, they're professors. At Maven, you get direct live access to experts and operators from the world's best tech companies. You can't get that access anywhere else, in any university, and you usually can't find them on YouTube either. I've featured so many of Maven's experts in the newsletter and podcast for that reason. To help you out, I've put together a collection of courses I recommend at maven.com/x/aakash.
- 32:32 – 55:55
OpenAI product culture & operating system: planning, reviews, experimentation, and hiring
- AGAakash Gupta
This includes courses like AI Prototyping for PMs, Product Sense for PMs, and getting an AI PM Certification. Visit it now at maven.com/x/aakash. So agents is one topic that everybody, when I asked, "What should I ask Jake?" They were like, "Ask him all about agents." We covered that. The next second most voted topic was all about what does the product team culture look like inside of OpenAI? So if we start with the context, right?
- JBJake Brill
Yeah.
- AGAakash Gupta
How does Integrity Product fit in more broadly within the product teams at OpenAI?
- JBJake Brill
Sure. So I'm gonna be super reductive. OpenAI is a research and a product company. Uh, we have a number of PMs on the research side of the company, people, uh, who are focused on things like model behavior, like what is the pr-personality of the model? What are the vibes of the model? People who are focused on safety, th-things I was saying before. How do you, um, do research on frontier safety risks, train the model to be safe and, and not take harmful actions? And then there's the product side of the company we, um, and that's really focused on how do you take these frontier safe models and bring them to humanity? Um, and you've probably, uh, interacted with many of the products we built. You know, there's, uh, PMs working on the consumer version of ChatGPT, the enterprise version of ChatGPT. Got a team of PMs focused on, uh, third-party developers. And the way I talk about Integrity is we are a platform team that builds shared technology that helps those product teams bring our AI models to humanity. We are, we are fundamentally-- We, we build systems, products, and tools that are focused on minimizing risk and maximizing trust and control.
- AGAakash Gupta
So platform teams, I've PM'd a couple of them, managed a couple of PMs on platform teams. I feel like more than any other team, for them, like, the traditional corporate planning process was really helpful-
- JBJake Brill
Yeah
- AGAakash Gupta
... because they could get buy-in from teams who would build on their platform that they were gonna actually use it so that they could actually figure out if they were building it. Does OpenAI have a planning process like that? How do you guys get the buy-in you need to understand what platforms to build?
- JBJake Brill
Yeah. Yeah. So we do have a planning process. We deliberately try to keep it pretty light, um, because it's a very dynamic environment and a very fast-moving industry, and, you know, I-- Kevin Weil, my, my manager, our chief product officer, often, often repeats the, the Eisenhower quote, "Plans are useless, but planning is everything." It's a-
- AGAakash Gupta
Mm-hmm
- JBJake Brill
... it's a really helpful exercise for people to talk about what they think is most important, where they're gonna be spending their time. And even if you diverge from, from, um, from your plans, it's good to go in being mindful about what, what you think is most important over the n-those next three months, even though something's inevitably gonna change. So our planning process only lasts a couple weeks. We do, um, we plan, uh, f- over every three-month quarter, and often the guidance I give to the Integrity team is assume you're only gonna accomplish something like 60 to 70% of your plan. If you do anything more than that, it probably means you weren't being flexible enough to the needs of the business. If you do anything less than that, probably didn't do a great job forecasting what was gonna be most important over the quarter. But we very much write our plans in pencil, not in pen.
- AGAakash Gupta
Hmm. So you guys still do the three-month planning process, and how does that couple week sprint look like?
- JBJake Brill
Um, so it's very bottoms up. Um, there's some amount of the teams, uh, you know, for Integrity, we've got, uh, i-and the other, like, product verticals within the team, uh, within OpenAI, you've got your own, like, individual swim lanes within there. And so we, we ask the teams to, um, sort of come up with bottoms up plan for what they think is gonna be most important over the next three months. Simultaneously, we run like a, an internal dependency intake process where people say, "Hey, I want to do X. Can you help supportUh, X with this-
- AGAakash Gupta
Mm
- JBJake Brill
... version of technology. With it being a platform team, we had a, a large amount of inbound requests, prob- probably multiple times more than the average team, other average team within, um, uh, within OpenAI, both because we're building shared technology for the other product teams, but also 'cause we have really close partners on our user operations, product policy, intel investigations teams, and we build tooling and data systems for them as well. Um, so it's-- the process only takes a couple of weeks. We put together, um, pretty lightweight documents, um, that follow a template of, like, you know, reflections on the prior quarter, here's what we're tr- you know, here's the big themes running after this quarter, here's how we're gonna measure success. Um, and then we'll either review those documents synchronously or asynchronously. We, we, we aim for async, again, in the, uh, because those meetings can be expensive, so we try to see how much we can get through in common form, and if there's still some meaty topics that, um, that we, uh, we can't resolve asynchronously, then we'll j- we'll jump in a room with the relevant folks and just talk through the, the main outstanding items.
- AGAakash Gupta
And how important are success metrics as part of that equation? Because some companies I've worked at, that's like the entire currency. I worked at a fintech company. Everything ultimately it's boiled down to GMV, right? [chuckles] And what is-- I, I feel like there's so many different goals OpenAI has as a company. You probably aren't just like an OKRs-based company. How do you treat success metrics?
- JBJake Brill
So we definitely have success metrics. Um, what I'll say is the success metrics for, um, Integrity have a different shape from many other teams. You know, you can measure the success for ChatGPT in classic things like how many people are using the product, things of that nature. Um, for Integrity we, we sort of view our success metrics as are we building systems that enable the success of other products? So there are some standard metrics that, that many teams use, but for us, we think a lot about latency, uptime, maturity of our systems.
- AGAakash Gupta
Hmm.
- JBJake Brill
And really what those systems are there to do is to make sure the other products can shine. So for example, uh, I mentioned earlier our team is responsible for our identity system, so we think a lot about, like, how many nines of reliability do we have? Because, like, if people can't log into their accounts or people can't sign up for their accounts, well, OpenAI just isn't gonna accomplish its overall goals.
- AGAakash Gupta
Hmm. Okay. So sounds like each of the little platform areas might have some product metrics, but there isn't, like, some overall business output metric that an integrity team would be going after.
- JBJake Brill
Um, the, the, the top metrics for the company, we, we view ourselves in service of them-
- AGAakash Gupta
Mm-hmm
- JBJake Brill
... but, like, we, we don't, like, take goals on, on, you know, number of users of, of this product or that product-
- AGAakash Gupta
Mm-hmm
- JBJake Brill
... 'cause we're a bit, uh, downstream from that. And, and so the way we think about success is those teams are going on those metrics, and what we do is we, we ask ourselves, are we, uh, building systems that make our products, enable our products to hit their goals effectively?
- AGAakash Gupta
Hmm. So you've been a PM at Facebook for a really long time. You were director of product at Instacart. What are the things that are unique about OpenAI's product culture?
- JBJake Brill
Oh, man. Um, so with the caveat that I left Facebook like nine years ago-
- AGAakash Gupta
[chuckles]
- JBJake Brill
... so I'm sure it's a very different company now, and I left Instacart about two years ago. One of the things that's classically different about OpenAI is, uh, AI. You know, like, we are constantly asking ourselves, are we making-- are we truly dogfooding our technology? Are we using it as powerfully as we can in all parts of our product development process?
- AGAakash Gupta
Hmm.
- JBJake Brill
Um, and I, I imagine those companies are asking themselves the same questions now. Uh, they just weren't at the time when I was working there. Secondarily, you know, classic product development is, okay, who are my users? What are the problems they have? And what are the products we can build to, to solve those problems? And to be clear, we absolutely do that at OpenAI. But then there's this other pretty fascinating approach to product development is we've got the best researchers in the world building the most powerful AI capabilities in the world, and sometimes it's like, holy moly, we just had this big research breakthrough. How do we bring this capability to humanity? Which is quite different than, like, going and interviewing someone, uh, 'cause they might not even know that that capability exists and it could solve their problems. So it's, it's a very different environment through that lens, and one of the special parts of being at both a research and a product company.
- 55:55 – 1:20:55
Career journey + the PM role in five years: prototyping, evals, and empathy
- AGAakash Gupta
For sure. So what was your story for breaking into OpenAI?
- JBJake Brill
[laughs] Oh, man. Uh, I certainly wasn't expecting it. You know, I, uh, Instagram was an incredible experience. I'm like so grateful I got to work on fascinating problems. Um, I got to work with brilliant people, and I have to say, the experience of being there during COVID is like super unique, and I am forever grateful I got to experience that firsthand, as intense [laughs] and wild as it was.
- AGAakash Gupta
Yeah.
- JBJake Brill
And so in 2023, you know, I had my like whole next couple of years planned out, and it did not involve leaving Instagram. I had, um, uh, I had, uh, qualified for my recharge. My wife was about to qualify for hers at her gig, and we were gonna like time them up and take it together. And then I, um, went on pat leave. My daughter was born in the spring of 2023, and I was a couple months into pat leave when I got a message from a former colleague of mine from Facebook. We hadn't talked in years, and, uh, he was in a, a new leadership position at, at OpenAI. And I-- when I saw the message, I was like, "Oh my goodness, what a, so great to hear from you, and also, like, I'm flattered you'd, you, you'd even think of me." And I went into the call being like, "It'll be great to catch up with him, and I'll sort of like courtesy say, and I'll say, 'Thanks, but no thanks,' and like, 'Great to see you.'" And I got off the call after thirty minutes, I was like, "Oh my goodness, this is, this is kinda mind-blowing," the sort of stuff that we were talking about. And I reached out to a buddy of mine who was working in the sort of trust and safety space at the time at the company. We spoke for an hour. And I very vividly remember that night lying in bed, and my mind was racing, and I could not sleep. I had so many thoughts racing through my head. It was like, "Wow, the scale of the problems, the sophistication of the problems that this company is trying to solve, um, is mind-blowing. The fact that this company is talking about safety so f- uh, s- so regularly in a way that they want it to be part of their brand and their products is, is really inspiring." Um, and, and like the fact that they even want to talk to me [laughs] in the first place was like deeply flattering because it's OpenAI. They can talk to anyone, you know? And I remember b- maybe it was because my daughter had just been born. I, I, I, I was being very circumspect to like s- in thinking about being like, "This kind of feels like what I've been working towards my whole career," and like, "What sort of world am I bringing my daughter into? And like, w- what role am I gonna play in that world?" And, and I did a lot of thinking. I was like, "I think this is the most meaningful thing I can be doing with my time, if I'm lucky enough to even have e- more conversations with the company." And so I spent a couple months in the summer of '23. You know, the, the shape of the job changed a couple times. I think there were like three or four different hiring managers over the course of the process.
- AGAakash Gupta
Mm.
- JBJake Brill
Um, and when they made the offer, I was like, "Uh, yeah." [laughs]
- AGAakash Gupta
[laughs]
- JBJake Brill
Like, "Who, who, who am I to turn this down?" So it's, it was not something that was on my radar in, in the slightest, and it just so happened that, you know, someone thought of my name from having worked together years ago. But you know, they, it was a, it was a very demanding interview process, and I think on some level, the fact that I was on pat leave really helped because I was either like changing diapers [laughs] or I was preparing for an interview, and I did that for like two months, and I, I think that was really helpful for it, it ultimately turn, turning out to be a, a hire decision on the company's behalf.
- AGAakash Gupta
Oh, wow. Two months of preparation. And it sounds like you took this intensely, like seriously.
- JBJake Brill
I took it very seriously because, you know, um, a- as I said, it sort of felt like I had been w- working my whole career to get to someplace like this. Do, do-- You know, I've-- I was never hired to be the first trust and safety PM at Facebook or Instagram. It just so happened that I ended up being the first trust and safety PMs at those companies. Like opportunities presented themselves, the org changed in such a way that, that, um, that ended up being the role I filled. And so this was the first time I was talking with a company and they were like, "We'd actually explicitly like you to join for this job."
- AGAakash Gupta
Mm.
- JBJake Brill
And so, uh, I figured in joining in that context, I just really wanted to make sure I understood, so what are the types of problems the company was thinking about? And you know, I was by no means an expert on this sort of like what are the safety risks in AI models. And so I just wanted to make sure I put my best foot forward for those conversations.
- AGAakash Gupta
Mm. I think that's really important advice, though, for people. When you get those really awesome interview opportunities, [laughs] really s- do everything you can. You may not be an exact expert in their industry, but you can be humble and learn also. So that's really, really inspiring. So you spent two months on it. If I were to reverse engineer though, I think part of it is then that you had positioned yourself really well in your career to be that first trust and safety PM at Instagram and Facebook because they wanna hire the best in the world. So you had that opportunity to create those career experiences. If you look back, what were the essential ingredients, maybe we can go story by story-
- JBJake Brill
Sure
- AGAakash Gupta
... that positioned you for those roles?
- JBJake Brill
You know, if, uh, I moved to San Francisco almost 20 years ago. Um, it'll be 20 years next week. AndIf you had told me 20 years ago that this is what I'd be doing, I, I... That, that wouldn't have made any sense to me because I didn't study this stuff. I, I, I studied, like I, I, I s- studied at a small liberal arts college in the Northeast. I majored in psychology and Latin American studies. [laughs]
- AGAakash Gupta
Oh, wow.
- JBJake Brill
And when I graduated, like what I wanted to do was go into the mental health field. And when I moved out here 20 years ago, it was to establish my residency so I could apply to get an advanced degree in psychology at a UC. And so, um, I was like full steam ahead on psychology and mental health when I came out here. And the story of me getting to Facebook was basically I'd spent a couple of years working in the mental health space and had a quarter-life crisis and I was like, "I don't think this is the right thing for me." I'm like not super happy. I don't feel fulfilled in the way I thought I would. What am I gonna do with myself? And it just so happened, um, I had a... Some of my really good friends from my first couple of years living in San Francisco were in the customer support team at Facebook, and they loved it, and I was a huge fan of the product. I had moved from the East Coast and left a lot of my best friends behind, and Facebook was this lifeline to keeping in touch with them. And so when I went through the interview process, it was not 'cause I knew I wanted-- I didn't even know what a product manager was, but I thought, you know, worse comes to worse, I do this for a little while and I realize, "Oh, actually I do wanna go back into mental health." And this was a good way to, to, you know, just like the grass isn't always greener. Um, and I ended up joining in customer support and it was like a, you know, not a super glamorous job. My first year I was doing a lot of really like, you know, it was like changing people's passwords, approving their name changes, stuff like that.
- AGAakash Gupta
Mm.
- JBJake Brill
Um, and towards the end of my first year, I end up moving into a role, uh, leading our, our privacy team in, in support and that team-- When we had our first phishing attacks on the site, our team was sort of at the front of the operational response. And, um, when we created the site integrity team at Facebook, we should show you a little bit of how long ago. It was called Site Integrity 'cause all Facebook had was a website at that point.
- AGAakash Gupta
Oh, wow.
- JBJake Brill
And, uh, I applied for a role to be a project manager on the team, but like I didn't know what a project manager was, and Facebook certainly didn't. I was the first project manager at the company, and if I... You know, ultimately it was like I, you know, I worked hard my first year and I, I like put myself in a position to be in a leadership role, but I never would have guessed it was like, oh, because like there's gonna be a phishing attack. I didn't even know what that was. And so when I moved into project management, I, I like was sort of figuring out on the fly what I was doing. And for my first year, I don't think what I did was actual project management. I think it was like, it was like coordinating our response to spam and malware and, and, and phishing attacks, and that had a lot of different shapes and it was like, just do what's necessary, and that could be like training up some of our support counterparts in how to do certain workflows or, um, uh, like making sure like, you know, this engineer's working on this fire and this engineer's working on this fire. [laughs] Sometimes it was just like running command line scripts to clean up a bunch of bad stuff.
- AGAakash Gupta
Mm.
- JBJake Brill
And after about a year, it became clear that we had to stop being so reactive. We had to start building user-facing products to help explain like, "Hey, your account got hacked. Here's what you do." And then I was like, "Oh, I think we need to start building products." And, but like that wasn't my job and I didn't know how to do it. And I remember I was chatting with, uh, one of my, um, counterparts, um, and he gave me some of the best advice I ever got. And I haven't talked to him for years, but Dan Levy, if you're listening, if this makes its way to you, thank you for the advice.
- AGAakash Gupta
[laughs]
- JBJake Brill
I've given it to many other people and it was, "If there is a job that you're interested in and there is an opportunity to do that job, even informally, just start doing it and good things will happen." And so I kind of in a de facto fashion just started being the product manager for the site integrity team and, you know, the products I, I, I built were of varying quality. Some of them we turned off pretty quickly, and some of them are still there in some form. Like if you ever, uh, you know, if, if you ever log in from somewhere suspicious and you get asked to verify your identity, you can blame me. I sort of built the first version of that up for Facebook.
- AGAakash Gupta
Oh, cool.
- JBJake Brill
And I remember I'd been doing like de facto product management for a while, um, when, uh, like a, a new lead for the integrity team joined and I was like, "Hey, I've been doing this job. Like can I like formally become a PM?" And I remember having some conversations and, and the answer was like, "Jake, we think you're smart, but like you've never had any formal training or mentorship with this. Like, let's give you some of that before you actually transition to PM." And yeah, my ego was bruised a little bit, but like, boy howdy were those people correct and, you know, I learned a lot. Um, I had two really important mentors, uh, these two gentlemen named Dan Rubinstein and Arturo Bejar, and they like really invested in me, helped me learn a lot about product craft, strategic thinking, how to bring teams along with your vision and, you know, by, you know, I don't know, maybe like six, seven months later, I went through the interview process and I think if I'd gone through it when I originally wanted to, I would've just fallen flat on my face. But because I got more hands-on mentorship, that set me up to be successful when I interviewed. And then when I interviewed and ultimately passed it, I think I was the first person to ever tran- transfer internally, interview and move over to PM at Facebook, and at that time there were maybe like 30 PMs at the company. And so when I think about what like put me in a position to be, to move into PM there, it was a lot of just like rolling up my sleeves and doing what the business needed. And then also being open to coaching, um, and finding mentors who were really willing to invest in me, and then learning from where I did well and where I didn't do well. Um, yeah, so that's, that's my Facebook story. It was, it was great, but like I never would've guessed, uh, when I started at Facebook that it would've ended up being a PM in the integrity space.
- AGAakash Gupta
Facebook's first internal product management hire. That's amazing. That was probably a lot v- much harder to do than the average move. So when we think about like taking on additional responsibilities, I think the first thing that probably people is popping into their head is like, "But I'm already working 40, 50 hours a week. How do I take on additional responsibilities?" What did that look like for you?
- JBJake Brill
Um, it meant working more hours in some cases.
Episode duration: 1:21:05
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode j4gdMHYfHaE
