Karina Nguyen: Why soft skills will be the new moat

Through synthetic data, evals, and post-training behind Canvas and Tasks; creative skills, taste, and reasoning become the human edge as models flatten code.

Lenny RachitskyhostKarina NguyenguestChristina Cacioppoguest

Feb 9, 20251h 14mWatch on YouTube ↗

EVERY SPOKEN WORD

135 min read · 27,179 words

0:00 – 4:42
Introduction to Karina Nguyen
1. LRLenny Rachitsky
  Not only are you working at the cutting edge of AI and LMS, you're actually building the cutting edge.
2. KNKarina Nguyen
  When I first came to Andara and I was like, "Oh no, I would have gone from an engineering." And then the reason why I switched to research is because I realized, oh my God, cloud is getting better at front ends. Cloud is getting better at, like, coding. I think cloud can, like, develop new apps.
3. LRLenny Rachitsky
  What skills do you think will be most valuable going forward for product teams, in particular?
4. KNKarina Nguyen
  Creative, thinking. You kind of want to, like, generate a bunch of ideas and, like, filter through them and just build the best product experience. I think it's actually really, really hard to teach the model how to be aesthetic or really good with visual design, or like how to be extremely creative in the way they write.
5. LRLenny Rachitsky
  What do you think people most misunderstand about how models are created?
6. KNKarina Nguyen
  When you taught the model some of the self-knowledge of you actually don't have a physical body to operate in the physical world, the model would get like extremely confused.
7. LRLenny Rachitsky
  (Intro music) Today, my guest is Karina Nguyen. Karina is an AI researcher at OpenAI where she helped build Canva, Tasks, the o1-chain-of-thought model and more. Prior to OpenAI, she was at Anthropic, where she led work on post-training and evaluation for the Claude 3 models, built a document upload feature with 100K context windows and so much more. She was also an engineer at New York Times, was a designer at Dropbox and at Square. It's very rare to get a glimpse into how someone working on the bleeding edge of AI and LMS operates, and how they think about where things are heading. In our conversation, we talk about how teams at OpenAI operate and build products, what skills she thinks you should be building as AI gets smarter, how models are created, why synthetic data will allow models to keep getting smarter, and why she moved from engineering to research after realizing how good LMs are gonna be at coding. If you enjoy this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube. It's the best way to avoid missing future episodes, and it helps the podcast tremendously. With that I bring you Karina Nguyen. This episode is brought to you by Interpret. Interpret unifies all your customer interactions from Gong calls to Zendesk tickets to Twitter threads to App Store reviews, and makes it available for analysis. It's trusted by leading product orgs like Canva, Notion, Loom, Linear, monday.com and Strava to bring the voice of the customer into the product development process, helping you build best-in-class products faster. What makes Interpret special is its ability to build and update customer specific AI models that provide the most granular and accurate insights into your business, connect customer insights to revenue and operational data in your CRM or data warehouse to map the business impact of each customer need and prioritize confidently, and empower your entire team to easily take action on use cases like win-loss analysis, critical bug detection, and identifying drivers of churn with Interpret's AI assistant, Wisdom. Looking to automate your feedback loops and prioritize your roadmap with confidence like Notion, Canva, and Linear? Visit E-N-T-E-R-P-R-E-T dot com slash LENNY to connect with the team and get two free months when you sign up for an annual plan. This is a limited time offer. That's interpret.com/lenny. This episode is brought to you by Vanta, and I am very excited to have Christina Cacioppo, CEO and co-founder of Vanta joining me for this very short conversation.
8. CCChristina Cacioppo
  Great to be here. Big fan of the podcast and the newsletter.
9. LRLenny Rachitsky
  Vanta is a longtime sponsor of the show but for some of our newer listeners, what does Vanta do and who is it for?
10. CCChristina Cacioppo
  Sure. So we started Vanta in 2018, focused on founders, helping them start to build out their security programs and get credit for all of that hard security work with compliance certifications like SOC 2 or ISO 2701. Today, we currently help over 9,000 companies, including some startup household names like Atlassian, Ramp, and LangChain, start and scale their security programs, and ultimately build trust by automating compliance, centralizing GRC, and accelerating security reviews.
11. LRLenny Rachitsky
  That is awesome. I know from experience that these things take a lot of time and a lot of resources, and nobody wants to spend time doing this.
12. CCChristina Cacioppo
  That is very much our experience, both before the company and to some extent during it. But the idea is with automation, with AI, with software, we are helping customers build trust with prospects and customers in an efficient way, and you know our joke, we started this compliance company so you don't have to.
13. LRLenny Rachitsky
  We appreciate you for doing that. And, you have a special discount for listeners, they can get $1,000 off Vanta at vanta.com/lenny. That's V-A-N-T-A dot com slash LENNY for $1,000 off Vanta. Thanks for that, Christina.
14. CCChristina Cacioppo
  Thank you!
4:42 – 8:21
Challenges in model training
1. CCChristina Cacioppo
2. LRLenny Rachitsky
  Karina, thank you so much for being here. Welcome to the podcast.
3. KNKarina Nguyen
  Thank you so much, Lenny, for inviting me.
4. LRLenny Rachitsky
  I'm very excited to have you here because not only are you working at the cutting edge of AI and LMS, you're actually building the cutting edge of AI and LMS. You recently launched this feature, which basically, uh, the first agent feature of OpenAI. I also just did this survey, I don't know if you know about this. I just this- I did a survey of my readers and asked them what tools do you use every day in your work, and most used, and ChatGPT was number one above Gmail, above Slack, above anything else. 90% of people said they use ChatGPT regularly.
5. KNKarina Nguyen
  (laughs) . That's crazy.
6. LRLenny Rachitsky
  It's- it's absurd and it wasn't around two years ago.
7. KNKarina Nguyen
  Yeah.
8. LRLenny Rachitsky
  Uh, also we're recording this the week that OpenAI announced Stargate, which is this half trillion dollar investment in AI infrastructure. So there's just, like, a lot happening, uh, constantly in AI and you have a really unique glimpse into how things are working, where things are going, how thing, how work gets done. So, I have a lot of questions for you. I want to talk about how you operate and how you work at OpenAI, where you think things are going, what skills are gonna matter more and less in the future, and also just where things are going broadly. So, how does that sound?
9. KNKarina Nguyen
  Sounds great. Thank you so much. Um. Yeah. I was extremely lucky.... to join early days on topic and kinda learned a lot of things, uh, there and, and I joined OpenAI around, like, eight months ago. So yeah, I'm excited to dive more into it.
10. LRLenny Rachitsky
  Okay. I'm gonna definitely ask you about the differences between those-
11. KNKarina Nguyen
  Yeah.
12. LRLenny Rachitsky
  ... but I wanna start more technical and just dive right in. I wanna talk about model training. People always hear about models being trained, this, these big models, how much data it takes, how long it takes, how much money it costs, it takes how, uh, how we're running outta data, which I wanna talk about. Let me just ask you this question. What do you think people most misunderstand about how models are created?
13. KNKarina Nguyen
  Model training is more an art than a science. And in a lot of ways, like, we as, like, model trainers think a lot about, like, data qualities. Like, it's one of the most important things in model training is, like, uh, how do you ensure the highest quality data for certain, like, interaction, um, model behavior that you want to create? But the way you debug models is actually very similar the way you debug software. Um, so one of the things that I've learned early days at on topic was like, we've discovered especially with, like, cloud three training when you taught the model some of the self-knowledge of like, "Hey, like, you actually don't have a physical body to operate, like, in the physical world." But then at the same time, we had data that kind of taught the model, um, some of the function calls, which is like, "This is how you set the alarm." And so then the model would get, like, extremely confused, uh, about, like, whether it, it can set an alarm and th- but it doesn't have a body in the physical, uh, world. So it's like the model gets confused and sometimes it'll, like, over-refuse. So sometimes it says, like, "I don't know." Like, uh, "Sorry, I cannot help you." And so there is always, like, a balanced trade-off between, uh, how do you make the model to be more helpful for users? But also, um, not being harmful, uh, in other scenarios. And so it's always about, like, how do you make the model, like, more robust and, like, operate across, like, variety of diverse scenarios.
14. LRLenny Rachitsky
  That is so funny. I never thought about that. Most of the data that it's trained on is kind of, like, assuming it's like a human describing the world and how they operate, and there's... But it assumes there's a body and you can do things and the model's told, "You don't have a body."
8:21 – 12:38
Synthetic data and its importance
1. KNKarina Nguyen
  Yeah.
2. LRLenny Rachitsky
  Uh, okay. I wanna talk a little bit about data while we're on this topic. I know you have strong opinions here. There's kinda this meme that models are gonna stop getting smarter because they're running outta data. They're trained in a large part on the internet and there's only one internet and they've already been trained on it. What more can you show them about the world? And there's this trend of synthetic data, this term synthetic data. What is synthetic data? Why do you think that's important? Do you think it's gonna work?
3. KNKarina Nguyen
  I think there are two questions here. Um, we can unpack one at a time. But, uh, people say they're hitting the data wall. Uh, I think people think more in the terms of, like, pre-trained large models that are trained on a bi- a lot on the entire internet to predict the next token. But what actually the model is learning during that process is actually how do you compress the compression algorithm here? The model learns to compress a lot of knowledge, and it learns how to model the world. Um, so the next prediction of the word, like, teach me how to drive basically, and you only have, like, a few words that will match that. A car. So the model actually learns, um, about the world, uh, in, in itself. So it's like it's modeling human behavior, sometimes it's modeling... And when you talk to, like, pre-trained models, which are very, very large, they're actually extremely diverse and extremely creative because you can talk to almost any Reddit user through pre-trained model. But I think what's happening right now with, like, new, uh, paradigm of, like, 01-series is of, like, the scaling in post-training itself is not hitting the wall, and that's because basically we, we went from, like, raw datasets from, from pre-trained models to infinite amount of tasks that you can teach the model in the post-training world via reinforcement learning. So any task, for example, like how to search the web, how to use the computer, how to write a wall. Like, all sorts of tasks that you're, like, trying to teach the model all the different skills. And that's why we're saying, like, there's no data wall or whatever because there will be infinite amount of tasks, and that's how the model becomes extremely super intelligent and we are actually getting saturated in all benchmarks. So I think the bottleneck is actually in evaluations. Uh, that we don't have, um, all the frontier like EVAs, like, I don't know, um, GPKA, which is, like, a Google Proof Question Answering, like, PhD level intelligence. The benchmark is, like, getting to, like, I don't know, more than, like, 60, 70%, which is what PhD gets. Uh, so it's like we- we are literally hitting the wall in, like, EVAs.
4. LRLenny Rachitsky
  I wanna follow up both those threads. So the first is on this idea of synthetic data. Is a simple way to understand it that the models are generating the data that future models are trained on, and you ask it to generate all these ways of doing stuff, all these tasks as you described? And then the newer model's trained on this data that the previous model generated?
5. KNKarina Nguyen
  Some tasks are synthetically curated. So this is, like, an active, like, research area is, like, how do, can you cons- uh, synthetically construct, like, new tasks for the model to, like, learn? Sometimes, you know, like, when you develop products, you get a lot of, like, data from the product and, like, user feedback, and you can use that data too in, like, uh, this, like, post-training world. Um, sometimes you still want to, like, use, like, human, um, human data because, uh, actually some, some of the tasks can be, like, really, really hard, uh, to, like, teach. Um, like, li- like, experts, like, only know, like, certain knowledge about, like, some chemicals or, like, biological knowledge. So, like, you actually need to tap into, uh, the expert, uh, knowledge a lot. So-Yeah, I think, to me, like, synthetic data training is more, um, for, like, product... It's, like, a rapid model iteration for similar product outcomes.
12:38 – 18:33
Creating Canvas
1. KNKarina Nguyen
  And we can dive more into, but, uh, the way we made Canvas and Tasks, and, like, new, like, product features for ChatGPT was mostly done with- by synthetic training.
2. LRLenny Rachitsky
  Let's actually get into that. That's really interesting. I wanna talk about evals, but let's follow that thread. So talk about w- how this helped you create Canvas.
3. KNKarina Nguyen
  So when I first came to OpenAI, I really had this idea of like, okay, like, it would be really cool for ChatGPT to actually, like, change the visual interface, but also sh- change, like, the way it is with people. So going from, like, being a chatbot to more of a collaborative agent and a collaborator, it is like a, it's like a step towards, like, more agentic systems, um, that become, like, innovators ultimately. And so the, the entire team of, like, applied engineers, designers, product, like, research, kind of, like, got, like, formed, uh, in the air, uh, almost out of, like, nothing. It's just, like, a collection of people who just, like, got together and we rapidly started iterating with each other. Actually, like, Canvas was, like, like one of the, I would say, like, the first project at OpenAI where researchers and applied engineers started working together from the very beginning of the product development cycle. And I, I think, like, there is a lot of things that we have learned on the way, but, um, I definitely came to, uh, with the mindset of, like, we need to do, like, really rapid model iteration such that, like, it'll be much easier for engineers to, you know, work with the latest model possible, but also learn from, like, user feedback or, like, early, like, internal dog food, uh, how do we improve the model very rapidly. And, you know, it's really hard to, like, um, kind of, like, figure out, like, how people, when you deploy a product, how people would be able to, like, use it. And so, like, the way you synthetically train the model is basically figuring out, like, what are the most core behaviors that you want th- this product feature to do? And for Canvas, for example, uh, it was... It came down to, like, three main behaviors. It was how do you trigger Canvas for prompts like, "Write me a long essay," when the user intention is mostly, like, iterating over long documents, or, "Write me a piece of code?" Or when to not trigger Canvas for prompts like, "Can you tell me more about president," like, I don't know, um, so- some of the general questions. So you don't want to, like, trigger Canvas because the user intention is mostly getting answer, not necessarily, like, iterate, uh, always a long document. The second behavior is, um, how do you, how do we teach the model to update the document when the user asks? So one of the behaviors that we, uh, taught the model is actually have, like, the auto- some agency and autonomy to literally go to the document and, like, select s- specific sections and either delete it or edit, so highlight it and rewrite certain sections. So sometimes the model... Sometimes the user would just, like, say, "Change the second paragraph to be something friendlier," and we would have to, like, teach the model to literally find the second paragraph in the document and change it to a friendly tone. So basically, you teach both, like, how to trigger, like, a edit itself, but also how do you teach the model to get higher quality edit, uh, for that document. In case of, like, coding, for example, uh, there's also, like, the question of, like, how good the model is at, like, completely rewriting the document versus, like, w- having very specific targeted edits. So that's, like, another, like, layer of decision boundary within, like, edit itself, is, like, select the entire document and, like, rewrite completely, or you want to, like, have very targeted custom behavior. And, you know, like, when we first launched the model, we would bias the model towards, like, more rewrites, because we saw the quality of the rewrites were, like, much higher. But over time, we were, like, kind of shifting based on, like, user feedback and what they are learning from iterative deployment. Lastly, this... The third behavior that we taught synthetically the model is how to make comments on any document. So, the way we used it is, like, we would use a one model to produce, to, like, simulate, like, user conversation. Let's say, like, "Write me a document about XYZ." Then we used 01 to, like, produce the document. And then we kind of injected, like, user prompt to be like, "Oh, make some comments, critique my piece of writing," uh, or, "critique this piece of writing that you just made." Um, and then we taught the model to, like, make comments on the document on, like, very specific targeted documents. This is, like, also, like, what kind of comments you want the model to make. Like, d- do they make sense or not? Like, how do you teach the quality of that? Um, and it all came down to, like, measuring progress via very robust evals. But yeah, this is how you, like, use, like, 01 and, like, kind of synthetic data generation for, like, this training.
4. LRLenny Rachitsky
  Okay. This is so interesting. Uh, so you talk about this idea of teaching the model, and you mention how it's using synthetic data to teach the model different behaviors. Is a simple way to think about it basically that's where you... You do that by, uh, showing it what success looks like, using basically evals? Is that the simple way to think about it? Like, here's what you doing this successfully would look like, and that teaches it, "Okay, I see. This is what I should do."
5. KNKarina Nguyen
  Yeah, I agree. Yeah. Amazing. Yeah, you got it. (laughs)
6. LRLenny Rachitsky
  Okay. Got
18:33 – 20:28
Day-to-day operations at OpenAI
1. LRLenny Rachitsky
  it. Um, I wanna start unpacking what your day-to-day looks like as you're building these sort of things. Is it like you sitting there-... uh, talking to some version of ChatGPT, uh, crafting these evals?
2. KNKarina Nguyen
  Sometimes I do that. Sometimes I do sit with ChatGPT. (laughs) Actually, I think I learned this so much from Antartic is, like, people spend so much time just, like, prompting models and, like, qualitative back bash all the time, and you actually get a lot of new ideas of how do you make the model, uh, better. It's like, "Oh, like, this is ... This response is kind of weird. Like, why is it doing this?" And you start, like, debugging or something, or, like, you, you start, like, figuring out, like, new methods of, like, how do you teach the model to respond s- in a different way, like, have better personality, let's say. So it's the same thing of, like, how personality is made, like, in the models within those. It's, like, very similar methods. But yes, I, I think my time at OpenAI have changed. I think when I first came, I was, like, mostly, like, research IC work. Uh, so I was, like, building a lot of, like ... I was, like, writing code, like, you know, chaining models, writing evals, working with PMs and, like, designers to, like, learn, teach them how to, like, even think about, like, evaluations. I think that was, like, really cool experience. And I think this was, like, an a- a- an adoption of, like, how do we, like, do this, like, product management of, like, AI features or, like, um, AI models. Um, yeah, but now it's, like, mostly, like, you know, like, management and, like, mentorship. Um, I'm still, like, doing IC, like, research code after, like, 4:00 PM although, but, um, yeah, it just kinda, like, changed.
3. LRLenny Rachitsky
  All right. Don't talk too much about being a manager-
4. KNKarina Nguyen
  (laughs) Okay.
5. LRLenny Rachitsky
  ... because everyone's firing their managers. Who needs managers anymore? That's the, what I hear now. Just kidding.
20:28 – 23:22
Writing evaluations
1. LRLenny Rachitsky
  It's interesting that so much of your time was spent on teaching product teams how evals integrate and how important that is. And I've heard this a few times, and I haven't personally experienced it yet, so I think it's an important thread to follow is just how writing these evaluations is gonna become increasingly an important part of the job of product teams, especially when they're building AI features and working with LLMs. So can you just talk a bit more about what that looks like? Is it, like, sitting there with an Excel spreadsheet basically showing, like, here's the input, here's the output, here's how good the result was? Talk about what that actually looks like very practically.
2. KNKarina Nguyen
  Certainly depends, like, on the, what you're developing. But, uh, there are various types of, like, evaluations. So sometimes I do ask product managers or, uh, there's also, like, new role that we have, like, model designers to, um, kinda, like, go through some of the user feedback maybe or, like, think of, like, various, like, user conversations that should have triggered ... Like, under these circumstances, it should trigger Canvas. And then you have this, like, ground truth label of like, okay, with this conversation, it should, like, trigger Canvas. Under this conversation, it should not trigger Canvas. And you have this, like, very bin- deterministic kind of, like, eval that for, like, decision about a behaviors is like this. Uh, when we were launching tasks, for example, like, how do you make correct schedules is, like, actually really hard for the model. But we, we, we built out, like, some of the deterministic evaluations that it's like, okay, like, if the user says, like, "7:00 PM," it should, like ... The model should say, "7:00 PM." So you can, like, have a deterministic evolves whether it's, like, pass or fail. Um, so yeah. And, like, the way it works is that, like, sometimes I ask product managers to just, like, go create, like, a Google Sheet, like, have different tabs and, like, um, what's the current behavior, what's, like, the ideal behavior, and, like, why or, like, some notes. And sometimes we usually use it for eval. Sometimes, uh, we use it for training. Because, like, if you give the spreadsheet to, like, a one model, it can probably figure out, like, how to, uh, teach itself good behavior. (laughs) And I think there are second type of, like, evals that is kind of more prevalent is, like, uh, human evaluations. And you can have specific trainers or you can have, like, internal people to, um ... Wha- when you have, like, a conversation of the prompt and then you have, like, various completion of models, you kinda choose the win rate. Which model is the best? Which model produce the highest quality comment or edit? And then you can have, like, continuous win rates. And as you develop new models, it should always, like, win over the previous models. So, um, it depends on what you want to measure.
23:22 – 26:57
Prototyping and product development
1. KNKarina Nguyen
2. LRLenny Rachitsky
  So interesting. Like, basically what I'm hearing and this is something I'm learning about as I talk to people is, product development start- might move from this, like, here's a spec PRD, let's build it together. And then cool, let's review it. Are we happy with this too? From that to, "Hey, AI, build this thing for me, and here's what correct looks like, and I'm spending all my time on what does correct look like on evals essentially."
3. KNKarina Nguyen
  You definitely want to, like, measure progress of your model, and this is where evals is, is because, like, you can have prompted model as a baseline already. And, uh, the most robust evals is the one where prompted baselines, uh, get the lowest score or something. And then because then you know, like, oh, if you trained a good model, then it should, like, just, like, hill climb on that eval all the time while not, like, also, like, regressing on, like, other intelligent evals. So it's, like, I think it's more a ... That's, that's what I'm saying, like, it's more of an art than science. It's like, okay, like, if you optimize the model for this behavior, like, you kinda don't want to, like, bring damage in, like, other areas of intelligence or... And this is happening, like, all the time in every lab, in every, like, research team. Um, I would say, like, prompting is, like, also a way to, like, prototype, like, new product ideas. Like, early days at Antartic when I was working on, like, file uploads feature, I remember I was just, like-... you know, prompting the model, um, to just, like... I mean, when we were, like, launching, like, 100 key contexts, I was just, like, prototyping this in the local, local browser, which I did the demo. Like, people really, really loved it and they just, like, wanted, like, API for, like, file uploads or something. And then that's when it clicked to me, like... I also, like, wrote a blog post a, a long time ago. Like, it clicked for me, like, prompting is, like, a new way of, like, product development or, like, prototyping for designers and for, like, product managers. For example, one of the features that I wanted to do is, like, have a personalized, uh, recommended, um, personalized starter prompts. So whenever you come to, like, Claude, like, it should, like, recommend you, like, starter prompts based on what your interests are. And so, like, you can literally do it, like, prompting for that. Mm-hmm. To experiment with that. And I guess, like, another feature was, like, generating titles for the conversations. It's a s- very small, like, micro-experience that I'm real proud of. Uh, the way we did that was bec- we, we took, like, five latest conversation from the user and, like, asked the model, like, what's the style of the user? And then, like, for the next kind of new conversation, the generated title will be of the same, like, style as the user. So it's, like, really little, like, micro-experiences like this, um- That's so cool. I- ... might happen. Yeah. Did you do that at Anthropic or at OpenAI? At Anthropic. Okay, cool. Yeah. I love the file upload feature that Claude has, by the way. Oh, ChatGPT doesn't have that yet, is that right? Um, I think it has. I think, like, the way- Somewhere in the middle? ... it's implemented is, like, very different though. Okay. Maybe it's the PDF feature 'cause I use- Yeah. ... it all the time with Claude. Okay. That's cool. Someone needs to get on that. Uh, man, it's wild how many features you build that I use every day and that many people use every day. This prototyping point you made is really important. It's something that comes up a ton on this podcast also, of how that is maybe the way that AI has most impacted the job of product builders recently, is just prototyping. Instead of going from showing just like, "Here's a PRD. Here's a design," PMs more and more are just, "Here's the prototype of the idea that I have and it's working. You can play with it." Yeah. Yeah. Okay.
26:57 – 33:34
Building Canvas and Tasks
1. KNKarina Nguyen
  I want to spend a little more time on how you operate. So you talked about you built this and launched this tasks feature. Is that, is that the way you describe your tasks? Yeah. So talk about how that emerged, and let's better understand just how you collaborate with product teams and how OpenAI works in that way, whatever you can share there. I think Canvas and tasks are, uh, going to the bucket of (?) , where it's, like, more, like, short or, like, medium terms. And, um, actually, the way Canvas and tasks came a- about to be was, like, it started as, like, one person prototyping and creating, like, a spec. It's kind of like PRD. It's like creating a spec of, like, the behavior of the model. I don't think, like, tasks is, like, extremely, like, groundbreaking, gr- groundbreaking feature unnecessarily. What makes it, like, really cool is because the models are so general, model can now search. They can, like, write sci-fi stories. They can, like, search for stocks. They can, like, summarize you news every day. Because the models are so general, like, giving something familiar to people that, like, you know, notifications is, like, very familiar. Like, having reminders is, like, very familiar. So, like, creating, like, a form factor for the people who are, like, very familiar. Same with, like, Canvas, right? It's like Google Docs is very familiar. Uh, but then you add, like, magical AI moment and it's like it becomes, like, very powerful. But the way it comes, usually, like, oh, like, operationally, like... Yeah, it starts as, like, a prototype, like literally prompted prototype of, like, how you would want, like, the model to behave. For, like, tasks, for example, like, you kind of, like, need to design... A little bit of, like, design, design systems, design thinking is like, okay, like, well, if the mo- if the s- if the user says, like, um, "Remind me to go to lunch, like, at 8:00 AM tomorrow," okay, what do, what kind of information does the model need to extract from that prompt in order to create a reminder? Mm-hmm. And so this is how you, like, like, design, like, a, a spec for, uh, a new feature, like a tool. Canvas and tasks are all tools. So it's like, how do you, like, create the tool spec? And then there's, like, mostly, mostly, like, like, uh, developing JSON schema. I was like, "Okay, like, from this prompt, maybe the model should extract, like, the time that the user requested." And then you just think about, like, which, which format do you want the time to be? And then, like, how do you want the model to, like, notify you is, like, basically i- if the user should give instruction to the model, uh, and then this instructions would, like, fire off, like, every day or something at that, that particular time. Uh, so for example, if you say, like, search... Like, every day I want to, like, learn, know about the, um, latest AI news, um, the model should rewrite into, like, "Okay, like, search for the latest AI news." And this will, will... This task will get fired at that particular time that the mo- that the user requested. And then, you know, like, your design is, like, tool spec, and then... Actually, I don't know. Like, I feel like sometimes, like, uh, it's like through conversations I, like... Either, like, people ask me to, like, join the investing, like, team and they're like, "Oh, my God. Like, we need researchers," or, like (laughs) "Uh, we need, like, some support." Like, "We need, like, to train the models." Or sometimes, like, for... With Canvas, it's, like, mostly, like, I just pitched the idea. Like, it got staffed quite immediately during the break. Um, so I feel like it, it's, like, depends on the project. Mm-hmm. And usually with staffing it's, like, mostly, like, a product manager, um, model designer, actual product designer, re- a couple researchers and a bunch of, like, applied engineers. Mm, depends on the complexity of projects. And then, like, you know, it take... It took... For Tasks, it took, like, I don't know, like, two months or so to go from, like, zero to one, basically. Oh, wow. Um-... for canvases with, like, four, five months, I guess, um, to go from zero to one. But, uh, yeah, and then, like, you know, you teach product managers how to, like, build evals, and, like, maybe, you know, how, how do we not only, like, ship, uh, the better feature, but, like, how do we think, like, m- more longer term? Like, what kind of, like, cool features did you want tasks to have? Like, I think it would be nice for tasks to be, like, extreme, little bit more personalized. It'd be nice to have, like, to create tasks via voice and on a mobile, right? Like, so you kinda need to, like... This is how you get, like, research ro- roadmap right here, is, like, thinking, like, how the feature will be developed in the future. And then from there, well, it's like, you, like, start creating datasets, like, uh, with evals. You wanna make sure that goes well, and then, like, you, mm, need to have, like, a tradeoff between, like, what methods you want to use. And the reason why I really love, like, synthetic, like, relying purely on synthetic data instead of, like, collecting data from humans is because it's, like, much more scalable, it's cheap, doesn't have, like... You literally sample from the model, and you teach the core behaviors of the models, and that will generalize, um, to all sorts of d- diverse coverage. And when you launch the beta feature, you learn so much from the users that you can, like... All your synthetic sets can be, can be shifted in the distribution of how the users behave in the, on the product behavior, and this is how you improve. Uh, and this is what happened with Canvas too when we launched-
2. LRLenny Rachitsky
  Okay.
3. KNKarina Nguyen
  ... from beta to GA.
4. LRLenny Rachitsky
  This episode is brought to you by Loom. Loom lets you record your screen, your camera, and your voice to share video messages easily. Record a Loom and send it out with just a link to gather feedback, add context, or share an update. So, now you can delete that novel-length email that you were writing. Instead, you can record your screen and share your message faster. Loom can help you have fewer meetings and make the meetings that you do have much more productive. Meetings start with everyone on the same page and end early. Problem solved, time saved. We know that everyone isn't a one-take wonder when it comes to recording videos, so Loom comes with easy editing and AI features to help you record once and get back to the work that counts. Save time, align your team, stay connected, and get more done with Loom. Now part of Atlassian, the makers of Jira. Try Loom for free today at loom.com/lenny. That's L-O-O-M dot-com/lenny.
33:34 – 35:36
Understanding the job of a researcher
1. LRLenny Rachitsky
  Something that I wanna help people understand, and I don't even 100% understand this, is what's the simplest way to understand the job of a researcher versus, say, a model designer and other folks involved? Like, what's the simplest way to understand what researchers do at OpenAI?
2. KNKarina Nguyen
  So the projects that I described are mostly, like, product-oriented. Like, research is mostly, like, product research. Another part, component of my team is actually more, like, longer term exploratory projects, and it's more about, like, developing new methods, understanding those methods under variety of circumstances. So, like, basically develop new methods. You kind of, like, need to follow very similar kind of, like, recipe of, like, building evals, but it's much more sophisticated evals. Like, you kinda want to have, like, auto distribution or, like, if you want to, like, measure generalization, um, you kind of need to, like, capture that. Uh, but it's basically more science-y in a way, where, you know, if, if we talk about synthetic data, like, one of the hardest things about synthetic data is, like, how do you make it, like, more diverse? Diversity in synthetic data is, like, one of the most important questions, uh, right now. And so it's, like, exploring, like, ways to inject, like, diversity as a general method that will work for all is, like, one of the, like, research explorations. Other ones is, like, more, like, developing new capabilities. I feel, like, it's always about, like, you know, like, you, you work on this, like, new method, and you have, like, signs of life that it's working. Either you think of, like, how do you make it more general, or you think of, like, how do you make it very useful? Or, like... And this is how, like, longer term projects become more, like, medium, like, short-term projects.
3. LRLenny Rachitsky
  That makes sense.
4. KNKarina Nguyen
  Mm-hmm.
5. LRLenny Rachitsky
  Essentially, working on developing ways to make the model smarter, L4, L5, L6-
6. KNKarina Nguyen
  Yeah.
7. LRLenny Rachitsky
  ... new ways to, uh... Like, L1 was a big breakthrough, right? The way it-
8. KNKarina Nguyen
  Yeah.
9. LRLenny Rachitsky
  ... operates where it's not just, "Here's your answer." It actually thinks and has-
10. KNKarina Nguyen
  Right.
11. LRLenny Rachitsky
  ... takes time to think through the process of coming up with an answer. Okay.
12. KNKarina Nguyen
  Yeah.
13. LRLenny Rachitsky
  Very helpful. Speaking of that,
35:36 – 42:15
The future of AI and its impact on work and education
1. LRLenny Rachitsky
  I was thinking about the future, where things are going. I wanna spend some time on just this I- insight that basically you are building the cutting edge of AI. Like, at the very bleeding edge of where AI is going and where it is. And so, uh, I'm very curious to hear just your take on how you think things are gonna change in the world and how people work based on where you see things are going. And I, I know it's a broad question, but let's say, like, in the next three years, how do you see the world changing? How do you see people's way of working changing?
2. KNKarina Nguyen
  It's a very humbling experience to be in both labs, I guess. Like, to me, when I first came to Andari, I was like, "Oh, no, I really love frontend engineering." And then, like, the reason why I switched to, like, research is because I realized at that time is, like, oh my god, like, cloud is getting better at, like, frontends. Like, cloud is getting better at, like, coding. I think cloud can, like, develop new apps or something. And so, like, it can, like, develop new features for the thing that I'm working. So it's, like, it was kinda like this meta realization where it's like, oh my god, like, the world is actually changing. And they're like, when we first, like, launched 100K context at that time, obviously, you know, I'm thinking about, like, some factors that's like, yeah, like, file uploads were, like, very natural, very familiar to people. But you couldn't imagine we could just, like, make, like, infinite chats in the Cloud.ai app, right? Like, as of, like, it's, like, a hi- in 100K context. But because, like, file uploads is, like, form follows function, it's like...... the form factor of the file uploads kinda enabled people to just, like, literally upload anything. The books, and, like, any reports, financial, and, like, ask any task to the model. And then I remember it was, like, you know, enterprise customers like, um, like financial customers are, like, really interested in that. It's like, oh wow, like, actually they... it's actually one of the very common tasks that people do, uh, in that setting. It was, like, kinda crazy to, like, see, uh, how some of the redundant tasks are getting, like, automated basically by these, like, smart models. And they're entering the, the era where I actually don't know, for example, sometimes like if L1 gives me the correct answer or not, because I'm not an expert in that field. And it's like, I don't even know how to verify the outputs, um, of the models just because, like, only experts know and, like, they can, like, verify this. So yes, so basically there are trends that are going on. The first trend is the cost of reasoning and intelligence is drastically going down. I had a blog post about this. Maybe I should update on, like, latest benchmarks because at that time, like, MMO... everybody was, like, doing, like, um, some... like, like one benchmark and then we, like, quickly saturated the benchmarks. And like now we need to, like, do the same flop but with, with another, like, frontier eval. But the costs of intelligence is, like, going down because it's, it becomes, like, much cheaper. S- smart, small models are becoming even smarter than, like, large models. And that's because of, like, the distillation research. This happened with, like, Claude 3 Haiku. I was, like, working on, like,
3. LRLenny Rachitsky
  (sighs)
4. KNKarina Nguyen
  ... post-unical with Claude 3 Haiku, and I realized it was much smarter than, like, Claude 2, which was, like way, you know, bigger, something like that. Um, but, like, the power of, like, small models become very intelligent and fast and cheap. We are moving towards that road. That has, like, multiple implications, but that means that, like, people will have more access to AI and that's really good. Like builders and developers will have much better access to AI, but also it means, like, all the work that has been, like, bottlenecked by an intelligence will be kind of, like, unblocked. So anyone, like in... I'm thinking about healthcare, right? Like, if I have... instead of, like, going to a doctor, I can, like, ask ChatGPT or, like, give ChatGPT a list of symptoms and ask me like, uh, which, like... would I have, like, a cold, flu or, like, something else. Like, I can literally get the access to, like, a doctor almost. And there's, like, been some, like, research studies around that.
5. LRLenny Rachitsky
  Yeah. There was a New York Times story about that where they compared doctors, two doctors using ChatGPT to just ChatGPT.
6. KNKarina Nguyen
  Yeah. (laughs)
7. LRLenny Rachitsky
  And just, just ChatGPT was the best-
8. KNKarina Nguyen
  Yeah. (laughs)
9. LRLenny Rachitsky
  ... of them all. Like, Chat... like, doctors made it worse.
10. KNKarina Nguyen
  That's crazy. Yeah. Yeah. (laughs) That's crazy, like, right? Like, (laughs) education, I think, uh, I would have dreamt if, like, I had the tool like ChatGPT when I was, like, young and, like, could learn so much. But it's like people can now learn almost anything from these models. So they can learn new language, they can learn how to build new book apps, like, I don't know, anything they do want. And like, I'm so... uh, like, it's humbling to, like, have, like, launch Canvas and, like, bring that thing to the people, enable them to do something else that they couldn't have ever before. And I think this is... there's something, like, magical around this experience. Uh, so education have, will have massive implications, like, I guess, like scientific research, right? Like, I think it's, like, the dream of, like, any AI research is, like, automate AI research. Uh, it's kinda scary I would say, um, which makes me think that, like, people management will stay. You know? It's, like, one of the hardest things to... it's, like, emotional intelligence for the models or, like, creative we- creativity in itself is, like, one of the hardest things. So writers, I, I don't think, like, people should be worried as much. I think it's, like... I think it'll alleviate a lot of, like, redundant tasks, uh, for people.
11. LRLenny Rachitsky
  This is awesome. Okay, I wanna follow this thread for sure. And it's funny that what you described is, like, you were an engineer at Anthropic and you're like, "Okay. Claude is gonna be very good at engineering. This isn't gonna be a potentially career long term so I'm gonna move into research. And AI is gonna need me for a long time to build it, (laughs) to make it smarter."
12. KNKarina Nguyen
  I would say we still have, like... I think Canvas team has still have, like, really cool, like, front-end engineers that I really, like... you know, people who, like, really care about, like, interaction design, like, interaction experience. Like, I don't think, like, models are there yet, but, like, I think if... but we can get the models to, like, this top 1% of, like, front-end or something, um, for sure.
42:15 – 47:50
Soft skills in the age of AI
1. KNKarina Nguyen
2. LRLenny Rachitsky
  So what I wanna move on to next along these lines is just, uh... and this is just speculation, but, uh, what skills do you think will be most valuable going forward for product teams in particular? So if folks are listening and they're like, "Okay, this is s- scary. What should I be building now to help me stay ahead and not be in trouble down the road?" What skills do you think are gonna be most... more and more important to build?
3. KNKarina Nguyen
  Yeah. I think, like, creative thinking, like, you kinda want to, like, um, come up... like, generate a bunch of ideas and, like, filter through them and not just, like, build the best product experience. Listening, you know, you want to, like, build something that, like, the most general model will not replace you. And oftentimes, you, you build something and you make it really, really good for, like, specific set of users and... actually the mode is now in, like, your user feedback. The mode is, like, more in, like-... you- whether you listen to them, like, whether you, you can, like, rapidly iterate. Like, the mode is, like, in here. I, I don't think, like, we, we are yet to, like... There are so many ideas, I think there's an abundance of, like, ideas which can, like, work out. It's like, I wouldn't be worried. I feel like... In fact, like, I just think, like, people in AI fields are like... I w- I wish they were, like, a little bit more creative and, like, connecting the dots across, like, different, like, fields or something like that, to, like, develop really cool new, like, generation and new paradigms of interactions with this AI. Like, I don't think we've cracked this problem at all. Um, a couple of years ago, I was, like, telling some people, I was like, "You know, you kinda want to, like, build for the future." So it's like, it doesn't necessarily matter whether the model is good or not good right now, but you can build product ideas, such that, like, by the time the models will be really good, it will work really well. Um, and I think that just, like, happened naturally. Like, for example, like, at Anthropic, like, right, like, uh, the Claude artifacts, and I feel like early ideas of Canvas was, like, back in, like, 2022, like before ChatGPT, like, writing ID was like an old ChatG... But I feel like Claude-1.3 model itself was, like, not there to, like, make, like, really extreme good, like, high quality edits, for example, like coding. Um, and I feel like I, I see, like, startups like Cursor, and it's, like, doing super well. Like, and that's because they, like, iterate so fast, they, like, invent, like, new ways of, like, training models. They move really fast. They listen to, like, users, like, massive distribution is like... Yeah, it, it's kind of cool.
4. LRLenny Rachitsky
  That's really helpful, actually. So what I'm hearing is that soft skills, essentially, are gonna be more and more important and powerful. You just talked about management-
5. KNKarina Nguyen
  Mm-hmm.
6. LRLenny Rachitsky
  ... leading people, being creative and coming up with innovative insights, listening. There's a post I wrote that I'll link to, where I look... I, I try to analyze what AI, how AI will impact product management. And we're actually very aligned, and my sense was the same thing, that soft skills are gonna become more and more important, and the things that are gonna be replaced is the hard skills. Which is interesting, 'cause usually people value the hard skills, like coding, design, writing really well. And it's interesting that AI is actually really good at that, 'cause it's taking a bunch of data, synthesizing it, and writing, creating a thing, versus all these fuzzy things around of what influences, convinces people to do things, and aligning and listening, like you said, creativity. Anything along those l- along those lines come up as I say that?
7. KNKarina Nguyen
  I think it's actually really, really hard to teach the model how to be aesthetic, or like, uh, do, like, visual, uh, really good, like, visual design, or like how to be extremely creative in the way they write. I think, like, I still think, like, ChatGPT kind of sucks at, like, writing. And that's because it's like, it's like bottlenecked by this, like, creative reasoning. I think, like, prioritization is, like, one of the most important. Like, I think, like, um, for a manager, I feel like... Actually, like, AI research progress is bottlenecked by, like, management, like research management, is because you have, like, constrained set of computes, and you need to, like, allocate the computes to the research paths that you feel the most convinced about. It was like, you need to, like, really hi- uh, you need to have, like, a really high conviction in the research paths to put the compute... And, like, it's more, like, return on investment, um, kind of situation. And it's like, okay... It was, yeah, like, I'm, I'm thinking a lot about like, like, okay, like, how do, across all my projects, which projects are higher priority? It's like, prioritization and also, like, on the lower levels, like, which experiments are really important to run right now, and which are not? And, like, cut through the lie. So I feel like prioritization, communication, like, um, management, um, people skills like empathy, like understanding people, like... Kind of like collaboration. Like, I think, like, Canvas wouldn't be, like, an amazing launch if they, it wasn't, like, about, like, people, and I think it's just... It's a wonderful, great group of people, and, like, I get a chance to, like, work with, like, people like Lee Byron, who's like a co-creator of, like, GraphQL, and, like, some of the best, like, Apple designers. And it's, like, so cool to, like, see... And, like, how do you create this, like, collaboration between people is just, like, something that's still humane, I think.
47:50 – 53:34
AI’s role in creativity and strategy development
1. KNKarina Nguyen
2. LRLenny Rachitsky
  Let me just follow this around a little bit, 'cause I imagine people listening are like, "Okay, but once we have AGI or SGI, it's like, it'll do all this." It'll s- you know, like, there's a world where, like, why isn't all this done? I think it's easy to just assume all that. I'm curious, this idea of creativity and listening, why you think AI isn't good at it, other than it's just very hard to train it to do this well. Is there anything there, just like why this is especially difficult for AI LLMs to get good at?
3. KNKarina Nguyen
  I think currently, it's difficult for many reasons. I think it's still, like, a active, like, research area, and it's something that, like, I think my team is, like, working on, is like, okay, how, like, how do we teach the model to be, like, more creative in, like, the writing? And actually, like, thinking, like, this new paradigm of life that the models think more should actually lead to, like, better writing in itself. But, like, when it comes down to, like, idea generation or, like, um, discriminating of, like, what is a good, like, visual design and all, I feel like it hasn't had learned, like, examples from, like, people to discriminate it very well. I do think it's because, like, you know, there are not that many people who are, like, actually, like, really, like... It's not, like, accessible to, like, models to learn from these people, I guess. Um, so I definitely think that's why it sucks. (laughs) But-
4. LRLenny Rachitsky
  Yeah, that makes sense.
5. KNKarina Nguyen
  Um-
6. LRLenny Rachitsky
  Basically, there's not enough of you yet, uh, researchers...... te- teaching it to do these things/people that have incredible taste and-
7. KNKarina Nguyen
  Mm-hmm.
8. LRLenny Rachitsky
  ... creativity that can teach these things. You could argue this will come, but-
9. KNKarina Nguyen
  Right.
10. LRLenny Rachitsky
  ... I'm not- (laughs) we don't need to keep going down that thread.
11. KNKarina Nguyen
  Yeah.
12. LRLenny Rachitsky
  Uh, let me ask you a specific question. In this post I wrote, I, I made this argument that a lot of people disagreed with, that strategy is something that AI tooling will become increasingly great at, and take over. There's this sense that that's a thing that people will continue to be much better at, and you can't offload to AI, basically developing your strategy, telling you what to do to win. My case is, isn't strategy just take all the inputs, all the data you have available, understand the world around you, and come up with a plan to win? Feels like AI would be, like an LLM would be incredibly smart at this. What's your take?
13. KNKarina Nguyen
  I think so too. I think, like, again, like, we, we, you teach the model all sorts of, like, tools and, like, capabilities and, like, reasoning, right? And it's like, when it comes down to, like, uh, specifically for Canvas right now, it would be very cool to, like, for the model to just, like, aggregate all the feedback from users. Like, summarize me, like, the top five, like, most painful flow- flows or user experiences. And then, like, the model itself is, like, very capable of, like, like, thinking of, like, knowing how it's being made, uh, figure out, like, how to, like, create a datasets for itself to, like, train on it. And I don't think, like, we are far away from that kind of, like, self-improvement, models becoming, like, self-improved, via, like... Then, like, the product development is basically kind of, like, self-improving, like, it's kind of, like, its own, like, organism or something. Um, yeah, like, I guess, like, a strategy is, like, is more, like, data analysis and, like, um, coming up with, like... Like, I think what models are really good at is, like, um, like, connecting the dots, I think. It's like, okay, if you have user feedback from this source, but you also have a internal, like, dashboard with metrics, and then you have, you know, like, other kind of, like, feedback, um, or, like, input, and then, like, it can co-create, like, a, a, a plan for you, like, recommendations even. And I think this is, like, one of the most common, like, use cases for ChatGPT2, is, like, coming up with, like, these sorts of things.
14. LRLenny Rachitsky
  That makes sense. Like, essentially a human can only comprehend so much information at once and look at so much data at once to synthesize takeaways. And as you said, these context windows are huge. Now here's all the information, what's the most important thing I should do?
15. KNKarina Nguyen
  Yeah, same as, like, scientific research. It's bec- like, you, like... Ideally the model will be able to, like, suggest, like, ideas, like, new ideas or, like, iterate on the experiment or, like, given the empirical results of the previous experiments, like, how do you, like, come up with, like, new ideas or, like, methods.
16. LRLenny Rachitsky
  Yeah. Yeah, man. Uh, okay. So just to close the loop on this conversation, this part of the thread is, the skills you're suggesting people focus on building and leaning into is soft skills like creativity, managing influence, collaboration, looking for patterns. Is that generally where your mind is at?
17. KNKarina Nguyen
  Yeah. I'm thinking a lot about, like, how do we make our relations more effectively. And I think this is m- mostly, like, management, I guess. It's like, how do you organize, like, research teams or, like, generally teams, like, combine, co- compose teams such that they will be at their maximally succeed or, like, at the maximal, like, performance of what can possibly... Like, we can, like, literally create, like, the next generation of computers. It's just, like, the matter of conviction and, like, the way you manage through that. It's like scaling organizations or, like, scaling product research, I guess.
18. LRLenny Rachitsky
  Yeah, I think wha- like, you're basically building this thing, and not efficiently doing it is, like, limiting the potential-
19. KNKarina Nguyen
  Yeah.
20. LRLenny Rachitsky
  ... of the human species right now.
21. KNKarina Nguyen
  Right.
22. LRLenny Rachitsky
  It's mismanagement within the research team and OpenAI and Anthropic and some of these other models.
23. KNKarina Nguyen
  Yeah, it's kind of crazy to think about it.
24. LRLenny Rachitsky
  Holy
53:34 – 57:11
Comparing Anthropic and OpenAI
1. LRLenny Rachitsky
  moly.
2. KNKarina Nguyen
  (laughs)
3. LRLenny Rachitsky
  Okay, so speaking of Anthropic and OpenAI, you've worked at both. Very few people have worked at both companies and seen how they operate. I'm curious just what you've noticed about the differences between these two, how they operate, how they think, how they approach stuff. What can you share along those lines?
4. KNKarina Nguyen
  It's more similar than different. Uh, obviously there is a lot of, like... There are some... Like, differences also comes to, like, nuances. I would say culture. I really love Anthropic, and I have a lot of friends there, and I also love OpenAI, and I still have a lot of friends there. So it's like, it's not about, like, enemy. I feel like there's, like, AI was all, like, yeah, they're competitors, there's, like, enemies. This is actually, like, fun big community, and it's, like, people, like, doing the same thing. I would say what I've learned from Anthropic is this, like, real care and craft towards, like, model behavior, model cast, model training. And I've been thinking a lot about, like, okay, like, what makes Claude, Claude and what makes ChatGPT, ChatGPT? And it's like, oh, it always comes down to, like, operational processes that kind of lead to the outputs to, to the model, uh, is the outputted model. And it's, like, the reason why Claude has so much more personality and, like, uh, is more like a librarian. (laughs) I don't know, like... (laughs) I don't know how I am, like, visualizing Claude being, like, a librarian (laughs) somewhat. Like, um, very, like, nerdy or something. Um, it's because I feel like it's a reflection of the creators who are, like, making this model, and, like, a lot of, like, details around, like, the character and the personality and, like, whether the model should follow up on this question or, like, not. Like, what's the correct, like, ethical behavior for the model to, like, in this scenario? It's, like, a lot of, like, craft. Um-... and, like, curate it, like, data sets. And this is where I learned that part of, like, art, I guess, um, at Anthropic. Also, like, Anthropic is, like, much smaller. Like, when I joined, it was, like, what, like, 670 people? When I left, it was 700 people. And, like, obviously, the culture changed so much. I really enjoyed being, like, early days startup, like, vibes. And, like, people knew each other as a family, but, like, the culture shifted. I would say, like, Anthropic, I learned from Anthropic that, like, they're much better at, like, focusing and, like, prioritization, like, very, very ha- like, very hardcore prioritization I guess, and they need to do it, like... But I think, like, OpenAI is, like, much more, um, innovative and, uh, much more, like, risk-takers in terms of, like, product or, like, research actually... You know, like, I don't really care, like, your full time job can be just, like, teaching the model how to be like creative writers. And it's like there's some luxury in this, like, research freedom that, that comes to scale maybe? I don't know. Um, but it gives you... It, it's like you'll have... I feel like I, I have much more creative, like, product freedom to do almost anything, I guess, within, like, OpenAI. Like, I've lost track
5. LRLenny Rachitsky
  (laughs)
6. KNKarina Nguyen
  ... ChattyBitty into, like, the version that we want. It's, like, more like, yeah, may- probably bottoms up, I guess.
7. LRLenny Rachitsky
  Yeah. That's how I was, I was thinking about it. It feels like OpenAI is more bottoms up, uh, distributed, people bubble up ideas, try stuff. There's more lo- and that emerge- leads to more thin- products launching, I imagine more things just kind of being tried.
8. KNKarina Nguyen
  Right.
9. LRLenny Rachitsky
  Versus more of a, "Let's just make sure everything we do is awesome..."
10. KNKarina Nguyen
  Right.
11. LRLenny Rachitsky
  ... "and great and craft," and thinking deeply about every...
12. KNKarina Nguyen
  Right.
13. LRLenny Rachitsky
  ... every investment. That's really interesting. I've never heard it described this way.
57:11 – 1:07:13
Innovations and future visions
1. LRLenny Rachitsky
  Uh, Karina, we've covered so much ground. This is gonna help a lot of people with so many, uh, ways of thinking about where the future's going. Before we get to our very exciting lightning round, I'm curious if there's anything else that you think might be helpful to share or get into?
2. KNKarina Nguyen
  One of my regrets, I guess, when I was early days at Anthropic, was that, like, I think there was, like, some luxury of the time 'cause pre-ChattyBitty to actually, like, come in with, like, a bunch of ideas and, like, prototype, like, almost every day. Um, and I think, like, we did a lot of cool ideas, like Claude in Slack was actually one of the first, like, uh, tool using, like, products. It's like, uh, Claude could operate in, like, your workplace now. It's, like, kind of cool because then you're like, "@Claude, summarize the thread." So, maybe you have a y- entire conversation with someone and then you want, like, a summary of, like, what happened. Like, you can say like, "@Claude, summarize this." Also, it was really fun to, like, even, like, iterate on the model itself. It's like, when you just, like, talk to the model in, like, Slack forever. Um, it created like, uh, some social element. It's kind of cool. It's kind of like my journey in, like, um, this Discord. Like, people learned so much about, like, prompting and, like, how to work with, like, Claude. Actually, one of the features that was, like, early tasks prototype is like, you know, every Monday, Claude would just, like, summarize the entire channel. Or, like, every Friday, we just, like, summarize, like, bunch of channels, uh, and give, like, the news about the organization or something. So, it's n- and it's kind of, like, really cool, like form factor. I think that thinking about, like, form factor is, like, a really important, like, question, like, in AI. Especially, we haven't, like, even figured out, like, how do we create, like, an awesome, like, product experience with, like, O-series models? It's like the paradigm between, like, synchronous, real time, give an answer paradigm into, like, more asynchronous paradigm of, like, agents working in the background. But then now, the question is like, the agents should build trust with you, right? And trust builds over time, which is like with humans. And, um, you know, you just saw this collaboration, which is why, like, a collaborate- like, this collaboration model was, like, you and a model is, like, so important because you build trust and the model learns from your preferences so that it can become, like, more personalized. And it will start predicting the next, like, action that you want to take on the computer or something. And, and it's, like, kind of like more predictive, much more of a... We, we go- we went from, like, personal computers to, like, personal model (laughs) uh, basically here. Yeah.
3. LRLenny Rachitsky
  That's, uh, why is it not a thing? That seems like such an obvious feature that every LLM should have is a Slack bot version of them. Is that, is that a thing I can i- install or is that not a thing right now?
4. KNKarina Nguyen
  I know that Claude in Slack was sunsetted in, like, 2023 or something. But that's because, like, I think, uh... I think it was, like, after ChattyBitty it was mostly, like, the focus on, like, consumer just cases or, like, enterprise just cases.
5. LRLenny Rachitsky
  Hmm.
6. KNKarina Nguyen
  Uh...
7. LRLenny Rachitsky
  Bummer.
8. KNKarina Nguyen
  I think they didn't want, like... I think the form factor of, like, Claude in Slack is, like, um, was kind of constrained a little bit, uh, when you wanted to-
9. LRLenny Rachitsky
  Hmm.
10. KNKarina Nguyen
  ... like, develop new features. Um-
11. LRLenny Rachitsky
  Bummer. I want that.
12. KNKarina Nguyen
  I know that ChattyBitty had, like, Slack bot too. So I don't know, like, maybe it will come back sometime.
13. LRLenny Rachitsky
  All right. I would, I would pay for that. Uh, any other memories from that time of early days? 'Cause that's a really special place to have been, is early days Anthropic. Any other memories or stories from that time that might be interesting to share?
14. KNKarina Nguyen
  I think the very first launch when we felt like... when Clipped in Use again was, like, 100K context, um, launch is, like, when the models could input the entire, like, book and give you, like, a summary of the book or something. Um, or the entire financial or, like, have, like, multi files financial reports and then, like, give you an answer, um, to the question, to a very specific question. I think there was something in there that kind of like, "Oh my god. This is, like, a really cool new capability." Not, like, model capability, but more like the capabilities that came from the product form factor itself, rather than, like, the model capability as much. Um-I think, like, other prototypes that we- we're thinking about, like, yeah, like, cl- uh, there's, like, one part about cloud workspaces. And it's, like, kind of the same, like, idea of like, Cloud and I would have this shared workspace and that shared workspace is like a document, and we can, like, tweet in the document. And I feel like sometimes the ideas, like, product ideas log, and they log for, like, two years (laughs) , um, just like in this case.
15. LRLenny Rachitsky
  It's interesting there's these milestones that kind of, uh, open up our view of what is happening and where things are going. ChatGPT I think was the first of just, like, "Wow, this is much better than I would have thought." You talked about 100K context windows where you could upload a book and ask it questions, have it summarize. I actually use that all the time when I have interview guests and they wrote a book. I sometimes don't have time to read the whole book (laughs) , so I use it to help me s- understand what the most interesting parts are and then I actually dive into the book, just to be clear. (laughs) Uh, and then I don't know, maybe, like, voice was another one where you could talk to, say, ChatGPT. Was there any other moments there that you're like, "Wow, this is much better than I thought it was gonna be"?
16. KNKarina Nguyen
  Yeah. I think, like, uh, the computer use agents, like, the model operating the desktop. And you can essentially think of, like, you know, new kind of, like, experience where the model can learn the way you browse. And from that preference, it can just, like, browse as just like you. And it's kinda like simulation, simulated persona. And it's actually very similar to the idea of, like, okay, like, maybe Sam Altman doesn't have a lot of, like, um, time. Maybe I want to, like, talk to, like, his simulated, like, his simulation and ask, like... Or, like, for example, like, yeah, like, I- I- I really appreciate some of the technical mentorship from, like, Jakob. Like, but he doesn't have a lot of time, so it's like, I really want to, like, ask him these questions. Like, how he would respond with simulated environments like this, um, would be really cool. (laughs)
17. LRLenny Rachitsky
  That's a great place to plug LennyBot. I have one of those. It's trained on all of my podcasts and newsletters.
18. KNKarina Nguyen
  Oh, cool.
19. LRLenny Rachitsky
  And I... It sits on many models. I don't know which one exactly they use, but it's exactly that. It's, uh... And it's not even me, it's all the guests that have been on the podcast and all the newsletter-
20. KNKarina Nguyen
  Cool.
21. LRLenny Rachitsky
  ... ????? I wrote. And you could just ask it, "How do I grow my product? How do I develop a strategy?" And it's actually shockingly good.
22. KNKarina Nguyen
  Do you feel like it reflects who you are, like, what are the-
23. LRLenny Rachitsky
  Yeah.
24. KNKarina Nguyen
  Okay.
25. LRLenny Rachitsky
  It's... The best part of it is you can talk to it. It's built... There's an ElevenLabs voice version that's trained on my voice now for this podcast, and it's actually very good.
26. KNKarina Nguyen
  (laughs) .
27. LRLenny Rachitsky
  And people, like, have told me they sit there for hours talking to it.
28. KNKarina Nguyen
  Wow.
29. LRLenny Rachitsky
  And somebody, uh, told it, "Interview me like I am on Lenny's podcast. Ask me questions about my career." And he did a half-hour podcast episode with LennyBot.
30. KNKarina Nguyen
  Oh, my God. That's so fun.
1:07:13 – 1:11:36
The potential of AI agents
1. KNKarina Nguyen
  nicely.
2. LRLenny Rachitsky
  You mentioned something as we were getting into this extra section that we ended up going down is this idea of, uh, your- the agents using a computer. I know this is actually something you're gonna launch today, the day we're recording it, which will be out by the time this comes out, called Operator. Can you talk about this very cool feature that people will have access to?
3. KNKarina Nguyen
  Yeah. So, uh, I unfortunately did not work on that, but I'm really, really excited about, like, this launch. Um-It's basically an agent that can complete the task in its own, like, virtual computer. Like, in its own virtual environment. You can do any literally task, like order me a book on Amazon, and then ideally the model will either, like, follow up with you, like, "Which book do you want?" Or, like, know you so well that it will, like, start recommending, like, "Oh, here are the five books that I might recommend to you to, to buy." And then, like, you have to, like, "Yeah, help me, help me buy." And then, uh, the model goes off, uh, into its own, like, virtual little browser, and, like, complete the task and buy the book on the Amazon. And then if you give the model, like, credentials, credit cards, obviously it comes with, like, a lot of trust and, like, safety. Um, then it will just complete the thing for you. It- it's a virtual assistant. (laughs)
4. LRLenny Rachitsky
  It's interesting how this just sounds like obviously this should happen. Like, why is this not already a thing? Which is also mind-blowing that-
5. KNKarina Nguyen
  (laughs)
6. LRLenny Rachitsky
  ... we're just assuming this should exist, like just some AI doing things for you on a computer that you just ask it to do.
7. KNKarina Nguyen
  Yeah. (laughs)
8. LRLenny Rachitsky
  Like, it's absurd.
9. KNKarina Nguyen
  It's actually really hard. And I think, like, um, you're still cracking this, but I feel like... I don't know if you use, like, Tuple? It's like a pair programming product.
10. LRLenny Rachitsky
  Hm, nope.
11. KNKarina Nguyen
  But, um, I don't know if you love pair programming, so if you use-
12. LRLenny Rachitsky
  Oh, yeah, Shopify uses this. I remember it came up on a podcast episode.
13. KNKarina Nguyen
  Oh, nice. Yeah, so it's, it's a very cool product where you can just, like, call anyone at any time and then, like, share screen and the other person can, like, have access to the screen and, like, start, like, literally operating your computer. And, and it's very, like, real time, like, diligence is, like, very, um... it's, like, very high quality. Um, and it's just like, I kind of want the same. It's like, I want a, like, pair program with, like, my model and, like, the model should even talk to me. Like, draw, like, very specific, like, section in my code, in VS Code, and, like, tell me, like... I mean, teach me and you can have, like, different modes. It's like right here, it is like a product right here for you. (laughs) I don't know. Um, people... Some people should build s- build on.
14. LRLenny Rachitsky
  (laughs) It sounds like a startup just got birthed-
15. KNKarina Nguyen
  Yes.
16. LRLenny Rachitsky
  ... from someone listening to this. You mentioned that it's very hard to do this, uh, agent controlling a computer as you and helping out. What makes it so hard for whatever, however much you can explain briefly? (laughs)
17. KNKarina Nguyen
  Much of it is, like, uh, because right now the model's, eh, operating on, like, pixels instead of, like, language or whatnot. Like, pixels is actually really, really hard for the models because, like, perception or visual perception. And I think there's still, like, a lot of, like, a lot of, like, multimodal, like, research that's going on. Um, but I think, like, language scales so much, like, easier compared to, like, multimodals because of that. Another, like, thing that... I guess, like, my, my team is working on that is, like, how do you derive human intent, um, very correctly? It's, like, sometimes, like, does a model know enough information to ask a follow-up question or, like, to complete the task? You kind of don't want, like, an agent to, like, go off for, like, 10 minutes and then come back with, like, an answer that you didn't even want. That actually creates, like, much more worse ex- user experience. And this just comes with, like, teaching the model, like, like, people skills. It's like, you know, like, what do people like? Like, kind of like creating, like, the mental model of the user and, like, care about the user in order to ask certain questions. Like, actually that part is, like, hard to... for the models.
18. LRLenny Rachitsky
  That relates to what we talked about earlier with kind of the soft skill, people skills pieces.
19. KNKarina Nguyen
  Yeah.
20. LRLenny Rachitsky
  Not where these models are strong yet. Okay.
1:11:36 – 1:14:33
Final thoughts and career advice
1. LRLenny Rachitsky
  I'm gonna skip the lightning round.
2. KNKarina Nguyen
  Sure.
3. LRLenny Rachitsky
  I wanna ask just one question from the lightning round.
4. KNKarina Nguyen
  Mm-hmm.
5. LRLenny Rachitsky
  Something fun. (laughs)
6. KNKarina Nguyen
  Yeah. (laughs)
7. LRLenny Rachitsky
  Uh, okay. So when AI replaces your job, Karina, I'm curious what you're gonna... And it gives you a stipend, gives you a monthly stipend. Here's your, here's your salary for the month. What would, what would you want to do? What do you want to spend your time on? What will you be doing in this future world?
8. KNKarina Nguyen
  I've been thinking about this a lot actually. I have... I feel like I have a lot of jobs options. (laughs) I would love to be a writer, I think. I think that would be super cool, uh, to just, like, write, like, short stories, like sci-fi stories, um, novels. I really like art history. So, you know those, like, um, conservationists who are, like, in the museums who just, like, try to preserve, like, art paintings, but just, like, painting through-
9. LRLenny Rachitsky
  Mm-hmm.
10. KNKarina Nguyen
  ... a lot of that? I think that would be really cool-
11. LRLenny Rachitsky
  Hmm.
12. KNKarina Nguyen
  ... um, (laughs) to do. Um, yeah. Um-
13. LRLenny Rachitsky
  That sounds beautiful.
14. KNKarina Nguyen
  I don't know.
15. LRLenny Rachitsky
  Uh, what I'm hearing is you need to nerf these models to not get very good at writing so that you can continue.
16. KNKarina Nguyen
  Yeah. (laughs)
17. LRLenny Rachitsky
  Although at that point, you don't need to do it for... Like, you don't need people to buy it. You're just doing it for fun, so it doesn't even matter if they're incredibly good at writing or art, art conservation. Oh, man. What an episode, what a conversation. What a wild time we're living in. Karina, thank you so much for being here. Two final questions. Where can folks find you online if they wanna reach out and follow up on anything, and how can listeners be useful to you?
18. KNKarina Nguyen
  You can find me... I'm on Twitter. It's @karina_mien. You can also shoot me at email on my website. Um, and I'm... My team is hiring, and so like, I'm looking for research engineers, research scientists, as well as, like, machine learning engineers. So, like, people who come from, like, product engineers who want to, like, learn, like, model training. Um, I'm actually hiring for, like, my team. My team is called, like, Frontier Product Research and we train models. We develop new methods, but for product-oriented outcomes. Yeah.
19. LRLenny Rachitsky
  What a place to work. Holy moly. Uh, what's the best way for people to apply for these, uh, very lucrative roles?
20. KNKarina Nguyen
  I think you can shoot me a DM on Twitter.
21. LRLenny Rachitsky
  Okay.
22. KNKarina Nguyen
  Or, um, I'm yet to create a job description.
23. LRLenny Rachitsky
  Okay. This is the job description. Uh, you are-
24. KNKarina Nguyen
  Or you can apply on the, like, post-training team. Yeah.
25. LRLenny Rachitsky
  Okay. Well, so you're gonna get a flood of DMs. I hope you're prepared. (laughs) Karina, thank you so much for being here. This was incredible.
26. KNKarina Nguyen
  Thank you so much, Lenny.
27. LRLenny Rachitsky
  Bye, everyone.
28. KNKarina Nguyen
  It was fun.
29. LRLenny Rachitsky
  (instrumental music) Thank you so much for listening. If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also, please consider giving us a rating or leaving a review, as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at lennyspodcast.com. See you in the next episode.

Episode duration: 1:14:33

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode DeskgjrLxxs

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

Introduction to Karina Nguyen

Challenges in model training

Synthetic data and its importance

Creating Canvas

Day-to-day operations at OpenAI

Writing evaluations

Prototyping and product development

Building Canvas and Tasks

Understanding the job of a researcher

The future of AI and its impact on work and education

Soft skills in the age of AI

AI’s role in creativity and strategy development

Comparing Anthropic and OpenAI

Innovations and future visions

The potential of AI agents

Final thoughts and career advice

Get more out of YouTube videos.