Y CombinatorCalvin French-Owen: How Sub-Agents Split Context for IDEs
Through concurrent sub-agents that spawn separate context windows; Claude Code debugs real-environment concurrency bugs without sandbox constraints.
EVERY SPOKEN WORD
50 min read · 10,476 words- 0:00 – 1:15
Intro
- CFCalvin French-Owen
I feel like when I'm using Claude Code, it's like, oh, I feel like I'm flying through the code.
- GTGarry Tan
When it's in your CLI, this thing can debug nested, delayed jobs, like, five levels in, and figure out what the bug was, and then write a test for it, and it never happens again. This is insane.
- CFCalvin French-Owen
I think everyone who's experimenting with this stuff on, like, a hobbyist level or at, like, a very small start-up, they're just pushing the coding agents as far as they can go. 'Cause it's like you don't really have time to figure out anything else. Like, as a start-up, you have limited runway, you're just going to orient around speed. I think at a bigger company you have a lot more to lose.
- GTGarry Tan
What are some of the tips to become a top one percent user of coding agents?
- GTGarry Tan
Yeah, what's your stack? [laughing] Yeah. [upbeat music]
- DHDiana Hu
Hey, everyone, welcome back to another episode of The Lightcone. Garry, are you- are you ready to record?
- GTGarry Tan
Oh, I'm, I'm in plan mode right now, but okay, yeah.
- DHDiana Hu
[laughing]
- GTGarry Tan
I guess it's time. Sorry about that. Well, welcome to a- another episode of The Lightcone, and today we have an incredible guest, Calvin French-Owen. He's one of the first people to create Codex at OpenAI, and before that, he started Segment, which is a multi-billion dollar company that got to a very successful exit. Calvin, welcome back.
- CFCalvin French-Owen
Thanks for having me.
- GTGarry Tan
I guess,
- 1:15 – 4:00
Garry can’t stop using Claude Code!
- GTGarry Tan
what a crazy time for all of us. Uh, I recently got very, very addicted to Claude Code, and, uh, I would describe it as, like, ten years ago, I was a marathon runner, and I loved doing it, and then I suffered a catastrophic knee injury, which is called manager mode. [laughing]
- DHDiana Hu
[laughing]
- GTGarry Tan
And I, uh, stopped coding, which is tragic and horrible. Uh, but now the last nine days have been, like, this incredible unlock of all the things I remember being able to do. And it's like, you know, I got a new total knee replacement, and actually, it's a bionic knee, and it allows me to run five times faster. What's your take on it? Because you're, I mean, right out there at the forefront of it. I mean, Codex pioneered all of the... a lot of the ideas that now, like, everyone still uses, and Codex is still evolving, too.
- CFCalvin French-Owen
For brief context, when I was at OpenAI, um, I was working on the Codex web project. At the time, Cursor was out in the market, and they had kind of built this shim, uh, around, I think it was Sonnet three point five, uh, and it was able to work in your IDE. Claude Code had just come out, uh, and it was working as a CLI. And we kind of had this idea, like, "Hey, in the future, coding is really gonna feel more like talking to a coworker." Like, you're gonna send off a question, and then they'll go off and do something and come back to you with a PR. Uh, and so that's where we started with this web view, uh, and that's what we were building. I think directionally, that's still kind of correct for where things should go, but obviously now everyone is coding with CLIs instead. Like, they're using those tools a lot more, whether it's Claude Code or whether it's Codex. And I think, at least for me, kind of the lesson in that is, I think in some sense you're right, that, like, everyone is going to become a manager in the future, or at least that's my hot take. But in order to get there, there are steps along the way, and you have to really build a lot of trust in the model and understand what it's doing.
- GTGarry Tan
You recently came over to Claude Code. What's the transition been like in terms of as using it as your, you know, one of your stacks?
- CFCalvin French-Owen
Yeah, yeah. So Claude Code is, uh, certainly my kind of like, daily driver today, and honestly, this has switched every few months. Uh, for a while, I was deeply in Cursor. I think their new model, which is really fast, is actually quite good. Then I kind of moved over to Claude Code, especially with Opus. Claude Code is a really interesting product, and I think it's underrated how good the both product and model are working together. If you study them closely, I think one of the things that Claude Code does in particular that's really amazing is split up context well. And so if you look at, uh, I don't know, things like skills or sub-agents, like, when you ask Claude Code to do something, it will typically spawn an explorer sub-agent or, like, multiple ones, and basically, each of those are running
- 4:00 – 6:23
Contrast with IDE’s, context-splitting
- CFCalvin French-Owen
haiku to traverse the file system and kind of, like, explore what's there, and they're doing it in their own context window. And I think Anthropic has kind of, like, figured something out here around, given a task, does that task fit in the context window, or should I actually, like, split it into many more?
- GTGarry Tan
Mm-hmm.
- CFCalvin French-Owen
And the models are, like, insanely good at this, which I think gives them really good results.
- GTGarry Tan
And I think the fascinating thing is, because it's on the terminal, it's the purest form for composable atomic integrations. Because if you came from a IDE first world, which is where Cursor was, and I suppose Codex, too, this concept of, uh, finding the context more free form wouldn't come out so natural, right? Because you-
- CFCalvin French-Owen
A hundred percent, yeah
- GTGarry Tan
... you lack the constraint, which is so unique.
- CFCalvin French-Owen
Yeah, and I, uh, personally, I was surprised. I don't know how you all feel, but I was surprised at, like, CLIs.
- DHDiana Hu
It's, it's weird. It's like a weird retro future-
- CFCalvin French-Owen
Yes, yeah
- DHDiana Hu
... that, like, the CLIs, which are the technology from twenty years ago, have somehow beaten out all the actual IDEs, which were supposed to be the future.
- CFCalvin French-Owen
A hundred percent, yeah. And I, I think it's important, actually, to Claude Code, that it's not an IDE because it sort of distances you from the code that's being written. Like, IDEs are all about exploring files, right? And you're, like, trying to keep all the state in your head and understand what's going on. But the fact that a CLI is, like, a totally different thing means that they have a lot more freedom in terms of how it feels. And I, I don't know about you, but I feel like when I'm using Claude Code, it's like, oh, I feel like I'm flying through the code. You know? It's like there's all sorts of things going. There's, like, little progress indicators. It's kind of, like, giving me status updates, but, like, the code that's being written is not the front and center thing.
- GTGarry Tan
I mean, dev environments are so messy. I mean, I really like how clean a sandbox conceptually is-
- CFCalvin French-Owen
Yeah
- GTGarry Tan
... in Codex, but then I just ran into all these crazy issues, like trying to do, you know, run just simple testing, right?
- CFCalvin French-Owen
Mm-hmm.
- GTGarry Tan
It needs to access PostgreSQL, and then it can't do it, or y- you know, my codex.md ended up being twenty lines long, and even then, it didn't work. When it's in your CLI, it could just access your development database. I mean, I'm not sure if I'm supposed to do this, but I've actually also had it access my production database to-
- CFCalvin French-Owen
Yeah, yeah. [laughing]
- GTGarry Tan
[laughing]
- GTGarry Tan
And uh, it can just do it, and it's like, "Yeah, okay, here," like, "I looked into it, and I think this happened, and I'm gonna debug this, you know, concurrency issue." And it's like, oh, my God, like, this thing can debug nested, delayed jobs, like, five levels in and figure out what the bug was, and then write a test for it, and it never happens again? This is insane.
- CFCalvin French-Owen
Yeah, a- and
- 6:23 – 9:11
Distribution models, top down vs bottom up
- CFCalvin French-Owen
I think that distribution mode is frankly underrated. Like, thinking about a Cursor, or a Claude Code, or a Codex CLI, the fact that you can just download it and use it without having to get IT permissions or anything-
- GTGarry Tan
Mm
- CFCalvin French-Owen
... makes a huge difference.
- GTGarry Tan
Mm.
- CFCalvin French-Owen
And actually, I was playing around with a product the other day where, uh, you download a desktop app-... and then it execs the Claude code that you have running on your laptop, and uses that, and communicates back via an MCP server to the desktop product.
- SPSpeaker
Mm.
- CFCalvin French-Owen
And it's like, this is a very interesting way- [laughing]
- GTGarry Tan
[laughing]
- CFCalvin French-Owen
-of now starting to work with your laptop, where you don't have to get anyone's permission to do it. You just download the product and go.
- GTGarry Tan
Yeah, I was looking at, like, New Relic has an MCP, but you know, Sentry you can, like, copy markdown, but like, it's like an auto bug fixer-
- CFCalvin French-Owen
Yeah, yeah
- GTGarry Tan
... basically. It's right there.
- SPSpeaker
It's super interesting that in a world where things are changing so fast, you really want your product to have a bottoms-up distribution, not top down, because, like, top down is, like, just too slow. Like, the CTO of a company is gonna be, like, have all these concerns about security, and privacy, and what if the-
- GTGarry Tan
Control. [laughing]
- SPSpeaker
And control, exactly. Versus, like, the engineers just, like, install the thing and start using it, like, "This thing is amazing!"
- CFCalvin French-Owen
Yeah, I think that's right. Uh, the one thing I do struggle with... I mean, I'm like a B2B enterprise guy, generally, [laughing] but I feel like there's some amount of moat that happens when you do that top-down sale, and there's gotta be some company who manages to crack it, where it's like, "Oh, this is a thing that everyone has access to. Maybe individual people can take it up."
- GTGarry Tan
That was the original, um, Netscape Navigator. It was free for non-commercial use, and then, uh, people would just download it and use it for commercial use, and then they could just track down the IPs and figure out-
- CFCalvin French-Owen
Uh-huh
- GTGarry Tan
... uh, exactly how many clients were in all of these different companies and say, "You should pay for this. You're in violation, but all you have to do is buy a license." [chuckles]
- CFCalvin French-Owen
Yeah, yeah.
- GTGarry Tan
So I'd be curious if you could do that work again here.
- CFCalvin French-Owen
Yeah.
- GTGarry Tan
I mean, the- your point about distribution is very interesting, 'cause, uh, now people are probably just making architecture decisions about what to use directly in Claude Code. Like, they might not-
- CFCalvin French-Owen
Mm-hmm
- GTGarry Tan
... even know what a- you know, what analytics to use, and it's like, "Oh yeah, as long as Claude Code says use PostHog, like, they're using PostHog."
- CFCalvin French-Owen
100%. One of the companies who I advise was talking about their, like, GEO strategy. This is, like, the generative optimization-
- GTGarry Tan
Mm
- CFCalvin French-Owen
... or how you show up in chatbots. And what he was saying is funny, is one of their competitors had put together a, like, top five list of, like, tools in their category that you should be using, and of course, their tool is ranked at the top of this top five list. [laughing]
- GTGarry Tan
[chuckles]
- CFCalvin French-Owen
And, like, any human looking at this would be like: "Oh, this is so obviously biased." It's like the top tool is the one that's in the domain, you know? [laughing] But the LLMs get fooled, and, like, they're pulling together a bunch of context, and they're saying, like, "Oh, this is the top," and then they'll just recommend it. I think, yeah, if you're selling a developer tool, like, having good docs that are out there, like, having social proof, like, maybe being posted on Reddit a little bit more, all of that helps your case tremendously.
- 9:11 – 12:28
Licensing and optimization
- SPSpeaker
I think a lot of the open source, uh, projects have taken off a lot more.
- CFCalvin French-Owen
Yep.
- SPSpeaker
I think one of the examples is Supabase, actually.
- CFCalvin French-Owen
Yeah. Yeah.
- SPSpeaker
Uh, which really took off last year, and part of it's because they have such a good open source documentation, how to set up a bunch of stuff. Whenever someone asks how to set up anything that you need, some sort of back end, Firebase type of transaction, the default answer from all the LLMs is actually, uh, Supabase.
- CFCalvin French-Owen
Mm-hmm.
- SPSpeaker
So just trying some of these questions, that comes from that.
- GTGarry Tan
The thing is, it's winning the internet, and it was like that before when it was, like, Stack Overflow-
- CFCalvin French-Owen
Uh-huh
- GTGarry Tan
... searching Google.
- CFCalvin French-Owen
Now there's a new... Yeah.
- GTGarry Tan
And then now that nobody uses Google anymore, [laughing] it's, like, crazy. You just-
- CFCalvin French-Owen
Yeah
- GTGarry Tan
... yeah, it's kind of the same deal.
- CFCalvin French-Owen
I, I will say it does help open source disproportionately, I would say. Like, I don't know if you all saw, there was a Ramp blog post that they recently published about building their own coding agent, and they were mentioning that they use open code as a harness, um, because the model can look and see the source code and understand how it's working. And I, I do this all the time with open source projects. I'll, like, clone the repo, and then spin up Codex or Claude Code and be like, "Hey, give me a walkthrough of what's going on here," and it's really useful.
- SPSpeaker
What do you think are some of the tips for anyone that wants to build a coding agent, since you've, uh, done it a lot? [laughing]
- CFCalvin French-Owen
[laughing]
- GTGarry Tan
[laughing]
- SPSpeaker
What are, what are some, uh, now lessons that you learned that you wanna share?
- CFCalvin French-Owen
I mean, I think the number one thing, uh, is managing context well. Basically, we kind of had, like, a checkpoint for, uh, I think it was o3, like, one of the reasoning models, and then we did a bunch of fine-tuning on it, um, in reinforcement learning, where it's like, oh, you're given a bunch of questions. So, like, solve these coding problems, or, like, fix tests or whatever, implement a feature. Um, and then the model was RL'd to respond to those. And so I think most people are not going to be doing that, right? But the things that you can do are figure out, like, "Hey, what context should I be supplying to this agent to get the best possible result?" And so for Claude Code, if you watch it working, it's like, oh, I'm going to, like, spawn a bunch of these explorer sub-agents. They will, like, search for different patterns in the file system. They will come back, uh, they will have this context, they'll summarize it for me, and then I'll have some place to go. It's interesting watching, like, different agents structure this context. Uh, like, I think Cursor takes an approach where they actually do semantic search, where they embed everything and figure out, like, "Hey, what query is closest to this?" If you look at a Codex or a Claude Code, uh, they actually just use, like, grep. [laughing]
- GTGarry Tan
Yeah. [laughing]
- SPSpeaker
[laughing]
- CFCalvin French-Owen
Uh, and I think that works because-
- GTGarry Tan
It works really well.
- CFCalvin French-Owen
Yeah.
- SPSpeaker
Works really well.
- CFCalvin French-Owen
It works very well because code is very context dense. Um, like, if you think about lines of code, it's like each line is probably less than 80 characters. There's not a lot of, like, big, like, data blobs or, like, JSON in your codebase. Maybe there's some, but not a lot. You can respect a gitignore to figure out and, like, filter out stuff that's just not relevant or is, like, packaged, and you can use grep and ripgrep to, like, find context around the code, which probably gives you a good sense for what that code is doing, and you can navigate the f- folder structure, um-
- SPSpeaker
And also, LLMs are really good at emitting very complicated grep expressions-
- CFCalvin French-Owen
Yes!
- GTGarry Tan
... yeah [chuckles]
- 12:28 – 17:36
Tips on becoming a top 1% user of coding agents
- SPSpeaker
So given this is how a lot of the superpowers for the best coding agents is context engineering, what are some of the tips to become a top 1% user of coding agents?
- GTGarry Tan
Yeah, what's your stack?
- CFCalvin French-Owen
Yeah. [laughing]
- SPSpeaker
Yeah, what, what do you do to be so productive with it?
- CFCalvin French-Owen
One is, if you're able to use-... uh, just generally far less code and plumbing. Um, so a lot of what I do is, like, deploy stacks on, like, Vercel or Next.js or, like, Cloudflare Workers, where there's kind of, like, already a bunch of boilerplate, like, taken care of f- for you. And then you don't really have to think that much about, like, "Hey, I need to stand up, like, all these different services and deal with, like, service discovery, and, like, registering on, like, some sort of central endpoint or, like, all these databases." It's like, oh, like, everything is pretty roughly defined in this, like, one or two hundred lines of code. I tend to operate more towards microservices for that as well, or, like, individual packages that are fairly str- well-structured. I think it's also worth knowing, like, what the LLM superpowers are. Like, in general, coding agents are, uh... and I think Andrej Karpathy just tweeted about this: they're, like, super persistent, so they will keep going no matter what. [chuckles] They end up, uh, typically just making more of whatever's there. So if you're trying to direct them to do something, it's worth, like... uh, I mean, uh, I can pick on OpenAI slightly in this example. OpenAI has, like, a giant monorepo. It's been there for, uh, a few years now and has, like, I don't know, thousands of engineers who are committing. Some of those engineers are, like, super senior meta folks who came in and are like- know exactly how to write production code. Some are, like, new PhDs. It's like, uh, a pretty wide range, and so the LLM will pick up different things depending on where you direct it. I think there's a lot of room, actually, for coding agents to figure out, like, what is the, like, optimal type of code that we should produce? I mean, obviously, giving the model a way to check its work helps improve performance drastically, so the more that you can run tests in lint, uh, CI, et cetera. Um, personally, I also use code review bots pretty aggressively. Um, I know, like, Reptile, a YC company, is really good. Um, I use... the Cursor bug bot has gotten quite good, and I actually like Codex for c- code review as well. I find it does a very good job on correctness. So those are all things that, like, the agents are good at. Uh, and they're excellent ex- exploring the codebase, too. I think areas where they don't do well, uh, they make more. If your goal is not to make more, they'll, like, often duplicate code and, like, spend a bunch of time re-implementing things that, like, you're like: "Oh, of course you didn't wanna do this." I think context poisoning is a real thing, where it kind of, like, goes down one loop, and it will continue because it has this persistence, but it's referring back to tokens, which are, like, not right in terms of pursuing a solution. Um, and so one thing that I often do is, like, very actively clear context.
- GTGarry Tan
Like, how often?
- CFCalvin French-Owen
Usually, uh, when it gets above, like, fifty percent tokens.
- GTGarry Tan
Mm-hmm. Oh, wow!
- CFCalvin French-Owen
Uh, yeah, yeah.
- GTGarry Tan
Okay.
- CFCalvin French-Owen
I know- there's this guy, Dex, uh, from this company, HumanLayer. That was actually another YC company.
- CFCalvin French-Owen
Yeah, it's a YC company from-
- CFCalvin French-Owen
Yeah
- CFCalvin French-Owen
... Fall '24.
- CFCalvin French-Owen
Yeah, yeah, and he-
- CFCalvin French-Owen
He talks a lot about it, yeah.
- CFCalvin French-Owen
He has this concept of, like, the LLMs reaching the dumb zone- [chuckles]
- GTGarry Tan
[chuckles]
- CFCalvin French-Owen
- where it's like, after a certain amount of tokens, uh, it just starts, like, degrading in quality. And I actually think that's very true, especially if you think about, like, how the reinforcement learning might work. Like, imagine you're a college student. You're taking an exam. In the first five minutes of that exam, you're like, "Oh, I have all the time in the world! Like, I'll do a great job. I'll think through each of these problems." Let's say you have, like, five minutes left, and you still have half the exam left. You're like, "Oh, man, I just gotta do whatever I can." Like, that's the LLM with the context window, right?
- CFCalvin French-Owen
One of the tricks that, uh, I think founders use is you put, like, a canary at the beginning of the context, that something very esoteric that it would only know.
- CFCalvin French-Owen
Oh, yeah, yeah, that you-
- CFCalvin French-Owen
It's, like, something really funny. It's like, I don't know, "My name is Calvin, and blah, blah, blah. I drink tea at 8:00 AM," some random fact. And then, as you keep going, you ask it: "Do you remember what's my name? Do you remember when I drank tea?" And then, when it starts forgetting that, I think is a bit of a sign that it- the context-
- CFCalvin French-Owen
Mm-hmm
- CFCalvin French-Owen
... has poison. That's, like, one trick I've seen people do.
- CFCalvin French-Owen
Yeah.
- CFCalvin French-Owen
They do a random canary to check it.
- CFCalvin French-Owen
I've not tried this, but I fully believe it. [chuckles] Yeah.
- GTGarry Tan
That's interesting. I haven't run across any bugs before compaction, but maybe I'm not paying attention. But you're saying, like, that actually is actively something that it just starts doing weirder things-
- CFCalvin French-Owen
Yes
- GTGarry Tan
... that are not, like-
- 17:36 – 21:34
When can the agents work 24-48 hour running jobs on their own?
- CFCalvin French-Owen
runs.
- GTGarry Tan
I guess, like, there are- these very different architectures between Claude Code and Codex sound like they're actually deeper in that Codex is actually meant for much longer-running jobs. So, you know, that's sort of, like, off the bat, a different use case, and then the architecture is very different as a result. I guess right now it seems like CLIs, you know, 2026 might be the year of CLI. But then this other idea that AGI is here, and it's actually ASI is around the corner. The coding agents right now are really, really smart but not smart enough to run on their own for long periods of time. But a 10X increase in compute from here, are we there? Like, are we at 24 hours or 48-hour running jobs on Codex, and that architecture is correct for that world?
- CFCalvin French-Owen
Yeah, I think it's a good question.
- GTGarry Tan
Mm-hmm.
- CFCalvin French-Owen
It sort of goes back to, like, kind of the founding DNA of both companies. Like, I feel like Anthropic has always been very big on, like, building tools for humans, where it comes to, like, "Oh, here's the style of the tone, and, like, here's how it should fit with all of the rest of your work." And I, I think Claude Code is, like, a very natural extension of that. In a lot of ways, it, like, works like a human would, where it's like, "Oh, you need to build like, I don't know, a doghouse or something." It's like: "Oh, well, I'll go to the hardware store, and I'll build all these materials, and I'll, like, figure out how they all fit together." Whereas OpenAI really leans into this idea of just, like-... we are going to train the best model and reinforce over time and get it to do longer and longer horizon things, uh, in this pursuit of artificial general intelligence. And so it may not work like a human at all, like going back to the doghouse example, it's like, oh-
- GTGarry Tan
But AlphaGo didn't either.
- CFCalvin French-Owen
Yeah, but AlphaGo didn't either. It's like, oh, it's like, instead I will have a 3D printer that can print from scratch, like a doghouse, and it'll be exactly what you want, and it will take a long time, and it will be, like, very custom, and it will do, like, weird things, but it will work, you know? And, like, maybe in the limit, that's the right call. And so it's gonna be really interesting to see how they play out.
- GTGarry Tan
I mean, net-net, it seems like the latter is somewhat inevitable, but I like the former so much. [chuckles]
- CFCalvin French-Owen
Yes! Yeah, yeah.
- GTGarry Tan
I like- [chuckles] you know, like, even this idea that it grabs is, like, I thought about, you know, 10 years ago, it was like, yeah, I was in there, like, writing my own really weird regxes-
- CFCalvin French-Owen
Mm-hmm
- GTGarry Tan
... to try to figure out where everything was when I was refactoring or whatever- trying to understand code or whatever. So that's the feeling I get when I'm using it. It's like I can do five people's worth of work in, like, a single day. It's like rocket boosters.
- CFCalvin French-Owen
Yes.
- GTGarry Tan
It's unbelievable.
- CFCalvin French-Owen
Yeah. I, I think it's going to be really interesting to see how this plays out across large and small companies. I think everyone who's experimenting with this stuff on, like, a hobbyist level or at, like, a very small start-up, they're just pushing the coding agents as far as they can go.
- GTGarry Tan
Mm.
- CFCalvin French-Owen
'Cause it's like you don't really have time to figure out anything else. Like, as a start-up, you have limited runway, you're just going to, like, orient around speed. I think at a bigger company, you have a lot more to lose, and you have all these other internal processes around code review, and you probably already hired, like, a big eng team. And I think it's gonna be very strange as, like, these individual teams of, like, one person are like: "Hey, that team over there isn't doing the right thing. Like, let me just build a prototype that, like, works better." I think at some point, it's going to start working better, and I think that landscape shift is going to be a very interesting, strange thing.
- GTGarry Tan
My 10-year-old is, uh... You know, he, he has, uh, writing assignments every day-
- CFCalvin French-Owen
Mm-hmm
- GTGarry Tan
... and then yesterday was the first day where he used AI. [laughing]
- CFCalvin French-Owen
[laughing]
- JFJared Friedman
[laughing]
- GTGarry Tan
And then I was like: "This is not a turn of a phrase that a 10-year-old is capable of doing."
- CFCalvin French-Owen
Oh. [chuckles]
- JFJared Friedman
[chuckles]
- GTGarry Tan
And then I think about that in this context, because we, you know, are working with a lot of 18 to 22-year-olds who, you know, they've done internships, but, like, they haven't done, like, eng manager work, like-
- CFCalvin French-Owen
Mm
- GTGarry Tan
... you know, we're saying, um, you know, post-product market fit. Uh, once you're- have job queues of, like, millions of jobs and, like, you know, hundreds of thousands of errors-
- CFCalvin French-Owen
Like-
- GTGarry Tan
... that's, like, real eng management. [laughing]
- 21:34 – 26:27
Can the agent teach things like architecture?
- GTGarry Tan
How does the next generation understand that? Can the Claude Code bot actually teach people about, uh, architecture and things like that? Or, you know, are you just gonna bump your head into it, and users just kinda suffer and, you know-
- CFCalvin French-Owen
Yeah
- GTGarry Tan
... people have to figure it out? [chuckles]
- CFCalvin French-Owen
Like, at least where I find, find myself spending the most time when it comes to product, is figuring out the kind of product model, in a sense. Like, what are the things that the user has to understand today? Um, and what are the primitives that they can use to, like, do whatever they want? I always think of Slack like this. It's like Slack was, in some ways, not really a new concept. It's like there were many chats that existed before it, um, but the fact that they had, like, channels, messages, and reactions in a simple way that people could just, like, think about and be like: "Oh, I understand how to, like, navigate this," it made a lot of sense for people. But then, kind of once they were there, like, it's very hard to change that later on for a user, you know? It's like, oh, maybe they wanted to go in more of, like, a document-first way, or, like, maybe right now they're trying to incorporate agents. It's, like, difficult to change the user's mental model. And so I- at least for myself, building products, it's like you have to think about that very carefully from an early stage. 'Cause again, whatever you supply to the coding agents is that kind of kernel, is going to be what they run with and make more of forevermore.
- GTGarry Tan
YC's next batch is now taking applications. Got a start-up in you? Apply at ycombinator.com/apply. It's never too early, and filling out the app will level up your idea. Okay, back to the video.
- JFJared Friedman
Do you have thoughts, just 'cause you know the, the agents so well, like, what, what types of engineers are going to benefit more than others, um, from these tools pl- becoming popular?
- CFCalvin French-Owen
In general, I think that kind of the more senior, senior you are, the more you benefit.
- JFJared Friedman
Hmm.
- CFCalvin French-Owen
Um, because the agents are so good at taking some sort of idea and then putting it into action. If you're able to prompt that in a few words, it's kind of like: Oh, now suddenly I had this, like, idea. I, I find this so often in OpenAI, like, scrolling through the code base, it's like, oh, like, here's the thing that I wish were different. Here's the thing that I wish were different. Here's the thing that I wish were different. Like, just being able to kick those off and then have them come back, I think is super empowering and multiplies your impact. I think also being able to detect, like, which sorts of changes are good or bad architecturally is very important, or, like, have a sense for where you might want to flag something to an agent. I think engineers who are more organized, like [laughing] -
- JFJared Friedman
[laughing]
- CFCalvin French-Owen
... manager-ish, uh, and there's probably just a missing product to be built here. Uh, maybe, uh, something like Conductor-
- JFJared Friedman
Mm
- CFCalvin French-Owen
... uh, where it's, like, spread across all of your sessions and kind of reminding you, like: "Hey, you were working on this thing. It's done. It needs your input here. Oh, you should switch your attention over to this other thing." I think that is where the Conductor-
- GTGarry Tan
Oh, Conductor should add that.
- JFJared Friedman
Yeah.
- CFCalvin French-Owen
Yeah.
- JFJared Friedman
Yeah, like, uh, context management for agents, but, like, we also need context management for humans. [chuckles]
- CFCalvin French-Owen
Yes, 100%.
- GTGarry Tan
Yes. [chuckles]
- CFCalvin French-Owen
Yeah.
- JFJared Friedman
Yeah. [chuckles]
- CFCalvin French-Owen
I mean, I want- like, when I wake up every day, it kind of is like: "Hey, here's all the work that got done overnight. Like, here are the, like, three decisions that you need to make. Here are, like, areas of deep thinking that you were planning to do." Like, I want the turn-by-turn for my day, you know? Other things that make it very useful, like, if you're able to build, um, I don't know, some sort of, like, quick prototype for an idea to show it off, like, that's an area... I mean, obviously, the agents do super well at this. Um, I would find myself at OpenAI often writing kind of like prototype code or like: "Hey, I've got this, like, in-memory key value store. Can you now turn it into, like, uh, work with a production database or something like that?" Being able to concisely specify ideas and code. And then I think having a smell for-... what the right architecture is, is still the area where the models, like, don't do the best job.
- SPSpeaker
So if you were going back to your, like, college days and studying CS again fresh, and you, like, were picking your own, like, syllabus or curriculum, like, what would you, what would you study?
- CFCalvin French-Owen
Personally, I think still understanding systems, uh, is very important. Um, and just having some conception of, like, how, like, Git works, you know, or like HTP or databases, like queues, like all of these different systems. I think that those fundamentals are still quite important. The other thing that I'd probably do is just have a semester where, like, each week you're just building something. [laughing]
- SPSpeaker
[laughing]
- GTGarry Tan
[laughing]
- CFCalvin French-Owen
And you really try and push the models as far as they can go. There's a sense that you have whenever you're doing something, that you could always just, like, go up the layer and ask the model to do it, and, like, go up a layer and ask the model to do it. You know, where it's like, oh, I have, like, a implement command, where it, like, implements the next phase of the plan, but then I could have, like, an implement all command, and it, like, goes stage by stage and creates a new sub-agent. And then I could have, like, a check your work kind of thing and, like... And I think knowing where the models can and can't accomplish that is such a moving target-
- SPSpeaker
Yeah
- CFCalvin French-Owen
... that it's worthwhile just to, like, tinker a lot.
- GTGarry Tan
I mean, the other thing that's really, really crazy for... I mean, I would love to be able to teach 18 to 22-year-olds. Like, everyone around, like, at this table has, like, shipped stuff that people really, really want and love. So it's like, how do we teach people that?
- 26:27 – 29:58
Will the next generation have even better taste and multitasking ability?
- SPSpeaker
like, the best 18 to 22-year-olds, like, five years from now will just have, like, off the charts taste in everything, because they'll just be so much more prolific. [laughing]
- CFCalvin French-Owen
[laughing]
- SPSpeaker
They should be, right? Like, they should just be launching and-
- GTGarry Tan
Yeah
- SPSpeaker
... like, touching reality, like, 10 times as much-
- GTGarry Tan
Yeah
- SPSpeaker
... as, like, the generation before them.
- CFCalvin French-Owen
The one thing I have wondered about on that note, um, I don't know if you all found this, but growing up, my mom used to tell me, like, "Oh, like, stop multitasking. You're not paying attention to, like-
- SPSpeaker
[laughing] yeah
- CFCalvin French-Owen
... what I'm doing." Uh, and I think there is some truth to that. Like, often I would be, like, off on my computer, like, not paying attention. But I do think I was l- legitimately better at multitasking than our parents were. Uh, and now I look at this new generation, and I think they're actually quite a bit better at multitasking than we are, you know? 'Cause they've kind of grown up in this age of the internet, and they're dealing with, like, TikTok and all these, like, different short form video and things. Like, it seems like there's room for both kind of this, like, deep thinking, where you wanna, like, notice what you're seeing, and understand, and problem solve, but then there's also this mode of just, like, bounce between a bunch of different things and you're context switching constantly. I feel like-
- SPSpeaker
The ADHD mode.
- CFCalvin French-Owen
Yeah.
- GTGarry Tan
Yeah.
- CFCalvin French-Owen
The new generation is quite good at this. [laughing]
- SPSpeaker
[laughing] Yes.
- CFCalvin French-Owen
Yeah.
- SPSpeaker
I definitely think there's a, there's a type of smart person, maybe it's ADHD, but just, like, always has, like, a bunch of good projects on the go, but just never actually finishes anything. I might relate to this personality a little bit. [laughing] Um-
- GTGarry Tan
Hey, you released your, uh-
- SPSpeaker
[laughing]
- GTGarry Tan
- your Vibe Code projects.
- SPSpeaker
Yeah, but I wouldn't... Only because of Claude Code.
- GTGarry Tan
Mm-hmm.
- SPSpeaker
That's kind of my point. Like, now I just think, like, you kind of, like, there's certain types of brains that just have, like, like-
- GTGarry Tan
Mm-hmm
- SPSpeaker
... 10 branches going in their head, but you never have enough hours in the day to actually, like, see any of them through, so they're always, like, half complete. And now it's just like, Claude Code gets you over the line with everything, and it's just like... And you made this point in your blog post about how it feels like a video game, but it's just like, there's just a constant novelty factor. Like, you start working on something, and usually when you hit the point of, like, "Ah, I'm like bored, and then I've got, like, this other better idea, and I should, like, start on that and then come back to this," like-
- CFCalvin French-Owen
Uh-huh
- SPSpeaker
... you can't do that now, but, like, everything can actually get finished.
- GTGarry Tan
Let's live in the future for a moment. It's 40 years from now. [laughing] Software still exists, databases still exist, access control still exists, but, like, at the core of it, I mean, software is entirely personal. Access control and who gets to do it is like, you know, sort of like this manager mode thing that people still have meetings about. But then everything else about a company, its functions, its rules, like, is defined by people just doing things in their own Claude Code-like thing. I don't know, maybe it's a CLI, or it's like, you know, having giant armies of workers then, I don't know. What would that look like?
- CFCalvin French-Owen
Like, imagine if every time a company signed up for Segment, you fork the code base, you give them their own copy-
- GTGarry Tan
Yeah. [laughing]
- 29:58 – 31:36
Maker vs manager schedules
- SPSpeaker
Something I'm curious to see is kind of like, what the update version of the PG maker, uh, maker schedule versus manager schedule-
- CFCalvin French-Owen
Yeah
- SPSpeaker
... would look like.
- GTGarry Tan
Mm.
- SPSpeaker
'Cause I feel like part of what's going on at YC is sort of a lot of our jobs are essentially manager schedule-
- CFCalvin French-Owen
Mm-hmm
- SPSpeaker
... which has just really made it hard to do any sort of building your own software.
- CFCalvin French-Owen
Yeah.
- SPSpeaker
But now you totally can, and that's why, like, a bunch of the partners-
- GTGarry Tan
Yeah, you just do it in the meeting.
- CFCalvin French-Owen
Yeah. [laughing]
- GTGarry Tan
Like I... Like, right at the beginning of this podcast. [laughing]
- CFCalvin French-Owen
[laughing]
- GTGarry Tan
You let it run and then come back. [laughing]
- SPSpeaker
Well, like, in the pockets, right? Like, in, like... It just used to be, like, literally, unless you had, like, you know, four hours minimum block free to do something, it just wasn't worth even getting started, right?
- CFCalvin French-Owen
Mm-hmm.
- CFCalvin French-Owen
And I, I think that's actually goes very deep to how we've changed programming. Like, it used to be that in order to write any code, you had to fill your own context window with so much data about all the different class names, and the functions, and the code that it touches. It would take hours to build up that context window, and so doing it in 10-minute snatches was just, like, so frustrating.
- GTGarry Tan
I do think, think maybe one, one primitive for this future world will be, I think, still the data models need to be still be consistent-
- SPSpeaker
Mm-hmm
- GTGarry Tan
... and the system of record.
- CFCalvin French-Owen
Mm.
- GTGarry Tan
Ne- there's, there's opportunity for something that's kind of agentic first, because right now we're still kind of inte- integrated very much with databases and SQL or NoSQL queries at a very low level. But imagine something that generates all the data that you need for all the different views for custom software. So a lot of the-... world would be custom views, but I think the unified stuff, we still need to have data to be correct.
- CFCalvin French-Owen
I think data has a lot of gravity, and I, I think you see this with companies who are, like, offering access via API or MCP. Like, I think Slack, uh, locked down their API a little bit 'cause they didn't want people just exfiltrating everything from Slack and then building agentic experiences on top of it.
- 31:36 – 35:52
How would Calvin build Segment now?
- GTGarry Tan
I wonder, with that note, if you were to rebuild Segment in the current-
- CFCalvin French-Owen
Oh, yeah. [chuckles]
- GTGarry Tan
-with the current tools-
- CFCalvin French-Owen
Yeah
- GTGarry Tan
... how would it look like?
- CFCalvin French-Owen
I mean, Segment is a, a funny business in that, uh, where we started was building these integrations, right? Um, and so it's like, oh, you need to wire up, like, the same data going to, like, Mixpanel and Kissmetrics and Google Analytics, et cetera. And I think just writing that code now, like, that used to be maybe a more annoying or harder thing to do, and so it was worth paying for. Now it, like, that value has dropped to zero.
- GTGarry Tan
One shot. [chuckles]
- CFCalvin French-Owen
Yeah, and actually, like, in many cases, you're better off, like, saying, "Oh, I actually want to map it this way, and I want this specific behavior." Like, I will just tell the Claude or Codex what to do, and then it will do it, and I'll have exactly the behavior that I want. So I think that aspect of Segment, like, the value has dropped precipitously. I think the aspect of, like, keeping this data pipeline running and, like, continuing to automate a bunch of parts of your business, or, like, schedule these, like, email deliveries which should go out through Customer.io every time a customer signs up, or, like, manage audiences for you, like, that value is kind of still there, and I think you could do a lot more interesting things, where it's like, hey, if I have all this data and, like, a full view of the customer, like, how should I be emailing them? Should I change, like, parts of the product when they log in? Should I be giving them different onboardings, depending on who they are? Like, there's a lot more interesting stuff that you could do by basically running, like, I don't know, small LLM agents over them and changing that. That would be the changes I would make.
- GTGarry Tan
So it's kind of like moving up the stack, to your comment earlier, and all the way turtles down. The low-level stuff is gone-
- CFCalvin French-Owen
Yeah
- GTGarry Tan
... it's now really more doing things at the campaign level, which is way more abstract.
- CFCalvin French-Owen
Yes.
- GTGarry Tan
I mean, I'm amazed at to what degree, like, Claude Code, even just from, like, the context of what I'm working on, figures out, like, what my motivations are.
- CFCalvin French-Owen
Yeah. I, I, I'm still blown away by coding agents, because effectively, what you're doing is you're, like, giving them a copy of a repo, and then you're slipping a little note under the door and being like, "Hey, go implement this thing." [laughing] They have, like, no knowledge of, like, what your company is or, like, what you do, who your customers are. In most cases, maybe it's in the training set 'cause they know you're Garry, [laughing] you know?
- GTGarry Tan
[chuckles]
- CFCalvin French-Owen
But it blows my mind that it works at all. And, and that's where I think the context is really important, right? Because if it latches onto something that isn't quite right, it doesn't have a lot to go on, and if it misses something that's essential, it's going to just re-implement it.
- GTGarry Tan
What do you think the constraints are right now? I mean, like, context window is still a constraint, but it's, like, so big that, you know, it's like we can do some stuff. Like, we can't do the mega re-architectures-
- CFCalvin French-Owen
Yeah
- GTGarry Tan
... but we can do a lot. And then if the Opus 4.5 somehow got a lot smarter, uh, and then that unlocked a big thing, which was interesting. I don't... have no idea if that was, like, pre-training or post-training. Like, what- are there other, like, levers that you think of other than, you know, basic model intelligence, like frontier model intelligence and context window?
- CFCalvin French-Owen
I mean, I still think context window is, like, pr- probably the number one limit. Like, if you look at Claude Code executing, it's delegating to all these different context windows. At the end of the day, when each one comes back, it's, like, getting some sort of summary, so it's also not getting the full picture. Like, if you have a problem that's just, like, too big to fit in a single one, you, like, kind of no amount of compaction is going to help you. I would point to that as, like, both Anthropic has figured something quite useful out with delegating to these sub-context windows, but also, I think it's still a block barrier.
- GTGarry Tan
So we'd do better if we had a million co- a million-token con-
- CFCalvin French-Owen
I think so
- GTGarry Tan
... context every single time.
- CFCalvin French-Owen
Yeah, I think so, and, like, figure out a better way to especially train these, like, very long context trajectories. 'Cause if you think about it, like, there's, there's a lot of training data on the internet for, like, what is the next sentence that comes or, like, what's the next paragraph that comes? If you have 80,000 tokens that are generated, like, understanding [chuckles] what the next thing to do based upon, like, "Oh, I should refer to the 20,000 token," like, that's trickier. I think this, like, integration and orchestration is starting to become the limiting factor. I mean, I, I think there are, like, stuff on code review related to this. It's like, oh, if we're, like, merging all this code, like, who's watching it? Does a human still have to watch it? Like, how do we verify the changes?
- GTGarry Tan
Mm.
- CFCalvin French-Owen
And then I think, like, pulling in the context correctly from your tools, like, you were talking about Sentry. Like, you want Sentry to auto be able to, like, figure out a PR, you know? And then, like, maybe it pushes it to a subset of your traffic, and if it looks good, then it rolls out everywhere, you know? Like, all of that automation still has to be built.
- 35:52 – 38:52
The importance of testing
- GTGarry Tan
I was surprised how important testing was.
- CFCalvin French-Owen
Yeah.
- GTGarry Tan
Like, I was operating for, like, the first two or three days of my nine days in the wilderness, [chuckles] like, uh, no tests or very few tests. And then one day I was like, "All right, today's refactor day. I'm gonna do- get to 100% test coverage," and then I just sped up like crazy.
- CFCalvin French-Owen
Mm-hmm.
- GTGarry Tan
'Cause it was like, "Oh, it did it. It works." I rarely even have to necessarily manually test, 'cause it's like the test coverage is so good, and, like, nothing breaks.
- GTGarry Tan
Which is very similar to what all the companies are doing just for prompt engineering, outside of coding, is very much test-driven development. I think we had this episode with Jake Heller, and that was a big paradigm shift. It's like the way you get a good prompt is all test-driven, just like evals, right? In a sense, the test cases are your evals.
- GTGarry Tan
There are some broken flows now. I think that you m- we might need a, uh, Claude Code that could, like, talk to a Stack Overflow that was like a Claude Code Stack Overflow.
- CFCalvin French-Owen
Ah!
- GTGarry Tan
Ah. [chuckles]
- CFCalvin French-Owen
Yeah.
- GTGarry Tan
Like, I had this problem, it was so crazy. Like, I'd, uh, instead of using in the, in, like, the priority of a job queue, I used... Or actually, I didn't even write- again, I did not write this.
- CFCalvin French-Owen
Yeah, yeah. [chuckles]
- GTGarry Tan
The machine wrote a string with a comma, thinking that it would take that syntax, but it was expecting, like, an array in JSON, and then it just, like, no jobs would run. And then I watched it for, like-... 30 minutes, walk through the internals of Rails job- [laughing]
- CFCalvin French-Owen
[laughing]
- GTGarry Tan
-like the active job, like a couple thousand lines of code, like trying to debug what was happening, and it found the bug, [chuckles] actually. And I was like: That's amazing! I just think about what I would do, like, ten years ago, and I would've been like: "Hey, why are the, you know, jobs not working?" And then I would find a Stack Overflow or blog po- a Rails blog post, and it's like, oh yeah, like, nobody fixed that stupid bug where, you know, you think that you can put a, you know, comma delimited string in there, but actually you have to make sure it's an array.
- CFCalvin French-Owen
Uh-huh. Uh-huh.
- GTGarry Tan
I was like, "Oh, my God!" Like, that was re- very funny, actually. I think that's, like, one of the hardest parts about thinking about what's going to happen here, 'cause, uh, there's, like, things that you would do as a human in a CLI right now, and, like, that's very obvious. But even that idea of, like, should the agents have their own Stack Overflow? Like, if you just increase the intelligence by, you know, I don't know what you even call it, like, by 10 IQ points-
- CFCalvin French-Owen
Mm.
- GTGarry Tan
-like, 10 virtual IQ points, like, would it even do that? It would just be like, "Oh yeah, that's a string, whatever."
- CFCalvin French-Owen
Yeah, yeah. I think there's something very interesting here around, like, agent memory. Um, and Claude Code has sort of set itself up, and I think Codex too, by storing all your conversation history just as files. So you could imagine, you, like, give it access to a tool that then can read previous conversation history. I think there's a missing piece around a lot of collaboration there. Like, it'd be amazing if, like, there was some way of smartly sharing your coworkers' prompts, and you could see and be like, "Oh, like, I hit this thing, but actually, like, Brian over there, like, fixed it earlier," you know? "So, like, the two of us can share knowledge." I, I think there's something- you're something onto this, of like, a model-generated, like, wiki, you know, or, like, Grokopedia style.
- SPSpeaker
Now I can't stop
- 38:52 – 40:10
The Claude bots are talking (to each other!)
- SPSpeaker
thinking about... Have you seen, um, the Claude bot social net-
- CFCalvin French-Owen
[laughing]
- SPSpeaker
-like, the network for Claude bots to talk to each other? [laughing] And it's like-
- GTGarry Tan
Oh, what's that like?
- CFCalvin French-Owen
Yeah. Oh, yeah.
- SPSpeaker
Yeah.
- CFCalvin French-Owen
That's the evolution for Molten Bot.
- SPSpeaker
Yeah, but I guess for those who don't know, Claude bot's essentially like, um, uh, like your own personal AI agent that you can run on your own machine. You can download it. Um, do not give it access to emails, would be my number one, [chuckles] piece of advice, or probably anything, um, 'cause it's not clear how safe it is, and it's probably almost certainly getting... Probably a lot of people are being prompt injected by it right now. But somebody created, um, sort of like a webse- I haven't actually seen it, but I've just, like, seen it on Twitter.
- GTGarry Tan
Yeah.
- SPSpeaker
But, like, a site where, like, everyone can sort of spin up their own, like, Claude bot, their personal agent, and then the agents can talk to each other. And now there's just, like, all this AI-generated content of these, like, personal AI agents talking to each other.
- CFCalvin French-Owen
Yeah. I mean, it looks like Reddit- [chuckles]
- SPSpeaker
[chuckles]
- CFCalvin French-Owen
-but if Reddit were run by agents. I mean, it's interesting to see, like, Codex's personality shine through when writing code, I would say. Uh, it does most stuff that humans don't do, kinda in this AlphaGo sense, where it's like, oh, it'll write a Python script to, like, modify some part of the file system. I think that is, like, very interesting and kind of alien behavior, which has been- A programmer would not do that. [chuckles]
- GTGarry Tan
-taught and learned.
- CFCalvin French-Owen
Yeah. [chuckles] Um, but it does give these, like, superhuman results, for me at least, when debugging complex issues that I find Opus often misses.
- GTGarry Tan
What's an
- 40:10 – 43:00
Examples of complex issues, how will the tools evolve
- GTGarry Tan
example of a complex issue that you could talk about? I mean-
- CFCalvin French-Owen
Um-
- GTGarry Tan
-it's like concurrency or naming issues, right?
- CFCalvin French-Owen
[laughing]
- GTGarry Tan
Yeah.
- CFCalvin French-Owen
I find the models are actually, like, decent at concurrency. Oftentimes, there's stuff where it's like, oh, there's a request that is, like, traversing several different services. Uh, I mean, kinda to your point about the, uh, serialization and deserialization of, like, stuff with commas in it.
- GTGarry Tan
Mm-hmm, mm-hmm.
- CFCalvin French-Owen
Um, it's like, oh, it needs to track some sort of complex behavior around those or, like, way of, uh, I don't know, refreshing complex UI state, and Opus often will miss it if there's many files, but Codex seems to catch it.
- GTGarry Tan
Interesting.
- CFCalvin French-Owen
Yeah.
- GTGarry Tan
Yeah, prognostication about how will tools continue to evolve? It's a very interesting... Like, I feel like sort of a new citizen in this land-
- CFCalvin French-Owen
Yeah, [chuckles] yeah
- GTGarry Tan
... in a way.
- CFCalvin French-Owen
Yeah.
- GTGarry Tan
Like I just, you know, knew it was happening. I'd, you know, manager schedule. Finally, a project appeared, and it was like, "Oh, I'm gonna go all in on this."
- CFCalvin French-Owen
Uh-huh.
- GTGarry Tan
And then now I'm, like, in... It's like, uh, I'm in a stranger in a strange land.
- CFCalvin French-Owen
Uh. [chuckles]
- GTGarry Tan
But it, but it, like, resembles exactly what I remember.
- CFCalvin French-Owen
I think we all feel-
- GTGarry Tan
It's just more awesome.
- CFCalvin French-Owen
We all feel that way.
- GTGarry Tan
Yeah.
- CFCalvin French-Owen
Like, I think, I think the most important thing is just to keep tinkering, because it all changes every few months. I do feel like the best- or the people who will get the most out of coding agents in the future are going to be kind of, like, more manager-like, where they're focusing on directing flows in certain ways. They're probably gonna be a little bit more like designer artists, in some ways, where it's like they're figuring out what specifically goes in the product and what stuff you can do without. And I think they'll be very good at just, like, continuing to think about automation and where they're missing context.
- GTGarry Tan
I guess what's funny is I tried to use Codex just now-
- CFCalvin French-Owen
Uh-huh
- GTGarry Tan
... uh, for my Rails project.
- CFCalvin French-Owen
Yeah.
- GTGarry Tan
But the thing is, like, it's kind of obvious that nobody at OpenAI cares about Rails, which is fine.
- CFCalvin French-Owen
[laughing]
- 43:00 – 45:59
Outro
- CFCalvin French-Owen
terms of the data mix and the researchers who are working on it. I think Anthropic is focused a little bit more on, like, some of the front-end things. Um, and I don't know, in terms of like, a Ruby, for example, like, who has the best model there, and who's incorporated the data mix?
- GTGarry Tan
Yeah.
- CFCalvin French-Owen
Like, some of the labs tend to take this perspective of just more data is better. Uh, and so they'll just-... flood as much data as possible, while others, I think, are a little bit more tuned in terms of the mix. And I think depending on which approach you take there, it can give very different results, where it's like, oh, I'm taking just the, like, top 10% of JavaScript is pretty different than if you're looking across everything.
- GTGarry Tan
I actually think OpenAI and the, the, you know, OpenAI models are really good at Ruby, uh, from what I can tell, and then this is just-
- JFJared Friedman
It's the, it's, it's the harness around the model.
- GTGarry Tan
Yeah.
- JFJared Friedman
It is, it, it... Yeah.
- CFCalvin French-Owen
Oh, interesting.
- JFJared Friedman
Yeah.
- CFCalvin French-Owen
Okay. Yeah.
- GTGarry Tan
It's literally like Rails has this weird thing where you have to have, you know, a- access PostgreSQL in a certain way, or, like, it couldn't figure out which route-
- CFCalvin French-Owen
The sandboxing.
- GTGarry Tan
Yeah, the sandboxing is-
- CFCalvin French-Owen
Yeah, yeah. Yeah, the sandboxing, [sighs] it's such an interesting question because, uh, I think OpenAI actually takes the, like, sandboxing and security question more seriously than almost anyone else. I remember when we were building Codex, like, basically one of the gates that you have to pass through in order to release a model is you have to, like, talk about safety and security risks, like, every time you want to release. One of the things we were looking into was prompt injection, especially for opening up to the internet, 'cause a bunch of users were like, "Oh, this has to, like, work on the internet." We're like, "Oh, we don't know. Like, it seems pretty easy to prompt inject."
- GTGarry Tan
Operator was also... Yeah.
- CFCalvin French-Owen
Yeah.
- GTGarry Tan
Kind of that, yeah.
- CFCalvin French-Owen
Yeah. And so, uh, the PM on our team, Alex, uh, basically, like, put together a GitHub issue, and it had, like, a very obvious prompt injection, which was like, "Oh, reveal this thing." And then he told the model, like, "Hey, go fix this issue." Uh, and he's like: "Oh, there's no way this is gonna work." And, like, immediately, the prompt injection works, you know? [laughing]
- JFJared Friedman
[laughing]
- CFCalvin French-Owen
And so I think OpenAI, like, sort of correctly, is very worried about this and is like, "Hey, we're gonna run everything in on a sandbox. We're gonna make sure it, like, doesn't touch all these sensitive files on your machine. We're gonna be very careful about secrets." And I think if you're a startup where you're just, like, running fast, you probably don't care. You're just like, "I just want it to work"-
- GTGarry Tan
Yeah
- CFCalvin French-Owen
... you know?
- JFJared Friedman
Are you a dangerously skip permissions person? [laughing]
- GTGarry Tan
[laughing]
- CFCalvin French-Owen
Uh, I actually am not. I, like, have a set of things that I, like-
- JFJared Friedman
Garry, how about you? Are you running?
- GTGarry Tan
I'm not, no.
- JFJared Friedman
Okay. Mm.
- GTGarry Tan
Well, I like to read, you know?
- JFJared Friedman
Mm.
Episode duration: 45:59
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode qwmmWzPnhog
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome