No Priors Ep. 47 | With Sourcegraph CTO Beyang Liu

Coding in collaboration with AI can reduce human toil in the software development process and lead to more accurate and less tedious work for coding teams. This week on No Priors, Sarah talked with Beyang Liu, the cofounder and CTO of Sourcegraph, which builds tools that help developers innovate faster. Their most recent launch was an AI coding assistant called Cody. Beyang has spent his entire career thinking about how humans can work in conjunction with AI to write better code. Sarah and Beyang talk about how Sourcegraph is thinking about augmenting the coding process in a way that ensures accuracy and efficiency starting with robust and high-quality context. They also think about what the future of software development could look like in a world where AI can generate high-quality code on its own and where that leaves humans in the coding process. 0:00 Beyang Liu’s experience 0:52 Sourcegraph premise 2:20 AI and finding flow 4:18 Developing LLMs in code 6:46 Cody explanation 7:56 Unlocking AI code generation 11:00 search architecture in LLMs 16:02 Quality-assurance in data set 18:03 Future of Cody 22:48 Constraints in AI code generation 30:28 Lessons from Beyang’s research days 33:17 Benefits of small models 35:49 Future of software development 42:14 What skills will be valued down the line

Sarah GuohostBeyang Liuguest

Jan 18, 202446mWatch on YouTube ↗

EVERY SPOKEN WORD

90 min read · 18,162 words

0:00 – 0:52
Beyang Liu’s experience
1. SGSarah Guo
  (instrumental music plays) Hi, listeners, and welcome to another episode of No Priors. This week, we're talking to Beyang Liu, the co-founder and CTO of Sourcegraph, which builds tools that helps developers innovate faster. Their most recent launch was an AI coding assistant called Cody. We're excited to have Beyang on to talk about how AI changes software development. Welcome.
2. BLBeyang Liu
  Cool. Thanks, Sarah. It's great to be on. Thanks for having me.
3. SGSarah Guo
  Yeah. So you guys founded Sourcegraph all the way back in 2013, right? I feel like I met you and Quinn at GopherCon either that year or the year after. Do you remember?
4. BLBeyang Liu
  Uh, yeah, I think that's right. We met at one of those, like, after conference, uh, events. And I remember you asked me a bunch of questions about developer productivity and, and code search and what we were doing back then.
5. SGSarah Guo
  Many listeners to the podcast are technical, but can you describe the core thesis of the company?
0:52 – 2:20
Sourcegraph premise
1. SGSarah Guo
2. BLBeyang Liu
  Quinn and I are both developers by background. We felt that there was kind of, like, this gap between the promise of programming, being in flow and getting stuff done and creating something, uh, new that everyone experiences. It's probably the reason that many of us got into programming in the first place, the joy of creation. Then you compare that with, uh, the day-to-day of most professional software engineers, which is, uh, a lot of toil and a lot of drudgery. When we kind of drilled into that, you know, why is that, I think we both realized that we're spending a lot of our time in the process of reading and understanding the existing code, uh, rather than, uh, building new features, 'cause all that is a prerequisite for being able to build, uh, quickly and efficiently. And that was a pain point that we saw again and again, both with the people that we collaborated with, uh, inside, uh, the company we were working at at the time, Palantir, as well as a lot of the enterprise customers that Palantir was working with. So we were kind of dropshipping into large banks and Fortune 500 companies and building software, kind of embedded with their software teams. And if anything, the, the pain points they had around understanding legacy code, uh, and figuring out the context, uh, of the code base so they could work, uh, effectively was, you know, 10X, 100X of, of the challenges that we were experiencing. So it was partially, you know, scratching our own itch and partially like, hey, like, the pain we feel is reflected across all these different industries trying to build software.
3. SGSarah Guo
  Yeah, and we're gonna come back to context and how important it is for, um-
4. BLBeyang Liu
  (laughs)
5. SGSarah Guo
  ... using this generation of AI. But I want to go,
2:20 – 4:18
AI and finding flow
1. SGSarah Guo
  actually go back to, like, some roots you have in, in thinking about AI and your interning at, um, the Stanford AI Research Lab way back when.
2. BLBeyang Liu
  Yeah.
3. SGSarah Guo
  Uh, like, that wasn't the starting point for Sourcegraph. It was more like, "Oh, we need, like, super grep," right? Like, we just need a, a version of search that works in real environments and is, is useful for getting to flow.
4. BLBeyang Liu
  Yeah.
5. SGSarah Guo
  When, when in the story of Sourcegraph did you start thinking about how advancements in AI could, could change the product?
6. BLBeyang Liu
  My first love in terms of computer science was, was actually AI and, and machine learning. That's what I, uh, concentrated in, uh, when I was a student at Stanford. Uh, I worked in the Stanford AI Lab, uh, with Daphne Koller. She was my advisor. Uh, mostly doing computer vision stuff in those days. And, uh, it was very different in those days. We're now living through the neural net revolution. You know, we're well into it. Uh, it's just, like, neural nets, uh, everywhere. And in those days, it's still kind of like the dark ages of neural nets, where it was after the first initial, uh, successes they had in, like, the late '80s and '90s, doing OCR with them. Um, but then after that, the use cases sort of petered out. And by the time that I was doing it, uh, the conventional wisdom, the thing that they told us in, you know, machine learning 101 was like, "You know, neural nets were this thing that we tried, you know, a decade or so ago, but it didn't really pan out. So these days, we're, we're mostly focused on, uh, graphical models and statistical, uh, learning techniques. You know, really trying to be explicit about modeling the probability distribution of what we're trying to represent."
7. SGSarah Guo
  We actually had Daphne and one of her other former students, Lukas Biewald from now Weights & Biases, on the podcast as well. And both of them were also, like, lamenting the dark ages, when, like, neural-
8. BLBeyang Liu
  (laughs)
9. SGSarah Guo
  ... nets were, like, this weird niche thing, and, "We're gonna work on graphical models instead." Um, but it's very cool to see so many people who have, like, a, you know, interest and technical passion in this, like, emerge the other end and be like, "Aha." Like, "Now is the time." So, um, at what point were you like, "Okay, like, I'm, I'm gonna look at this and, uh, we're gonna try to
4:18 – 6:46
Developing LLMs in code
1. SGSarah Guo
  work on it at Sourcegraph"?
2. BLBeyang Liu
  Yeah, it's great. It, it really feels like a homecoming of sorts. And I think it... We're, we're very fortunate that a lot of the underlying skillsets, I think, do transfer pretty well. I mean, it's, it's all linear algebra and, and matrix operations, uh, underneath the hood, and that stuff is still applicable. And a lot of the intuitions, like the value of sparsity and, and things like that, uh, still are, are kind of applicable. I'm still waiting for the statistical learning and, and maybe some of the convex optimization stuff to, to reemerge. I wouldn't count it entirely out yet. I feel like the pendulum always swings back the other way. It's, um, it swung away from statistical learning, uh, and convex optimization and, and models now. Um, but I think they'll reemerge, especially as we try to get deeper into interpreting, uh, how and why, uh, neural nets and, uh, attention is as good as it is. But to answer your question, you know, when did we start thinking about this at, uh, Sourcegraph? I, I want to say it was, like, circa 2017, 2018 that we started to kind of, like, revisit some of this, because, um, and I think 2017 was when the attention paper came out and you started to see more applications of LMs in the, uh, space of code. I think TabNine was one of the earliest-
3. SGSarah Guo
  Yep.
4. BLBeyang Liu
  ... uh, to market there with the LM-based, uh, autocomplete. I remember chatting with someone who had essentially implemented that on top of GPT-2, uh, at the time. And it wasn't nearly as good as it is, uh, now, even then, you know, like, two or three years ago. But we ran some early experiments, uh, applying LMs, specifically, uh, embeddings, uh, to code search, and that yielded some interesting results. Again, the quality wasn't at the point where we were ready to productionize it yet, but it was certainly, like, enough to keep us going. I think things really picked up September or October of last year. It was a confluence of factors. I think, uh, one, our internal efforts just kind of, uh, reached a level of maturity where we started being more serious, uh, devoting more time to it. Second thing is I went on, uh, paternity leave, so I was able to step away from kind of, like, the day-to-day stuff a little bit, and that gave some time and room, uh, for, for kind of experimentation. And then, of course, at the end of November, uh, ChatGPT landed, and that just, uh, changed the game for everyone, and there was a ton of interest and, and excitement that really, uh, gave us, uh, a big kick to, to start exploring in depth, uh, the, the efforts that we already h- had underway.
5. SGSarah Guo
  Awesome. And so explain what Cody is today.
6. BLBeyang Liu
  Cody is an AI coding
6:46 – 7:56
Cody explanation
1. BLBeyang Liu
  assistant. It integrates into your editor, uh, whether using VS Code or JetBrains. We also have experimental support for Neovim, and as an Emacs user, uh, uh, Emacs is, is on the way. We've also integrated it into our web application. So if you go to sourcegraph.com and go to a repository page, uh, there's an Ask Cody button that allows you to ask, uh, high level questions about that code base. And in terms of feature set, it supports a lot of the features, uh, that other coding assistants support. Um, inline completions, high level Q&A, uh, informed by the context of your code base, kind of specific commands like generate unit test or, uh, fix this compiler error, uh, that are kinda like inline actions in the editor. And our main point of differentiation is across that feature service area, we augment the, uh, large language model that we're using underneath the hood with all the context that we can pull in through Sourcegraph and through techniques that we have refined over the past decade, building, you know, a really awesome code understanding tool for human developers.
2. SGSarah Guo
  Okay. So you have said, um, and I think it is like a, a more interesting point of view now that there is an
7:56 – 11:00
Unlocking AI code generation
1. SGSarah Guo
  argument that choosing and structuring like large repo context is the key unlock for code generation and like AI code functionally. Can you explain how you guys approach it?
2. BLBeyang Liu
  Yeah. So in, in many ways, the context problem... So, you know, uh, context, another w- another word for it is retrieval augmented generation. The, the basic idea... I mean, listeners of your pod are probably familiar with this, but just, just for the ones that are, you know, tuning in and-
3. SGSarah Guo
  Yeah.
4. BLBeyang Liu
  ...and unfamiliar, uh, the idea is that large language models get a lot smarter, uh, when they're augmented with some sort of context fetching ability, uh, the most common of which is typically like a search engine. So there's a number of examples out there of, of, uh, doing this. Bing Chat is one example. Perplexity is another example. They're building Google competitors where they integrate the large language model with, uh, a web search functionality. And fetching search results into the context window of the language model helps basically anchor it to, uh, specific facts and, and knowledge that helps it hallucinate less and generate more accurate responses. We essentially do the same thing, uh, for code using a combination of code search and also something we call graph context to pull in relevant code snippets and pieces of documentation into the context window, uh, in a way that improves code generation and high level Q&A a-a-about the code. And so on the code search end, we're, we're essentially incorporating the technologies that we've built, uh, to help human developers over the past decade. So if you look at the core feature set of Sourcegraph, the bread and butter really is, uh, you have code search, which allows you to go from, you know, I'm thinking of a function or I'm thinking of an error message and quickly pinpointing the needle in the haystack, uh, in a giant, giant universe of code. And then from there, it's sort of this walking the reference graph of code. So go to definition, find references, uh, in a way that doesn't require you to, you know, set up a development environment or, you know, tangle with any build systems. It just all kinda works. Um, so the analogy there is like we wanna make exploring and searching code as easy as it is to explore and search the web. That's a huge unlock for humans being able to take advantage of the institutional knowledge embedded in that data source. And it turns out those same actions, the code search and then the walking the reference graph, uh, turns out to be really useful for surfacing relevant pieces of context that you can then place into a language model's context window that makes it much better at generating code that fits within the context of your code base and also answering questions accurately without making as much stuff up.
5. SGSarah Guo
  Actually, I'm very interested. Do you do both, let's say, other traditional information retrieval approaches like ranking along with AST transversal?
6. BLBeyang Liu
  Yeah.
7. SGSarah Guo
  Or, like, is there, is there information missing from the graph context that's also useful either for your humans using search or for the models using search?
8. BLBeyang Liu
  Yeah, there's a ton of data sources. Let, let's start with the search side, which the search problem is really like, "Hey, the user asked a question. Now find me all the, the pieces of, of code or pieces of documentation that could be relevant to answering that question." We really view that as a generalized search problem. Um,
11:00 – 16:02
search architecture in LLMs
1. BLBeyang Liu
  it has a lot of, like, parallels to end user search, uh, with the difference being, you know, for human search, it's really important we get the, the, quote-unquote, right result in the top three. Otherwise, people would ignore it. Whereas with, uh, language models, you actually have a little bit more flexibility because, you know, you have a context window of these days at least, you know, 2,000 tokens. Some cases, much longer, right? And then in terms of how you do that fetching, um, the, the overall architecture is very similar to how you would design a search engine. So you have a, a two-layered architecture. At the, the bottom layer are your kinda like underlying retrievers. Um, so the base case here would be just keyword search. Um, or, you know, the fancy way of saying that nowadays is, uh, sparse vector, uh, search. Uh, if you use the kinda like one hot encoding where ones correspond to the presence of certain dictionary words. Um, anyways, th- that's just keyword search. It actually works reasonably well. I think if you lo- talk to a lot of, uh, RAG practitioners, uh, you'll find that the kind of, like, dirty secret is that keyword search can probably get you, uh, more than 90% of the way there. Let's talk about embeddings in a little bit. But on, on keyword search alone, there's, there's a lot that we do. Um, it's a combination of, uh, classic keyword search, combining that with, uh, things that work well for code, like regular expressions and string literals. Um, also really important is how you index the data. Uh, so what you're treating as, quote-unquote, like, the document in your keyword search backend. Um, we found that it, it's, it's, it's absolutely essential if you're searching over code to, uh, parse things. Um, and so you can extract ex- specific functions and methods and classes, uh, along with the corresponding docstring and treat those as separate entities in your system rather than indexing at the file level or trying to do some more naive chunking. So there's the keyword search. We also have a embeddings-based search or dense vector, uh, search.... where you basically run those same documents, those functions and symbols and, and code, uh, through an LM, uh, take out the, uh, kind of internal representation, the embeddings vector, and then do a nearest neighbor search against that. There's a, uh, a couple other techniques they can use to, to surface, you know, relevant context too, like matching filenames and things like that. Anyways, you have this basket of underlying kind of retrievers, and the goal of the retrievers is just to get, uh, preserve 100% recall. So make sure you don't miss anything, but also get the candidate result set down to a size where you can use a fancier method to bump the, the, the really relevant stuff, uh, up in the context window, and that's where the second layer of the architecture comes into play, and the second layer is the re-ranking layer. Again, if you're implementing a search engine, this is how you do it, right? Like, you have... After your, your layer one's proposed, like, all the candidates up, you have a fancier, you know, re-ranking layer that would be too slow to invoke across the entire document corpus. But once you've kind of scoped it down to a smaller set, you can take the re-ranker and, uh, the, the, the purpose of the re-ranker is really to bump, uh, the right result or the most relevant results up to the top, so, um, optimizing for precision, uh, o- over recall. So that's kind of like the general architecture of, of the, the search backend that powers Cody.
2. SGSarah Guo
  Awesome. Yeah, I, I think one of the things that, um, I believe, and, uh, we believe at conviction is that people are gonna build pipelines that look like search pipelines attached to a large language model in many more domains.
3. BLBeyang Liu
  Yeah.
4. SGSarah Guo
  And, like, you should treat that entire... Like, you guys are building a very sophisticated version here, having worked on search for a while, but, uh, that parts beyond the language model itself are quite important. For example, like the embeddings model and your chunking strategy, and they're actually pretty data specific.
5. BLBeyang Liu
  Yep.
6. SGSarah Guo
  Right? We were just talking about this and I think people are gonna end up with, um, domain-specific and even fine-tuned embeddings models from companies like Voyage or in-house because there's a... I think there's a lot of headroom on performance there.
7. BLBeyang Liu
  Yep, absolutely. I think the Voyage folks are doing really interesting stuff, uh, working on an embeddings model for code. We're kind of collaborating with them, uh, at the moment. Uh, they're a really smart set of folks. And I think you're absolutely right. Um, there's so many, uh, components, uh, in these AI systems that are outside of the, quote unquote, main language model that are really important, and we found that the, the most important things are... Uh, I mean, really what this comes down to is like a, uh, a data quality and data processing pipeline, which has been something that people have realized for a long time, right? Like, uh, your model architecture can only go so far if your data is, is garbage. Uh, so you really need a high-quality data pipeline, and that means not only having, you know, uh, in our domain, high-quality code that can serve as the underlying data to use, but also a way to structure that data, uh, in a way where you can, uh, maximize your ability to extract signal from noise.
8. SGSarah Guo
  Do you take into account the quality of code
16:02 – 18:03
Quality-assurance in data set
1. SGSarah Guo
  in this pipeline in, in some way? Because y- you know, you're working on customer code bases. Like, if they're anything like the code bases I've interacted with-
2. BLBeyang Liu
  (laughs)
3. SGSarah Guo
  ... like, there's, you know, there's a variance of quality, but that's the real world. So, like, what do you mean by, you know, high-quality code here?
4. BLBeyang Liu
  I mean, we kind of implicitly do right now because we, uh... Uh, built into Cody is, is this notion of, like, you know, which code is it referencing? It's gonna reference the code in your code base, uh, first, and that's probably the, the most relevant code if you're trying to work on day-to-day tasks in a private code base. We're probably gonna release a feature soon. This is something that our customers have requested. Um, basically the ability to point Cody at areas of the code base that are better models of what good, uh, looks like. We, we talked with a lot of enterprise customers where when we say like, "Hey, you know, Cody has the context of your code base, it will go and do a bunch of code searches when it's generating code for you," uh, their initial reaction is like, "Uh, can I tell it to ignore, uh-" (laughs)
5. SGSarah Guo
  (laughs)
6. BLBeyang Liu
  "... large parts of the code? Because there's certain parts of the code where, like, yeah, those are anti-patterns, we're trying to, like, deprecate that or migrate away from that pattern." And we're like, "Yeah, absolutely. That's actually, like, a, a very easy thing to do at the, the, the search layer." And, and the nice part of, of this too is, um, when you're doing RAG, you can, you can be very explicit about the, the type of information, the type of data you're fetching into the context window. You basically, like, can give someone, like, a lever that they could turn on or off, or, like, a slider at query time that kind of controls what you tag in, uh, as context. So, you know, maybe sometimes you really do want the full code base as context when you're doing something like a completions or you're just trying to, you know, get something out the door. Other times maybe you want to be a little bit more thoughtful about what context you're attending to because, uh, you have a, a- another goal, uh, in mind. You know, not only do you want to ship the feature, uh, but you also want to uplevel the quality of your code or make it look more like some golden copy you have somewhere in your code base.
7. SGSarah Guo
  You just mentioned completions, and then there's the other, uh, sort of
18:03 – 22:48
Future of Cody
1. SGSarah Guo
  user experience model that we've seen, which is chat, in terms of how people interact with code generation capabilities.
2. BLBeyang Liu
  Yep.
3. SGSarah Guo
  Um, where do we go from here, right? Is it, like, is it agents? Is it more reliability? Like, what do you, what do you want to build Cody into?
4. BLBeyang Liu
  Yeah, so I think there's kind of, like, the short-term and the long-term to think about. In the short term, I think there's a ton more surface area in the developer inner loop, um, and kind of, like, h- human in the loop use cases. Uh, we-
5. SGSarah Guo
  Sorry, describe what you mean by inner loop.
6. BLBeyang Liu
  When you think about the software development life cycle, this kind of iterative cycle, uh, through which we build software, there's kind of, like, an inner loop and the outer loop. The outer loop is kind of, like, the entire ring of, like, you plan for a feature, you decide what you want to build, you go and actually implement the feature, you write the tests, you submit it to CI, you submit to code review.Uh, and then provided you pass all that, then it's time to deploy it into production. Once it's in production, you gotta observe and monitor it and react to any, uh, issues that happen along the way. So, that's kinda like the- the outer loop. That sorta happens at the team level or maybe the organizational level. The inner loop is the kind of, uh, cycle that, uh, a single developer iterates on, uh, potentially multiple times per day. And it- this is really the engine of- of how you iterate to something that is like a working patch that actually delivers the feature. So, in one invocation of the outer loop, there's many inner loops that you go through because as a developer, unless you're like, you know, a- a superstar genius who's already written this feature before, the first attempt at, uh, implementing a new feature, you're gonna get a lot of stuff wrong, you're gonna kinda like figure stuff out along the way, uh, you're gonna acquire more context and realize, oh, there's this other thing that exists, uh, that I should be using. And so it's that kinda like learning process that you wanna accelerate as much as possible. And so if you look at the landscape of code AI today, the- the systems that are actually in production and in use, uh, they're all inner loop tools. So, anything that is, you know, in your editor, doing inline completions or chat, that's- that's kind of assisting you in the process of- of writing the code and assisting you in- in kind of accelerating your- your inner loop as a- a developer. And there's just a ton of opportunity there. And we think of it mainly in terms of, uh, you know, beyond chat and completion, there is these specific use cases that represent forms of toil, uh, or are just, you know, a little bit tedious or repetitive or just non-creative that we can help, uh, accelerate. And so we've broken those out into distinct use cases that map to commands in coding. So, there's a command to generate a unit test informed by the context of your code base, there's a command to generate doc strings, there's a command, uh, to explain the code, again, pulling in context through the graph, uh, and- and through using, uh, code search, um, that we think can be targeted. Basically, they're- these are like laser beams that allow us to focus on key pain points in the developer inner loop. Things that like disrupt you, slow you down, and maybe take you out of flow. Ton of stuff there. That's all near term. In the longer term, I think the- the vision that we and a lot of folks are working toward is, hey, can we get to the point where the system can write the feature itself, the code writes itself, so to speak? And the-
7. SGSarah Guo
  An AI engineer, yeah.
8. BLBeyang Liu
  An AI engineer, exactly. The kind of interface for that, the way we describe it is, you know, you- can you take, uh, an issue description, um, it's either a bug report or the description of a new feature that you want to add, and can your system generate a pull request or a- a change set, uh, that implements, uh, the spec that you provide without, uh, human intervention or human supervision, uh, in the actual process of writing the code? Um, and so in the long term, we are working towards that. I think we're still a little bit, uh, a- a ways from, uh, getting there. There will be kind of like a range of issues that can be supported in terms of complexity, right? Like, there's certain, like, bugs and issues that, you know, in whole, are kind of a form of toil. Like, no one wants to do them because it's kinda like busy work, even though, uh, it's- it might be really important busy work, you know, like keeping your dependencies up to date and things like that. Those are probably the things that we'll tackle first, be able to completely automate first, and then we'll slowly work our way up towards more sophisticated, uh, features.
9. SGSarah Guo
  Migrating database schema.
10. BLBeyang Liu
  (laughs) Yeah, exactly. There's probably maybe like a- a two by two you wanna draw here between like how tedious is it and how high stakes is it. And, uh, you know, we'll slowly try to migrate up into the- the- the upper right quadrant.
11. SGSarah Guo
  You don't trust my AI to do that yet?
12. BLBeyang Liu
  (laughs)
13. SGSarah Guo
  Actually, I- I- I do want to talk about the constraints, 'cause like
22:48 – 30:28
Constraints in AI code generation
1. SGSarah Guo
  I've been thinking a bunch about this too, and, um, like one, if you take inspiration from the iterative process of real humans writing code and I'm like, okay, like, you know, there's- there's pseudoho- code in my head and I'm going to test something and then I gotta like remember how something works.
2. BLBeyang Liu
  Yeah.
3. SGSarah Guo
  There's now one, within a small community of people working on this, like, um, increasingly interesting vein of thought which is like, okay, we're gonna invest more in, sometimes people call it system two thinking-
4. BLBeyang Liu
  Mm-hmm.
5. SGSarah Guo
  ... or, you know, variations of test time search, like generate more samples, and because it is code, do different types of validation, right?
6. BLBeyang Liu
  Yeah.
7. SGSarah Guo
  There's another school that's just like, make the model better, right? Like, we don't need any validation.
8. BLBeyang Liu
  Yeah, yeah, yeah.
9. SGSarah Guo
  We just need more reasoning, right? I don't know if there's others that you think about, but like are those the right dimensions of constraint? Like, be more right in terms of what we show the end user or just, you know, have the model be-
10. BLBeyang Liu
  Yeah, yeah, yeah.
11. SGSarah Guo
  Yeah.
12. BLBeyang Liu
  So, uh, I mean, just to restate what you just said, um, I think that- I think that's a- a good way to slice it. Um, like the- the- the two examples you mentioned was like, oh, okay, like is the approach, uh, just integrate validation methods into the kind of like, uh, chain of- of thought and execution. Um, and- and maybe we can get by with like small, dumb models as long as like there's a- a feedback loop and, uh, work on-
13. SGSarah Guo
  Or what we have today, right?
14. BLBeyang Liu
  Or what we have today.
15. SGSarah Guo
  Yeah.
16. BLBeyang Liu
  Um, and then, uh, another school of thought would be like, hey, we- we really need just like much smarter models who- who don't make the same sorts of like stupid mistakes, uh, as- as are made today. Um, I think that's an interesting way to slice it. Another way to slice it that, um, has been kind of top of mind for me is if your goal is issue to pull request, um, one way to do it is you could take a model of whatever size and comp- basically like decompose that task, uh, down into sub-tasks. So, if you're trying to implement this complex feature, you know, which files do you need to edit? What functions do you add to e- each file? And what unit tests do you need to validate that functionality? You can kind of keep decomposing it 'til you're at the level where, you know, today's language models can solve that, and then you kinda chain 'em together, uh, right? So, that's- that's kind of one way, is kind of like the break it down and then build it from the bottom up. Um, the other way to do this is just, I mean, you could just say like...... the first way is wrong. Like, the first way is how humans do it. But, you know, it's not necessarily the case that the- the best way to do it for a machine is the way that humans do it. Another way you could do it is just say like, "Hey, let's expand the context window of the model so it can attend to, you know, a large chunk of the existing code base," and then just ask it to generate the diff. And, uh, you know, if that is reliable enough, um, it'll probably be unreliable, but if it's reliable enough such that, you know, it works 1% of the time, uh, then you can just roll the dice, uh, 100 times. And as long as you have, like, a validation mechanism, you know, as long as it outputs the unit tests, uh, which you can kind of, like, quickly review, then you just roll the dice 100 times and chances are at least one of them will be correct, and that's the one that you go with. Um, and that's- that's kind of... The latter approach is- is kind of the approach that papers like, you know, AlphaCode, uh, uh, or systems like AlphaCode take, uh, when they're trying to tackle these, like, programming competition, uh, type problems. So the limiting factor in the first approach, the bottom-up approach, is, uh, what percentage of time does, like, a single step, uh, in- in your- your whole process work? Because you're essentially rolling the dice, you know, n times, and if each... that your success rate each time is, you know, like 90%, uh, then it's- it's- it basically, like, you know, decays to zero the longer your chain of execution is. So, the more steps that are required, the- the more, the- the exponentially less likely you're gonna get all the way to full success is. And right now, I think, like, the- the, uh, the fidelity of today's system's far less than 90% for each step. So, the- I think this is the issue that everyone building agents in that way is- is encountering is, like, you know, how- how can we-
17. SGSarah Guo
  You have compounding failure. Yeah.
18. BLBeyang Liu
  You have compounding failure. And then, I mean, you have a kind of similar issue on the other side of things, which is like if you're trying to do the AlphaCode thing, we've gotten that to work decently well for programming language competition style problems. But w- working, like building a new feature within the context of a large code base, if you try to zero shot it, uh, I think the number of times, uh, you'd have to roll the dice would be basically cost prohibitive or time prohibitive. For both approaches, I think context quality can play a key role. Uh, because what we found is for Cody, for example, um, when our context fetching engine works, the quality of code generated by Cody, uh, it's like night and day. The ability for, um, today's LMs to kinda like pick up on patterns in the existing code, uh, understand what the existing APIs, uh, are, are- are in use, pick up on like the testing framework that you're using, it's like really, really good. Um, and so it raises the kind of like reliability, uh, level up from, you know, this is a complete dice roll, we definitely need to keep the human in the loop, to the point where you're like, okay, maybe if we improve this context fetching engine just a little bit more, we can get the point where we can start chaining these like two, three, four step workflows together into something that works. So, I guess the short answer to your question, like how do we get to more reliable agents? Uh, for us, the answer relies heavily on context quality and, uh, fetching the right context for the right use cases, uh, quickly.
19. SGSarah Guo
  Yeah. I guess I have a lot of optimism when if you look at this as just like a... it's just an engineering problem with a pipeline that has a bunch of different inputs, each of which you can improve from here, and you're doing trade-offs against improvement in any part of that pipeline. And like that- that could include, um, how we turn that, uh, natural language issue into something that a model can plan from or that we decompose, right? To what the context quality is to solve that, to, um, what is the, uh, you know, efficiency trade-off of like go sample new solutions from the language model versus what is the quality-
20. BLBeyang Liu
  Yeah.
21. SGSarah Guo
  ...of your feedback from runtime evaluation? And there's different types of feedback you could get.
22. BLBeyang Liu
  Yeah.
23. SGSarah Guo
  Like, I assume that there's like some... for any given level of language model quality, there's like some optimal pipeline, and I think we're like very far from that today, and then all of the dimensions are improving. So-
24. BLBeyang Liu
  Yeah.
25. SGSarah Guo
  ...I- I still kind of think the AI engineer is gonna come sooner than- rather than later.
26. BLBeyang Liu
  Yeah, I'm- I'm optimistic. You see very promising signs, uh, especially when- when the context engine works. And I think you raise an interesting point. I think it- it still is a bit of an open question. I- I think maybe the question comes down to like, you know, this system, uh, this AI engineer, uh, how much of the architecture of that system is gonna be captured at the model layer, you know, embedded in the- the parameters of some very large neural network, uh, or- or something that looks like a neural network versus how- how much of it is gonna be... is gonna be in, uh, I guess a more traditional, uh, uh, software system? Uh, kind of like a, a tri- traditional, um, you know, boxes and arrows, uh, uh, architecture. And, uh, yeah, I... The... My honest question is I- I'm not exactly sure. Like, it- it's not, it's not like we don't have any model layer stuff, uh, going on at all. It's certainly something that we're interested in. Um, but I think our philosophy is we always want to do the simplest thing or like what feels like the simplest thing first. Um,
30:28 – 33:17
Lessons from Beyang’s research days
1. BLBeyang Liu
  I think, you know, when I was doing machine learning research, this was like a- a principle that I took away, because doing the simple thing, it- it establishes baseline. Like oftentimes, you'll find that, like th- th- the doing the fancier thing is often sexier, uh, and certainly these days it's like trendier, right? 'Cause you can kind of claim the mantle like, "Ah, you know, I made a... I made my own LLM, a Byong LLM, and, uh, you know, I trained it on my own data."
2. SGSarah Guo
  (laughs)
3. BLBeyang Liu
  Uh, "Now I'm- I have, you know, AI or, uh, ML street cred because I- I did something with the model layer." But the lesson I took away from- from my research days was really the importance of establishing baseline, because oftentimes if you do the fancy thing first, uh, you might have something that looks like a good result because it... you know, it's gonna work, uh, to some degree. Uh, but then someone else might come along and do a much dumber, uh, simpler thing.... Cheaper and one that, uh, can be improved more iteratively, and it's gonna work as well or better than, than your solution. There's, like, many examples of that. Uh, I think there was a... The most recent example that comes to mind was, um, there was some paper in Nature that was published where some, uh, research group trained, uh, a very large neural network to do, like, climate prediction. You know, a very important problem. Predicting the weather, it's very tough, right? Um, and the thought was like, you know, using the power of, uh, nat- uh, magic of neural networks, we could, we could actually train something to predict the weather. And lo and behold, you know, like, it yie- it generated good predictions and, and it was published in Nature. And then a year later, there was another paper that was published in Nature where, um, a- another research group, uh, trained a neural network for this exact same application, uh, but in this case, the neural network was, was one neuron (laughs) . It was (laughs) , i- it was literally just, like, a single, uh, aggregator and, and that, uh, uh, uh, performed as well as, as the, uh, gigantic, uh, neural net. So e- basically establish a baseline first and, and that, that was kind of like what informed our initial prioritization of, of RAG over fine tuning. It's not that we don't think that there's value in fine tuning, uh, or there's value in, uh, training at the model layer. It's that, you know, RAG helps you establish a baseline and I think you're still gonna wanna do RAG anyways. Like, even if you have fine tuned models in, in the mix, RAG is still sort of this, like, last mile data or, or context, um, and so you'll wanna do that anyways. So why not do that first and establish a baseline that will actually inform where you wanna invest in at the, the kind of training layer?
4. SGSarah Guo
  I absolutely agree with, um, that characterization. And I'd say if you, um, if you approach RAG first, you'll benefit from improvements at the model layer, internal or external, right?
5. BLBeyang Liu
  Absolutely.
6. SGSarah Guo
  One, one question for you before we zoom out from the, from some of the technical stuff. Does the offering of small models, uh, like the
33:17 – 35:49
Benefits of small models
1. SGSarah Guo
  7 or 8 by 7B size that are quite capable, I think surprised a lot of people, um-
2. BLBeyang Liu
  Yep.
3. SGSarah Guo
  ... uh, from, from Mistral. Like, do small models that show higher level reasoning change your point of view at all, or how you guys approach this?
4. BLBeyang Liu
  Uh, we're very bullish on small models so we, uh, we've actually integrated Mistral into Cody. You can use Mistral, uh, as one of the models in Cody chat, uh, as of, uh, last week. And it's just, uh, amazing to see the progress, uh, on that side. Uh, I mean, there's a lot to like about small models. They're cheaper and faster and if you can make 'em, uh, approach the quality of the larger models for your specific use case then, you know, there's, it's a no-brainer, uh, to use them. Um, I think we also like them in the context of completions. The primary model that Cody uses for inline completions right now is, uh, StarCoder, uh, seven billion, um, and with the benefit of context, uh, that actually matches, uh, the performance of, uh, you know, larger proprietary models. And we're just scratching the tip, uh, of what's possible there with context fetching right now, so I think, uh, we're very bullish on, uh, pushing, pushing that boundary up even further. And again, with a smaller model, inference goes much faster and it's also mu- much cheaper, which means we can provide a faster, cheaper, uh, product to our users. What's not to like there? Um, I think there is a question, uh, with the smaller models, uh, specifically in the context of, uh, RAG, uh, because I think there's been some research that shows that the kind of, like, in-context learning ability of large language models is, is a little bit emergent. Uh, like, it emerges at a certain level of model size or, uh, maybe a certain volume of, like, uh, training, uh, data size. Um, and if you fine tune a, a, a medium-size-ish model, sometimes it loses the ability to, to do effective, uh, in-context learning 'cause I think the intuition is it's devoting more e- more of its parameter space to kind of like memorizing the, the training set so it can do better kind of like, uh, rote completion rather than have something that approaches, uh, kind of like general reasoning ability. So that's something that we're, we're kinda, uh, watchful for, and it does mean that in certain use cases, chat for instance, uh, Cody still uses some pretty large models, uh, for chat. Um, and, and we've, we have seen, uh, better results with models that have more of a kind of like general reasoning ability because they're able to better take advantage of the context that's, uh, fetched in.
5. SGSarah Guo
  We can't, at this time here, not make predictions, so one-
6. BLBeyang Liu
  (laughs)
7. SGSarah Guo
  ... is just, you have thought about software development
35:49 – 42:14
Future of software development
1. SGSarah Guo
  and how to change it for literally a decade now, probably longer since you had to, like, think about it to start the company. Um, what does it look like five years from now?
2. BLBeyang Liu
  That is a great question. Where my mind goes is, um... Well, I, I guess, to answer where software development will go in the next five years, um, maybe it's, it's kind of informative to look how, at how it's evolved over the past. There's a seminal work called The Mythical Man-Month, uh, that was written in the '70s about software development that today, uh, oddly enough despite all the technological changes, uh, still rings very true. And the, the core thesis of that book is that software development is this strange beast of, of knowledge work, uh, that's very difficult to, to measure. The common mistake that people make again and again is to treat it, uh, as some sort of, like, factory style work where, uh, you know, commits or lines of code, uh, are kind of commodities and, and the goal is just to try to, like, ship as many of those widgets out, uh, as possible, uh, whereas, you know, anyone who's spent, you know, a month inside, uh, a software development org working as an actual software, uh, creator knows that there's such a high variance in, in terms of the impact that, uh, a line of code can make. You know, you have some features that eat up many lines of code that have very little business impact and, uh, there's also kind of like one line changes that, uh, can, uh, be game changers for, for the product, uh, that you're building. And so when I look...... forward at how software development is gonna change, I like to place it in the context of solving a lot of the, the challenges that that book called out in the '70s that still exist, uh, today. And I think the, the core problem of, of software development is, is one of coordination and visibility. So, to develop the volume of software that we need in today's world requires, uh, teams of software develop- uh, developers, often large teams, uh, building complex systems, uh, features that span, uh, many layers, uh, of stack. And a lot of the slowness and a lot of the pain points and a lot of the toil of software development comes from the task of coordinating human labor, uh, across all these different pieces among many different people with different areas of specialization and also different incentives, uh, at play. And I think the, the real potential of large language models and AI more generally is to bring more cohesion, uh, to that process. And I think the, the, the gold standard is to really try to get a team of, uh, software developers to operate as if you are of one mind, you know, one really, really insanely intelligent, uh, productive person with kind of a coherence of vision, uh, and a, a unity of, of, of goals and a clarity of focus. And so there's a couple ways, uh, in which AI can do that, uh, well, specifically two. One is, you know, working from the bottoms up, making individual developers more productive such that more and more scope of software can be produced by a single human. If a single human brain is producing that software, then of course there will be more of a coherence of vision because it's just you, uh, that's, uh, building everything and you can kind of ensure there's a consistency of experience and, uh, code quality there. The other way, uh, of doing this is giving, uh, people responsible for the overall execution of software engineering team, you know, the team lead or, uh, an engineering, uh, leader, director, uh, visibility into how the code base is changing, um, actually helping you keep up to date with the changes that are happening, uh, across the area of code that is your responsibility. I don't know of a single, uh, you know, director or VP level, uh, uh, of engineering today that reads through the entire git commit log, uh, of their code base, uh, because doing so would be just s- literally so tedious and so time-consuming that you wouldn't have, uh, time for any a- any other, uh, parts of the job that, that are very critical as an engineering leader. But with the benefit of AI, I think now we have a system that can read a lot of the code on your behalf and summarize the key bits and sort of grant engineering leaders, uh, at long last the sort of visibility and transparency into how the system as a whole is evolving so they can attend to the parts that need attention, uh, and also make visible to all the other people on the team, uh, how things are evolving so that everyone has kind of the context of the overall narrative, uh, that you're trying to drive when you're kinda shipping day to day and making changes to code base.
3. SGSarah Guo
  I'm just gonna take this to its logical conclusion, Beyon. So, like, Brooks' Law from this book was that adding manpower to a late software project makes it later, right? So, I think the future is just me and, like, you know, like, a Jira linear shortcut interface, a really good spec, and, like, one sprint later-
4. BLBeyang Liu
  (laughs)
5. SGSarah Guo
  ... my engineer is done 'cause I didn't have to communicate with my team. That's it.
6. BLBeyang Liu
  Yeah. If your goal is to build software as it exists today, then yes. I think in the future, a single human will be able to build applications that today require, uh, large numbers of people to coordinate. On the other side, uh, of things, though, I think that the demand for, for software, we're nowhere close to, uh, reaching the demand for good high-quality software. And I think human beings have a tendency to take any system or technology that we're given and kind of push it to, uh, the limits or stretch it as far as we can. So, I think the other thing that's gonna happen is that our ambitions as a species for building complex sophisticated software are gonna kind of grow with the capabilities that we have. And so I still think we will have large teams of, uh, software developers in the future. They will just, you know, each individual will be responsible for far more feature scope, uh, than they are today. And the system as a whole will be more sophisticated and more powerful. Um, but people will still have to coordinate. (laughs)
7. SGSarah Guo
  So, what do you think will matter in that,
42:14 – 46:36
What skills will be valued down the line
1. SGSarah Guo
  like, future in terms of how, oh, like, what software engineers need to know how to do, right? And the little bit of color I'll give you here is we ran this hackathon, uh, early in the year for a bunch of talented undergrads who had built, like, you know, they're working on startups or had built, like, really good machine learning demos or done interesting research or something. So, they are people who are like, "I learned to code around AI tools," which is a wild idea to me.
2. BLBeyang Liu
  Yeah. Yeah. Yeah.
3. SGSarah Guo
  Right? Like, "I started on Cursoras . It was my first IDE," or whatever. And a point of view that was a little surprising to me in I think, like, M- March of this year was, uh, like, "We just don't need to learn to code anymore," right? And I'm like, "Uh," like-
4. BLBeyang Liu
  (laughs)
5. SGSarah Guo
  ... "How could you say that?" Like, you know, like-
6. BLBeyang Liu
  (laughs)
7. SGSarah Guo
  ... "They don't even teach garbage collection anymore." Like, grumpy old man. Um, like-
8. BLBeyang Liu
  Yeah. Yeah. Yeah.
9. SGSarah Guo
  ... w- uh, like, "Where's the CS fundamentals?" Like, what do you think people need to know? Like, what will be valued?
10. BLBeyang Liu
  So, my take on this and, you know, here- here's the advice I would give to myself or, you know, a y- a younger sibling or my child, you know, if, if they were, uh, you know, at, at that age where they're trying to determine what skills they should invest in. Uh, I think coding is still gonna be incredibly valuable skill moving forward. Um-I think in the limit, the things that are gonna be valuable that are gonna differentiate, uh, humans operating in collaboration with AI, if you think about, like, layers through which a software deve- delivers value, you know, at the very top, you have kind of, like, the product level concerns, the user level concerns, like, "How do I design the perfect user experience? How do I make this piece of software meet the business objectives, uh, that I'm trying to achieve?" And then you have, at the very bottom, the very low level, okay, like, "What data structures, what algorithms, what sort of, uh, specific things underneath the hood are happening that are gonna roll up to the, the high level goals that I wanna achieve?" And then you have, like, a lot of stuff in the middle, uh, that is really just mapping the low level capabilities, uh, that you're implementing to the high level goals that you're trying to achieve. And I think what AI will do is it will compress the middle, um, because in the middle is really just a lot of, like, abstractions, uh, and, um, middleware and other things that are today necessary, and today, you know, require a lot of human labor to implement. It's more boilerplate-y, it's more, um, uh, tedious, repetitive, um, non-differentiating, uh, it's more mechanical, but it's all necessary today because, you know, you, you gotta connect the dots from the high level goals to, to low level functionality. But the actual, like, creative points, the, the real, uh, linchpins around which software design turns are really gonna be the, the high level goals, like, what you're trying to achieve, and then the low level capabilities. My maybe a bit contrarian hot take here is that, uh, CS fundamentals, uh, if anything, are gonna grow, uh, in importance. Um, you know, the stuff you learn in a coding bootcamp, maybe that gets, you know, automated away, but the fundamentals of knowing, you know, which data structures, uh, what their properties are, how you can compose them creatively into solutions that meet high level goals, that is kind of like the creative essence of software development. And I think humans will have, uh, the ability to spend more time connecting those dots in the future because they'll, they'll just need less time spent on kind of like that middleware, uh, piece. So I, I still think CS fundamentals are very important and also domain expertise. So, you know, if you're trying to build software in a given domain, really understanding, like, what moves the needle, uh, in that domain is, is gonna be really important.
11. SGSarah Guo
  Awesome. Veyong, I think we're out of time. It was a great conversation. Thank you so much for doing this.
12. BLBeyang Liu
  Thank you so much for having me. This was, uh, really fun. (instrumental music)
13. SGSarah Guo
  Find us on Twitter @nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way, you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com. (instrumental music)

Episode duration: 46:36

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode 9hPMfH_fyMA

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome