Skip to content
ClaudeClaude

What legal agents inherit from coding agents: Lessons from Legora

Three patterns shaped Legora's legal-AI agent: what they could reuse from coding agents, what they had to translate, and what they had to invent. Hear Staff Software Engineer Jakob Emmerling share how Legora rebuilt document editing, linting, and bulk review around coding-agent principles and see the agent live.

May 21, 202628mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. SP

    [on-hold music]

  2. SP

    Please welcome to the stage Staff Software Engineer at Legora, Jakob Emmerling.

  3. SP

    [upbeat music] [audience applauding]

  4. SP

    Hi, I'm Jakob. I'm an engineer at Legora, where we are building collaborative AI for lawyers. And today, I wanna talk about how we learn from coding agents when building an agent to do legal work. But before we get into it, a bit of context about what we do at Legora. We're basically building, um, AI-based workspace for lawyers to do end-to-end legal tasks on. And we have over a thousand customers today, including some of the largest law firms in the world. We're originally founded in Stockholm, valued over five billion dollars now, and did kind of a record sprint from one to one hundred million in ARR. But that's not really why you should listen to me today. This here is. Six months ago, we had a realization of how we built agents and, um, realized that we need to do something different. And this all stemmed from this chart that probably everybody in this room here is very familiar with. We've all seen over the last years coding-- like, AI and coding going from bad autocomplete to good autocomplete, chatbots, agents, and background agents and beyond. And it doesn't look like it's gonna stop anytime soon. And we have heard talks this morning about new stuff in Claude Code and how organizations are leveraging AI to make software engineering more efficient, and it's just gonna, like, keep getting more and more important for our day-to-day work as software engineers. And, uh, the interesting part when looking at this is, and if, like, if you're building in any other vertical, you probably noticed a similar thing, is that other verticals outside of coding were actually quite behind looking back, like, six months ago. And, uh, this is a realization we had back then. And, uh, we thought, we thought a lot about, like, why this is and, like, what are the reasons coding is accelerating so fast? Why are agents so powerful for coding and not for legal work yet? And, um, we started, like, thinking about, like, in which ways is coding similar to legal work? And like, what are the analogies that we can draw to, to learn from it? And there's actually a lot of parallels, uh, once you start looking into it. For example, both are-- both coding and legal work are based heavily on prior work. Both lawyers and engineers work a lot with text-based documents. There are strict conventions within organizations and firms, and there's also, like, a strong review culture. For example, as engineers, we are reviewing each other's pull requests before code ships to production, or now also the pull request of agents. And lawyers do a similar thing where, like, an associate would draft a, a document, and before it goes out to any client, a partner will review it and sign off on it. And, uh, this is not something, like, magical about legal and coding and some unique combination of those two that leads us all this-- leads to all these parallels. There's actually-- Uh, there's a lot of parallels between any kind of knowledge work and coding. So if you, like, build in any other vertical, it's a very interesting exercise to do. And, uh, I mean, after looking at this, the obvious next question becomes: How do we make use of this? And like, how do we, how do we get in on coding agents getting better and better to make our vertical better? And it's basically-- There's basically three buckets how you can, how you can learn from coding agen-agents, as we found. First of all, there's stuff that you can reuse one-to-one. Stuff like to-dos, planning, sub-agents, sandboxes, human-in-the-loop. There's a lot of things that over time, coding agents figured out how to solve, like, a certain UX or how to make agents good at certain long-running tasks. And these things turned out to be pretty universal for agents in general. And it's also exactly the things that you get for free when using, like, the Anthropic Agent SDK or managed agents even. Next up is a bit more interesting. There's, uh, stuff you can translate, and these are things that look very similar in your domain to some, some sub-problem in coding. And, um, it's not as simple as just copying them. You kinda need to, like, look at the pattern of how coding solved something and translate that pattern to your domain. And we're gonna look at some examples, uh, of all of these, so it's gonna get a bit more clear, hopefully. And lastly, probably the most exciting part, because it's actually about, uh, building new stuff, is stuff you need to invent for your domain. And for us, that's stuff like grounding every answer in citations so lawyer c-- lawyers can verify where certain claims in, in the agent's output come from or working with large sets of documents for due diligence use cases. Okay, we're gonna walk through all of these different ones now after I grab a bit of water. First off, we're gonna look at planning mode and human-in-the-loop as an example for what you can reuse from coding agents. You all probably notice when you work on, like, a, a bigger, bigger task together with Claude Code or your coding agent of choice, you start, instead of just, like, firing off a prompt and let it rip for a few hours, you start off, like, planning, like, a detailed plan of how the work should be done. And this basically solves a few different things. First of all, you're exploring the problem together with the agent, you're gathering a lot of context, and you're also making upfront decisions so that then the agent doesn't need to make decisions for you when it actually executes the work. And yeah, it turns out this works exactly the same way in legal, if you want it to. Um, if there's any bigger legal task a lawyer would work at, now in Legora with our agent, um, they would just plan all the work beforehand, iterate on a plan, and when they're happy with all the decisions and assumptions made, they can let the agent execute on the plan. So it's literally just like a one-to-one translation of the UX that coding agents have established and found out to work very well for human-agent collaboration. Next up, a similar example, approval of tool calls or in general, like, dangerous actions. Like for example, in your coding agent, you don't want your agent to execute, like, randomShell commands that are not sandboxed. Uh, you want it to, like, ask you, is, like, should I run this? Is this safe? And you answer yes or no, and then the agent a-acts accordingly. It's a similar thing in, in an agent working with legal stuff. You don't want it to randomly go and delete a bunch of important client documents. You want to be in the loop for certain actions. So same again as the established UX, and we just took it as it is and built it into our agent. And we don't-- And therefore, we didn't need to do the whole iteration loop of figuring out what is the perfect UX for this use case. We could just take what coding agents already figured out and apply to our domain. Okay, next up is gonna be a bit more interesting. Um, it's about document editing, and it's something where we needed to translate from the coding agent domain to our domain. To give a bit of context here, lawyers basically love working with Microsoft Word files. They spend a lot of their time in Microsoft Word drafting stuff and, uh, like redlining documents, reviewing documents, and going back and forth. So obviously, if you build an agent for legal work, this is something you need to solve very well. And, uh, for us, I mean, this is also something we solved quite a while back in some way, and that's what I want to look at firstly and walk you through how we solved this problem before, after we had this grand realization [chuckles] of doing it differently. Um, okay. Maybe also interesting to know why, why, like, DOCX editing is a bit more challenging than just editing plain text files. A DOCX file is basically a zip file of a bunch of XML files with a lot of metadata and a lot of noise in there. So it's not as simple as, like, editing a Markdown file. So what we did, uh, initially, uh, we had s-some kind of top-level agent handing off an editing intent to another reasoning model, and then that model got context about the current document that needs to be edited and a bunch of instructions on, like, high level what should be edited. And it then reasoned about, okay, which edits to make throughout the documents. So if you have, for example, like a fifty-page template that you wanna fill out, that model would then figure out, oh, I need to insert something on page one, page three, page five, page six. But it doesn't write out the full edits because that would be, like, a lot of tokens. And then the models used to get very lazy and just started, like, filling out the templates mid, um, mid, uh, document. So what we then did with these, like, individual, like, editing markers where we know, uh, in this place we need to insert something like this, in this place we need to insert something like this. We gave them to individual models and had them write out the full edits with, like, style information and also pay attention to what was before and after it to, like, synthesize it into the document. And, uh, this worked very well and solved a lot of the exhaustiveness problems, but it also brought a bunch of other problems. Because basically what you have with this kind of setup is you have a lot of individual LLM calls with independent reasoning, different context, different tool, uh, tools, and you just have, like, all these hand-off problems. So for example, if you add, like, a bunch of new tools and things your, your top-level agent can do, suddenly your agent starts handing off to this editing to, uh, like, this editing reasoning model and gives it some instructions about pulling context via this tool that this editing model doesn't even have. So you run into all these weird, weird issues, like, the more powerful your, your agent gets. And, uh, of course you want-- You don't wanna, like, limit how, how powerful your agent can be and, uh, limit in, in too many ways. So if we, uh, contrast that to what all the coding agents out there converge to, we see it's quite different. Basically, like, all the coding agents out there work in a way where they, they just read, edit, and verify things in a loop. Like, you read usually line-based, um, simple plain text files that these agent need to read. And editing then happens with some tool that either does string replace, patches, or line-based editing. And then afterwards you-- the model either reasons about what to do next or run some static type checking, linting, stuff like this. And, um, yeah, when we, when we looked at this, and we-- I mean, as engineers, we on a daily basis use these coding agents. We also tried playing with it on, like, bigger documents. For example, we threw in these large tracing documents and had it made, like, surgical edits to twenty places, and it just worked. And it didn't have all these weird exhaustiveness issues that we, we run, that we ran into with, um, editing legal documents in the past. And this made us very intrigued to basically change how our agent does document editing. So what we tried then is, uh, basically the exact same thing, [chuckles] just, uh, with a bit of different tool implementations to make it work. So we have the same editing loop. We have a read tool, an edit tool, and a verify step. And the reading happens in the form of an intermediate representation of the DOCX file. We basically take the DOCX file, transform it in a flat representation that is like a single text-based file that the agent can interact with. And then we have a bunch of editing, uh, tools that work on this intermediate representation, and it can just go back and forth between reading and editing, see its own edits, and keep looping. And, um, yeah, I wanna quickly talk about the first time that we tried this out because it was quite a realization for us to see this work. So what we did back then was we built, um, a POC. So we got intrigued by coding agents solving these exhaustive editing things. So we built a POC of editing DOCX documents in the same way. And, uh, as any good POC goes about, you don't have any, like, structured eval set you run against it. It's like start vibe checking it and talking to the model and see, see how well it works. And you have these, like, ideas about what things are hard and what are easy. So I looked at this with a colleague of mine who is, uh, who was deep down the challenges of, uh, editing legal documents. He, he built our first Word add-in initially. Uh, so he knows about all the hard parts about editing legal documents exhaustively. And I asked him, "Okay, like, what should we try with this? Like, what's something hard? What, uh, what was challenging before?" And, um, he said, "Ah, just, like, pass it this ten-page document and, and ask it to translate it paragraph by paragraph from English to Swedish." Because apparently that was something that, like, in our previous setup orIn, uh, in, like, previous models, usually was very challenging to get done exhaustively. So we did this. We passed the thing in, asked the agent to, um, translate paragraph by paragraph from English to Swedish, and, uh, it just kept-- like, started editing paragraph by paragraph, as you would expect. Then sometimes it got a bit lost, started rereading the whole thing, sees, "Oh, I have forgot a paragraph up here," goes back, edits the paragraph, and just keeps, keeps doing this for, like, ten minutes. And, um, at the end, we open up the document and, uh, everything was, was translated. And, uh, the funniest part about this was that to test how good this, like, new harness and tool design works, we ran this whole thing on Haiku, so this wasn't even, like, a good model. And, uh, I think this, like, this was really a moment for us where we realized, okay, this might actually work. Like, there might actually be something, like, about mirroring tool design from coding agents and getting a lot of benefits from it. And, uh, I think the interesting part here is that-- or, like, uh, m-my mental model is kind of that you wanna have the model almost feel like it's inside a coding agent harness, and it just does a legal task. Because then suddenly you get these benefits from all the reinforcement learning and fine-tuning that is done on the coding agent harnesses, because your harness looks very similar in tool design, and leads to very similar trajectories in, in tool coding. And, um, a lot of stuff you just get, get for free, but it's pretty cool. Okay, that's it for editing. Another example I wanna quickly go over is linting for legal documents. Yeah, this is exactly what it looks like. It's, uh, basically ESLint [chuckles] but for, for legal documents. And, uh, you can imagine, I mean, as, as engineers, we, we use a lot of, like, static type checkers and tools, and they are very powerful also for agents to get, like, a feedback loop on all the mechanical stuff that you want to have them do right. And, um, turns out also in legal documents, there's a lot of, like, static things that you, you can actually verify that help an agent on, like, a bigger task. For example, if you have, like, a, a big contract that references, like, a paragraph later in the beginning, and an agent, like, removes the paragraph that's being referenced, like, it's cool to have a, like, a static way to check that all references are still intact and, and then give the agent this feedback loop of, "Hey, you might wanna edit, uh, the section at the bottom where you reference this paragraph." So it's also something very cool and, uh, you c- you can take this way further by, for example, doing LLM-based things inside of this linter tool, like lint more less mechanical stuff, so you can, like, make this feedback loop feel very similar to coding. Okay, that's it for stuff that we translated while building our agent. Lastly, I wanna talk about stuff that we had to invent when building, building our agent. And, uh, th- there's, like, equivalence for this in a- any domain. There's, like, very domain-specific things that people working inside a domain need to solve every day, and you want your agent to solve it equally well as a human could do to get the best possible outcomes. A good example for this, in our case, is, um, due diligence. So you can imagine you have two companies, company A buying company B, and there's a lawyer in the middle who then gets the task to make sure everything is fine with this transaction. And what they need to do is basically they need to go through all the contracts that co- company B has with other parties and review them. And, uh, as one can imagine, if this is, like, a large company, they have a lot of contracts, like thousands and thousands of contracts you need to review. And also other, other, like, binding documents. And, uh, I mean, of course, lawyers do this, this task today, and there's a lot of tools that help them do this. And we also have such a tool on the platform already. It's called Tableau Review. Um, it's what we see here on the, on the slide also. It's, uh, this grid-like interface where every row is a document, and then you can do a structured data extraction by adding columns to it and specifying what you wanna extract. So instead of reading the whole document, you can then, like, lean on an LLM to extract the relevant pieces of information from you and help you get, like, a good overview of large document set. And then you, as a lawyer, you would filter down on specific parties or specific red flags that you wanna then, like, follow up on and dive deeper into. So if-- when we're building our agent to do legal work, we obviously need to give it a way to do the same kind of work. So what we do is we just give it access to use Tableau Review, the feature that we have on the platform, in the same way that a human would use it. So our agent can go take a folder of documents, throw it into this tool, and then we generate all these, like, these cell values, and the agent basically specify what to extract. And then it can go and, like, filter down this huge grid of data and figure out what's, what's relevant for, for the task at hand. And, uh, yeah, there's, there's analogies for this probably in all different domains. For example, accountants would wanna have some way to do reconciliation very mechanically. Doctors also have probably some very specific tasks they need to solve. So you can, like, really think of this as, like, the last, I don't know, twenty percent of your agent to make it really well for, for a specific, specific domain. Okay, that's the, the three different categories of things. And now we're gonna have a quick live demo of how this looks like in our, in our agents today. We're gonna pray that the live demo works. [chuckles] Okay. Is it visible? Yes. Great. So this is Legora. Uh, you log in, you get a chat box, like in a lot of AI tools. And, uh, we can look a bit around. I prepared a project [lip smack] that's called Employment Agreements. And if we look here, we have, uh, a bunch of files in here. It's not that many. We have a bunch of employm- employment agreements for different em-- uh, fictive employees here. And we have an HR policy that's, uh, like specifying how we, like, how we-- what kind of benefits our company has and a bunch of stuff like this. So if we wanna go in here now and, uh, give our agent a task, we can, um-Ask it to-- I wanna give every, um, employee an extra week of vacation during Christmas. Um, let's plan out the work we need to do for that. Drop in a prompt here. And, um, now what we expect the agent to do is, uh, to look-- first of all, make itself familiar with its environment. It's searching for stuff here now. For example, it's searching for employment agreements, vacation time off to figure out, okay, what agreements do we have? What policies do we have? And then we want it to, like, reason a bit about, okay, what needs to be done for this change and create us a plan. And now it's creating this plan. And if we look in here, we're gonna see-- Okay, this is still streaming in. At least it says we need to do a bunch of steps here. First of all, we need to review all employment agreements. We need to amend some employment agreements to add the Christmas shutdown clause. Very nice. And then we also need to update our HR policy manual. And I mean, the cool thing here is that, like, if you would drop an agent into a code base and ask it to do a random thing, it would first go out and, like, collect all this context. Like, we didn't need to tell it that we have a policy that we need to update, or we have five employment agreements we need to update. It basically goes and collects all this-- these things, uh, that need to, need to happen for this change. And, uh, if you-- I mean, we could now iterate with the agent on this and tell it to, "Oh, no, we don't want an extra section for this," or phrase it like this or whatever. But, uh, for the, for the purpose of this demo, we are fine with this, and we can send off the plan here and have it execute what we planned out. And, and what happens now under the hood is this exact editing loop I, I talked about. First, it starts reading the documents. Then it reasons about what edits to do. Then it calls some editing tool, and it goes back and reads the documents to see that the edits were made and if everything is looking fine. So it's, uh, yeah, thinking a bunch here now about how to do this, which is great because it means it's hopefully gonna work. And this is the moment where good salespeople start to talk about stuff. [laughs] I'm not very used to demoing the product, so [laughs] it's gonna be a bit boring. [laughs] Oh, but we should pretty soon, uh, get, uh, some stuff back. Okay, now it's, uh, copying over the-- all the documents it wants to modify to a different space to modify them then, so I can review the changes before they get written back. And in a second, we should start to get the first edit streamed in here. Yes. So it's good. Okay, if we collapse this, we can see now that it starts editing the different employment agreements, and I can look in here. Ah, it's actually interesting. There was one employment agreement that I had from before for testing where it already had the Christmas shutdown clause, but now it decided to also unify this one to make the dates uniform to the other ones. So that's pretty cool. Um, but I basically see here what it's adding to the individual documents and all the edits it's making. And, uh, then I can also see the redline version of the original document. So if I cl-click in one of the agreements here. Um, okay, this is a boring one because it's the one that had the clause. But this is one of the employment agreements that it edited. I can go in here, and I see it redlined with the right formatting in the right indentation. It added a clause here to the benefits section about the Christmas shutdown that, uh, every employee is gonna benefit from. And now it also goes and updates the annual leave policy. And in the same way, I can go to the policy here. This is our policy, and then we should get a redline for, yeah, annual leave, uh, on the right section about the new thing we just added. And, uh, now it also gets a bit ambitious. It wants to draft a employee announcement memo to communicate to our employees that they have more vacation now. Um, yeah, we could like select now which, which template to use for that memo and how that should look like. And it's just gonna go and, and draft this thing. But, uh, yeah, we don't need to wait for it 'cause it's just gonna spit out a document in the end. Okay, that's it for, uh, for this part of the demo. Another thing I wanna show related to the due diligence, like mass document review use case is, uh, this one here. So here we have, um, another Legora project with a bunch of files in there. We have like around a hundred files in here that are like random documents that might pop up in the due diligence of a company. For example, we have-- I don't even what's-- know what's in here. It's like insurance policies, workers' compensation. It's, uh, random documents. And I, I, of course, don't wanna go through them one by one and start sorting them in. So what I do instead, I take my prepared prompt here because this is too long to speak reliably [laughs] and put this in here. And I'm basically asking the agent to... Oh, we also quickly need to go somewhere else here. Got... That's the right one. Uh, so I'm basically asking the agent to do a structured review of all files in this project. I want to know what categories of contract exists, which are the interesting parties, and if there's any red flags in the stuff. Yeah, it's a very legal, uh, [laughs] legal-specific prompt here. And, uh, then based on that, I wanted to also-- after it found all the different types of agreement, I wanted to put the different, um, employment agreements in a folder for future work. And again, like, uh, it doesn't wanna jump right into the task, which is fair because, uh, it's not the most detailed prompt here. [laughs] So it, uh, tries to, like, write up a plan of how to do this, but, uh, looks kind of fine. So we're just gonna have it do the thing and see what comes out of it. And then now it starts to first create a tabular review, which is this grid-like document extraction thing I talked about earlier. And, uh, in a second, we should actually have this thing to look at it. It also figured out that there's like a hundred documents here, and that it'll take a few minutes while the AI processes each file. It's very aware of [laughs] how this works. Okay. I mean, under the hood now it needs to write out all the different files it wanna-- it wants to put in this tabular review, so it's actually like a quite long tool call. So that's why it's gonna take a second for this to go through. But, uh, it's looking good.Okay, there we go. So now it like, it's still doing stuff, and I could stay here now and it would like keep talking and keep doing the task. But, uh, what I wanna do instead, I wanna actually open this up and look at this, uh, this part of the platform. So this is Tableau Review. We have-- we saw there's a grid on the screenshot earlier. We have, um, all our documents here in, uh, different rows. You can open them up and see the documents here. And then we have columns that define data that needs to be extracted. For example, here, the agent decided it wants to extract the document category and it writes a detailed prompt out here, what that means, and how to, how to extract this data point. And, uh, the cool thing here is that this is actually way more than just like a flat grid of data. It's actually a fully interactive reviewable surface of-- for data. So I can open this up here and, um, I can see the document on the right here, and I can see the extracted data points. And what I would do now, um, as a, as a lawyer, I would go in here and in some cases actually verify the answers. So for example, I can see, okay, this document is, uh, flagged as compliance regulatory. I can see the reasoning of the LLM doing this task of, uh, why it was flagged that way, and I can even click on here and get highlights in the document where it pulled that content from. And I mean, this, this example is pretty boring because the document is like five rows. I'm gonna find something a bit more interesting where we're gonna see this in action. So I have some employment agreements again here. They keep coming back. Um, so here I have a bit of a longer document, and here it classifies it as employment agreement. And then I could then go here after I verified this and actually marks-- mark this as verified. And then I can also see like the progress of verification is being tracked. So if you're like some more senior lawyer doing this kind of task, you could like ask an associate to review all the different, uh, data points here to double-check, and then you can collaborate on this surface with AI and also between different humans. And then I can see also here, it extracted the parties here. I can click on here for the citations, get taken to the right part of the document. And here it also raised a bunch of red flags and concerns about, uh, stuff being placeholder in our employment agreements, which is probably not that great in the real world. So yes, this is, uh, this is pretty cool. And if I go back here, then I should also see... Yeah, I don't know where it is. Uh, yeah, here. We see that the thing is verified. Get a cool green check mark. So if we go back to the agent that's hopefully finished our task in the meantime, we actually see it came back with something. So it did this whole thing in the background. It created this Tableau review that we looked at now. And I mean, we also don't need to look at this, but we can use it as a human-in-the-loop step for verifying outcomes, just get a bit of a better understanding of the underlying data. And, um, when we come back here, the agent figured out the document categories. It figured out the key parties. There's some interesting red flags here that, uh, I can't judge how good they are 'cause, uh, I don't know [laughs] what is, what is in there and what not. And, uh, it also moved all the employment agreements to a specific folder. So if I click in here now, we have this folder here about employment agreements, and it has all the employment agreements in these like hundreds of documents in there. Yeah. So that's it, that's it for the demo. You can go back to the slides. Great. Okay. To quickly zoom out, out a bit and, um, round this off, I think it's very interesting to also think about why coding as a domain is so much ahead in terms of like AI adoption. And there's probably a lot of different reasons for that. There is, for example, the fact that engineers are just more willing to try new tools and to like adopt new technology in their work. It might also be the case that coding gets so much focus now because solving coding unlocks a lot of growth in like other piece-- uh, other niches of, of software engineering, and you can accelerate progress much quicker by solving it. But, uh, the cool thing is, if you're building any other vertical, you don't really care why it's ahead because you can just keep looking at what coding agents ship, and you can reuse what's usable for your domain. You can translate stuff that's similar, but not really the same. And then the last part you actually invent and, and come up with for your specific, uh, domain agent. And yeah, that's the framework. Uh, for any vertical agent, you can just keep looking at coding agents. Whenever they ship something new, you steal the thing and benefit from it. [laughs] Yeah, that's it for today. Thank you. [audience cheering] [upbeat music]

Episode duration: 28:42

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode nho1YAEPuwA

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.