Fighting financial crime with Claude Cowork

Leveraging Claude Cowork to optimize high-stakes workflows and fight financial crime. In-house MCPs, MCP gateways, evals, deploying critical workflows in production for analyst teams.

May 22, 202627mWatch on YouTube ↗

EVERY SPOKEN WORD

20 min read · 3,970 words

0:00 – 0:14
Intro
1. SPSpeaker
  [on-hold music]
0:14 – 2:29
Qonto’s mission and why financial crime is a top AI priority
1. SPSpeaker
  Please welcome to the stage Senior Staff Software Engineer at Qonto, Stefano Amorelli.
2. SPSpeaker
  [upbeat music] [audience applauding]
3. SPSpeaker
  Thank you very much. Thank you very much, guys. I'm very, very happy to be here with you today. I'm Stefano Amorelli, and, uh, I work at, uh, Qonto. Uh, Qonto is a fintech based in France, and we provide business banking online for SMEs and financial tools. We have more than six hundred thousand customers, and we operate in more than eight markets in Europe. So if you work in finance, you probably know that financial crime is the number one priority. Just to give you an idea, between two and five trillion US dollars are laundered every year in the world. So it's a big business for criminals, and thanks to AI, the bad guys are getting better and better, but so are we at preventing financial crime. What I want to show you today is what we are cooking at Qonto, how we are using AI, how we are using Claude, how we created a system around that, that puts security and compliance first, so that we can apply AI in critical environments with very sensitive data. I'm also gonna touch very quickly ab- uh, on the topic of evals. How do you run evaluations, what you actually evaluate, and why they are important in these use cases. And I want to leave you also with some food for thought on how some of the key takeaways and lessons learned from this project can also drive AI adoption, especially if you are in a bigger organization across different departments. Let's get started and see what's the life cycle of a financial crime.
2:29 – 4:00
The manual investigation workflow: alerts trigger a heavy human process
1. SPSpeaker
  Uh, so whenever somebody does a suspicious transaction, uh, what we have is an alerting system. Uh, this is fully automated. And once an alert gets created, that's when the manual process begins. So we have a human agent that picks up the case, prioritize it, and gathers data from so many different data sources. So imagine going to Google Web Search, uh, to, uh, third-party tools, to internal dashboards. So what we see is, uh, these financial investigators that have three different monitors, at least, and dozens of browser tabs, more than, uh, than I have, um, and gather all this data, all these documents, and compile a big document. Some of it is in their brain, some of it they write it down, and then they have to reason, uh, within that information and make a judgment whether there was indeed a criminal activity or not. Very manual process, can take a long time, and right now-- Well, in the past, uh, in the near past, was handled by humans, but it's a bit of a inhuman, uh, process, okay? We ask a lot from them. So what's the role of AI? How we can apply AI to make this process better? So one way, obviously, we can, uh, you know, spin up Claude and ask questions, so that's the general
4:00 – 5:01
Where AI fits: predictive ML for alerts, agentic AI for investigations
1. SPSpeaker
  usage. We can upload documents, ask general questions without any plugin or maybe a simple skill, but that's not exciting, right? We want to, to dive a little bit deeper on how we can leverage, really leverage AI into this use case. Another phase is the native AI. So what we call the first line of defense is the alerting system that I mentioned before. And there, generative AI is not really the main character, because we use predictive AI, traditional machine learning. We really value, uh, speed and accuracy, and most likely it's, uh, it also, uh, combines deterministic, uh, rules, and most likely gen AI is not the best fit. But what's interesting is how we can apply agentic AI to the second line of defense. So the manual process that right now takes a long time. In the industry, not many players are adopting age- uh, AI and agentic AI in that step
5:01 – 6:32
Model and interface choices: Opus 4.7 + Claude Cowork for investigators
1. SPSpeaker
  of the process. So we're gonna focus on the phase three, agentic AI for the second line of defense. The first question that we ask ourself is, all right, which model do we choose for that? And the answer was a bit easy in this case. Uh, [laughs] we went with Opus 4.7. Um, I would argue that maybe you do not need Opus 4.7 for everything you do, even if you're doing coding. For some cases, you might, um, survive with other models, cheaper models. Uh, but in this case, if you're handling so much information, you do need a model that is able to reason across a very long context window. And really there is no better frontier model right now in the mar- market than Opus 4.7. We also use Claude Cowork as an entry point, as a user interface for the, um, investigators, for the anti-financial crime investigators.First of all, because it's easy to onboard them, so non-technical stakeholders, they can use Cowork. But also it's very powerful, uh, for creating complex plugins, uh, that contains multiple skills and are packaged with the tools and MCP servers that they need, as we will see in a few slides. If we,
6:32 – 7:32
Why long-context reasoning matters: GraphWalks benchmark and scattered facts
1. SPSpeaker
  uh, take the numbers, if we bring out, out the numbers of why Opus 4.7 shines in these tasks, there is a benchmark that is, uh, very valuable to evaluate which model is the best, and the-- it is the GraphWalks benchmark. So what this benchmark demonstrate is how well the large language model is able to reason across a document and find the facts that are spread across the document. So n- not necessarily facts that are close to each other in the context window, but it's able to find the connections across all the context window. And that's exactly the best benchmark you want to have a look at, especially in investigations where information is scattered around all the context window effectively. And Opus 4.7 is the leader right, right now in the market. You cannot find a better model than that for now. Um, no matter how intelligent
7:32 – 8:58
Data access is the real bottleneck—and MCP raises security/compliance concerns
1. SPSpeaker
  the model is, if it doesn't have access to data, it's not as cool. It's not as useful. So when we're talking about data access, what comes to mind? Maybe somebody from the audience wants to say it. When you want to give access to data to a large language model, what do you think about?
2. SPSpeaker
  MCP.
3. SPSpeaker
  MCP, yes. Correct. Who is running MCP in production? Raise your hand. Okay. Who had an interesting conversation or concerns about security and compliance when it comes to MCP? Okay, I see a lot of hands. Uh, yes, so when you mention MCP to [laughs] risk and compliance, when you mention it to security, you can raise some eyebrows, to put it mildly. Uh, some people even say that the S in MCP stands for security. But the reality is that, uh, you can address these concerns. You can build it security-first and compliance, uh, first in mind so that you can build a case and address these objections. How you do that, you create the boundaries, and you implement a system, an harness that, uh, is compliant and secure. And that's exactly the main takeaway of, uh, this talk that I want to share with you. Um, especially in fin
8:58 – 9:28
The data landscape: many sources, multimodal inputs, and action-taking APIs
1. SPSpeaker
  crime investigations, the data is scattered across so many different data sources, okay? So the challenge here is that we have our knowledge, uh, database. Uh, we have, uh, OSINT, so open-source intelligence. We have internal databases, dashboards, KYC, KYB data, so also multimodality. We also want to take automated actions, uh, through our internal API endpoints. It's a mess. Uh, every data entry point uses a different
9:28 – 10:59
Security-first requirements: remote MCP servers, OAuth, RBAC, audit trails, human oversight
1. SPSpeaker
  programming language. It's internal, it's external, and we want to unify the experience so that it's secure and compliant. How we do that? Well, from a technical perspective, let's, uh, dig a bit deeper. Uh, what we decided to use, first of all, is remote MCP servers. So we want a centralized place where we can manage them, where we can monitor them, and that means that we have OAuth by default, so we have strong authentication. That's the first technical requirement that you want-- we want to apply. Uh, second of all, we also want to make sure that the session tokens, so that the-- apart from the authentication, the permissioning system is cryptographically safe enough. So even if the tokens-- a session token gets leaked, uh, it's not a security-- a big security concern. And for that, we leverage PASSETTO tokens, so it's, uh, platform-agnostic security tokens. It's a relatively new technology from two thousand eighteen that fits perfectly the case for shortly lived, uh, security tokens. We also want role-based access control. We want to give access to specific data to specific people only. We don't want everybody to access everything inside the company. Another important point is that we want audit trail. We want to know exactly who is accessing what and when. For
10:59 – 13:33
System architecture overview: Cowork plugin → MCP gateway → federated MCP servers
1. SPSpeaker
  compliance reasons, we do not want to fully automate the process just yet. So we want to keep human judgment, at least on the critical decisions, because these actions, these financial crime investigations, have consequences, have legal consequences as well. So these are a few technical requirements, but how the architecture looks like, uh, this is how we implemented these technical requirements. This is architecture. The entry point is Claude Cowork. So we have, uh, a plugin, um, and that's the entry point for our anti-financial crime analysts. And the Cowork plugin is connected to the MCP gateway. So that's the most important building block of our infrastructure. What is the role of the MCP gateway is to authenticate the user, first of all, is to, uh, implement role-based access control, so certain users, based on identity, can access only, uh, certain data. And it's also, uh, to implement the audit trail, so everything that is done is logged in an append-only database.Um, another cool thing of the MCP gateway is that it connects to downstream MCP servers. So you have multiple data sources, multiple MCP servers deployed internally, and the cool thing is that any of these MCP servers can be implemented in Go, Python, TypeScript, COBOL. Or maybe not COBOL. Uh, please don't do that. Um, but it's cool because it's agnostic, so it becomes very easy to link a new MCP server, a new data source. And you get audit log, you get identity, um, management, and role-based access control by default out of the box. Okay? And then obviously, MCP servers connect to the downstream APIs. So, as I mentioned before, the entry point is our Cowork plugin. Uh, we're gonna give, dive a, uh, dig a bit deeper into that, um, later in the presentation. But just to give you an overview what's the idea of the plugin. We sat together with investigators, we did a few investigations together as engineers, and we crafted, uh, this plugin based on their domain knowledge and domain expertise. Very cool because it's centrally, um... It's an artifact that is centrally, uh, shipped, and it's versioned, and can be updated easily, even by the investigators themselves. Uh,
13:33 – 15:35
End-to-end auth flow and RBAC operations: SSO identity, token minting, Terraform policy
1. SPSpeaker
  very shortly, again, what the MCP gateway does, uh, SSO. So we use single sign-on internally at our company, and it's, the MCP gateway is responsible for authenticating the user and minting the shortly lived tokens so the MCP servers know which permissions the user have. And we also then have the audit pipeline. So if we have a look at the end-to-end flow of, uh, an MCP call request, we see the user that is using Cowork is initiating a session. That means that if it's not logged in, we have a single sign-on page connected to an identity provider, and the identity provider provides the identity of the user logged in to the MCP gateway. At this point, we already have everything audited, everything logged. And if the user is authenticated, has the right permissions, then it can perform the MCP server tool calls. Effectively, the, um, the users, uh, based on their permissions, they are not even able to see MCP servers that they are not supposed to have access to. So multiple roles the MCP gateway has. Uh, one is to validate the token from the OAuth. One is to resolve the identity, authorize the request, and forward that to the downstream MCP servers. How we do role-based access control? Well, in our case, we have a Terraform file, so it's also versioned, it's also auditable, um, and we can see the history of the changes where we define which teams, based on the identity, have access to which MCP servers. So it looks like this. Uh, practically it's just a Terraform file that we version in a, a GitHub repository. But let's dig
15:35 – 18:07
Implementation details: ContextForge gateway, Kubernetes deployment, streamable HTTP, Paseto validation, OTEL instrumentation
1. SPSpeaker
  a bit, uh, deeper into some code snippets. So here is the architecture again, a little bit more detailed. Here I split a bit the MCP gateway with the authorization gateway, which is-- the responsibility is about identity management. And the MCP gateway is based on an open source package. Uh, we use ContextForge. And it's, uh, connected to downstream MCP servers that are federated, so they are internally deployed and maintained. How does it look in practice? So this is the authorization gateway. Uh, we get a token from the auth, from the single sign-on. Uh, we verify the authenticity of that, and we get the permissions from the identity. Okay? Uh, eventually, once we validate the permissions, we can mint a bearer token. So why we do that? Because we want a shortly lived token to be shared with the downstream MCP servers. We wo-want... We don't want to share the token that comes from the, um, single sign-on. So we mint the token, the Paceto token. And how it looks like in practice for downstream MCP servers, we just have a YAML file that contains the configuration of these, uh, servers. They are deployed on our Kubernetes cluster. That means they can be reached only by the MCP gateway. They cannot be reached directly. And we use streamable HTTP by default. Stateless streamable HTTP. That means that we can scale it, uh, much easier. Inside the, uh, downstream MCP servers, we need to validate the authenticity of the bearer token, and that's where we use Paceto public tokens. And very shortly, how they work is basically you have a Base64 encoded payload that is not encrypted. So even, uh, if it's exposed, it's supposed to be like so, but it's signed so that we can verify that is, that its content has been, uh, has not been tampered. So, um, we also, another important thing for audit trail is that we instrument all the MCP tool calls that we do. And here we use conventional, uh, OTEL plus some additional fields that are not supported by, uh, the standard just yet. And
18:07 – 19:07
What investigators experience: one interface, interactive widgets, faster case understanding
1. SPSpeaker
  now let's see a very quick video of how it looks in practice with a small exampleSo here I am asking to run analysis. It's very sped up, and it's a bit masked. Uh, I'm sorry, I cannot share, uh, exactly the customer data. Uh, but what I wanted to show you here is that Cowork is not only to gather information from different data sources, but also able to render on-the-fly widgets. So now we are using inline, uh, widgets much faster, much quicker, much better than, uh, the previous artifacts that would take much longer to generate, and they're also interactive. So in this case, you can see there are a few actions at the bottom so the, um, the investigator can trigger actions. There, there are also dropdowns. They can edit the visualization of the, of the charts. So very quick video, but it really changes the life of investigators that before they needed
19:07 – 19:38
Operational monitoring and auditability: Grafana + ClickHouse visibility into tool usage
1. SPSpeaker
  to go across so many different tools, gather the data, find the information, reason about it. Now they-- with just in one interface, they, they can have a dashboard created for them and also leverage AI to have suggestions and reasoning on the findings. So let's go back to the slides and see how does it look on the backend. So here what you can see is, uh, Grafana dashboard. It's on production, and it uses, uh, ClickHouse as a database. That
19:38 – 21:09
Plugin design lessons: modular skills, orchestrator + meta-skill, XML prompts, explicit tool scoping
1. SPSpeaker
  is a perfect fit for the audit trail. So here we can see all the tool calls, all the authorization flows that happened during the day, and we can also see who is accessing what. So we have a user, uh, and then we have a tool name, how many calls, how much time does it take. Everything is traced and instrumented. But let's have a look at the plugin itself because there are some learnings that are worth sharing as well. Uh, how we structured such a complex plugin that contains so much domain knowledge and expertise. Well, the first thing is good prompting hygiene. So instead of creating one huge prompt of, uh, one thousand lines, we splited it across different sub-skills. That's the main thing, and the main skill, we call it as our orchestrator, refers to the different sub-skills based on the operation that the investigator wants to do at different steps. You know, we also use what we call a meta skill that runs always at the end and verifies the results, uh, at the end of the, of the call that whenever the investigator runs the plugin. Inside the prompt of each skill, we use XML structured prompt. We find it much more efficient than traditional prompting, and we also specify which MCP servers and tools are used by the skill itself. So we save some time, uh, for the large language model to discover exactly the tools that it needs
21:09 – 23:40
Evals for trust: tool correctness, grounding, and reasoning quality (LLM-as-judge)
1. SPSpeaker
  to perf-- to use, uh, the skill. Okay, but, uh, once we build that, then the question was from different stakeholders, "That's very cool, but, uh, does it work? Like, can we trust it?" Uh, and the answer to this question, a good way to answer this question is evals. You want to bring to the table some quantitative facts, some data that demonstrate that the plugin is performing as you would expect. You can evaluate different things. Here I try to outline the f- the three things that I believe are the most important in this use case, and one is the ability to-- for the plugin to call the right tools, but also the ability to, uh, call them in the right order, for example, and the ability to not come up with random facts, with hallucinations. All the data that is displayed on the dashboard must be grounded into, uh, the reference documents. Another thing for when we ask the large language model to reason and suggest, uh, conclusions on the investigation is not only about the output, the expected output. You also want to evaluate what's the reasoning behind the output that the large language model gave. So let's say there is an investigation. Uh, the large language model said, "Okay, this is criminal behavior," and maybe it's correct, but we also want to know that the reasoning behind that, how it reached the conclusion, is also correct. And this is, uh, done with the large, uh, language model as a judge, and it's another good parameter to evaluate. So what's the impact of evals? So for engineering, it's great because then you can iterate on the plugin. You can change a model. You can see if you introduce, uh, regressions. For compliance, they are much happier, uh, because now they can sleep at night. Uh, we can prove to them that, uh, the data is, uh, is correct, that is the accuracy that we expect is, uh, is compliant with our requirements. Uh, but it's also for our end users very important because we are introducing a new tool, a new technology. We want to show them that they can trust it, and they-- we don't want them to second-guess every
23:40 – 27:50
Scaling AI adoption with an MCP gateway: a reusable compliance “flywheel” and future automation levels
1. SPSpeaker
  time they use the tool. But that's not only about financial crime. All these learnings about building security, uh, compliance, uh, having a mindset of creating a system around, uh, data access that is, um, se-secure and compliant can also be leveraged in many other use cases. What we are seeing at Qonto now that we have an MCP gateway, we have our first plugins, more and more teams are building their own plugin that is very cheapEverybody can do it with the right prompting skills, leveraging the same MCP servers, the same MCP gateway we created. Maybe they need a new MCP server, can be deployed, uh, also very quickly in just a matter of few days. And what it becomes is basically a flywheel where you drive AI adoption inside your company, especially if you're working in a bigger enterprise, with security and compliance in mind. Because with MCP gateway, you get, uh, audit trail, you get role-based access control, strong authentication, identity management out of the box. So initially, this project took, uh, a few weeks to build. Uh, but what we are seeing is that new use cases are adopted in just a matter of a few days. This is just the beginning. Uh, what we saw today is still having human in the loop. But thanks to evals, thanks to keep-- thanks to our effort to keep improving the plugin and, uh, uh, measuring its accuracy, our vision is to move more and more forward, uh, towards human, uh, on the loop. That means that they review the decisions that AI can take autonomously. And then there is the dream scenario where we have humans completely out of the loop, where AI takes all the decisions, possibly much more longer term, but a possibility if we can demonstrate the accuracy through evals. So some key takeaways. Evals, start with evals. I like to compare, uh, evals to TDD. Uh, so if you're a software engineer, you're probably familiar with test-driven development. In any case, if you are building critical workflow-- workflows, you most likely will need evals at some point, so why not start with it, invest in it as soon as possible, so you can leverage and be faster, uh, later on. Access to data is king. So you want to be able to access the, the data you need. Uh, you want to give the large language model the power that it needs, but sec-- in a secure and compliant way. We want to, to make sure that, um, it's especially in these sensitive data topics, in these, uh, workflows where we need to perform such critical operations, we want to keep that in mind from the beginning. In the beginning, I said that financial crime is a huge bus-business, uh, from two to five trillion, uh, US dollars laundered every year. But what's even more worrying is that a very small percentage of that amount currently gets detected and seized by authorities. So in my honest opinion, this is a very good use case of AI, uh, where we're using AI that is not only for building, uh, stuff, cool stuff, but also to have, uh, a very big impact on, uh, on society. And even if you're not working in the financial crime, on the financial industry, um, or some, uh, you know, sensitive topics like healthcare, sensitive industries like that, I hope you can still apply some of these takeaways in your day-to-day job so that we can build together a more trustworthy AI. Thank you. [applause] [upbeat music]

Episode duration: 27:51

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode tUoO4ucrNc0

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Intro

Qonto’s mission and why financial crime is a top AI priority

The manual investigation workflow: alerts trigger a heavy human process

Where AI fits: predictive ML for alerts, agentic AI for investigations

Model and interface choices: Opus 4.7 + Claude Cowork for investigators

Why long-context reasoning matters: GraphWalks benchmark and scattered facts

Data access is the real bottleneck—and MCP raises security/compliance concerns

The data landscape: many sources, multimodal inputs, and action-taking APIs

Security-first requirements: remote MCP servers, OAuth, RBAC, audit trails, human oversight

System architecture overview: Cowork plugin → MCP gateway → federated MCP servers

End-to-end auth flow and RBAC operations: SSO identity, token minting, Terraform policy

Implementation details: ContextForge gateway, Kubernetes deployment, streamable HTTP, Paseto validation, OTEL instrumentation

What investigators experience: one interface, interactive widgets, faster case understanding

Operational monitoring and auditability: Grafana + ClickHouse visibility into tool usage

Plugin design lessons: modular skills, orchestrator + meta-skill, XML prompts, explicit tool scoping

Evals for trust: tool correctness, grounding, and reasoning quality (LLM-as-judge)

Scaling AI adoption with an MCP gateway: a reusable compliance “flywheel” and future automation levels

Get more out of YouTube videos.