EVERY SPOKEN WORD
25 min read · 5,174 words- SPSpeaker
[upbeat music] Hey, how's it going, everyone? Um, I'm Dylan. Uh, I work on the product team at AirOps, and yeah, excited to walk you through g- uh, how, you know, t- the title says how AirOps chases friction, um, with building AI products with Claude. And I guess the main really big takeaway I want you guys to come away with is building agents, um, and just making agents accessible is honestly a really hard problem. Um, I guess, like, with developers, it's a bit easier. People are used to, um, kind of all these different concepts, but when you try to make these accessible to, you know, personas like marketers, there are a lot of friction points in. Gonna talk through some of the friction points that we, um, have seen and battled with. So just to start off, quick intro of who we are at AirOps. Um, we are a growth marketing, um, platform for AI search. And AI search for you guys is kind of like SEO, but for engines like ChatGPT, Gemini, Claude. You know, buyers are asking on Claude different, you know, questions of, "Hey, I want to buy, like, these pair of sunglasses," you know. How do you know and how are you making sure that you are showing up for these searches? So we help brands see how they're appearing in search, um, identify gaps, take action on those gaps, whether that be, you know, creating content, refreshing content, um, and then being able to measure the impact of whether or not, you know, the actions that they're taking actually worked. And so quick agenda. Um, just gonna walk through real quickly how we got here, kind of our approach, um, for agents. Um, a quick just run through of what we just launched last week with AirOps Next. Um, and then two friction points that we really focused and dialed in on for our launch, um, when it comes to making and giving people the power of Claude, um, and making that super accessible. And then closing with a couple of other friction points that we're really looking at for our next, um, you know, kind of next act. So how we got here. Um, AirOps used to be or still is, but we were mainly focused on orchestrating content through a traditional workflow builder. So node-based style, kind of like an NAN, uh, NAN, where you can drag and drop these different nodes. You can, um, orchestrate different variables and how things flow through this, you know, elaborate workflow. And with a workflow builder style, and especially with our, our core customer audience being marketers, um, you would hit this complexity ceiling where you're trying to teach a content marketer what liquid text is, what JSON is, and all these, you know, different concepts. Um, and they also had this short shelf life, which, um, as new models came out, you know, Claude would release, um, you know, Opus, uh, four six, four seven, you know, it keeps on going. You would have to update these different steps. Um, it changes the way that you're building this workflow, so customers are constantly having to go and update this workflow they just spent so much time in building. Um, and also just if they update a step in, like step one, they don't realize that variables and outputs are referenced in step 20. So there's, there's a bunch of, like, complexity and just it was really brittle. And scaling enterprise use cases obviously then required someone technical to help really guide through this workflow creating process. So our goal was to really lower this barrier to entry for, um, content marketers to build and ship, you know, their ideas and the way they want to create content. Um, and also while still maintaining, um, that quality bar, which, um, is a very big focus for them because of course, brands don't want to just be pumping, you know, AI slop out and, you know, having governance around keeping those enterprise standards. So this is a quick just preview of-- or not preview, but a look at what our workflow studio looked like. This use case was actually to create a content brief with internal links. So as you can see, it's, it gets kinda gnarly in there. Um, [chuckles] just for something that does seem pretty simple, uh, it just goes to show, like, how much thought goes into a lot of these workflows and the way that they want to create and, you know, orchestrate content. So how do-- how can we take this, you know, kind of structure and create an agent experience while still harnessing, you know, the way that we use Claude code and these other, uh, agent tools? So I think the first really breaking point when we decided to invest, uh, heavier into agents was really with the re-launch-- with the, with the release of Opus four five. I think that's when a lot of people started to see, um, really how smart the models were in tool calling, being able to follow instructions, um, to, uh, in a, in a way that, you know, they weren't breaking the, the standards that they were setting for it. And for us, we at the same time were trying to make building easier. And one of the ways that we do that, and I'll, I'll dive more into it a bit later, is through kind of this document-based style builder. And the first thing that we tried doing was actually taking this kind of like Google Doc in a way and with an LLM compiling a workflow in the background, which was an interesting idea. It was, like, a bit brittle. Like, there's a lot of, you know, error points that can happen there trying to turn, like, this non-deterministic, um, you know, kind of instructions and making this actual workflow. Uh, then went after that was then, you know, using a traditional agent orchestration framework, um, where you're, you know, kind of defining these nodes. You're letting an LLM decide different, um, decision points. And we got to a point where we were getting pretty good outputs, but with those traditional frameworks, they honestly are, like, pretty brittle. If I ever wanna change the way I want to orchestrate different sub-agents, um, I basically have to make code changes, right? Like, I actually have to go and change the way I'm routing these different nodes and steps together. And that's when we really decided to, you know, invest heavier into the Claude Agent SDK, where it's, it's honestly pretty, pretty awesome that you can kind of orchestrate agents just through Markdown files and provide skills, um, in different contexts really just by manipulating this environment and harness rather than actually, like, you know, doing that programat- pro-programmatically through, um, you know-A traditional agent framework. So that's kind of led to our launch, which we just had last week with AirOps Next. And briefly, just to run through exac- what did, what did we exactly launch? Um, the first one was Quill, which was kind of our, our branding of our agent captain for content marketers. Um, Quill had access to all the data that we provide to teams, um, whether that be AI search data, the brand context in Brand Kit, which houses, like, literally everything about a brand. So, you know, you know, Quill is following the instructions of how content should be created. And, um, just making it easier throughout the UI to get people from insights they see in the dashboards to actual actions. Um, being able to take the findings and gaps and get to that in the very next step. The next one is Playbooks. So Playbooks is basically our new building experience. Um, and for all you, like, developers out there, it's pretty much like a skill. That's very much like what we grounded it in, is like how can we make skills accessible? And marketers are all, like, used to the kind of a document-based style. Um, but we just allowed for, like, collaboration on these Playbooks/skills and also, uh, governance and versioning. So, you know, you'll have people with, you know, 10 different versions of this playbook of, uh, how do I wanna create this, this piece of content. And just real quick, just results that we've seen from customers. Um, we did, you know, a case study with Parallel. We helped them produce and create content, and they saw 130% increase in citation rate, uh, 42% increase in share of voice, and they were able to go live in one week, which for us is a huge accomplishment just because traditionally, especially 'cause we work with these enterprise customers, um, it usually takes around a month, at least with, like, the workflow builder of constant feedback, going back and forth. Like, "Hey, this is, you know, not really how I wanna be speaking in my, in my blog." Um, there's, you know, citations that are, like, kind of being who say or aren't right. They're not... These are other citations that I wanna use for this piece of content. So it was really incredible to be able to get to, um, that acceptance criteria in such a short amount of time. And other just, like, quick customer quotes too. Um, incredible to see how agents has moved what really, like, LLMs can do and how marketers are viewing using, you know, these more agentic workflows. Um, you know, Animal saying it felt more like a mid-level strategist for their team. And, um, at Rippling, being able to focus on off-boarding a lot of the tedious tasks and really focusing on, like, where does my expertise come in, and being able to add that value to the agent while it's creating that content and, um, feeding that unique context. So we were able to really accomplish this by focusing on two main friction points. So when building and trying to bring high-quality agents to these more, you know, highly professional and enterprise use cases, um, we really had to focus on how do we fit into the actual workflow of these content marketers and, you know, how do we make sure that we ensure, uh, quality outputs through the way that we're building our harness and orchestrating our harness. So the first one, um, friction point I wanna talk through is, uh, I, I have it saying, like, endless use cases forces intentionality. And I think a lot of us have probably been there too when we first started using Claude Code or using an agent, and we're like, "This thing is, like, really powerful. Like, I can do a ton of different things." You're like, uh, it, it's really easy to start sprawling into this spiral of, um, yeah, just like there's so many different use cases. And I think, uh, not only on, like, the product engineering side, but also for customers, how do we be very intentional about what is the actual, like, problem that we're trying to solve here, and how do we really force it and understand, like, what is, um, you know, this workflow that we're trying to solve for people? So, um, kind of like in my head now, I have this, like, like, mini, like, Steve Ballmer head just going like, "Marketers, marketers, marketers." You know, just like really trying to focus on who is, like, my customer in this use case that, that I want to tackle. Um, and just a quick glimpse into what does, like, this workflow look like for a content marketer. This is one example use case for content creation where they'll discover, um, what do they wanna create on. They'll research of this specific topic that they want to make sure that they're ranking in AI search or traditional search for. They'll draft a brief, generate the article, and then add any sort of, like, internal linking and best practices when it comes to SEO and AEO. And throughout this whole process, there are different human review points. Um, human review comes up a ton and human in the loop comes up a ton in content marketing, uh, especially just with when you're being cognizant about the content that you're pushing out. You, you really wanna make sure that it is adhering to the way that your brand talks, to the information that you're serving out on the internet, um, and just giving you the best shot of also being surfaced, um, within kind of like generative AI search. So that whole process kind of led us to focus on two parts when it comes to the most... I gue- these are kinda like the two most important parts we saw with the content marketing flow. One is having a document-based kind of like IDE, which again was like our playbook view. We wanted, uh, you know, marketers are super familiar with documents. They are used to Google Docs. Um, they've probably been u- working with Docs for, for years and ages, even before, like, technology. Like, you have this piece of paper that you're kind of defining how you wanna do something. So making that familiar versus, you know, this node-based workflow builder. Um, transparency was also really important. One thing that users actually really liked about the workflow builder was that they could actually see what tools were being used at each step. So how do you still have a document, but at the same time when I read through I can understand, like, this, this is the exact tool that I'm using at this instant. Um, this is the type of context I'm feeding at this point. Um, and just being transparent around that. And that kinda goes hand in hand with control as well. How can I make sure that I'm still in control of this, you know, set of instructions or, uh, workflow that I'm building?Um, especially now that it is really, like, not as, like, deterministic because it is, like, a document. So, um, those are three areas that w- that we focused on. And then again on enforcing human review. How can we bring governance, configurability, accountability into an agentic workflow? Which I think is pretty unique for us, and I don't think, um, a lot of brands, like, kinda, like, tackle, like, that problem. I think a lot of it is w- like, with agents, also with coding, is you kind of let it go. It'll, it'll finish what it's done with, and human review in that instance with coding is usually, like, with PR reviews after it's literally finished the whole job. It's rarely ever, like, while it's going through the actual, you know, coding process. It'll, you know, kind of like ask for feedback of taste in, in different areas. So, um, that was, like, another area that we, that we really focused on. And with that, I actually wanna jump into a quick, uh, just, like, demo of what this looks like within, um, within AirOps. So, yeah. The first thing I wanna cover is our playbook. And again, it's kind of like this skill, this natural language builder where I can go in and either, like, type in, like, with a slash command and, you know, define different inputs, different outputs, and also all these different tools that, you know, content marketers can use and that they're super used to using. And with it-- when it comes to tools as well, you can add any MCP. So if there's other outside connectors that you usually use, you can use those and access them. Um, we also have the ability to schedule different triggers. So this gives this kind of like always on skill or playbook or agent that, you know, can do this certain action at either a scheduled cadence based off webhooks. Um, we also have monitor, which we've, um, kinda like partnered with, like, uh, Parallel when it comes to just being able to put a query in. I like saying, you know, watch the internet in a way. So when certain things happen, um, it would trigger off this playbook to then run. And then the last one is, uh, AEO insights. So whenever a metric drops, let's say like my citat- my citation rate dropped in the last, um, like, last week, then it would trigger off one of these playbooks. It can go and, like, do this research and come back to me of, "Hey, like, this is the reason why, um, this happened." And as I jump through into this kind of process of, you know, someone basically created this SOP of how they wanna create this blog. Uh, if I come to the outline section, we enforce human review by adding the ability to assign different users at the end of each section. So as it, as, you know, the agent goes through this whole playbook, it'll come to this section step and decide to... We-- I think we have a tool in the background that will then, like, fire off. And basically, since I'm assigned, I'm the only person that can actually unblock this agent. Other people can still leave, like, comments on the outputs and artifacts and different feedback, but I'm kind of like this gatekeeper now that, um, has to review this piece of content, and we're trying to-- we do that to ensure, like, that governance. And what that actually looks like and how that's surfaced to users, one way is through our inbox. So we have an Air O- uh, we have an inbox within AirOps where every single time human review happens or different, you know, opportunities are surfaced, users can actually come in here and see those directly. So I can, you know, click on one of these, um, items. It will open up our basically, like, agent run. So on the right side is basically this agent, like, running through its whole process. I can kinda see its thought traces. On the left side, all the different, you know, outputs and artifacts that are, you know, one, defined by me through that playbook. And I can then either, you know, edit this existing document, leave certain comments, and then, you know, kind of kick it off and approve it. The other way that we also surface this governance and human review is through our grid. So our grid is our way of orchestrating content or orchestrating basically content at scale. So in this specific example, I'm just kinda showing, um, like, the way that you can actually collaborate with these. So Augustine is also in this document with me. Um, I also-- I'm able to edit it, also leave human review. If I close this out, I can see within the grid all these different, um, you know, kind of outputs that are, that are running. We're basically kind of like running s- uh, skills at scale, where each one of these rows is a specific job that I wanna accomplish, and I'm running that playbook. Um, and I can also then see, like, human review at scale and click in these different, uh, cells to, um, then leave, like, my feedback and ensure that, you know, all this content's talking in my right tone, uh, tone of voice. Uh, awesome. The second friction point I wanted to talk about was with agents, um, the biggest worry is around consistency, especially when you're coming from this workflow base. So how, you know, how did we tackle going about producing quality outputs? Um, how do we, you know, make sure that customers are also seeing that and, and, you know, being aligned to that as well? One of, um, the ways that we visualize this, and actually, like, this, this graphic here was, um, created and, and used in reference by our VP of sales, which, like, funny enough, is actually, like, one of the best explanations I feel like I've seen of harness engineering. Um, you know, on the product side, we've, we've tried a couple of times, but it's just funny, like, from, you know... The go-to-market side, still is the best way that we've been able to explain this, where, um, you kind of have this car where you then have an engine, the model obviously being, you know, Claude, Opus or Sonnet, whatever engine that you wanna pick for that. And everything else that goes around then, everything else that you build on top of it, is super important in terms of creating a super great agent. So the two that I wanna focus on, and the one that we focused on the most, was around tools and the way that we're orchestrating context. And, you know, the Claude Agent SDK and also Claude Managed Agents, uh, API have been super helpful in terms of being able to iterate on this quickly. Um-And just making sure that we're like kinda like programmatically setting these different, um, sub-agents and, yeah, they've, they've been like super instru- uh, instrumental in, in terms of us getting to a quality, uh, output. So the first one I wanna cover is around tools, and basically the background on tools are you could give an agent a bunch of primitive tools. So in this specific example, what we were always trying to accomplish with Claude and our agent was helping, um, Claude understand what is wrong with a page on my website. We have different tools like access to, uh, you know, traffic data, citation data, um, scrapers, so I can find like similar competitor pages. And we started off like honestly with a skill of, "Hey, here's like, here's the way of dissecting whether or not a page is losing, what's wrong with the specific page. Are like the schemas off? How does it compare to competitor pages?" And it would kinda go on like these like safari trips, and it's honestly like a bit like token inefficient. So one thing that we wanna focus on is how can we create specialized tools of jobs that Claude does that we know are, is gonna do like over and over again, and how can we just make that a bit more deterministic where, you know, Claude can then put a URL and immediately spit out everything about that page, as well as different structured, you know, content gaps, uh, versus like other similar pages in that, um, in that space, as well as what should be my target keywords and my target prompts for it. The second tool that we made, uh, in just like a workflow that we saw that was super common with content marketers was our page versus tool, which was essentially being able to benchmark my page against, you know, top ranking pages in that space, um, and finding what exactly is behind those pages and how can I, you know, how can I close those gaps? So this is just like a really simple way of getting context efficiently. Um, it's, it's kind of like a code mode in a way. I know that's been, you know, something that's been popular nowadays is being able to be more programmatic in terms of how we're fetching context versus kinda like looping through these different, um, you know, tool calls. It's like, can I actually just like produce code that will fetch exactly what I need, um, in, you know, one loop. The second one is through sub-agents. Sub-agents have definitely been instrumental and crucial in terms of getting to that quality of output. Um, in general, like what we tell users too with playbooks, and honestly when you're first creating your agent harness, is to actually just start off with, you know, Claude itself and just have it go through its own tool calls, and really not trying to make it too complex in all the context that you're trying to, you know, give to it. So that's, that's where we start off with. And we were reaching a couple of, um, kind of like error spots when it came to the quality of the outputs we're getting. So what we did was decide to add on, like over time, certain sub-agents. The first one was a compliance check, so being able to just make sure that we're not like kinda polluting that main context window just because context rot honestly is like still a huge problem and will probably continue to be a problem when it comes to like what are, you know, the tokens and texts that you're attending to. Um, so being able to spin off the sub-agent that will go through and have everything that it needs to know about my brand and whether or not the content that I just produced is following those rules. It'll, you know, come back with a score, whether or not it adhered to them, what were the things wrong with it, and then the agent can then take that feedback and then kinda make edits. The second one we did was around writing. Um, we first again, like tried just, just using like the regular Claude, you know, harness to actually write the content, but we found that it was better to spin off a sub-agent with its own very focused context window to focus solely on just writing that piece of content. Um, so it's not distracted by any of the research that was created, um, or any of maybe like old compliance checks, um, and just being able to like very focus in on that one job. The next one is, um, a brand kit sub-agent. So again, like brand kit for us is basically just like kinda like a knowledge base or context layer of everything you need to know about a brand. And this is something that we kick off actually at the beginning of all of our runs, where we have a, um, brand kit sub-agent that will go and fetch all of the relevant context it needs, and then stores it as like an internal artifact. So then throughout the whole process, um, our main agent loop can then reference those artifacts versus having to like use tools in our MCP to refetch like that context. 'Cause then what happens is, um, in different sub-agents, you might actually have different brand context that was fetched over time. So just being able to fetch that up front, store it as an artifact, and then just redirect the agent to always be referencing, um, that same artifact. And then finally, um, just still being able to add custom ones. This is more, more for us like an internal, um, tool where like some of our like solutions architects who work for our customers can spin off different sub-agents when, when needed. Um, and just really helps with just maintaining that context. I think that's definitely been the biggest learning is although like context windows do continue to grow, um, you know, you have like a million context window with, um, with Opus 4-7, but just still being really cognizant that just because it is, you know, larger doesn't mean that you should be like using the whole thing. You should still be very efficient of what sorts of contexts are, you know, you're letting the model attend to. Yeah, and in terms of results, what we saw was actually a 8% decrease in like fewer tokens consumptions around at least that specific tool. So instead of, again, like that tool spinning off different primitives and, you know, going over and over again, you actually just have this one tool call that fetches a page, um, and then all the results from it. The second one from that also like those specialized tools was also speed. Um, beforehand, while it was making like those 20 different tool calls to fetch all the context it needs, you just have this one entry point that spits out everything that you need. Um, and then when it comes to the quality side, we had 10 enterprise customers while we were do- running this beta actually start publishing con- uh, publishing content in under two weeks. So-They were able to self-serve and also, you know, get to a quality of output through these agents, which was something that used to be kind of like a hold-handy experience. Um, it was also had, like, a really high, um, kind of like ceiling. So yeah, um, when it comes to like building these agents and with Claude being able to execute and making it easier to execute on things, um, it's really easy to think that there are no more problems to solve. But really, every single time a problem is solved, just that friction point always keeps moving. So those are kind of the two that we, um, were really chasing after and we really want to tackle, and there's a ton more that we also want to continue chasing. But I think just overall, it's a great thing to just continue to chase friction because that really is how you create production, you know, agents and make those more accessible to users outside of, you know, more technical spaces. So just to quickly close out, um, the next two kind of friction points that we're looking to battle and excited to share more learnings on are around really self-improvement and feedback loops, which there were awesome talks yesterday around dreaming sequences and you know, how-- I, I think the most interesting one is actually how do you structure summaries of different traces? Um, what, what's the best way of collecting the most relevant memories? And also like forgetting is actually a feature, like being able to forget certain types of memories. And the last one is benchmarking content creation agents. Um, I think something really interesting about our space is that it isn't, uh, law, it isn't coding, it's not something that is really easy to say whether or not it's, you know, something's correct. There is a lot of taste that goes into a piece of content that you create, and there's a lot of opinions of how I want certain context to be formatted and created. So what are like the best ways that we are creating benchmarks so that every single time we do change our harness, we add a sub-agent, or we decide to add a skill to our harness, how do we know that it's actually improving outputs and not just, you know, kind of vibes that, that we're going after? Um, but yeah, I'm around all day, would love to, you know, chat with you guys and hear about what you guys are building. But yeah, hope this was, was helpful. Thank you. [audience applauding] [upbeat music]
Episode duration: 26:46
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode M5uwBawBDpw
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome