Aakash GuptaMasterclass: How to Turn an AI Agent into a Real Product (No Code)
EVERY SPOKEN WORD
90 min read · 17,917 words- 0:00 – 1:34
Intro
- AGAakash Gupta
While we've all been doing vibe coding, Tyler Fisk has gone to the next level to build production AI agents.
- TFTyler Fisk
We are on a podcast right now, so no pressure at all, and we're gonna do a live agent build here. We're spinning up multiple agents here just to kind of get this process done and get all the context that we need as quickly as possible.
- AGAakash Gupta
I love how many like parallel processes you have going here. My guest today, Tyler Fisk, is one of the world's most experienced AI agent builders in the world. He has taught thousands of students to build AI agents, and he has worked with hundreds of businesses to implement AI agents in their workflow.
- TFTyler Fisk
We find that a lot of the work comes when we're doing this for actual clients, and like what we teach is this idea of the mentality of an AI practitioner, which happens to be very similar to like the Ford deployed engineer mentality.
- AGAakash Gupta
Anybody who's watched this far knows why you've hit that success. You just packed thousands of dollars of lessons into sixty seconds.
- TFTyler Fisk
Deep research is such a powerful tool that I think more people should know about and use because just such a good way to gather info for your agents quickly.
- AGAakash Gupta
One huge takeaway for me from watching you build these is- [beep] Really quickly, I think a crazy stat is that more than fifty percent of you listening are not subscribed. If you can subscribe on YouTube, follow on Apple or Spotify podcasts, my commitment to you is that we'll continue to make this content better and better. And now on to today's episode.
- 1:34 – 3:00
Tyler's Background: AI Agent Expert
- AGAakash Gupta
While we've all been doing vibe coding, Tyler Fisk has gone to the next level to build production AI agents. My guest today, Tyler Fisk, is one of the world's most experienced AI agent builders in the world. He has taught thousands of students to build AI agents, and he has worked with hundreds of businesses to implement AI agents in their workflows. He has gone to the next level to productionize these multi-agent workflows. He normally charges thousands of dollars for this knowledge. Today, we're giving it all away for free. Tyler, welcome to the podcast.
- TFTyler Fisk
Thank you for having me, Aakash. Like, really excited to be here.
- AGAakash Gupta
What are we gonna do today?
- TFTyler Fisk
Uh, well, we are gonna go from idea to building out a couple of different AI agents and then see if we can have them all work together and orchestrate and build an AI workflow, like real time from scratch.
- AGAakash Gupta
Do you need a CS degree to build AI agents in production?
- TFTyler Fisk
No, I, I don't think so. Uh, I don't have one, and we've had loads of people come through our class. In fact, there's a running joke that says we've even had a student say that, uh, they couldn't even spell AI if we spotted them two vowels, and yet, uh, they're building agentic workflows in four weeks. So yeah, no, you don't need a CS gr- degree anymore, I don't think.
- AGAakash Gupta
Amazing. Can you show us how it's done?
- TFTyler Fisk
Sure, yeah. So what do we want to try and build today?
- AGAakash Gupta
I want to see a multi-agentic workflow. Let's work on some popular company like Apple.
- TFTyler Fisk
Cool.
- 3:00 – 4:12
Live Demo: Building Apple Customer Service
- TFTyler Fisk
Let's do it. And we will start in a tool called TypingMind. And TypingMind is essentially like a playground tool, um, for any LLM that we want. And so I can use an agent that I use frequently called Gigawatt. And Gigawatt here is an AI prompt engineering and AI engineering agent. And we're gonna run this. It's gonna be connected to Sonnet-4. I've got some MTP tools toggled on here, uh, like Exa, Perplexity, uh, Sequential Thinking, and then we're gonna do some research because that-- we find that a lot of the work comes when we're doing this for actual clients, and like what we teach is that it's this idea of the mentality of an AI practitioner, which happens to be very similar to, uh, like the Ford deployed engineer mentality. So we need to understand the problem, understand the business, really get to the secret sauce of it, and from there, we can kind of gather all of our requirements and then start our build.
- AGAakash Gupta
Awesome.
- TFTyler Fisk
Okay. So let's see here. What-- We wanted to try to do this for Apple, I think is what we wanted to do.
- AGAakash Gupta
Yep.
- TFTyler Fisk
Cool. All right. So I joke when I tell people what I do for a living now is I talk
- 4:12 – 16:08
Gigawatt: The Agent That Builds Agents
- TFTyler Fisk
funny to robots, so that's what we're gonna do here with Gigawatt. So hey, Gigawatt, what's up? We are on a podcast right now, so no pressure at all, and we're gonna do a live agent build here. What we're gonna do is we want to build an expert-level agent for Apple, and that's Apple Computer. Uh, so I want you to go out and use your tools, your sequential thinking tool, your deep research tool, uh, Perplexity to really think about if we were gonna be building an agentic workflow that handles inbound customer service-type emails, and we're gonna have two different AI agents as a part of this AI team to handle those. One of them is gonna be this expert, and that's the one that we're gonna focus on first. It's not gonna talk to any people. Its whole job is to go and find the best possible information to help support the email agent that we're gonna dub, "You've got mail," and then send that response to it, and then it will craft that email out to the customer service, uh, or to the customer as a customer service agent to respond to them. So I want you to think on this deeply. Ask me, let's say, three clarifying questions so that you and I are on the same page. Uh, we're not in prompt engineering mode yet. This is where you're just going to do brainstorming, and we're getting on alignment on what this build is that we're gonna do. All right, let's see what you got. Go get 'em slugger. So that's like a funny thing, uh, that we always do here at the end. Like, people always ask, like, "Why the heck are you saying, 'Go get 'em slugger?'" And the fact that I'm even talking to this agent in this way, like very kind of loosey-goosey, and I'm actually just like talking to it, so I'm using Mac Whisper on my computer, is because the system instructions for Gigawatt here are extremely, uh, detailed and comprehensive and trained on how it is that we've done this process.For several years now at this point. Um, so we can see it's using its sequential thinking tool here. Uh, it's thinking about like what it wants to go do. This is very similar to the thinking traces that you get from, uh, like any of the reasoning models, but we can use this tool, uh, with any model now. And it's come back, and it's asking us these questions here. So love the energy. Here we go. Uh, Apple's product ecosystem is massive. We're talking about iPhone, like the whole shebang. Uh, so it wants to know, let's see here, clarifying questions. Are we doing this to an, uh, an expert that handles the entire Apple universe, or should we focus on a specific subset? Like maybe the hardware support or just customer devices? Are we going full enterprise? Um, let's just do, let's do all of the main consumer products. So, all right, Gigawatt, I'm gonna go through here and answer some of your questions, just stream of consciousness style back to you here. Uh, we're gonna build this expert agent so that it knows all things Apple products and Apple culture, plus normal, uh, information that it would need to know if it were an expert on the Apple team to try and surface information that would be useful to respond to generalized customer service questions. So this could be everything from, you know, how do I set up family sharing on my devices? How do I update the iPhone? What is even the latest iPhone? How do I do this thing in iOS? Like all, the whole gamut there. So we need to think about this deeply and the kind of information and expertise that we are gonna need to gather in to be able to do that here. Okay, so there's that one. The second one, uh, let's see. What kind of customer service scenarios are we optimizing for? Are we talking technical troubleshooting, um, product information requests, billing account issues, warranty, all of the above? Okay. Uh, on your second question, let's just say all of the above, just for speed. Um, like, because you never know what kind of customer service questions you're gonna get, especially for a, a company the size of Apple. So right now, let's solve for all four of those that you mentioned. And let's come down here a little bit more. Our information sources and research depth. What level of research capability should the expert have? Should it be pulling from Apple's official documentation? Okay, so great question. So it's wanting to know, like, basically, how is it gonna know this info? This is-- we're gonna build it into the system instructions, and we're gonna connect it to a RAG retrieval knowledge base, and we're gonna connect it to an internet tool so it can go out and, and actually find this information. Let's tell Gigawatt that. Hey, Gigawatt. So this agent, this expert agent, will be connected to a RAG retrieval database where we will scrape in, uh, Apple's website. And for the demo, the sake of the demo, it's not gonna be the entire comp-- like whole website scrape just for the sake of time. But the agent needs to know that it's connected to that, and it can go search through its RAG to find anything helpful to answer the, the questions that the customers are asking. It's also gonna have a lot of the expertise written and built into the system instructions itself, and it's gonna be connected to a web search tool, so go out on the open internet to find information as well. With that one, though, we need to be careful that when it's going onto the internet, it's looking at Apple-specific sites or official information, because when it's out on the open web, that could cause hallucinations. So we need to think system instructions, RAG database first. Internet is good to find real-time info, but we need to use kinda chain-of-verification mentality when we're, um, when it's thinking about that kind of stuff. All right. So let's do that. And that should get us up and rolling.
- AGAakash Gupta
And while this is running really quickly, if a user wants to replicate this Gigawatt infrastructure, this agent, how would they go about doing that? Can they access Gigawatt, or how would they rebuild it?
- TFTyler Fisk
Yeah. Um, well, Gigawatt is something that we give versions of it away as a part of going through our course. That's one thing. And the other thing is, like, we're actually working on turning Gigawatt into a real live product. We are vibe coding this right now. Like, I've been deep in VS Code with Claude Code and Codex trying to turn this into a product that I don't have to be there anymore, that like this whole back and forth that we're doing now and forcing the research and even deep research happens completely autonomously. Um, so yeah, that's, that's a couple ways that we're working on it.
- AGAakash Gupta
If somebody didn't take the course or you hadn't released the product yet, how could they replicate something like this to do this on their own?
- TFTyler Fisk
Um, I mean, the way that I've done it is that everyone kinda has their own best practices around prompt engineering and what good looks like to them. So I have taken a lot of the work that I've done over the last several years where I manually did this. And because I originally built Gigawatt to speed up my own processes, um, I use like what good looks like from my point of view and how I approach it and the different AI research techniques, um, that we use. So you heard me say chain of verification there a minute ago. That's a process that Meta came out with here a few years ago that is, was like produced in a, in a research paper. I'm a nerd. Like, I love reading that stuff and, and watching content around it. And when I saw that, I recognized, oh, this is a framework we could start incorporating into agents so that they hallucinate less frequently because it's like reviewing the information, kind of fact-checking it, giving a confidence score before it ever decides to include it in an output. So that's like one example of one piece of AI research that Gigawatt knows about. So if someone wants to rebuild that, it's kinda collecting, um, like all of these, these kinds of bits and pieces of information, and then now the LLMs are really good at writing, uh, prompts to begin with. So if you could gather that, put it together, um, you're gonna have a pretty good prompt engineering agent just going from there.
- AGAakash Gupta
Okay. So the goal is to build a prompt engineering agent in this tool, TypingMind, which gives that agent access to all these tools that you listed off, and then you prompt that agent that's your agent builder helper [chuckles] to help create the actual agent you want to create.
- TFTyler Fisk
That's right.
- AGAakash Gupta
Awesome.
- TFTyler Fisk
Yeah, exactly right. Like, it's, it's, uh, a version of meta-prompting almost, and you'll see that, uh, happen in here frequently, like, as I go through this process. Um, like, not only to build Gigawatt, the kind of odd thing that's like inception-y dream within a dream is that I've used older versions of Gigawatt plus, like, my feedback and improvements and, and new techniques that I've come up with or, or found out, you know, as, as, like, new research comes out, and then bake that in and Gigawatt can meta-prompt itself and help build a better version of, of Gigawatt or whatever agent that is that we're trying to build at the time.
- AGAakash Gupta
It's like turtles on turtles.
- TFTyler Fisk
You got it. [chuckles] Yeah. So let's see here. So let's come back. So Apple expert, uh, agent architecture, uh, complete Apple ecosystem, multi-source, um, intelligence here, chain of verification, comprehensive customer service coverage, information hierarchy. So I'm just, like, speed reading through this just for the sake of time. So it's-- it knows that we wanna make sure that the RAG database is primary. Secondary is when it's thinking about responding, it's working from the built-in system instructions. Tertiary is gonna be verified web search with confidence scoring on that. The output structure is going to be for the You've Got Mail agent. That's an important one. Um, like, if we were doing this in production, and I won't do it today, I would probably have inter-agent communication print its output in JSON because, uh, it's not the prettiest thing for us to look at, but when you go agent to agent, it's, it's very useful because the LLMs can parse out information, uh, from that very easily. They-they make sense for them, and I'm-- I know I'm, like, anthropomorphizing them, but it just is what it is. And, and that technique really works well. So, um, while this is also running in the background, let me come over here and do this just so we're multitasking here. I'm gonna switch over into Cassidy AI. Cassidy AI is, uh, a platform that we teach on and that we use frequently. It's a no-code, no-code platform, and it makes things like web scraping in-incredibly easy. Uh, so let's just go and grab, um, let's see here, apple.com. Grab their website, pop back over here into Cassidy. We're gonna chuck this in. I'm just gonna do a domain-level scrape just for, um, expediency. We're gonna do 1,000 pages. I'll turn the sync off here and just go save and begin import. And so while we finish building these agents out, uh, this is doing a web scrape on Apple's website. And so this is what we're gonna connect into the RAG here in just a moment. The other thing that I would do is this. We're going to, uh, use Deep Research. Hey, Gigawatt, we're gonna use Deep Research, and we're gonna work with our Clear agent, which is another agent that you're familiar with that is an expert in writing Deep Research prompts. So think about any areas that you think would be extremely valuable either to help further inform you in this build process or to add as assets into a RAG retrieval knowledge base and give me, let's say, three areas of Deep Research you would like to see done. Be as comprehensive as possible in your response there, and then I'll hand that over to the Clear agent so we can start, uh, getting those Deep Research agents up and rolling. Thank you.
- 16:08 – 17:00
Ads
- AGAakash Gupta
If you've been enjoying this breakdown with Tyler on how to build multi-agent systems, you are going to love his Maven course. It is one of the highest growing courses on Maven, and if you use my code, you will get money off. So use my code, AAKASHxMAVEN. Use the link that's in the show description notes, that's in the email or newsletter that you got about this or in the social media post. Find that link. Go sign up for Tyler's course. He does it with his partner, Sara. This is one of the highest reviewed courses on Maven. And if you saw, we weren't able to cover all the details in the episode. So if you wanna actually get through building the workflows, deploying the workflows, building the evals, they teach all of that in the course. I highly recommend it. And now back to the episode.
- TFTyler Fisk
All right, so let's do that. And so now we're gonna let Gigawatt write this up. And
- 17:00 – 26:02
Deep Research & Knowledge Base Setup
- TFTyler Fisk
again, because I'm talking to these things, uh, and just using Mac Whisper to, like, transcribe this, and there's lots of tools that do this now. Like, I can talk so much faster than I can type, so this is getting-- Like, from every angle here, the efficiencies that I get in doing this kind of work is just exponential now at this point. It's kind of ridiculous. Um, so while this is happening, I'm gonna call in my Clear agent. This is another agent that, again, is kind of a prompt engineering agent, but it's more hyper-focused into writing Deep Research prompts, and that's to go directly to be used on something like Perplexity or Anthropic, ChatGPT, Google, those four primarily. And it will see what it is that Gigawatt wants to see Deep Research done in, use that to write the Deep Research prompt in a way that we found that works really well, and then from there, we'll go let it go do Deep Research. Now we're spinning up multiple agents here just to kinda get this process done and get all the context that we need as quickly as possible.
- AGAakash Gupta
Today's episode is brought to you by Vanta. As a founder, you're moving fast toward product market fit, your next round, or your first big enterprise deal. But with AI accelerating how quickly startups build and ship, security expectations are higher earlier than ever.Getting security and compliance right can unlock growth or stall it if you wait too long. With deep integrations and automated workflows built for fast-moving teams, Vanta gets you audit-ready fast and keeps you secure with continuous monitoring as your models, infra, and customers evolve. Fast-growing startups like LangChain, Writer, and Cursor trusted Vanta to build a scalable foundation from the start. So go to vanta.com/aakash, that's V-A-N-T-A.com/A-A-K-A-S-H to save one thousand dollars and join over ten thousand ambitious companies already scaling with Vanta. AI is writing code faster than ever, but can your testing keep up? Testkube is the Kubernetes native platform that scales testing at the pace of AI-accelerated development. One dashboard, all your tools, full oversight. Run functional and load tests in minutes, not hours, across any framework, any environment. No vendor lock-in, no bottlenecks, just confidence that your AI-driven releases are tested, reliable, and ready to ship. Testkube, scale testing for the AI era. See more at testkube.io/aakash. That's T-E-S-T-K-U-B-E.I-O/A-A-K-A-S-H.
- TFTyler Fisk
All right, so let's see this. So let's go Clear. Hey, Clear, uh, you can read this thread so you're fully up to speed. We're building out a couple of AI agents here, one that's gonna be an expert agent and one that is going to be a customer service email agent. Gigawatt has just given you back three different areas of deep research that we would like to see happen. So take them one at a time, focus on them, uh, each, uh, in a loop, like one right after another, and then write the deep research prompt that we can go out and use on, uh, let's just use Perplexity, uh, and Anthropic for these just to, to knock it out quickly. Uh, so do that and then write those deep research prompts, please. Thank you. All right. And again, like the reason that I'm able to just talk to these agents and it feels like I'm not prompting them that well is because on the back end, if we want to go like peek at this for just a split second, um, like Gigawatt system instructions, like even this version we can see it, are, are quite extensive. Like this is a different version of Gigawatt. This is three point two. It's one of the earlier versions, and it's almost 72,000 characters. So it's-- there's-- it-- they're extremely extensive in here as to like what we're covering. They're written in XML, um, because I, I do a lot of my work in Anthropic. I work with all of the models, but specifically I work a lot with Claude models, and Claude is trained on XML, so you can get it to do some additional functionality by, by using this. Um, and because we're so organized, detailed on this end, we can be much more kind of like conversational on the front end here. All right. So we've got our first Clear agent prompt here, so let's just take that. Again, we're gonna-- it helps being like, um, I guess neuro spicy to kind of spin these things up quickly. So I'll just use Perplexity Deep Research to start doing that area of research. Let's come over here, and we have prompt number two. We'll go spin that one up in Claude just for variety. Uh, so let's chuck that in there. Research, let it roll. Pop back over here into, uh, like this last one, and we'll just do another Perplexity one just 'cause it's a little bit faster. So-
- AGAakash Gupta
I love how many like-
- TFTyler Fisk
[chuckles]
- AGAakash Gupta
... uh, parallel processes you have going here. That's where the speed comes from, and that's how you're able-- gonna be able to finish this in a podcast.
- TFTyler Fisk
Yeah, exactly. And this just comes from like repetition of doing this a lot. So-
- AGAakash Gupta
Yeah
- TFTyler Fisk
... uh, let's see here. Gigawatt five point one. All right, so now I'm switching the chat, like in chat back over to Gigawatt, and I moved off of the Clear agent. So that's kind of the cool thing here is that in TypingMind, it's kind of like in ChatGPT how you can call in like different custom GPTs or Gemini Gems. I think you can do that too. Um, it's the same thing here, but this is running all on the API layer, so we have much better security, much more control over like system parameters and things like that, and also like MCP tools. So let's go back here to Gigawatt. And all right, Gigawatt, we have got all three deep research agents up and running. Uh, so that's gonna take just a second to finish. In the meantime, what I want you to do is think deeply about everything that we've discussed so far, the set and setting that we are, uh, building for here in this AI team, and I want you to give me back a PRD as you understand what it is that we're building specifically for this expert agent, and let's call this Agent, um, Core, please. Thank you.
- AGAakash Gupta
Let's go. All the PMs are smiling. There's a PRD in here. [chuckles]
- TFTyler Fisk
You got it. Yeah. Yeah. So for all the folks that don't know what that is, 'cause we, we have to explain our, our tech bro speech frequently, uh, PRD is a product requirements document. So this is basically, think of it as like a full writeup of, uh, what are we building? Who is it for? What's this journey look like? What is it gonna do? What is it not gonna do? Like all, all of these like different, um, lenses you look at a project through. And by doing that and even training a model on what a good PRD looks like from your point of view, that could even be its, and probably should be, its own separate agent. Um, like y-you get a really good alignment to like lay out in the thread here, like what's our game plan? And then Gigawatt's gonna go build from that. This same process works like when you'reLike after we've done all this work and we're like, "Hey, let's go turn this into an actual product or a workflow," or, or whatever that might be. Um, like coming back and getting your plan together first before you ever try and go do the actual work is, is paramount here. All right, so let's see. We've got our PRD going. We've got Core is gonna be our ecosystem expert, and it just so happens, like I'm, I, I may or may not have built, uh, an Apple expert agent before in class.
- AGAakash Gupta
[chuckles]
- TFTyler Fisk
And I like, I'm a millennial, so like I'm extra on using emojis and like fun names here for these agents, and Claude all on its own came up with Core. I was like, "That's a freaking good name." So like anytime I'm building an expert, an Apple expert, that's what I'm doing here.
- AGAakash Gupta
[laughs]
- TFTyler Fisk
So let's see. All right.
- AGAakash Gupta
This was not coordinated, folks. He just asked me who to build it for. We just-
- TFTyler Fisk
Yeah. [chuckles] We just were riffing on this
- AGAakash Gupta
... so many agents that he had already built this. [chuckles]
- TFTyler Fisk
Yeah. It was either this or, uh, we were joking about Hattie B's Chicken before we got rolling today. I was like, "We can build it for that too." [chuckles]
- AGAakash Gupta
[chuckles]
- TFTyler Fisk
Uh, yeah. So let's see. We're gonna build Core out here. This is the research and intelligence, like expert agent. Um, it's given its primary function, who the target user is, which is important here. We want it to remember, because the inputs that this agent is gonna get is like, for us, we're gonna come up with like a synthetic email that like an Apple, you know,
- 26:02 – 30:00
Why Multi-Agent Systems Beat Single Agents
- TFTyler Fisk
might receive on the regular. They might get like more data points as well as an input, but if you're not very clear and specific on this of, of who is the-- what are the inputs, who is the intended audience, what are the output requirements, then this agent will have a kind of role mix-up and start trying to actually write an email back to that, uh, end user.
- AGAakash Gupta
Mm.
- TFTyler Fisk
And that's not what we want at all. So like it's, getting this clarity is extremely important here. All right, so for-
- AGAakash Gupta
And for people who don't understand, why is it important to have multiple agents and separate these agents in this way?
- TFTyler Fisk
Yeah. Well, in, in this example, uh, we learn in building customer service email flows and just our own experience, like I, I always-- for me, it's easy to learn things in, in stories and to like visualize this. So when we built a version of this out for my family's company called Grower's Solution, they, um, they sell like gardening supplies and greenhouses, and when they get customer service emails coming in, the people team, there's like different tiers of customer support and like different specializations in there. They have their own internal knowledge bases and things. But then just like in every business, literally every business that we have worked with, uh, they're, they're no exception. There's tribal and tacit knowledge through all different areas of domain expertise. And so their people team have to go and like tap other folks on the team to, to get, um, the right answers 'cause they might not know. Uh, so like they might get a question that's like really complicated about irrigation or something, and there's a few people on the team that really specialize in that. So they have to go ask those people the question and then bring it back after they've gone on this like fact-finding mission to respond to the customer. [lip smack] And the funny thing about this in, in real life is that the real-life experts typically are not the same people that you want answering the customer service emails.
- AGAakash Gupta
[chuckles]
- TFTyler Fisk
Um, they don't always like talk in a, in an empathetic way. Not that they're like harsh, it's just they're experts.
- AGAakash Gupta
Yeah.
- TFTyler Fisk
And to distill that down in a way that like lands at the level that the customer is trying to talk at, it doesn't always resonate there. So that's like one really clear example of we want our expert to be focused in different areas of domain expertise, and then hand that off to an agent that specializes in taking that, translating it, and being on brand, um, and like having very separate roles. And from a agent, like a technical point of view also, if you try and have like one more generalized do-it-all agent, it's more difficult. Um, when you get into things like temperature and, and system parameters and things like that, you might wanna have a, a bit of a higher temperature so that when your, your email agent writes a response, it's much more like it feels human or feels authentic. Uh, and if you have an expert agent, that temperature might be like much lower, maybe even zero. It's like much more deterministic in its response. And if you had that bundled into one and had that temperature turned down really low, um, it could come across really rigid. Um, a-and also if you turn the temperature higher to, to break that rigidness out of it, then it's gonna maybe not follow your instructions quite as closely and can have hallucinations or not do every step that you're asking it to do. So it's like, it's really important to have this like separation, and that's why multi-agent teams and then having them work together is, it's again, it's like a, a key thing in this space that we're seeing.
- AGAakash Gupta
You just packed thousands of dollars of lessons into 60 seconds. So just to reemphasize for folks, you can almost think about it as who would be the different people in a company, as we just analogized. Like there's the expert, there's the customer service expert. They have different skills. One is really good at the tech, one is really good at talking to people.
- 30:00 – 34:05
Temperature Settings Explained (Ice Peak)
- AGAakash Gupta
You can think about it the same way with agents. And with agents technically, in particular, if you give them too many things to do, they're not gonna perform as well. And so that's why it's important to ha- build a multi-agent system to actually separate these things out and to use this technical term that some people might not be familiar with. In fact, I'm not totally 100% sure I understand it, temperature. Can you just explain that a little bit more?
- TFTyler Fisk
Oh, yeah. I have a super fun analogy. So I have young kids, uh, [chuckles] and we watch a lot of Disney and Pixar movies and stuff. SoToy Story, I've seen it a ton. And if you think about, like, the claw machine, like the little aliens that are in there, and it's like coming down, they're like, "The claw." Okay? So imagine that we are-- temperature is like this icy peak inside of a claw machine, and when the temperature is dialed all the way up, 'cause it's, it's, it's different from model to model, but generally it's on a scale of zero to one or zero to two. And when the temperature is dialed, uh, up so that it is like-- Or I'm sorry, when it's dialed down to zero, you can think like it's cold, and this like icy peak is what temperature is like. So when the claw comes down, which is, uh, the LLM coming to try and predict what is the next most probable token that needs to come in the response, it's very deterministic. It's gonna pick off the top of this peak or the pile, basically. And when you turn that temperature up to one, that's when, like, the peak melts down. And so when the claw comes down, it's easier for it to grab from multiple spots off of the peak or off of the pile. And what you're doing when you turn the temperature up or down to like, I'll switch into tech bro here for a minute, is that you're changing the shape of the probability distribution curve. So that means that you're-- You dial it up, it's gonna be much more creative. It has more options for tokens that it can pick from. When you take it down to zero or take it down to freezing, it's much more rigid, and it's only gonna pick off the peak there. So, like, there's different instances when you would wanna have a temperature, like, up or down. And how that looks like too, this is one of the ways it was explained to me, like, early on, is that when temperature equals zero, and you say, "The sky is..." and then you leave it blank and have it complete that sentence, it might say, "The sky is blue. The sky is clear." Like something like that. Very, like, predictable, right? When you take it all the way up to one, it might say, "The sky is full of fluffy clouds and rainbows," or something, like, extremely creative in that way because it has, uh, more tokens to go and choose from, essentially. It's like it's-- the randomness is increased there. All right. So like, let's get our agents working here in the background again. So I'm gonna kick up Gigawatt. So all right, Gigawatt, this is excellent. Now we are ready to kick over into prompt engineering mode. Uh, I'm gonna vibe code this a little bit, and for the sake of time, I didn't-- I'll admit, I didn't fully read your PRD, but I, I trust you. So what we're going to do is you're going to draft the V1 version of system instructions in the way that you've been trained. This is gonna be in the XML output. You're going to include at least, but definitely not limited to, these top five level XML sections that are gonna be role, context, instructions, criteria, and examples. But for the examples section, since this is the first version, I don't want you to do traditional shot prompt examples. Instead, I would rather you get a little bit more meta about it. Uh, think about different scenarios that this agent might encounter, and then what are the steps that it might go through to achieve the intended goal? So it's not just a direct, like, here's an input, here's an output type shot prompt example. Be much more descriptive, and let's just give it, let's say, two different scenarios in the example section to begin with. All right. Take a deep breath, proceed in a step-by-step manner, and go get 'em slugger. Let's see what you got.
- 34:05 – 39:47
Ads
- AGAakash Gupta
Today's episode is brought to you by the experimentation platform Kameleoon. Nine out of ten companies that see themselves as industry leaders and expect to grow this year say experimentation is critical to their business. But most companies still fail at it. Why? Because most experiments require too much developer involvement. Kameleoon handles experimentation differently. It enables product and growth teams to create and test prototypes in minutes with prompt-based experimentation. You describe what you want, Kameleoon builds a variation of your webpage, lets you target a cohort of users, choose KPIs, and runs the experiment for you. Prompt-based experimentation makes what used to take days of developer time turn into minutes. Try prompt-based experimentation on your own web apps. Visit kameleoon.com/prompt to join the wait list. That's K-A-M-E-L-E-O-O-N.com/prompt. Today's episode is brought to you by the AI PM Certification on Maven. Run by Miqdad Jaffer, who is a product leader at OpenAI, this is not your typical course. It's eight weeks of live, cohort-based learning with a leader at one of the top companies in tech. As you know by now, the future of PM is AI, and this certificate will give you the learnings plus the hardware to show you are ready for an AI PM role. I myself took the course and recommend it. Put on by the amazing team at Product Faculty, including Mo Ali and Pavel Hern, it's worth it. Former students come from companies like OpenAI, Shopify, Stripe, Google, and Meta. The best part? Your company can probably cover the cost. So if you want to get five hundred and fifty dollars off, use my code, AAKASH550C7. That's A-A-K-A-S-H-5-5-0-C-7, and head to maven.com/product-faculty. That's M-A-V-E-N.com/P-R-O-D-U-C-T-F-A-C-U-L-T-Y.
- TFTyler Fisk
I'm saying a lot of the stuff here in the user prompt, which is like what we just sent through, that is already in the system instructions. But because the system instructions are generally like extremely long, uh, I find that it's like helpful to like, uh, reinform them. Like help remind the agent or remind Gigawatt like what I want it to do on certain key things. Uh, and then because shot prompts are like-- Most of the instructions are telling it how and, uh, or what to do and the instructions around that, but the shot prompts are showing it how to go and do that. And one thing that you'll find that if you include shot prompt examples in there, and they're not really what you want it to be, if it's not a good representation of that, it kind of muddies the whole waterAnd like the intent-- the outputs you're gonna get are not gonna match your expectations. It can cause it to mess up, basically. So that's why I have, like when I'm first building it, like be much more like meta in that way instead of a direct input and output. And then the, the like ending part here, people-- I say, "Go get 'em slugger," a lot, and it started as a joke, and then it's-- I just like do it all the time. That actually comes from AI research, though. So there's a, a technique called emotion prompting, and this is really interesting. Like I love like research. Uh, so like this was a-- You find that in, uh, sociology and psychology, a lot of the information that we see there have, has led to some of the breakthroughs in LLMs and techniques that you use. So when you gaslight or give like positive reinforcement or negative reinforcement to a person, they will actually do better. And it turns out that that's true in LLMs too. But we always say like, you know, be positive. These things have pretty good memory now. Um, probably, uh, we're working towards like perfect recall, uh, like in everything eventually. Uh, so like... And they're gonna have robot bodies soon, so be nice and friendly to your AI. So I'm always going on like the positive side of like, you know, boost them up a little bit, and they will actually do better at the job. It's been proven. So it sounds goofy, but it works.
- AGAakash Gupta
[chuckles]
- TFTyler Fisk
All right, here. So Gigawatt has gone through its sequential thinking, and you can see like this is spinning up. It has a couple of internal thought traces, and then we could even like go in and read what it's doing here. It's like seeing its own inside its head. And then it's started to write the system instructions here. All right, cool. So we can see it's got the, uh, let me s-- I'm scrolling through it quick here. It's got the role section, context. There's even subsections within context here. So the, the role is basically like the job description for the agent. Context is all the, the background information and nuances around that of like, where is it working? Who is it working for? Who is it working with? Like all sorts of different kinds of details can go in there. Uh, instructions is the exactly what you might think, like the step-by-step instructions you want the agent to follow each time it does its job. Criteria, once we get down here to that, and it wrote in and out of a code block, which is okay. Uh, criteria down here, these are kind of like the guardrails that you're gonna put around it on... These can both like be like dos and do nots. So like, do not use emojis in your output or always have an empathetic, uh, tone when you, you print an output. Like, you can make these up whatever you want these rules to be. And then the example section down here is the shot prompt examples. So when I do this, Gigawatt's pretty good. It's gonna probably have a set, I would say like a, a B set of instructions right now if we were gonna grade it.
- 39:47 – 51:26
System Instructions & Meta-Prompting
- TFTyler Fisk
And like that's okay, but I have like figured out that by using this process called meta-prompting, which this is like a prompt I have stored right here, is that it basically the-- you're having Gigawatt review its own work, go section by section, give it a quantitative score and qualitative reasoning around that, and give, if they exist, give suggested areas of improvement. And then another thing I have it do in this as well is I give it these ideas of different techniques and frameworks that I use, such as like meta-prompting, step back prompting, AI agent self-review, chain of thought, chain of density, like just a few different things to give Gigawatt inspiration. And because it's connected to Perplexity and Exa, it can go out on the internet and think about, "Hey, what kind of agent am I building? What are the latest breakthroughs in AI?" I can go and research that. What of that makes sense to incorporate into this agent? And it will come back here in a minute with those like suggestions. So you can see it's done like multiple Perplexity searches here. Uh, it might decide to go use Exa. It might think about what it found in a search before it ever like responds back to us. But this process, it basically is like we've got this set of B instructions. After the evaluation, um, we're gonna have it do it again, and we generally go to like an A, A- territory, and that's generally good enough for us to start, let's go take it and start testing it and doing what we call observational evals on it. Like go see if this thing's any good and working like we hope it will. All right.
- AGAakash Gupta
Love this. And you mentioned the sh- phrase shot prompting. If people don't know that, I believe that's when you're giving various examples within the prompt itself, correct?
- TFTyler Fisk
That's right. Yeah.
- AGAakash Gupta
Yep.
- TFTyler Fisk
Yeah, that's, that's exactly it. So this could be, um... Let's come up here and look and see what even Gigawatt wrote. So here's one of the examples it came up with. The scenario is product information research with evolving information. I'm trying to decide between the iPhone Pro and the iPhone 17 Air, uh, 'cause we all need a thinner iPhone. [chuckles]
- AGAakash Gupta
[chuckles]
- TFTyler Fisk
So like what's the main differences in which one, uh, would be better for photography? I heard there might be some new camera features, but I'm not sure, uh, what's current. I need-
- AGAakash Gupta
I actually have this question, so I love how like-
- TFTyler Fisk
Yeah
- AGAakash Gupta
... it was able to pick something that people would really ask.
- TFTyler Fisk
Yeah. Like really ask, and then like it's current too, 'cause it went out in real-time to find like what's the latest, you know, kind of stuff here.
- AGAakash Gupta
Yeah.
- TFTyler Fisk
And so like the example is showing it what should Core go do. It's saying, like do the initial analysis, go research the RAG database, think about your own system knowledge, uh, go out on the internet and research and use chain of verification for that, like synthesize and generate your output, and then give that to the next agent. So it's like walking it through, like we've told you what you should do in the, in the system instructions, but this is showing you, like how do you actually go walk that out in a practical way?Um, yeah. Cool. So Gigawatt's still thinking here. While it's doing that, let's go check on Deep Research. Sweet, so this one is done. Uh, I'm going to grab this and export this just as a document really quickly. Cool. There's one. Claude is like, it's one of my favorite deep research tools because it looks at so many sources. It will literally go look at thousands of websites-
- SPSpeaker
Wow
- TFTyler Fisk
... uh, to do, to do this work. Uh, which is, like, I would-- we would never have time to go do that, especially in this short a time. But for some reason, it's, like, goofing up right here right now. I think it's because of it's trying to call additional, like, tools I have connected to it, like Canva and stuff, and I don't want it to do that right now. So let's see. Give it one more shot here. Um-
- SPSpeaker
I haven't actually seen these connectors, and I'm a pretty heavy Claude user. What are those?
- TFTyler Fisk
Um, it's basically like they've made it easier to kinda connect in MCPs. So if you come over here to... I think you have to do it on Claude Desktop initially. I could totally be wrong on this. Um, but if you come over here and you go to your settings, I think is where you're doing it at initially. Um, right here. Connectors is, like, where you can come in and either set up your own custom ones that are, you know, basically like MCP tools, or they've made it easier with, uh, some, like, out-of-the-box ones. Um-
- SPSpeaker
That's awesome.
- TFTyler Fisk
Yeah. I think it's just like... And, and an MCP is, like, for people that don't know, is Model Context Protocol. That is, um, before this came out, it's like a framework that they've, they've used, uh, when people would need to go and use a tool. Like, if you wanted to connect an agent to Gmail, let's say, you'd have to do all these different API calls, and that's, like, very scary. Most people aren't gonna do that. But because we know that context, like bringing relevant information back to an agent, is so powerful, having these tools is part of what makes an agent an agent. And so Cla- Anthropic, who makes Claude, came up with MCPs, and they've really simplified that process and maybe even democratized it is a better word, uh, so that more people can connect more tools to their agents, and it just makes them so much more useful. Let's see.
- SPSpeaker
Got it.
- TFTyler Fisk
Yeah. All right, so it's finally kicked off its research. We'll let that one finish. Perplexity did its other research. So it looked-- this one looked at 119 different websites and sources to pull this info in. I'm gonna export that as a doc as well. And again, I'm like, we're having to scatter around here. Uh, this is the import over here in Cassidy, and it pulled in. It had a few pages that errored out. Um, let me click that. We'll let it retry them just in case while we have time. And now if we come back out here to my main file, like here's the website. It's pulled in this many pages already. Uh, and if we go add, I'm just gonna make a new folder to chuck all this stuff into one spot. Apple Knowledge. Save it. Let's pull this into here. And then now I can also add those, uh, deep research documents that we just did with Perplexity to throw them into this RAG database. Or, like, you could think of this as an AI second brain system, essentially. Uh, so now all of that's gonna live in its head. So that's uploading. Here we go. So now back over here in typing mode, and I'm like, I'm jumping around, so I'm trying to make sure we, like, make this easy for folks to follow along, is Gigawatt, uh, just finished up its self-review, where it was evaluating its work and saying what it did well and where it could do better.
- SPSpeaker
Mm-hmm.
- TFTyler Fisk
And so there's section-by-section analysis here. Um, you can see, like it's, it's pretty-- We don't want it to be biased or have an ego towards its own work, and it even kind of says that in the prompt that we gave it, saying that our whole goal is that we make the best possible agent for this specific role. And, uh, it's giving itself a seventy-eight out of a hundred. It's giving its, like, reasoning as to why it gave it that grade and then how it could improve it. Eighty-two here. So it's like, as you can see, like we're, we're doing okay, but we're like C territory even right now. So we need to get that grade up for sure. We're A students around here.
- SPSpeaker
[laughs]
- TFTyler Fisk
And, uh, let's see. So overall we got a seventy-seven out of a hundred, so we, we've got some big gaps that we need to fill. Um, and it's giving basically itself its own instructions on how we could do this better. And so now what I'm gonna do is I have another prompt that is, uh, let's see here, write the improved version. Let's adjust this. So here we go. This is where-
- SPSpeaker
I love this use of saved prompts. This is huge.
- TFTyler Fisk
Oh. Speeds it up, yeah. Like, it takes a while to get them, like, dialed in, you know, like the way that you want them. But once you have them done, like pulling them in from a prompt library is, is key. Uh, let's see. So today is... What is today? The twenty-fourth of September. So I'm putting this in because now it's gonna write in some metadata into, like, the head tag area of the system instructions and even incorporate a change log in there. So let's go ahead and do this. It's basically telling it, "Write the complete new updated set of system instructions." This should be additive and not reductive because if you just let it go out and do its thing, then it might, like, wholesale rewrite the system instructions. Uh, instead it's like you've already got a good foundation. Like use what you, uh, recognize about what was working and what wasn't, and go improve upon that without, like, just completely changing everything. And that's what we're seeing here.All right. And it's also telling it too, because this happens, uh, or can happen, is that, um, if you happen to cut off before you complete writing this set of system instructions, like know that I'll have you like pick up, uh, in a follow-up message. And I tell it that ahead of time because if you don't, then the model, um, can kind of try and like write to optimize for its, its context output window to an extent. Like it'll try and make it more concise, and I don't want that. Like I want it to be, um, extremely descriptive in here, as you can see. Uh, like these generally, like once you do this next lap or this next version, the, the system instructions kind of balloon up in size, and it's not to write them longer for the sake of writing them longer. It's because this is like very key information that we want to make sure that this agent gets like every time it goes to do a task, and that's why we're, we're like taking extra care in the system instructions. So, all right. Cool. So while this is happening, again, I'm gonna come over here and let's do a, um, get this ready to put this agent into production. So I'm gonna call this like Core, um, we'll call it like the Podcast Edition.
- AGAakash Gupta
And we're back in Cassidy.
- TFTyler Fisk
Yeah, back in Cassidy. Yes, thank you. Um, do you have a preference? Like I'll let you choose. Like it, it honestly, it doesn't matter. I would, I would suggest like probably choosing one of the Ferrari models, but what do you want to run this on? Do you want to do it like GPT-5 or we have any of the Claude models or even Gem-
- AGAakash Gupta
Maybe just tell us a little bit more about how you would think through the choice and model choice. I feel like everyone's always asking me about that [chuckles] .
- 51:26 – 1:05:50
Testing Core & Echo Agents Together
- TFTyler Fisk
pretty much most tasks that most people are gonna have them do. So the good news is, is that like, like we're rich with intelligence now. Um-
- AGAakash Gupta
Yeah
- TFTyler Fisk
... so like picking GPT-5 versus like Sonnet or, or, um, or Opus or even Gemini Pro, like the new Pro, um, uh, I mean, you could argue that you could really switch them out, and that's even what we teach is that you wanna be somewhat model agnostic because when you get into production, uh, you know, if, if Anthropic has a new release and their API is running a bit glitchy, uh, but you have agents in production, like glitchiness is not acceptable. We need to like be able to like pull the plug on it running on, uh, an Anthropic model and plug it into, uh, GPT-5 or whatever it is, so that we're not losing anything. There's redundancy there. Um, but it could be things like speed. Speed is a big deal. Uh, if you need reasoning, uh, like for a model to like think through something using one of the reasoning models is a big deal. If latency matters, using something like a Mini, a Nano, a Haiku, those are, those are faster and cheaper, but not as capable necessarily. Um, yeah. And, and then like context window is the other thing. This is like the amount of, of short-term memory that a agent has, like how much space can it hold in its head. And, uh, a million and two million is like Google was kind of the first one to go into that territory. Anthropic's at a million now with Claude Sonnet. Um, GPT-5 is still at like 400K. I suspect they're gonna be at a million or maybe further than that like soon. Um, so let's just go, let's just for now, let's go GPT-5, and we'll go high thinking. Let's just make it, uh, really good here. I've got the tokens. So this is the context window. I'm turning this all the way up so it can have the most information possible. I'm gonna go ahead and toggle this on. This basically is, um, turning on the RAG knowledge base. So it's the data that we've been populating in here. And I'm gonna s- I'm gonna select this folder. So like everything we chuck in that folder, it will have access to. And because I'm saying always search, it's like forcing the agent to go look into that folder every time to see if there's anything that will help it. Maybe it does, maybe it doesn't, but we're making it go look there. Um, and then let me save this. And the last tool or two tools I'm gonna turn on here are gonna be data analysis and web search. Um, data analysis allows it to use like Python. It's like code interpreter basically, and web search is allowing it to go out on the internet and research things. So I'm gonna give it those tools as well. And let's go see. I think Gigawatt finished over here. I heard the Easy-Bake oven timer go off. Um, let's see.
- AGAakash Gupta
[chuckles]
- TFTyler Fisk
Yeah. There we go. All right. Yeah. So it finished the whole thing. So I'm gonna grab this whole set of instructions, go back into Cassidy now to actually put it into production. And let me go to the very bottom because it had some notation here at the bottom that aren't a part of, isn't a part of the instructions, so I'm just gonna delete that out.
- AGAakash Gupta
Yeah.
- TFTyler Fisk
And I think that is-
- AGAakash Gupta
We have that clean XML instruction.
- TFTyler Fisk
Yep, you got it. So it's got the tags in there exactly like we want. It, it's using it per the way that I've like trained it to do this. Uh, it's got the new and, and upgraded, enhanced version of the system instructions in here, and we're at 7,000 tokens right now. Actually less than that because let me delete this off, the other like commentary here at the beginning. So approximately like 7,000 tokens for just the system instructions.Not bad. Um, uh, let's see. So now we can go ahead and open this one in chat, and let's test it out. So like, what was-- You said, what was the question you wanted to ask it? In fact, I'll just, I'll hold down my key and you just talk, and it'll transcribe it into it.
- AGAakash Gupta
Um, should I get a iPad or iPhone Air or iPhone Pro? I'm trying to weigh battery and photo quality.
- TFTyler Fisk
Love it. The battery is the thing I want to know about. I watched their, like, keynote address or whatever, and I'm like, I don't know. They were pushing that, that add-on battery pretty hard right out of the gate. So we'll see. It looks cool, though.
- AGAakash Gupta
Exactly. It looks so cool, but I'm worried that it won't have a good battery life. So let's see if the agent can help us address that.
- TFTyler Fisk
Yeah, for sure. So it's gonna go through here and look at this. And while this is happening, we're gonna go do one more thing here. I'm gonna go grab, um... Let me come over here. Where is it at? I think I jumped off of it. Here it is, this one. I'm grabbing example system instructions of another You've Got Mail agent, and this is our customer service agent. So we built the expert. Now we're gonna really quickly build, um, an email agent. And examples are everything. So when I can come back over here into TypingMind in that exact same thread and, uh, tell it that, like, here are some system instructions for a completely different email agent, um, to use as, like, inspiration and what good could, could look like, knowing how we're gonna be using it here, uh, draft the V1 version of the Apple email agent and come up with a fun name for it kind of thing. So, all right, Gigawatt, this is awesome. We have the Core agent deployed in Cassidy, and we're testing that out right now. I've now given you an attached, uh, copy of system instructions for a different You've Got Mail agent, which is our customer service email agent. But it, this one is made for Hattie B's. Uh, it's a hot chicken joint, uh, in Nashville. So look at this for inspiration, uh, on what a good customer service email agent looks like. Think on this deeply, and think about what would you need to change and update and embody so that it sounds on point and matching the culture, the vibe, the brand tone and voice for Apple. Uh, you're also gonna need to come up with a fun name to go along with the Core agent for this You've Got Mail agent for Apple. Uh, it should be like a one or two name that feels very on brand and authentic to what it's gonna be doing here. So don't do any prompt engineering yet. Review all this, reflect on it, and then come back to me and let me know where you land on it. Uh, this is a brainstorming and alignment phase again, just with this new agent. Okay, so we're gonna let Gigawatt again work in the background, and let's pop back over here to, uh, let's see here, our iPhone question. Awesome. Well, this is not gonna be very pretty for us to look at, so I'll have it, like, change it here in a minute 'cause it returned it in JSON, like what I mentioned earlier.
- AGAakash Gupta
Which is what we wanted.
- TFTyler Fisk
Yeah, it's exactly what we wanted. That's what we want it to do inter-agent communication. It's good. Um, but now I'm just gonna be like, "Hey, Core, uh, we're running evals on you right now, and you did perfect. You output your work into a JSON, but we're reviewing this. Can you convert this into markdown so it's much prettier for us to come in, read it, and review it and see how it looks?" Okay. So we're just gonna let it go through and, like, recreate this for us in a way that's easier to look at. And down here we can see, this is like what's powerful about using a, a RAG system or even having, uh, like even if it researches on the internet, it cites its sources. So if we ever wanted to come back in here when we're doing evals, uh, to see, well, where the heck did it even get this information or arrive at this outcome, we can fact check it in this way by like, uh, it, it's, uh, citing this website that it found or what came out of the knowledge base that it went and read that in documentation internally, like whatever it might be. All right, there was the oven again, so let's come back over here into TypingMind again. Sage, I don't like Sage. This is like a generic... Yeah, mm-mm, we can do better than that.
- AGAakash Gupta
[chuckles]
- TFTyler Fisk
Uh, all right. So let's see. Hattie B's agent is good. Masterclass. Yep, cool. I think Gigawatt's like gaslighting me here.
- AGAakash Gupta
[laughs]
- TFTyler Fisk
Um, let's see. Sage, one word. Okay, so clean Apple aesthetic, like a Genius Bar expert, natural organic, loves nature metaphors. I don't know about that, but sophisticated and approachable. Okay. So like we're just gonna kinda, again, for the sake of time, rock with this to some extent because you can see, uh, Hattie B's, it has, um, sophisticated simpli-simplicity versus playful ir-irreverence. Hattie B's brand voice, they're Southern, so they say like, "Y'all," and use like these really kind of funny chicken puns.
- AGAakash Gupta
[chuckles]
- TFTyler Fisk
And instead, like Apple's gonna be, you know, we need to take that idea and translate it for them.
- AGAakash Gupta
Yep.
- TFTyler Fisk
If we were doing this for real, we have a whole process to do brand voice and communication analysis, and we even have like workflows for that. Um, but we're just gonna do this quickly now, um, and just roll with it. So, all right, Gigawatt, this looks awesome. Now that we have an idea, I think first of all, Sage, no, let's not do that. That's not a good name. Think more Apple, Apple branding, the way that they do it, something that feels on point for an email agent in that space. Again, it's a one-word name. Like look at Core as an example. Core is an excellent name for being on brand for Apple. Think of that for this set and setting as well.With that in mind, use sequential thinking, use your EXA tool to research anything else that you feel that you need to, whether it's like brand guides, brand voice, styles out on the internet, and then come back and draft the V1 set of system instructions. Again, remember to print them in XML. Use the role, context, instructions, criteria, and examples. Do the same sort of shot prompt examples that I had you do with the previous expert agent, where it's much more meta. Let's do two different scenarios there, so it's not the traditional here's an input, here's an output. It's like walking through what it should be doing and showing it that through that scenario. And with all of that in mind, that's a laundry list of tasks, work through them one by one, take a deep breath, and show us why you're a savant at prompt engineering. Go get 'em slugger. All right. Let's see what we get here. So that one's gonna be working. Now let's go back to Cassidy. And like language matters too. I use that phrase savant, I use that frequently when I'm prompt engineering because, you know, they're large language models, and the words that you choose have power and meaning and shape, um, the quality of the agent build that you're doing. So you wanna be thoughtful when you're doing that as well. Uh, so let's come up here. So it's saying decision guide, iPhone Air versus iPhone 17. Here's the summary. If photography is your top priority, iPhone 17 Pro is it because it's got that new-- it's got the ProFusion plus the ultra wide telephoto and the optical zoom, like built in. It's got 38 hour or 39 hours of video playback. Thinnest phone, uh, with a strong main camera. 48-megapixel main camera, no ultra wide telephoto, up to 27 hours. So it's like retrieved all this information, put it into a pretty good breakdown here. I don't know. Is this helping your decision-making process yet? We'll see what-
- AGAakash Gupta
Yeah, definitely. I mean, I'm seeing, okay, 27 hours video play. Maybe the iPad or the iPhone Air is good enough for my battery needs. Similar megapixels. I already own three fancy cameras for the other stuff, so yeah, this is awesome.
- TFTyler Fisk
Yeah. Sweet. Okay. So now we're-- We can see all this. Like it's looking good and, and ultimately like what we really wanted to see was, uh, this JSON response up here. So I'm gonna go ahead and go ahead and queue up some messages. So let's go eval and improve. I'm gonna queue that, and then I'm gonna go write the improved version. Uh, let's see here. Let me take this off so I can have these messages like pending, so it can continue working more in the background for us here.
- AGAakash Gupta
Oh, wow. I've never done that. That's super cool.
- TFTyler Fisk
Yeah. It's like, it's helpful for speed for sure. Um, 9:24:25. Let's queue that up again. Cool. So those are just like waiting in the wings and...
- AGAakash Gupta
Wow, that would save me so much time just queuing things. That's great.
- TFTyler Fisk
Yeah. It, it's little, little wins in this stuff that you just kinda figure out. Um, here we go. So like here is that, that Claude research finally finished. So it did 394 sources, which is actually kinda low for it. It normally is going up above a thousand in general now, but it took 10 minutes to finish that research. Deep research is such a powerful tool that I think more people should, um, know about and, and use because, yeah, it's just, it's such a good way to gather info that is valuable, not only like in the build, but to curate info for your knowledge base, like for your second brain, either for yourself or for your agents quickly.
- AGAakash Gupta
Yeah.
- 1:05:50 – 1:24:11
RAG Databases & Enterprise Documents
- AGAakash Gupta
of enterprise documents. How would you build that in this case?
- TFTyler Fisk
Yeah. That's a whole, a whole like it depends kind of decision tree. Um, because like it-- you can use something like unstructured.io as like a platform that can go through and what's called like vectorize and, and turn the documents into embeddings and like chunks that are stored in a vector database. Um, and especially with things like PDFs or, or documents that might have things in it that are also not text, that might be visual, this is where like for that specifically, something like Gemini Flash is really good at that. It's, it's extremely inexpensive to basically do OCR now and not only extract the text that's on the page and put it into a format that's better to then go and like turn it into the, the vector store, but it can look at the visuals and you can, you can have a flow set up where it actually extracts and describes that and turns it into, um, the vector store data as well, so it's actually searchable. Um, the thing though, like RAG is extremely powerful and potent, but there's also limitations to it. Uh, like things can change and update. Like a really easy example is that in our, our classes, we, uh, like when we first started teaching our early cohorts, they were on certain days of the week and certain times of the day, like all that kind of stuff. And we have an agent that lives in Slack in our community called the Professor. And so students can go and ask it a question and say like, "Hey, um, I don't, I don't have my calendar in front of me, like when's the next foundations class? I'm in cohort 12," like whatever it is. And it would go and look into the RAG systemBut because we're, like, adding all of this information, and we're not gonna take the time to go in and, like, clean out all this historical data because that would take forever, and it's just-- it's not a job you wanna go do. Um, it mi- the professor might not bring back the best information because the more you add into a RAG system, it's powerful, but it can also kinda degrade, uh, the retrieval of it, like the quality of the retrieval because, um, there's just so much information for it to be looking through. And, and so the, the way that we're thinking about trying to solve that more is not only the structure that you put into your RAG and having like, w- uh, we've been working on this concept called the, the Cairns method. It's like a three kinda tier system in there. And even more so now, it's, it's agentic RAG. So it's not just connecting it to, um, a database like this or a RAG database. It's connecting it to GraphRAG, which holds much more, like, relational information in there, and it updates it as well. So that could be like GraphRAG, Gr-Graffiti, things like that. Uh, so it, it would go in and keep up to date in that other data store that, hey, um, cohort 12 is now on Mondays and Fridays from this time to this time. Uh, and so the agent's connected to both of those data stores. And also even like MCPs are another tool that you can connect for this. So it gives it more information and, like, ways of connecting the dots in that information so that it, it gets a better quality response outside of just RAG itself. So yeah.
- AGAakash Gupta
Amazing. So much value in that one response. Awesome.
- TFTyler Fisk
[chuckles] Yeah. No worries. All right. Gigawatt's still working. Um, even if we tell it to type like the wind, it'll try. So what's happened, like coming back over here into, um, typing mode here, it has, uh, thought through what I had it go out and write the V1 instructions. It used EXA, which is like an agentic web search tool.
- AGAakash Gupta
Mm-hmm.
- TFTyler Fisk
Uh, so it did research more on brand voice guidelines and customer service communication examples, like this kinda stuff. So it's going and finding like real world shot prompt examples, bringing it back, writing the V1 version. They're probably-- It's probably gonna be okay, but not great. And then if we... Let's see. It's started its next process now of it's like reviewing this. And let's see what we got here. It's got a eighty-two, eighty-five, so we're a little bit better. We're not scoring in the seventies here. We're up in the eighties at least. Well, there's a couple seventies. We can do better.
- AGAakash Gupta
[chuckles]
- TFTyler Fisk
And coming down here at the very end, let's see. And it's also doing that research into other areas like frameworks or techniques that it could be using for this specific agent. Um, one of them is like-
- AGAakash Gupta
One huge takeaway for me from watching you build these is how much care you're putting into the system prompt. You know, you had an amazing system prompt to begin with. You had an amazing agent creating it, but you're having that agent review its own work and iterate. And most people probably skip that step.
- TFTyler Fisk
Yeah, I would say so. Um, like Sara and I dub it as like the founder's level of service. A-and for me, what that means is, you know, when you first start a, a business and, you know, like you're j- it's just you or like you and a co-founder, whatever that is, you're doing everything, and you're putting your heart and soul into it because it's like it's your baby, right? So it's that level of care that we try and put into pretty much everything that we do. And even when we're doing this for clients, like outside of class, [lip smack] uh, our whole discovery process, which that's a whole 'nother topic we could talk about, is very intensive so that we can temporarily become experts in their business and in their culture and feel like we're, like, plugged into their team so that we can then come-- take that knowledge and come over here and work with Gigawatt to do this for them, like in real time. And I think that's the skill set more people need to learn. That's, uh, it's, it's incredibly important. That's how I think you build a system that works versus one that is uninformed and not aligned with, like, what your, your intentions or your client's intentions are. And this-- like, all of these little steps matter, so.
- AGAakash Gupta
Mm-hmm.
- TFTyler Fisk
[lip smack] Yeah. Our community has dubbed this weaponized OCD as well. [chuckles] So I find... Yeah. Let's see here. So we've got, uh, it went with Echo. Okay. Echo's not terrible. I'm not gonna, like, razz Gigawatt on its naming convention. It could do better, but... So we've got Echo as the-- what it's dubbed the customer service email. You can see it's got version information in here, a quick change log, um, and then it's printed out the whole new version of system instructions. Let's just skip to the end and see if it's... Nope, still writing. Um, so while it's doing this, let me come up here, and I'm gonna grab just like, let's go Echo V2. Copy. We're gonna pop back over here into, uh, let's just go here. So let's go assistance, create a new one. We'll go ahead and get this ready for it to-
- AGAakash Gupta
Yep
- TFTyler Fisk
... forward.
- AGAakash Gupta
The placeholder for a new agent.
- TFTyler Fisk
You got it. Yep. So this is gonna be, um, Apple Customer Service Demo. And for this one, let's see. We did GPT-5 high thinking, so let's be multi-model here. Let's either go Anthropic or Gemini. Like, honestly, I could go either way. Do you, do you have a preference you wanna see happen here?
- AGAakash Gupta
We've been doing a lot of Anthropic, so let's try Gemini. Let's give them some shine.
- TFTyler Fisk
Let's do it. I love it. So here's Gemini two point five. Yeah, I, I friggin'-- All these models are so good. Like, we are blessed with choices now. Um-The Flash model and the Pro model are so good. Flash is kind of like this weird, um, kind of in-between model. It's not quite like the front runner, and it's not as, I don't wanna say like low quality 'cause it makes it sound like it's bad, but not as, um, the tier that Mini and like Haiku are at. It's like kinda in between those two, in my opinion. Um, okay, so here we go.
- AGAakash Gupta
And if you're productionizing, it's like crazy cheap.
- TFTyler Fisk
Yeah. Oh my God. People should go and look into how much it costs to p- like, process documents like what I said. Like, basically do what you would pay for OCR, but now you can use Flash for that even. Holy smokes. It's e- it's so much cheaper. It's like literally... I've done the math on it. It was like hundreds or thousands of pages for, uh, we're talking about like a dollar or something. It's crazy cheap.
- AGAakash Gupta
Crazy.
- TFTyler Fisk
Yeah. Let's see here. So save this. Let's give it tools here. What is not saving? Why is it not saving? Oh, it cleared the instructions. Okay. Um, I'm just gonna put something in here.
- AGAakash Gupta
[laughs]
- TFTyler Fisk
Sorry, it's a little-
- AGAakash Gupta
We'll update that in a second.
- TFTyler Fisk
Yeah. We'll, we'll give it its brain here in a minute. So I'm gonna connect those here as well. Save it. And Gigawatt, you done yet? Ah, it cut off. All right, here we go. So here we, um... All right. You, uh, why am I typing? Uh, you were cut off. Please pick up, uh, where you stopped and continue on to complete the system instructions. Thank you. It's pretty close, so it's in the criteria section. This is what I was saying, like a lot of the time, um, y- the system instructions can kind of bloat up a little bit, and that's not-
- AGAakash Gupta
Mm-hmm
- TFTyler Fisk
... that's not a bad thing. Um, depending on, like, we'll, we'll know for sure when we test it, right? But I find that, um, the more detail, the enrichness that you put into this process, the better the outcome that you're gonna get.
- AGAakash Gupta
Mm-hmm.
- TFTyler Fisk
So-
- 1:24:11 – 1:35:10
Production Workflow with Human-in-Loop
- AGAakash Gupta
to go on that path?
- TFTyler Fisk
Yeah. Um, well, there's like several steps, but the, like a no-code tool like Cassidy makes it pretty simple for that. Um, an example of this, and there's, there's a lot of this comes in like taste as well of what would the Apple team, how would they wanna interact with this? The first thing I would say is we would never put it into production without some sort of a human-in-the-loop checkpoint. That's, that's very irresponsible. And in pr- like actually in live production, you have no idea the types of inputs that you might get in a system. Uh, like someone might say something, um, that Apple doesn't wanna respond to or, uh, there, there could be a, a myriad of different things. And so it's very important to have an intentional human-in-the-loop checkpoint put in place somewhere. And you can begin to, uh, once you have it in production, add more autonomy into the system where you, you then like know for these certain use cases, we can kind of auto-send it. Uh, and if not, it still needs a human to review it before we send it on its way, but it speeds it up and can do them in batches. So here, this is a workflow in Cassidy that is also an email workflow, and what's happening is we set this up to work with Slack as well, 'cause this was like our intended-- That's where we wanted to work with our email agent at. And so what happens is we would connect this upstream with something like, uh, like you can even do it now natively in Cassidy too, but before you couldn't. So we would use something like Zapier or Make or something that when a new email comes in, and if you're like ins- for instance, using Google and it's tagged a certain wayThen it would then trigger this workflow to start running, and we would get their, their message, their name, the date of the email, the subject, and the Gmail thread ID, which is important to be able to route it back to the correct thread. And then this first block here is an agent called Cinnamon. Um, it's a sentiment analysis agent. [lip smack] The funny story on that name is that, uh, I'm a "Big Bang," uh, po- like, TV show theory, and Raj's dog on that show's name is Cinnamon. And every time he talks to it, he's like, "Cinnamon." And whenever I say sentiment, uh, or sentiment analysis, and it gets transcribed because of my accent, it often comes out as cinnamon.
- SPSpeaker
[laughs]
- TFTyler Fisk
And so I'm like 100% my sen-sentiment analysis agent is gonna be Cinnamon. That's awesome. So it's, it's basically doing a vibe check of the email and passing on that information downstream to the other agents. So there's even another agent in this flow. There is an expert agent here that's doing like what Core would be doing. We've even added in like a separate research agent to go out onto the internet to, uh, do whatever other kinds of research you might think is valuable to pass on that context downstream to the other agents. Then you have, uh, all that information then gets handed off to the You've Got Mail agent, so that customer service email agent, and it's writing its V1 draft of the email. [lip smack] Even in the workflow, we will do this process of, we call it the toast method. It's like, um, it tests it. It does a thing, it grades itself, it does it again. It's like the, it's the meta-prompting, but at a workflow level. So You've Got Mail wrote its email. It's QA-ing its email, and then it does it again here in this next step. And then this is a version of Gigawatt that's called Gigawatt Unhinged, and it writes things that will make your grandma clutch her pearls. Like, it's pretty terrible. And we just do that as like an internal agent to send us a Slack message. It's funny. Uh, it saves this into a short-term memory bank and a knowledge base in Cassidy, so it will know it, uh, like moving forward as, as a history. And then it sends us a Slack message. And so like if we're the Grower Solution team, what this feels like for us, or if we're the Apple team, let's say, is that our agents are constant lis- constantly listening for new email. And when a new email comes in and they've answered them, uh, it runs through this process of all these different agents doing the work and then pinging us in Slack and saying, uh, "Hey Tyler, guess what? Uh, here's the new email we got. Here was the situation. Here's what we thought about it. Here's the, the version of the email that we wrote. Do you like it? If so, we'll auto-send it. Um, if you don't, let us know what we need to do better, and we'll go back and fix it again." And depending on how we answered that question in Slack, it will then force this other trigger right here, which is like the second part of this flow. So this is constantly watching that Slack channel that the agents are talking to us in, and we just talk to that agent. Like no hot words, no fancy language. It's literally like we're talking to like an executive assistant that comes in and does this. And this is what we call a generative filter. It's just a very simple like GPT-4o, or it could be like GPT-5 mini, like anything, like a cheap, fast model that looks at what we said or what the Apple team said and determines did we say, uh, "Yeah, it looks great. Go ahead and send the email." Uh, "No, it needs this revision," or, "It needs to add this," or whatever it might be. Or is it like a message coming back in Slack that's some sort of a confirmation message from an agent? And this is a generative filter. Based off of that, it understands, and it will only print one of three outputs here. It's either like ship it, revise, or confirmed. And based on this output, there's different paths here. So if it says ship it, the agent will go look up the history, go pull the Gmail thread ID, um, find the email that was written, know that it's approved, and actually go send the email for us. So we don't have to do anything. Like Apple just needed to talk to the agent, and then the agent go does its thing. Uh, if we had something that we wanted it to improve on or revise, it goes through this different path where it goes and looks up more information. It sees what was done originally. It goes back to the expert agent, the research agent, the email agent, and they kind of do their process again to take into consideration what you said it needed to do differently. And then it brings it back to you in Slack at this like checkpoint for you again of like, "Hey, we, we went back and fixed it. Do you like it this time or not? Like we can send it, or if we need to fix it again, we can do it." So it's like all of these agents are working under the hood autonomously without us, and we just have to talk to it in Slack, and that's it. And then once we finally say send it, it sends it, sends it on its way, records all that here back into a folder system that we have set up in the RAG knowledge base so that if, let's say, you send through that email about the, um, iPhone Air versus the iPhone Pro, and then we send you a response in email, and then maybe tomorrow you send a follow-up email, n- we now have memory of that, like both at a system level and, um, like in the email also. We could go and like research it in the tool if we wanted to as well. So this is part of the piece of getting it into production is like building out the workflows and the user prompts and all the system parameters around this. But that's, that's like a whole, um, like whole thing to go into that. And at the same time, we have to put into place, uh, an eval system. And eval, it-- Eval are, are paramount in production. Uh, like they're, they're so key because that's what gives you observability and, uh, a detailed audit trail to know, uh, what it is, what's happening in production, what do you need to go fix, what improvements need to be made, what's actually working well. Uh, we put that into a Goldens database to reinforce what's going well. Um, yeah, so there, it's a, it's a complicated system to get to like this type of quality, butLike I, I-- we haven't found another way to do it, w- like to get to that level of outcome without that level of like complexity as well.
- AGAakash Gupta
Amazing. And all these tools we walked through today, how much-- what plans are you on? How much are they costing?
- TFTyler Fisk
You don't want to see my tech stack bill. Uh, like it's-- this is like a mind shift change. So w- in Cassidy specifically, because we actually teach out of that platform as well as, uh, use it ourselves for internal work, um, we're on an enterprise plan with them, so we're spending like a couple thousand dollars a month on it. Uh, but that's like heavy, heavy usage. I would say though, and this is what-- like when we were doing more client work, now we're much more like focused in education. When we were doing client work, it's this kind of, uh, paradigm shift of don't look at it as like tool expenses. Instead, this is where I have an accounting degree, and it comes in, in handy quite a bit here, is that look at what this costs you to do without AI, like your people team, to go do each of these functions. If you had to have, uh, your experts go get pinged and their time is valuable to go ask them a question about something that only they can ask, and then you have to give that to a customer service rep, and they have to take the time to read the email and respond to the email. If it's like a high-stakes email, they might have to go get that approved or checked off by, uh, s- someone else in management above them before they hit send. Calculate their hourly wage, um, back that into here, and it's always a winning scenario. Like even if this costs a lot of money in credits, when you look at that on a cost to cost of like not necessarily like tech stack spend, but real-world value translated into what are we spending to agentify that process, uh, there's always a huge delta there, like in a positive way. And that will only increase as we keep moving forward because this is the most appreciating asset. AI is the most appreciating asset in the sense that the, the quality of the outputs that we get is, is improving on a logarithmic scale, and we know it'll continue on that trajectory because of things like Moore's law. And it's also racing to zero on how much it costs to actually use these things because there are, uh, open source models that you can run for the price of electricity on your local machine now that are like essentially free, that are, are just as good if not better than the, the leading models from like OpenAI or Anthropic from like six months to a year ago. Um, so like it's gonna be the most expensive it's gonna be right now is, is the main takeaway I would tell you for that.
- AGAakash Gupta
All right. This completes our demo of how to build multi-agent workflows into production.
- 1:35:10 – 1:39:53
$1.6M Course Business Results
- AGAakash Gupta
There's obviously much more that we could have gone into, but that's the high level. Before we go, since I have you here, there are a couple questions I have to ask. You guys, you and your partner Sara, you guys crossed seven figures in a year. What's your advice to people looking to create seven-figure courses?
- TFTyler Fisk
Whew. Um, be ready to work your tail off for sure. Uh, we launched AI Build Lab as a s- what we thought was gonna be a side hustle actually. Um, I would say a couple different things. I would say constantly be listening to your customers and iterating and improving, 'cause we hear them, we understand if they're like grasping these concepts and the way that we're teaching it, is it connecting with them or not? And we're going back and revising it so that the next round we're getting better for each cohort. So that's one thing. Uh, once you start to get any sort of kind of, um, mass of students coming through, you need to start thinking about your own personal infrastructure. Like e- whether that's like a- AI-fying processes yourself or, uh, like bringing people team on, like adding the headcount, which we've done.
- AGAakash Gupta
Mm.
- TFTyler Fisk
Both of those. Um, yeah, and just like really honestly just like working your butt off as well. Like, and bringing value. We, we spend a lot of time in live sessions answering questions for folks. Um, and, and we find that that has like, we've poured into them, like we have very much so an abundance mindset, and because of that, we don't have to advertise very much. Almost all of our business is word of mouth.
- AGAakash Gupta
Wow.
- TFTyler Fisk
And, and even talking to like the Maven, uh, executive team, even, even, uh, Gagan himself, uh, they're like, "Y'all are a bit of enigma to us," because we don't have PhDs after our names, and we didn't work for any of the big Silicon Valley tech companies, and yet somehow we've managed to come out of nowhere and make a seven-figure course on Maven. So I, I think it's just like pouring yourself into the quality of the product that you're giving to your customers and really caring about that.
- AGAakash Gupta
Are you able to share any facts and figures for folks, like how the business is doing now or how big the latest cohort was?
- TFTyler Fisk
Yeah. Um, we have had, uh, the-- I don't even know what the new... So we have another cohort starting up here in October. The one that's in there right now is between 100 and 200 people in foundations, and I believe we're over 200 people in our, uh, doc gen course, which is like the advanced, the next course. Um, we are now at a run rate where we're, um, running at like about $1.6 million in, in gross revenue, which is still kind of baffles me, um, because I, I'm an entrepreneur. I've been a part of multiple businesses and launched multiple businesses, and Sara and I take that for granted sometimes of like how quickly we've been able to hit that, and we're grateful for that for sure.
- AGAakash Gupta
Definitely a really interesting case study. Uh, anybody who's watched this far knows why you've hit that success. It's the depth of the knowledge of these new tools that nobody else has. I've had plenty of people
- AGAakash Gupta
build AI agents on this show, and you have a next level depth of that knowledge. What's next for the AI Build Lab empire?
- TFTyler Fisk
Uh, that's something we've been talking about quite a bit actually internally. Um, I think a few different things. We, we really want to scale our, our foundations course, which is our, our, like, entry level course. It's definitely not entry level, but it's our first course because, uh, for us it's incredibly important to not just, like, say it, but actually do it, to bring more people from all over the world, diverse, um, areas of work and demographics so that we're, we are actually democratizing AI. Because we, we still live in a bubble. Um, most people still don't know what's even possible with this tech, and we want to, uh, enlighten folks and bring more voices and more, um, more people to this table because it- it's huge. So there's that, and then the other thing I would say is that we've been thinking a lot about trying to get into the product space. So like this Hey Gigawatt thing, we've been trying to figure it out because that also helps people go from needing to know all of this head knowledge. The idea is that Hey Gigawatt works for any kind of a non-technical user, guides them through this process, and builds everything for them in the back end. So it's kind of like, um, Vibe Coding and, uh, I don't even love that term, but kind of like that for agent building in, in an essence. So yeah.
- AGAakash Gupta
Well, I'll be cheering you on from the sidelines. I'm sure everybody who's watched and listened to this point will as well. Tyler, thank you so much for sharing your expertise with us.
- TFTyler Fisk
Thank you so much. It's been a lot of fun. I appreciate
- 1:39:53 – 1:40:33
Outro
- TFTyler Fisk
it.
- AGAakash Gupta
Bye, everyone.
- TFTyler Fisk
All right. Bye all.
- AGAakash Gupta
So if you want to learn more about how to shift to this way of working, check out our full conversation on Apple or Spotify Podcasts. And if you want the actual documents that we showed, the tools and frameworks and public links, be sure to check out my newsletter post with all of the details. Finally, thank you so much for watching. It would really mean a lot if you could make sure you are subscribed on YouTube, following on Apple or Spotify Podcasts, and leave us a review on those platforms. That really helps grow the podcast and support our work so that we can do bigger and better productions. I'll see you in the next one.
Episode duration: 1:40:43
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode WVU7MCfFet4
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome