Masterclass: How to Turn an AI Agent into a Real Product (No Code)

Tyler Fisk reveals how to build AI agents that actually work in production. He breaks down the meta-prompting system, shows how to build multi-agent architectures live, and demonstrates why most teams fail at production agents. --- Full Writeup: https://www.news.aakashg.com/p/tyler-fisk-podcast Transcript: https://www.aakashg.com/how-to-build-production-ai-agents-complete-masterclass-with-tyler-fisk/ --- Timestamps: 00:00:00 - Intro 00:01:34 - Tyler's Background: AI Agent Expert 00:03:00 - Live Demo: Building Apple Customer Service 00:04:12 - Gigawatt: The Agent That Builds Agents 00:16:08 - Ads 00:17:00 - Deep Research & Knowledge Base Setup 00:26:02 - Why Multi-Agent Systems Beat Single Agents 00:30:00 - Temperature Settings Explained (Ice Peak) 00:34:05 - Ads 00:39:47 - System Instructions & Meta-Prompting 00:51:26 - Testing Core & Echo Agents Together 01:05:50 - RAG Databases & Enterprise Documents 01:24:11 - Production Workflow with Human-in-Loop 01:35:10 - $1.6M Course Business Results 01:39:53 - Outro --- 🏆 Thanks to our sponsors: 1. Maven: Get $135 off Tyler’s course with my code AAKASHxMAVEN - https://maven.com/sara-davison/scale-with-aiworkflows-foundations?promoCode=AAKASHxMAVEN 2. Vanta: Get $1,000 off AI security & compliance at vanta.com/acos - http://vanta.com/aakash 3. Testkube: Leading test orchestration platform - http://testkube.io/ 4. Kameleoon: Leading AI experimentation platform - http://www.kameleoon.com/ 5. The AI Evals Course for PMs: Get $1155 off with code ‘ag-evals’ - https://maven.com/parlance-labs/evals?promoCode=ag-evlas --- Key takeaways: 1. Stop Vibe Coding: Most teams write one prompt, test twice, ship to production, and hope for the best. Tyler's rule: "We would never put it into production without a human-in-the-loop checkpoint. 2. Use Meta-Prompting to Build Agents: Tyler built Gigawatt—an agent with 72,000 characters of system instructions that builds other agents. It researches the domain, writes V1 instructions, evaluates itself (scores out of 100), identifies gaps, and rewrites to V2. Goes from 77% to 86%+ quality. 3. Build Multi-Agent Architectures: Don't build one agent that does everything. Separate concerns like you'd separate teams. For Apple: Core (expert agent, temp=0, finds facts) + Echo (email agent, temp=0.7, writes responses). 4. System Instructions Need 7K-9K Tokens: Structure includes Role (job description), Context (business details), Instructions (step-by-step process), Criteria (guardrails), Examples (meta reasoning). 5. Temperature Is Your Secret Weapon: Tyler's Toy Story analogy: Imagine an icy peak in a claw machine. Temp=0 (frozen): claw picks from top only—deterministic, precise. Temp=1 (melted): claw grabs anywhere—creative, varied. 6. Information Hierarchy Prevents Hallucinations: Priority order: RAG database first (scraped company docs), System instructions second (built-in expertise), Web search third (with chain-of-verification). When agents search without verification, they hallucinate. 7. Build Complete Workflows: Tyler's 9-step production workflow with 5+ agents: Email arrives → Sentiment analysis (Cinnamon) → Expert research (Core) → Email writing (Echo) → QA loop → Human checkpoint (Slack) → Generative filter → Send → Log to memory. 8. Observational Evals Come First: Test 20+ different scenarios manually. Include edge cases and adversarial inputs. Document every failure. Save golden examples. Only after building confidence do you add systematic evals in production. 9. Calculate ROI as Labor Cost Reduction: Traditional cost: $460/day (expert time + customer service rep + manager review) = $138K/year. AI cost: $153/day (platform fees + API credits + human review) = $45.9K/year. Savings: $92K annual (67% reduction). 10. Emotion Prompting Actually Works: Tyler ends every prompt with "Go get 'em slugger." Based on research: positive reinforcement improves LLM outputs by ~15%. The same psychology that works on humans works on LLMs. "Be nice to your AI. They're gonna have robot bodies soon." --- 👨‍💻 Where to find Tyler: Instagram: https://www.instagram.com/tyfisk/ LinkedIn: https://www.linkedin.com/in/tyfisk/ AI Build Labs: https://www.linkedin.com/company/ai-build-lab/ --- 👨‍💻 Where to find Aakash: Twitter: twitter.com/aakashg0 LinkedIn: linkedin.com/in/aagupta/ Newsletter: news.aakashg.com #aiagents #productmanagement 🧠 About Product Growth: The world's largest podcast focused solely on product + growth, with over 187K listeners. Hosted by Aakash Gupta, who spent 16 years in PM, rising to VP of product, this 2x/week show covers product and growth topics in depth. 🔔 Subscribe and turn on notifications to get more vidoes like this.

Aakash GuptahostTyler Fiskguest

Oct 16, 20251h 40mWatch on YouTube ↗

CHAPTERS

Why this episode: from “vibe coding” to production AI agents
Aakash sets the stage for a live build with Tyler Fisk, positioning him as a seasoned practitioner who takes agent building from prototypes to production workflows. Tyler previews the plan: go from idea to a working multi-agent system in real time, without requiring a CS background.
Tooling overview: TypingMind playground + the “Gigawatt” agent that builds agents
Tyler introduces TypingMind as a flexible LLM playground and brings in Gigawatt—his prompt/AI engineering agent—connected to models and tools (search, sequential thinking). He explains the “AI practitioner / deployed engineer” mentality: research first, requirements next, then build.
Defining the Apple customer support use case and agent roles
They scope the workflow: handle inbound Apple customer-service emails using two agents—an internal expert and an outward-facing email responder. Tyler has Gigawatt ask clarifying questions to align on scope (consumer products, all support categories) and information sources (RAG + web with verification).
How to replicate Gigawatt: meta-prompting, research frameworks, and chain-of-verification
Aakash probes how others can recreate Gigawatt if they don’t have access. Tyler explains it’s an accumulation of best practices and research techniques, including Meta’s chain-of-verification approach to reduce hallucinations and improve factuality.
Building the knowledge base fast: Cassidy scraping + parallel Deep Research agents
Tyler switches to Cassidy (no-code) to scrape apple.com into a RAG knowledge base while also launching Deep Research in parallel. He introduces a second helper agent (“Clear”) specialized in writing high-quality Deep Research prompts for tools like Perplexity and Claude.
Creating a PRD for the expert agent (“Core”) and why multi-agent beats single-agent
Gigawatt produces a PRD for the Apple expert agent, “Core,” to lock alignment on inputs/outputs and avoid role confusion. Tyler explains why separating expert reasoning from customer communication mirrors real organizations and improves controllability, tone, and reliability.
Temperature explained with the “icy peak” claw-machine analogy
Tyler breaks down temperature as a control over randomness/creativity by altering the probability distribution of token selection. He connects it to agent design: experts often benefit from low temperature, while customer-facing writing may benefit from higher temperature.
System instructions V1 for Core: XML structure, meta-examples, and self-review (meta-prompting loop)
Tyler has Gigawatt draft Core’s system instructions in XML with defined sections (role, context, instructions, criteria, examples). He avoids early “shot prompts” and instead uses scenario-based meta-examples, then runs a structured self-evaluation to iterate from a ‘B’ to an ‘A’ version.
Deploying Core in Cassidy: model choice tradeoffs, RAG + tools, and first test query
Tyler moves Core into Cassidy, enabling RAG, web search, and data analysis, and discusses model selection (speed, reasoning, latency, cost, context window, redundancy). They test Core with a device-purchase question; Core returns a structured JSON answer with citations, then converts it to readable markdown for review.
Building the email agent (“Echo”): Apple tone/brand alignment and cross-model quirks (XML tags)
Using example system instructions from a different email agent (Hattie B’s), Tyler has Gigawatt adapt the voice to Apple and generate a new email-focused agent. He deploys the agent on Gemini, notes how newer models emulate XML tag patterns (think/scratchpad/answer), and runs observational evals by having Echo turn Core’s research into a customer-ready email.
From agents to production workflow: human-in-the-loop, Slack approvals, and branching logic
Tyler shows what productionization looks like in Cassidy: triggers from email, sentiment analysis, expert research, drafting, QA loops (“toast method”), and a Slack-based approval step. A generative filter routes decisions (ship/revise/confirm), enabling safe autonomy while maintaining oversight and audit trails.
Scaling considerations: RAG for enterprise docs, GraphRAG/agentic RAG, evals, and cost framing
They discuss ingesting large enterprise document sets (OCR, chunking, embeddings) and the limitations of naive RAG as corpora grow or become outdated. Tyler emphasizes agentic RAG/GraphRAG for relational, updatable knowledge, and frames costs in terms of labor saved and improving model economics over time; then they close with Tyler’s course-business results and product ambitions for “Hey Gigawatt.”

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

Why this episode: from “vibe coding” to production AI agents

Tooling overview: TypingMind playground + the “Gigawatt” agent that builds agents

Defining the Apple customer support use case and agent roles

How to replicate Gigawatt: meta-prompting, research frameworks, and chain-of-verification

Building the knowledge base fast: Cassidy scraping + parallel Deep Research agents

Creating a PRD for the expert agent (“Core”) and why multi-agent beats single-agent

Temperature explained with the “icy peak” claw-machine analogy

System instructions V1 for Core: XML structure, meta-examples, and self-review (meta-prompting loop)

Deploying Core in Cassidy: model choice tradeoffs, RAG + tools, and first test query

Building the email agent (“Echo”): Apple tone/brand alignment and cross-model quirks (XML tags)

From agents to production workflow: human-in-the-loop, Slack approvals, and branching logic

Scaling considerations: RAG for enterprise docs, GraphRAG/agentic RAG, evals, and cost framing

Get more out of YouTube videos.