Skip to content
Aakash GuptaAakash Gupta

Masterclass: How to Turn an AI Agent into a Real Product (No Code)

Tyler Fisk reveals how to build AI agents that actually work in production. He breaks down the meta-prompting system, shows how to build multi-agent architectures live, and demonstrates why most teams fail at production agents. --- Full Writeup: https://www.news.aakashg.com/p/tyler-fisk-podcast Transcript: https://www.aakashg.com/how-to-build-production-ai-agents-complete-masterclass-with-tyler-fisk/ --- Timestamps: 00:00:00 - Intro 00:01:34 - Tyler's Background: AI Agent Expert 00:03:00 - Live Demo: Building Apple Customer Service 00:04:12 - Gigawatt: The Agent That Builds Agents 00:16:08 - Ads 00:17:00 - Deep Research & Knowledge Base Setup 00:26:02 - Why Multi-Agent Systems Beat Single Agents 00:30:00 - Temperature Settings Explained (Ice Peak) 00:34:05 - Ads 00:39:47 - System Instructions & Meta-Prompting 00:51:26 - Testing Core & Echo Agents Together 01:05:50 - RAG Databases & Enterprise Documents 01:24:11 - Production Workflow with Human-in-Loop 01:35:10 - $1.6M Course Business Results 01:39:53 - Outro --- 🏆 Thanks to our sponsors: 1. Maven: Get $135 off Tyler’s course with my code AAKASHxMAVEN - https://maven.com/sara-davison/scale-with-aiworkflows-foundations?promoCode=AAKASHxMAVEN 2. Vanta: Get $1,000 off AI security & compliance at vanta.com/acos - http://vanta.com/aakash 3. Testkube: Leading test orchestration platform - http://testkube.io/ 4. Kameleoon: Leading AI experimentation platform - http://www.kameleoon.com/ 5. The AI Evals Course for PMs: Get $1155 off with code ‘ag-evals’ - https://maven.com/parlance-labs/evals?promoCode=ag-evlas --- Key takeaways: 1. Stop Vibe Coding: Most teams write one prompt, test twice, ship to production, and hope for the best. Tyler's rule: "We would never put it into production without a human-in-the-loop checkpoint. 2. Use Meta-Prompting to Build Agents: Tyler built Gigawatt—an agent with 72,000 characters of system instructions that builds other agents. It researches the domain, writes V1 instructions, evaluates itself (scores out of 100), identifies gaps, and rewrites to V2. Goes from 77% to 86%+ quality. 3. Build Multi-Agent Architectures: Don't build one agent that does everything. Separate concerns like you'd separate teams. For Apple: Core (expert agent, temp=0, finds facts) + Echo (email agent, temp=0.7, writes responses). 4. System Instructions Need 7K-9K Tokens: Structure includes Role (job description), Context (business details), Instructions (step-by-step process), Criteria (guardrails), Examples (meta reasoning). 5. Temperature Is Your Secret Weapon: Tyler's Toy Story analogy: Imagine an icy peak in a claw machine. Temp=0 (frozen): claw picks from top only—deterministic, precise. Temp=1 (melted): claw grabs anywhere—creative, varied. 6. Information Hierarchy Prevents Hallucinations: Priority order: RAG database first (scraped company docs), System instructions second (built-in expertise), Web search third (with chain-of-verification). When agents search without verification, they hallucinate. 7. Build Complete Workflows: Tyler's 9-step production workflow with 5+ agents: Email arrives → Sentiment analysis (Cinnamon) → Expert research (Core) → Email writing (Echo) → QA loop → Human checkpoint (Slack) → Generative filter → Send → Log to memory. 8. Observational Evals Come First: Test 20+ different scenarios manually. Include edge cases and adversarial inputs. Document every failure. Save golden examples. Only after building confidence do you add systematic evals in production. 9. Calculate ROI as Labor Cost Reduction: Traditional cost: $460/day (expert time + customer service rep + manager review) = $138K/year. AI cost: $153/day (platform fees + API credits + human review) = $45.9K/year. Savings: $92K annual (67% reduction). 10. Emotion Prompting Actually Works: Tyler ends every prompt with "Go get 'em slugger." Based on research: positive reinforcement improves LLM outputs by ~15%. The same psychology that works on humans works on LLMs. "Be nice to your AI. They're gonna have robot bodies soon." --- 👨‍💻 Where to find Tyler: Instagram: https://www.instagram.com/tyfisk/ LinkedIn: https://www.linkedin.com/in/tyfisk/ AI Build Labs: https://www.linkedin.com/company/ai-build-lab/ --- 👨‍💻 Where to find Aakash: Twitter: twitter.com/aakashg0 LinkedIn: linkedin.com/in/aagupta/ Newsletter: news.aakashg.com #aiagents #productmanagement 🧠 About Product Growth: The world's largest podcast focused solely on product + growth, with over 187K listeners. Hosted by Aakash Gupta, who spent 16 years in PM, rising to VP of product, this 2x/week show covers product and growth topics in depth. 🔔 Subscribe and turn on notifications to get more vidoes like this.

Aakash GuptahostTyler Fiskguest
Oct 15, 20251h 40mWatch on YouTube ↗

At a glance

WHAT IT’S REALLY ABOUT

Build production multi-agent customer support products without code live demo

  1. Tyler builds a two-agent team—an expert “Core” agent plus an email-writing “Echo” agent—to mirror real company roles and improve reliability, tone, and task focus.
  2. The workflow emphasizes rigorous upfront discovery and documentation (PRD), then iterative system-instruction drafting using self-critique (“meta-prompting”) to raise quality before deployment.
  3. They set up a RAG knowledge base by scraping Apple’s site and importing deep-research reports, while enforcing an information hierarchy (RAG first, then system knowledge, then verified web).
  4. The episode explains practical agent-engineering concepts—temperature tradeoffs, tool/MCP connectors, and structured inter-agent handoffs (often JSON)—to reduce hallucinations and improve orchestration.
  5. For production, Tyler shows a no-code automation pattern with sentiment analysis, QA loops, Slack-based human approval, and logging/evals, plus guidance on model selection and cost justification via labor replacement value.

IDEAS WORTH REMEMBERING

5 ideas

Split agents by role to match real org structure and optimize outputs.

Tyler argues experts shouldn’t write customer emails directly; separating a deterministic expert agent from a higher-temperature brand-voice agent improves factuality, empathy, and instruction-following.

Use an explicit information hierarchy to curb hallucinations.

Core is instructed to consult the RAG first, then system knowledge, then carefully verified web search with “chain of verification” and confidence checks—reducing ungrounded claims.

Iterate system prompts with structured self-critique to reach “production-grade.”

Gigawatt drafts system instructions, grades them section-by-section (role/context/instructions/criteria/examples), and then rewrites additively; this replaces “ship the first prompt” behavior with repeatable improvement.

Prefer structured inter-agent handoffs (e.g., JSON) even if humans don’t like it.

Tyler notes JSON is easier for downstream agents to parse reliably; humans can request a markdown view for review while keeping machine-to-machine formatting stable.

Parallelize research and build steps with multiple specialized helper agents.

He runs web scraping, deep research (Perplexity/Claude), PRD drafting, and prompt iteration concurrently, dramatically reducing time-to-first-working-system.

WORDS WORTH SAVING

5 quotes

I joke when I tell people what I do for a living now is I talk funny to robots.

Tyler Fisk

We’re spinning up multiple agents here just to kind of get this process done and get all the context that we need as quickly as possible.

Tyler Fisk

The real-life experts typically are not the same people that you want answering the customer service emails.

Tyler Fisk

Temperature is like this icy peak inside of a claw machine… you’re changing the shape of the probability distribution curve.

Tyler Fisk

We would never put it into production without some sort of a human-in-the-loop checkpoint. That’s very irresponsible.

Tyler Fisk

No-code agent building (TypingMind, Cassidy)Multi-agent architecture: expert vs. email agentPRD-first build processRAG knowledge base setup via web scrapingDeep Research prompts and importing outputs into RAGMeta-prompting and agent self-review scoringProduction workflow: human-in-the-loop, Slack approvals, evals/observabilityTemperature tuning and role separationMCP/connectors and tool accessModel choice tradeoffs: speed, cost, context window, redundancy

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome