Skip to content
ClaudeClaude

Teaching agents to learn from your team

Agent that improves itself daily by treating instructions as code: edited, reviewed, merged like any PR. Writing skills that teach agents how to think (not what to do). Closing feedback loop so team judgment flows back automatically.

May 22, 202628mWatch on YouTube ↗

CHAPTERS

  1. Why most agents stall at “80% done” and how to close the gap

    Petra opens by highlighting a common pattern: many people can build an agent, fewer keep one running, and even fewer are happy with it in production. The talk focuses on bridging that last-mile gap where agents “kind of work” but require constant babysitting.

  2. Meet Buzz: an agent that triages social mentions and drafts replies

    Buzz monitors social mentions of Warp and helps the team decide whether to reply, like, or skip. When a reply is needed, it drafts a message to reduce effort while preserving human authenticity in final sending.

  3. The hard part: judgment, taste, and avoiding “obviously AI” replies

    Petra explains why social engagement is difficult for agents: it requires nuance and brand-aligned decisions. Bad AI replies are easy to spot and can harm community trust, so the bar for quality is high.

  4. Why standard agentic loops work for coding—but not for fuzzy tasks

    The talk contrasts coding agents that can iterate against unit tests or external checks with social-reply agents that lack clear, immediate evaluators. Feedback from real users is slow, complex, and risky to use as a trial-and-error loop.

  5. Attempt #1: “Nail the prompt” turns into brittle checklists

    Their first approach was iterative prompt crafting—trying to encode everything upfront. This devolved into a long list of rules that broke on novel situations and produced robotic output.

  6. Shift from rules to principles: teach how to think, not what to do

    They reframed instructions as principles similar to how you’d onboard a new team member. This reduced instruction length significantly and improved generalization and tone.

  7. Attempt #2: Human critique still made the agent regress into hyper-specific rules

    Petra describes a test-and-fix cycle: collect mentions, compare Buzz’s outputs, and provide detailed human feedback. But when asked to “learn,” the agent tended to add narrow rules rather than improving principles.

  8. Teaching the agent to learn: a meta-skill that updates instructions appropriately

    They added a dedicated learning skill that compares agent output, human feedback, and existing instructions to identify gaps. The goal is for the agent to refine its principles in the right place, not append random rules.

  9. The key breakthrough: learn from the team’s normal workflow (Slack)

    To avoid ongoing training burden, they designed a feedback loop that piggybacks on what the team already does. Buzz posts suggestions in Slack; teammates react with emojis indicating the actual action taken and leave occasional notes.

  10. Turning Slack breadcrumbs into continuous improvement via PRs

    Buzz compares what it suggested versus what the team did, extracts takeaways, and proposes instruction updates. Changes are delivered as GitHub pull requests and summarized back in Slack for quick review.

  11. Human factors: keep it simple and make the agent feel like a teammate

    Petra emphasizes that feedback loops fail when they ask too much of people. Low-friction design and teammate-like interaction (name, personality, Slack presence) increases engagement and improves the quality of feedback.

  12. What it looks like in practice: skills, channel output, and example improvement

    Screenshots illustrate Buzz’s skill files (principle-based instructions), Slack triage messages, and a specific learning example (“don’t correct users” in certain contexts). The agent updates instructions in-place rather than tacking on arbitrary rules.

  13. Results and operating model: volume handled, metrics, and cloud orchestration

    Petra shares operational numbers and how Buzz runs in the background via Warp’s orchestration. Beyond triage and replies, Buzz also generates analytics (graphs, distributions, who’s replying) to keep the program healthy.

  14. Takeaway: design the feedback loop, not the perfect prompt

    The conclusion stresses that initial prompts only need to be “good enough.” The durable advantage comes from a feedback loop that continuously improves the agent as situations evolve and team preferences become clearer.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome