How to Build an Internal AI Agent That Evolves Itself

AnswerThis builds AI agents for evidence-based scientific workflows and has scaled past $2 million in ARR with just two full-time employees — largely because they built an internal AI ops agent that processes over 100 emails a day, closes support tickets, and updates their CRM automatically. In this recent batch talk, founder Ayush Garg breaks down the architecture of a self-extending agent that builds its own tools when it encounters tasks it can't handle yet, how his non-technical co-founder trains the agent by giving it feedback in Slack, and the three types of memory — factual, behavioral, and procedural — that any founder can copy to build an internal agent for their own business.

May 19, 20265mWatch on YouTube ↗

CHAPTERS

0:01 – 0:31
Internal AI ops agent: outcomes and why it matters
Ayush introduces AnswerThis and explains how a small team scaled to ~$2M ARR with the help of an internal AI ops agent. He frames the agent as a force-multiplier that reduces founder time spent on operational work.
- •AnswerThis builds AI agents for evidence-based scientific workflows
- •Small team (2 FTE + contractors) reaching significant ARR
- •Internal AI ops agent is a key leverage point
- •Goal: share a setup others can replicate
0:31 – 1:01
What the agent actually does day-to-day (email, support, CRM, feedback)
He details the concrete operational tasks the agent handles across the business. The emphasis is on replacing repetitive, high-volume workflow work with an always-on assistant.
- •Processes 100+ emails/day
- •Closed 400+ customer support tickets
- •Updates CRM after meetings
- •Collects user feedback across channels
- •Handles customer support workflows
1:01 – 1:16
Instantly queryable business context (asking the agent business questions)
Beyond automation, the agent becomes a single interface to business status and customer context. Instead of checking multiple apps, the team can query the agent directly.
- •Ask questions like lead status and open customer issues
- •Replaces context-switching across multiple tools
- •Creates a unified, searchable operational picture
- •Turns ops data into an on-demand interface
1:16 – 1:31
Core differentiator: a self-extending agent that writes new tools
Ayush explains that the real power is not a static task list but continuous capability growth. When the agent encounters repeated unmet needs, it triggers a coding sub-agent to create a new tool that persists.
- •Agent identifies repeated tasks it can’t yet do
- •Delegates tool creation to a coding sub-agent
- •New tools become permanent for future sessions
- •Shifts the agent from fixed automation to evolving automation
1:31 – 2:01
System architecture: thin harness + task queue for inbound channels
He outlines the high-level setup: a Claude Code CLI wrapped in Python, fed by a queue of tasks from Slack, email, and other sources. The agent processes tasks iteratively, benefiting from CLI-native capabilities.
- •Claude Code CLI as the main agent, wrapped in Python
- •Slack/email/other messages routed into a task queue
- •Agent iterates through tasks rather than one-off prompts
- •Thin harness philosophy enables flexibility and reliability
2:01 – 2:32
Injecting company-specific logic via read-only codebase + database access
To handle business-specific questions, the agent is given a read-only snapshot of the codebase and database. A cron job keeps these updated so the agent can infer subscription logic and app behavior directly from source.
- •Read-only access to codebase and database for factual grounding
- •Cron job refreshes snapshots each release
- •Agent answers support questions by reading source of truth
- •Learns subscription logic and system behavior from code
2:32 – 3:02
Tooling layer: startup service CLIs + a coding CLI that can modify the agent
Self-evolution depends on two tool categories: operational CLIs for key services and a coding agent that can edit the agent’s own code. This enables on-demand creation of new capabilities when gaps appear.
- •Expose tools like Intercom, Fathom, Stripe as CLIs
- •Provide a separate coding agent as a CLI
- •Coding agent can edit the main agent’s code/tooling
- •Unblocks tasks by creating missing tools on request
3:02 – 3:33
From skeleton to full toolkit: examples of autonomous tool creation
Ayush shares how the agent grew into a robust system by continuously authoring tools, now totaling 45+ CLIs. He gives a concrete example of creating a landing-page uptime monitor via a cron job.
- •Agent has created 45+ CLIs over time
- •Capability growth happens through real operational needs
- •Example: landing-page monitoring for ad reliability
- •Agent can create cron jobs and operational checks autonomously
3:33 – 4:03
Editable personality and memory via instructions.md (feedback loop)
A critical component is an instruction file loaded every turn, which the agent can edit. This creates an employee-like coaching loop where feedback becomes persistent behavior change.
- •instructions.md loaded on every agent turn
- •Agent can edit instructions.md itself
- •Feedback becomes persistent behavioral updates
- •Enables continuous improvement without manual code changes
4:03 – 4:34
Support-quality story: non-technical feedback that permanently fixes errors
He illustrates the value of editable memory with a customer support case: the non-technical co-founder corrected recurring mistakes directly in Slack. The agent updated its instructions/tooling and eliminated that mistake class going forward.
- •Non-technical co-founder notices recurring support mistakes
- •Provides feedback directly to the agent in Slack
- •Agent updates instruction set (and tooling as needed)
- •Entire category of mistakes stops recurring
4:34 – 5:04
The three memories an internal agent needs: factual, behavioral, procedural
Ayush generalizes the approach into a mental model: factual memory (truth about the business), behavioral memory (how to act), and procedural memory (how to do recurring tasks). Each maps to a component of the architecture.
- •Factual memory: codebase + database (how the startup works)
- •Behavioral memory: instructions and feedback (how it should behave)
- •Procedural memory: encoded tools for recurring tasks
- •Model ties directly to reliability and self-improvement
5:04 – 5:33
Copy-the-stack checklist: minimal steps to replicate the setup
He closes with a practical recipe to recreate the system: use a coding-capable CLI harness, grant read-only access to core truth sources, add initial CLIs plus a coding agent, and maintain an editable instruction file. Connect via Slack/email and deploy.
- •Use Claude Code (or similar) as the main harness
- •Give read-only access to codebase and database
- •Provide basic service CLIs plus a coding-agent CLI
- •Load an instruction file that updates every turn
- •Connect through Slack/email (e.g., via SSH) to operationalize