Aakash GuptaHow to Become a Builder PM (n8n, Claude Code, OpenClaw)
CHAPTERS
Why “builder PM” is more than using trendy tools
Mahesh and Aakash frame the moment: PMs are increasingly expected to prototype, automate, and even push code. They open by rejecting the misconception that simply using Claude Code or OpenClaw makes you a builder PM—what matters is responsibility for outcomes and the ability to ship first versions quickly.
- •PM expectations shifting toward building, prototyping, and shipping
- •Misconception: tool usage ≠ builder PM identity
- •Builder PMs focus on outcomes—getting to real users fast
- •AI compresses timelines from PRD → mock → prototype → signals
Defining a builder PM: customer clarity + first-version shipping without a dev team
Mahesh defines a builder PM as someone who can work backwards from customers and produce a usable first version—often to the first ~10 customers—without relying on dedicated engineering. The role is positioned as “diffusing AI benefits into the economy,” not just managing roadmaps.
- •Builder PM = talk to customers, decide what to build, ship v1
- •Engineering background helps, but mindset/approach is central
- •Goal: reach initial customers and validate demand quickly
- •AI makes prototyping/building accessible to non-engineers
Agent fundamentals via n8n: models vs scaffolding (knowledge, memory, tools, guardrails)
Using a “human development” analogy, Mahesh breaks down agents into the intelligence layer (model) plus scaffolding: knowledge/context, memory (state), tools (actions), and guardrails. He argues builder PMs must understand these primitives to debug and scale agent behavior.
- •Model = intelligence; agent = harness/scaffolding around it
- •Four pillars: knowledge/context, memory, tools/actions, guardrails
- •Why understanding layers matters for reliability and iteration
- •n8n as a great environment for learning these concepts
Live demo: build a basic n8n agent from scratch (and watch it fail without context)
Mahesh creates a minimal n8n AI agent with a chat model and tests it with general questions, then with a current-events query to show knowledge cutoff limitations. The point: a bare model is useful but insufficient for real-world tasks that require fresh information.
- •Create agent node + connect an LLM chat model
- •Inspect inputs/outputs to understand what’s actually sent to the model
- •Demonstrate knowledge cutoff with real-world/current event question
- •Motivation for adding tools and external context
Adding tools and memory: search augmentation + conversational continuity
He adds a web search tool (Tavily) to retrieve up-to-date information and then shows the next failure mode: without memory, the agent can’t reference prior turns. Adding a memory component enables follow-up questions and continuity without re-calling tools every time.
- •Tool calling to fetch fresh info from the web
- •Memory solves “stateless assistant” behavior
- •Agent decides when to call tools vs use memory
- •Builder PM takeaway: reliability requires proper scaffolding
Company knowledge with RAG: ingest contracts, chunking, embeddings, retrieval
Mahesh demonstrates building a knowledge base by uploading a contract (MSA), chunking it, generating embeddings, and storing it for retrieval. The agent can then answer contract-specific questions grounded in company documents instead of generic legal advice.
- •RAG pipeline: data loader → text splitting → embeddings → vector store
- •Chunk size/overlap shown as a concrete tuning lever
- •From generic answers to contract-grounded responses
- •PM relevance: bring company knowledge into agent workflows
Multi-agent workflows and “email-to-analysis” automation
Moving beyond a single agent, Mahesh shows a multi-agent system that triggers from Gmail: send an email request and receive structured contract-risk analysis back automatically. This illustrates end-to-end orchestration: ingestion, retrieval, analysis, and delivery through real channels.
- •Multi-agent orchestration with connectors (e.g., Gmail)
- •Published workflow triggers automatically on new emails
- •Generates structured risk analysis reports
- •Pattern: treat the agent system like a service, not a chat toy
Evals and ground truth: measuring agent quality before it gets you fired
Mahesh emphasizes that agents don’t self-police; humans bear the consequences. He demonstrates creating ground-truth labels (e.g., lawyer-reviewed contract terms) and running evaluation workflows to score risk detection and modification quality.
- •Need for evaluations because agents won’t be accountable
- •Ground truth creation (human-labeled expected outputs)
- •LLM-as-judge + rule checks to assess performance
- •Example metrics: risk detection vs suggestion/modification quality
Where n8n falls short: scaling beyond the first customers and into production
n8n is positioned as ideal for learning and early traction, but it becomes limiting for collaborative development, testing, containerization, and production operations. Mahesh highlights the lack of a clean path from visual workflows to code-based teamwork.
- •Great for rapid prototypes, debugging, and early users
- •Weaknesses: production hardening, tests, containers, collaboration
- •Limited “code mode” visibility for teams
- •Transition point: when you need scale, latency, maintainability
When to use Claude Code: from workflows to code + reusable skills
Mahesh recommends using n8n briefly, then moving to Claude Code (and tools like Cowork) to combine delegation with real codebases. Claude Code is framed as a universal work engine: if it can code reliably, it can perform many knowledge-worker tasks reliably too.
- •Suggested path: 2 weeks n8n → move to Claude Code
- •Claude Code supports both non-coders and experienced engineers
- •Concepts: skills, sub-agents, hooks, scheduled jobs
- •Why it’s “hot”: merges delegation + code iteration in one place
What changed in Dec 2025: agent loop productization + computer control + long-horizon jobs
Mahesh explains the shift: Anthropic’s agent loop (context → actions → evals) became a product surface, supercharged by computer control (files + bash + browser) and longer-horizon models. This collapses entire categories of point solutions (context companies, action companies, eval tooling) into one platform.
- •Agent loop: manage context, take actions, run evals iteratively
- •Computer control unlock: file system + bash + browser as universal tools
- •Long-horizon capability expanded from minutes to hours
- •Result: fewer specialized wrappers; more work done end-to-end in one tool
Live Claude Code demo: PRD review automation + continuous learning from your edits
Mahesh shows a PRD/two-pager review workflow: Claude applies a checklist, inserts comments directly into the document, and produces an annotated output. He then layers on a learning system that monitors his manual edits, writes “learner.md,” and suggests checklist updates after repeated patterns—creating a human-in-the-loop improvement flywheel.
- •PRD review skill: apply checklist and comment in-doc automatically
- •Checklist stored as a versioned artifact in the repo
- •Post-review learner agent compares AI output vs human edits
- •Continuous improvement loop: detect patterns → propose checklist updates → human approve
Beyond PRDs: competitive analysis, mocks, prototypes, and dashboards in one compressed cycle
Mahesh outlines how Claude Code can cover the full PM cycle: competitive research, mock creation, prototype building, and analytics/dashboarding. The core claim is time compression: what used to take months can be reduced dramatically when one tool spans research-to-build-to-measure.
- •Competitive analysis agents that research and synthesize reports
- •Generate mocks/visualizations without waiting on design cycles
- •Convert mocks into working prototypes customers can touch
- •Instrument usage and build dashboards to read signals fast
OpenClaw deep dive: delegation through channels + sandboxed machines + model flexibility
OpenClaw is presented as an open-source pattern built on an agentic loop, emphasizing delegation (do work asynchronously) and channel integration (WhatsApp/Slack/Signal/etc.). It runs in a controlled environment (e.g., Mac Mini or VM) and can swap models (including open-source) to avoid vendor limits.
- •OpenClaw as a pattern: agent loop + delegation + channels
- •Connectors for WhatsApp/Slack/Signal/Telegram and more
- •Sandbox approach: dedicate a machine/VM to the agent’s work
- •Model flexibility: not tied to a single provider’s limits
Enterprise reality: security constraints, sandboxing as the next frontier, and AI PM interviews
They discuss why big companies can’t simply deploy OpenClaw broadly and argue the future is provider-controlled sandboxing (secure VMs) that reproduce issues, test fixes, and return solutions safely. Mahesh closes with how AI PM interviews are changing—more case studies, more system design, and an expectation that candidates use modern tools during the process—plus his personal comp trajectory and why he left big tech to build independently.
- •Bigco constraint: access, permissions, and security concerns
- •Sandboxing/controlled VMs as the key unsolved platform problem
- •AI PM interviews: case studies + system design + tool fluency expected
- •Career perspective: comp trajectory in AI, and why autonomy beats bureaucracy