Skip to content
Aakash GuptaAakash Gupta

You'll be left Behind as an AI PM If You Don't Use ChatGPT Apps

Colin Matthews is back with the definitive guide to ChatGPT apps. MCP protocol explained, live app building demo, and eval strategies. Plus: why every PM should learn this new distribution channel. Full Writeup: https://www.news.aakashg.com/p/colin-matthews-podcast Transcript: https://www.aakashg.com/chatgpt-apps-guide-colin-matthews/ Chippy: https://chippy.build/ ---- Timestamps: 0:00 - Intro 3:09 - What Are ChatGPT Apps? 8:25 - Architecture & How They're Built 10:32 - Ads 11:24 - Building First App 19:52 - Live Demo: Healthcare App 30:18 - Ads 33:12 - Improving with Evals 40:19 - PM Role & Prototyping Debate 52:01 - Ideas for Solo Builders 54:38 - Colin's Solopreneur Year 1:01:26 - Outro ---- 🏆 Thanks to our sponsors: 1. Colin's ChatGPT Apps Course: Next cohort starts February 13th on Maven - https://bit.ly/4qd2ryx 2. Vanta: Automate compliance, Get $1,000 with my link: https://www.vanta.com/lp/demo-1k?utm_campaign=1k_offer&utm_source=product-growth&utm_medium=podcast 3. Land PM Job: 12-week experience to master getting a PM job - https://www.landpmjob.com/ 4. Naya One: Accelerate AI adoption in financial services - https://nayaone.com/ 5. Mobbin: The world's largest mobile & web design library - Get 20% off: https://mobbin.com/?via=aakash ---- Key Takeaways: 1. ChatGPT apps = MCP + widgets - The Model Context Protocol (invented by Anthropic) lets AI agents call external tools. OpenAI added UI widgets on top to create embedded app experiences directly in chat. 2. 900M weekly active users = massive distribution opportunity - This is the new SEO. Early data shows 26% higher conversion from AI traffic vs traditional search. Every enterprise will eventually build here. 3. You're building for multiple platforms - MCP works across ChatGPT, Claude (coming soon), Cursor, and other AI tools. Build once, distribute everywhere. Gemini doesn't support it yet. 4. Apps get called based on tool descriptions - Your metadata matters. Like SEO but for LLMs. Run evals to test if correct prompts trigger your tools. Iterate on descriptions to improve discovery. 5. Three eval categories: direct, indirect, negative - Direct: user names your app. Indirect: user describes outcome. Negative: irrelevant request shouldn't trigger your tool. Test all three systematically. 6. PMs should prototype but engineers ship production - Use tools like Chippy to prototype quickly and test concepts. Show stakeholders real interactions. Engineering team builds the production version. 7. Enterprise-first, solo builders second - Large companies (Target, Uber, Canva) are early adopters chasing distribution. But huge opportunity for indie builders once public marketplace launches. 8. Best opportunities: embedded collaboration tools - Spreadsheets, task lists, whiteboards where ChatGPT can partner with you. Not just search results—actual interactive experiences. 9. Error analysis on observability logs is critical - Track what prompts triggered which tools with what parameters. Look for mismatches between expected and actual behavior. Iterate tool descriptions. 10. Marketplace launching by end of 2024/early 2025 - Currently only launch partners can publish. Public marketplace coming soon means anyone can ship apps and reach ChatGPT's massive user base. ---- 👨‍💻 Where to find Colin Matthews: LinkedIn: https://www.linkedin.com/in/colinmatthews-pm/?originalSubdomain=ca Newsletter: https://blog.techforproduct.com/ 👨‍💻 Where to find Aakash: Twitter: https://www.x.com/aakashg0 LinkedIn: https://www.linkedin.com/in/aagupta/ Newsletter: https://www.news.aakashg.com #chatgptapps #aipm ---- 🧠 About Product Growth: The world's largest podcast focused solely on product + growth, with over 200K+ listeners. 🔔 Subscribe and turn on notifications to get more videos like this.

Colin MatthewsguestAakash Guptahost
Jan 22, 20261h 2mWatch on YouTube ↗

CHAPTERS

  1. Why ChatGPT apps are a major opportunity for product builders

    Aakash frames the episode as a practical breakdown of what the emerging “ChatGPT App Store” means for builders, beyond the headline demos. Colin previews why embedded apps inside chat can create new distribution and product surface areas for teams and solo builders.

    • Goal: move past hype/use-case montages to builder-relevant implications
    • Chat-based apps as a new distribution channel (potentially underrated)
    • What PMs should understand now before the marketplace fully opens
  2. What “ChatGPT apps” are (and how the App Store concept works)

    Colin defines ChatGPT apps as branded, designed interactive experiences that run inside a ChatGPT conversation—more than text outputs or web results. They let companies control user experience, guide interactions, and sometimes hand users off to complete actions in the company’s own product.

    • Embedded UI + interaction directly inside the chat thread
    • Companies control branding and workflow vs relying on generic web search answers
    • App-store-like browsing/installing is expected to evolve from today’s limited discoverability
    • Apps can also deep-link users back to the native product for completion steps (e.g., checkout)
  3. Discovery, intent, and why enterprises care (Target/Expedia examples)

    They unpack why discovery—even if currently ‘hidden’—can become powerful once ChatGPT starts surfacing relevant apps automatically. Colin explains why AI-referred users can convert better and why enterprises want a deterministic way to appear inside ChatGPT rather than playing “Whac-A-Mole” with SEO.

    • Automatic surfacing: ChatGPT may propose an app even if user didn’t install it
    • AI traffic can have higher intent and higher conversion despite lower volume
    • Deterministic presence vs volatile web-search ranking battles
    • Example pattern: build cart in-chat, finish purchase in first-party app/site (Target)
  4. What regular builders can ship: utilities and micro-apps inside chat

    Colin predicts a familiar curve: big brand apps plus small utility apps (like early iOS “flashlight apps”). He outlines lightweight but sticky ideas—spreadsheets, to-do lists pinned to the top, and other tools that create repeat usage within ChatGPT.

    • Utility apps: spreadsheet-like collaboration, to-do/task checklist UI, pinned tools
    • More complex apps: maps, navigation, search, integrations—few limits long-term
    • Early apps are bare-bones because the platform is early; complexity will increase
    • Opportunity for niche workflows where chat needs structured UI to be useful
  5. Architecture deep dive: MCP + tool calling + UI widgets

    Colin explains the underlying mechanism: Model Context Protocol (MCP) lets an AI client discover and call external tools. OpenAI’s UI “widgets” layer (now being incorporated into MCP) adds the ability to render interactive interfaces inside chat instead of only returning text/data.

    • MCP (from Anthropic): a standard for agents to call tools over the internet
    • Flow: user request → model decides to use a tool → fetch tool list → call tool with parameters
    • Caching of tool schemas/availability; refresh when needed
    • Widgets/UI: tool returns data plus UI code/URL so ChatGPT can render an interface in-chat
  6. Building a ChatGPT app: the easy path (Chippy) vs the hard path (DIY)

    Colin contrasts building with his platform (Chippy) versus spinning up your own MCP server and UI bundling pipeline. Chippy is positioned as an iteration-friendly environment that previews the chat experience without repeatedly rebundling and reconnecting inside ChatGPT.

    • Easy path: Chippy spins up MCP server + tool scaffolding optimized for ChatGPT apps
    • Preview/testing loop built-in: simulated chat + UI rendering before connecting to ChatGPT
    • Hard path: host MCP server, define tools, build UI, handle bundling/packaging for rendering
    • Extra complexity: form factors (full screen vs inline), OpenAI guidelines, auth, deployment
  7. Live setup: connecting an MCP app and ways users invoke it

    They walk through connecting an app via a generated MCP URL and enabling it in ChatGPT settings. Colin explains three invocation modes: typing the app name, manually selecting/tagging it, or relying on ChatGPT to choose it automatically for relevant prompts.

    • Developer workflow: create connector, paste MCP URL, configure auth/name
    • Invocation modes: name-based, manual selection, or automatic relevance-based routing
    • “Automatic invocation” becomes the distribution battleground (showing up on relevant queries)
    • Marketplace status: public publishing not yet open; sharing via URL works for testing
  8. Live build: a healthcare reviews manager app (tools + UI)

    Aakash chooses a healthcare domain and they build a hospital reviews app with three tools: view reviews, share reviews, and review analytics. They discuss PM value: using these apps as prototypes/specs and learning the full AI product loop quickly.

    • Domain selection: hospital/surgeon reviews as revenue-relevant use case
    • Tool design: viewing, sharing, analytics as separate callable capabilities
    • Plan mode to draft tool breakdown before generating implementation
    • PM angle: prototype-to-spec workflow; engineers can productionize later
  9. Observability and evals: measuring whether the right tool gets called

    Colin demonstrates logs/observability showing the user prompt, tool called, and parameters selected by ChatGPT. He introduces OpenAI’s eval categories—direct, indirect, and negative—to systematically test discoverability and prevent irrelevant tool calls.

    • Observability: inspect prompts, tool routing, parameters, and outputs
    • Eval types: direct (names app), indirect (describes outcome), negative (should not call tool)
    • Build a “golden set” from real interactions and log expected behavior
    • Tool-routing quality is a core metric: does the model choose the intended tool?
  10. Improving performance with eval feedback: fix tool metadata/descriptions

    They run an “auto eval” and see a mismatch: the prompt “I want to share a review” should trigger the share tool but calls the view tool due to ambiguous wording. Colin shows how small edits to tool descriptions (metadata) can improve routing behavior and recommends iterating via tests rather than guessing.

    • Auto evals provide fast directional feedback but may differ from in-product ChatGPT behavior
    • Failure analysis: the word “sharing” in the view tool description caused misrouting
    • Fix approach: refine tool descriptions, add clarity and examples where needed
    • Broader lesson: iterate with evals instead of overthinking a PRD in advance
  11. PM role debate: prototyping is a skill that amplifies core PM work

    Aakash raises concerns about endlessly expanding PM responsibilities (citing Itamar Gilad’s framework). Colin argues AI prototyping isn’t a new responsibility category; it’s a skill (like Figma) that strengthens communication, stakeholder alignment, and customer discovery—when used with intent.

    • Reframe: prototyping supports core PM work (alignment, discovery), not a separate job duty
    • Comparable to earlier tools (Balsamiq/Figma) for visualizing ideas quickly
    • Risk: “vibe coding” without purpose won’t replace fundamentals like research and strategy
    • Best use: faster feedback loops and clearer communication with stakeholders/customers
  12. Strategy mind map: benefits, who should build, and when it matters

    They map benefits across personal skill-building and enterprise growth. Colin expects pods (PM/design/engineering) to own the effort, with PMs focusing on why/priority and ongoing iteration (evals, analytics, incremental shipping) to capture AI-driven demand and retention.

    • Primary enterprise benefit: growth—get in front of high-intent ChatGPT users
    • Team structure: cross-functional pod (design form factor + engineering complexity + PM prioritization)
    • PM decision criteria: test form factors, watch competitor adoption, assess retention/re-engagement value
    • Operational loop: ship small, evaluate usage, improve tool calling and outcomes
  13. Ideas for solo builders + cross-platform bet: MCP beyond ChatGPT

    Colin suggests building embedded collaborative utilities where structured UI beats pure chat (spreadsheets, presentations, whiteboards). He highlights distribution as the advantage—even if feature depth lags incumbents—and notes MCP makes apps portable to other clients (Claude, Cursor, Lovable), reducing single-platform risk.

    • Good solo ideas: structured collaborative UIs where ChatGPT can co-edit with users
    • Compete via distribution: “good enough” embedded alternatives can win adoption
    • E-commerce and content creation are natural fits (cart building, mini Canva/Figma-like flows)
    • MCP portability: potential to work across ChatGPT, Claude (in progress), and other MCP-enabled tools
  14. Colin’s solopreneur year: experimentation, stack, and tool choices

    Aakash shifts to Colin’s creator/builder journey: balancing operations with new bets, quickly dropping ideas that won’t reach excellence, and aiming for a durable software product alongside teaching. Colin shares his stack (Replit for UX exploration; VS Code + Claude Code/Codex, Neon, Render) and why he prefers VS Code over Cursor.

    • Time split: maintaining the business vs testing new commercial/interest-driven bets
    • Experiments: multiple SaaS apps, two prototyping tools, RAG exploration, podcast trial
    • Stack: VS Code + Claude Code/Codex, Git/GitHub, Neon (DB), Render (hosting), vendors like Voyage for embeddings
    • Tool philosophy: prioritize code-gen quality over IDE “bells and whistles”
  15. Wrap-up: platform readiness, cautious optimism, and calls to action

    They close with a balanced view: the form factor is promising, but adoption depends on marketplace/discovery execution by OpenAI. Aakash recaps the episode as a masterclass for PMs and encourages engagement and subscriptions.

    • Still early: few launch partners; success hinges on marketplace + discovery UX
    • Expectation: clarity within a few months whether it becomes a major platform
    • Recap: why PMs should learn now to avoid being behind as AI PMs
    • Outro: follow/subscribe, newsletter bundle promos, audience feedback invitation

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.