Skip to content
How I AIHow I AI

An exclusive inside look at GPT-5

In this episode, I share my hands-on experience with OpenAI’s GPT-5, the company’s new frontier model. As one of the first users outside of OpenAI to test the model, I put GPT-5 head-to-head with GPT-4.1 across real-world product use cases—from writing PRDs to generating code to assisting with visual design work. This is my unfiltered look at what GPT-5 can (and can’t) do—and how it changes the game for builders. *What you’ll learn:* 1. How GPT-5 differs from previous models with its engineering-focused approach to problem-solving and tendency to prioritize technical details over business context 2. A comparative analysis of how GPT-5 and GPT-4.1 generate different types of product requirement documents and prototypes for the same prompt 3. Why GPT-5 excels at technical writing, functional requirements, and code generation while potentially skipping important business discovery questions 4. The model’s impressive spatial awareness capabilities when generating images for interior design and other visual tasks 5. Practical considerations for choosing the right model based on your specific use case and audience 6. How GPT-5’s extensive tool-calling behavior and bullet-point communication style reflect its engineering-oriented design *Brought to you by ChatPRD—an AI copilot for PMs and their teams:* https://www.chatprd.ai/howiai *25k giveaway:*  To celebrate 25,000 YouTube followers, we’re doing a giveaway. Win a free year of my favorite AI products, including v0, Replit, Lovable, Bolt, Cursor, and, of course, ChatPRD, by leaving a rating and review on your favorite podcast app and subscribing to the podcast on YouTube. To enter: https://www.howiaipod.com/giveaway *Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo *In this episode, we cover:* (00:00) Introduction to GPT-5 (04:34) Testing GPT-5 in ChatPRD for document generation (07:10) Comparing GPT-5 and GPT-4.1 on business vs. technical orientation (11:22) Side-by-side comparison of PRDs generated by both models (15:23) Where GPT-5 excels: Technical considerations and documentation quality (17:35) Comparing prototypes generated from different model outputs (19:57) Testing homepage critique capabilities between models (23:14) OpenAI’s strengths in API design and developer support (25:37) GPT-5’s performance as a coding assistant (27:26) Examining GPT-5 in ChatGPT’s interface (28:50) Testing GPT-5’s front-end design capabilities (31:17) Personal use case: bathroom remodel planning (33:45) Comparing GPT-5 vs. GPT-4 for interior design visualization (38:10) Summary of key findings and recommendations *Tools referenced:* • OpenAI: https://openai.com/ • ChatGPT: https://chat.openai.com/ • Claude: https://claude.ai/ • Gemini: https://gemini.google.com/ • Cursor: https://cursor.sh/ • v0: https://v0.dev/ • Lovable: https://lovable.dev/ • Bolt: https://bolt.com/ • LaunchDarkly AI Configs: https://launchdarkly.com/docs/home/ai-configs *Other reference:* • Benjamin Moore paints: https://www.benjaminmoore.com/ _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email jordan@penname.co._

Claire Vohost
Aug 7, 202540mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 4:34

    GPT-5’s core identity: “for engineers, by engineers”

    Claire frames GPT-5 as a deeply technical model whose default posture is implementation-focused: code, refactors, and detailed execution. She previews the central tradeoff explored throughout the episode—GPT-5’s engineering strength versus potentially weaker business/stakeholder-friendly framing.

    • GPT-5 feels engineered for coding and technical problem-solving
    • More “what/how” execution than “who/why” product discovery
    • Expectation setting: great for engineers, mixed fit for business audiences
    • Episode roadmap: product docs, prototyping, coding, ChatGPT UI, personal tests
  2. 4:34 – 7:10

    Workflow setup: how Claire evaluates models in real products (ChatPRD)

    Before benchmarking, Claire explains her model ecosystem and how she chooses models per task rather than looking for a single “best” model. She uses ChatPRD as a controlled environment with established prompts, A/B testing habits, and satisfaction metrics to judge whether GPT-5 is an addition to her tool team.

    • Uses multiple providers/models across tools (Cursor, ChatPRD, ChatGPT, etc.)
    • Evaluates models as “team members” matched to use cases
    • ChatPRD has extensive prompt/model testing and high user satisfaction baseline
    • Testing method: swap models via config (LaunchDarkly AI Configs) under same system prompt/context
  3. 7:10 – 11:22

    GPT-5 in ChatPRD: early chat behavior and stylistic tells

    Running the same ChatPRD context side-by-side, Claire notices GPT-5’s strong developer voice and preference for bullets/markdown. Even when tuned toward more natural language, GPT-5 still reveals its technical orientation in how it asks questions and drives the conversation.

    • Same prompt/context yields similar structure, but different focus
    • GPT-5 defaults to markdown bullets and a developer-like tone
    • Prompt tweaks can soften style, but core “engineering” bias remains
    • Practical implication: outputs may need adaptation for PM/stakeholder readability
  4. 11:22 – 15:23

    Business lens vs. engineering lens: feature ideation divergence (GPT-4.1 vs GPT-5)

    Claire compares how GPT-4.1 and GPT-5 brainstorm features for conversion: GPT-4.1 probes metrics, personas, and goals, while GPT-5 moves quickly toward solutions and implementation details. The feature ideas overlap, but the framing differs—user/business-centric vs. spec/implementation-centric.

    • GPT-4.1: discovery questions (personas, metrics, goals) and business impact framing
    • GPT-5: faster jump to features, mechanics, and execution details
    • Difference summarized as “who/why” (4.1) vs “what/how” (5)
    • Potential risk: GPT-5 can skip product discovery steps PMs value
  5. 15:23 – 17:35

    PRD outputs side-by-side: verbosity, structure, and “code artifacts”

    When generating full PRDs, GPT-5 produces longer, denser documents and even includes code-like artifacts at the top—signals of technical training. Claire discusses the upside for engineering execution and the downside for stakeholder alignment when details become overwhelming.

    • GPT-5 PRDs are significantly more detailed (sometimes excessively)
    • Notable developer artifacts (e.g., code-block comment) despite PRD request
    • More personas/use cases, but more feature-centric framing
    • Tradeoff: precision for build vs. clarity for alignment and communication
  6. 17:35 – 19:57

    Where GPT-5 clearly wins: functional requirements and technical considerations

    Claire highlights GPT-5’s standout advantage in functional requirements and technical considerations—areas where specificity matters and engineers naturally ask follow-up questions. She suggests this may enable a natural division of labor: PM-friendly docs from one model and engineering specs from GPT-5.

    • Functional requirements are richer (prioritization, edge cases like warnings)
    • User experience descriptions are more specific—useful for downstream prototyping
    • Technical considerations section is markedly stronger and more “engineering-native”
    • Potential workflow split: business narrative vs. engineering spec generation
  7. 19:57 – 23:14

    Downstream test: prototypes generated from each PRD (v0 integration)

    Claire evaluates how each PRD performs when fed into a prototyping tool. She prefers GPT-4.1’s simpler, more colorful initial design, but finds GPT-5’s verbosity generates a prototype packed with components and upsell ideas—better for ideation and remixing.

    • GPT-4.1 prototype: cleaner, more colorful, easier to parse at a glance
    • GPT-5 prototype: more gray/blue, but far more components and options
    • Verbosity becomes an asset when prototypes are used for inspiration, not shipping
    • Takeaway: choose model based on whether you want clarity or abundance of ideas
  8. 23:14 – 25:37

    Homepage critique showdown: tone, criticality, and promptability

    Testing critique on ChatPRD’s homepage, Claire finds GPT-4.1 harsher and more blunt, while GPT-5 is more balanced and sandwich-style in feedback. This becomes a practical test of “instructability”—how well each model can be pushed to match a desired critique tone using prompts.

    • GPT-4.1 delivers sharper, more negative critique by default
    • GPT-5 starts more encouraging and measured
    • Even with “be more critical,” GPT-5 remains more diplomatically structured
    • Important for app builders: test whether prompts reliably tune tone/behavior
  9. 25:37 – 27:26

    OpenAI as a platform: API design, tooling primitives, and developer experience

    Claire gives unsponsored credit to OpenAI’s strength beyond the raw model: APIs, controls, tooling primitives, and developer support. She notes improvements around tool calling, reasoning, and configurability that make building LLM products easier compared to other providers.

    • OpenAI advantage often comes from platform DX, not just model quality
    • Improved primitives/controls help application developers ship faster
    • Tool calling/reasoning controls are highlighted as meaningful upgrades
    • Recommendation: developers should review updated docs and capabilities
  10. 27:26 – 28:50

    GPT-5 as a coding assistant in Cursor: speed, refactors, and tool-calling intensity

    In real development work, GPT-5 becomes Claire’s daily driver due to speed and code quality on a major feature build. The main drawback: it’s an aggressive tool caller (hitting limits) and communicates heavily in bullet points—raising questions about efficiency and token/tool overhead.

    • Fast performance; helpful for large codebases and refactoring
    • High code quality and “thoughtful” engineering partner behavior
    • Very heavy tool calling (search/read cycles; can hit tool-call limits)
    • Communication style: bullet-point-heavy, engineer-Slack vibe; possible cost/perf implications
  11. 28:50 – 31:17

    GPT-5 inside ChatGPT: Canvas prototyping and front-end design taste

    Switching to ChatGPT’s interface, Claire tests GPT-5 Thinking with Canvas to prototype a blog matching ChatPRD’s style. She finds the output more polished and “classy” than typical generic AI UI, but flags contrast/readability issues that need improvement.

    • Uses GPT-5 Thinking + Canvas for UI prototyping with a reference screenshot
    • Design sense is higher polish than many out-of-the-box AI prototypes
    • Issues: background/text contrast and CSS readability need work
    • Implication: ChatGPT may become more viable for lightweight prototyping workflows
  12. 31:17 – 33:45

    Personal benchmark: bathroom remodel planning and spatial reasoning in images

    Claire stress-tests GPT-5 with a consumer workflow: bathroom remodel layouts and visualizations. She reports improved spatial awareness and better adherence to layout instructions, with strong image results after a few iterations—suggesting real consumer value beyond coding.

    • Uses remodel planning as a practical “everyday” benchmark
    • GPT-5 better interprets spatial instructions (left/right/back wall, placement)
    • Image generation improves after a couple do-overs; outputs feel more accurate
    • Signals broader strength: spatial reasoning across layout and visualization tasks
  13. 33:45 – 38:10

    Tile-and-paint side-by-side: GPT-5 vs GPT-4o for color matching and mockups

    Uploading tile samples, Claire asks for matching Benjamin Moore paints and receives unexpectedly specific, well-labeled options including names and paint codes. Compared with GPT-4o’s less coherent mockup, GPT-5 produces clearer, instruction-following renderings and consistent references—reinforcing her view that spatial awareness is improved.

    • GPT-5 returns specific paint names + codes and crisp text rendering
    • Offers to generate full mockups and follows detailed material placement instructions
    • Outputs include more coherent 3D-like renderings aligned to the prompt
    • GPT-4o comparison: less sensical layout adherence; weaker instruction following for this task
  14. 38:10 – 40:11

    Final recommendations: when to choose GPT-5 vs older models

    Claire concludes GPT-5 is exceptional for engineering: technical writing, specs, and production coding, with notable gains in Canvas/front-end and image spatial reasoning. For PM/stakeholder artifacts, older models may remain preferable due to business framing, concision, and tone—making GPT-5 best used as a specialized teammate rather than a universal replacement.

    • Best fit: engineers, technical docs, functional requirements, coding assistants
    • Caveats: bullet-point bias and heavy tool calling; may need optimization by tools
    • PM/stakeholder work may prefer GPT-4.1/4o/o3 for business orientation and brevity
    • Consumer upside: improved Canvas prototyping and image generation/spatial awareness

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.