An exclusive inside look at GPT-5

In this episode, I share my hands-on experience with OpenAI’s GPT-5, the company’s new frontier model. As one of the first users outside of OpenAI to test the model, I put GPT-5 head-to-head with GPT-4.1 across real-world product use cases—from writing PRDs to generating code to assisting with visual design work. This is my unfiltered look at what GPT-5 can (and can’t) do—and how it changes the game for builders. *What you’ll learn:* 1. How GPT-5 differs from previous models with its engineering-focused approach to problem-solving and tendency to prioritize technical details over business context 2. A comparative analysis of how GPT-5 and GPT-4.1 generate different types of product requirement documents and prototypes for the same prompt 3. Why GPT-5 excels at technical writing, functional requirements, and code generation while potentially skipping important business discovery questions 4. The model’s impressive spatial awareness capabilities when generating images for interior design and other visual tasks 5. Practical considerations for choosing the right model based on your specific use case and audience 6. How GPT-5’s extensive tool-calling behavior and bullet-point communication style reflect its engineering-oriented design *Brought to you by ChatPRD—an AI copilot for PMs and their teams:* https://www.chatprd.ai/howiai *25k giveaway:* To celebrate 25,000 YouTube followers, we’re doing a giveaway. Win a free year of my favorite AI products, including v0, Replit, Lovable, Bolt, Cursor, and, of course, ChatPRD, by leaving a rating and review on your favorite podcast app and subscribing to the podcast on YouTube. To enter: https://www.howiaipod.com/giveaway *Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo *In this episode, we cover:* (00:00) Introduction to GPT-5 (04:34) Testing GPT-5 in ChatPRD for document generation (07:10) Comparing GPT-5 and GPT-4.1 on business vs. technical orientation (11:22) Side-by-side comparison of PRDs generated by both models (15:23) Where GPT-5 excels: Technical considerations and documentation quality (17:35) Comparing prototypes generated from different model outputs (19:57) Testing homepage critique capabilities between models (23:14) OpenAI’s strengths in API design and developer support (25:37) GPT-5’s performance as a coding assistant (27:26) Examining GPT-5 in ChatGPT’s interface (28:50) Testing GPT-5’s front-end design capabilities (31:17) Personal use case: bathroom remodel planning (33:45) Comparing GPT-5 vs. GPT-4 for interior design visualization (38:10) Summary of key findings and recommendations *Tools referenced:* • OpenAI: https://openai.com/ • ChatGPT: https://chat.openai.com/ • Claude: https://claude.ai/ • Gemini: https://gemini.google.com/ • Cursor: https://cursor.sh/ • v0: https://v0.dev/ • Lovable: https://lovable.dev/ • Bolt: https://bolt.com/ • LaunchDarkly AI Configs: https://launchdarkly.com/docs/home/ai-configs *Other reference:* • Benjamin Moore paints: https://www.benjaminmoore.com/ _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email jordan@penname.co._

Claire Vohost

Aug 7, 202540mWatch on YouTube ↗

CHAPTERS

0:00 – 4:34
GPT-5’s core identity: “for engineers, by engineers”
Claire frames GPT-5 as a deeply technical model whose default posture is implementation-focused: code, refactors, and detailed execution. She previews the central tradeoff explored throughout the episode—GPT-5’s engineering strength versus potentially weaker business/stakeholder-friendly framing.
4:34 – 7:10
Workflow setup: how Claire evaluates models in real products (ChatPRD)
Before benchmarking, Claire explains her model ecosystem and how she chooses models per task rather than looking for a single “best” model. She uses ChatPRD as a controlled environment with established prompts, A/B testing habits, and satisfaction metrics to judge whether GPT-5 is an addition to her tool team.
7:10 – 11:22
GPT-5 in ChatPRD: early chat behavior and stylistic tells
Running the same ChatPRD context side-by-side, Claire notices GPT-5’s strong developer voice and preference for bullets/markdown. Even when tuned toward more natural language, GPT-5 still reveals its technical orientation in how it asks questions and drives the conversation.
11:22 – 15:23
Business lens vs. engineering lens: feature ideation divergence (GPT-4.1 vs GPT-5)
Claire compares how GPT-4.1 and GPT-5 brainstorm features for conversion: GPT-4.1 probes metrics, personas, and goals, while GPT-5 moves quickly toward solutions and implementation details. The feature ideas overlap, but the framing differs—user/business-centric vs. spec/implementation-centric.
15:23 – 17:35
PRD outputs side-by-side: verbosity, structure, and “code artifacts”
When generating full PRDs, GPT-5 produces longer, denser documents and even includes code-like artifacts at the top—signals of technical training. Claire discusses the upside for engineering execution and the downside for stakeholder alignment when details become overwhelming.
17:35 – 19:57
Where GPT-5 clearly wins: functional requirements and technical considerations
Claire highlights GPT-5’s standout advantage in functional requirements and technical considerations—areas where specificity matters and engineers naturally ask follow-up questions. She suggests this may enable a natural division of labor: PM-friendly docs from one model and engineering specs from GPT-5.
19:57 – 23:14
Downstream test: prototypes generated from each PRD (v0 integration)
Claire evaluates how each PRD performs when fed into a prototyping tool. She prefers GPT-4.1’s simpler, more colorful initial design, but finds GPT-5’s verbosity generates a prototype packed with components and upsell ideas—better for ideation and remixing.
23:14 – 25:37
Homepage critique showdown: tone, criticality, and promptability
Testing critique on ChatPRD’s homepage, Claire finds GPT-4.1 harsher and more blunt, while GPT-5 is more balanced and sandwich-style in feedback. This becomes a practical test of “instructability”—how well each model can be pushed to match a desired critique tone using prompts.
25:37 – 27:26
OpenAI as a platform: API design, tooling primitives, and developer experience
Claire gives unsponsored credit to OpenAI’s strength beyond the raw model: APIs, controls, tooling primitives, and developer support. She notes improvements around tool calling, reasoning, and configurability that make building LLM products easier compared to other providers.
27:26 – 28:50
GPT-5 as a coding assistant in Cursor: speed, refactors, and tool-calling intensity
In real development work, GPT-5 becomes Claire’s daily driver due to speed and code quality on a major feature build. The main drawback: it’s an aggressive tool caller (hitting limits) and communicates heavily in bullet points—raising questions about efficiency and token/tool overhead.
28:50 – 31:17
GPT-5 inside ChatGPT: Canvas prototyping and front-end design taste
Switching to ChatGPT’s interface, Claire tests GPT-5 Thinking with Canvas to prototype a blog matching ChatPRD’s style. She finds the output more polished and “classy” than typical generic AI UI, but flags contrast/readability issues that need improvement.
31:17 – 33:45
Personal benchmark: bathroom remodel planning and spatial reasoning in images
Claire stress-tests GPT-5 with a consumer workflow: bathroom remodel layouts and visualizations. She reports improved spatial awareness and better adherence to layout instructions, with strong image results after a few iterations—suggesting real consumer value beyond coding.
33:45 – 38:10
Tile-and-paint side-by-side: GPT-5 vs GPT-4o for color matching and mockups
Uploading tile samples, Claire asks for matching Benjamin Moore paints and receives unexpectedly specific, well-labeled options including names and paint codes. Compared with GPT-4o’s less coherent mockup, GPT-5 produces clearer, instruction-following renderings and consistent references—reinforcing her view that spatial awareness is improved.
38:10 – 40:11
Final recommendations: when to choose GPT-5 vs older models
Claire concludes GPT-5 is exceptional for engineering: technical writing, specs, and production coding, with notable gains in Canvas/front-end and image spatial reasoning. For PM/stakeholder artifacts, older models may remain preferable due to business framing, concision, and tone—making GPT-5 best used as a specialized teammate rather than a universal replacement.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

GPT-5’s core identity: “for engineers, by engineers”

Workflow setup: how Claire evaluates models in real products (ChatPRD)

GPT-5 in ChatPRD: early chat behavior and stylistic tells

Business lens vs. engineering lens: feature ideation divergence (GPT-4.1 vs GPT-5)

PRD outputs side-by-side: verbosity, structure, and “code artifacts”

Where GPT-5 clearly wins: functional requirements and technical considerations

Downstream test: prototypes generated from each PRD (v0 integration)

Homepage critique showdown: tone, criticality, and promptability

OpenAI as a platform: API design, tooling primitives, and developer experience

GPT-5 as a coding assistant in Cursor: speed, refactors, and tool-calling intensity

GPT-5 inside ChatGPT: Canvas prototyping and front-end design taste

Personal benchmark: bathroom remodel planning and spatial reasoning in images

Tile-and-paint side-by-side: GPT-5 vs GPT-4o for color matching and mockups

Final recommendations: when to choose GPT-5 vs older models

Get more out of YouTube videos.