Gemini 3 vs. Claude Opus 4.5 vs. GPT-5.1 Codex: Which AI model is the best designer?

I put three cutting-edge AI models to the test in a head-to-head design competition. Using the exact same prompt, I challenged Google’s Gemini 3, Anthropic’s Opus 4.5, and OpenAI’s Codex 5.1 to redesign my blog page, evaluating them on visual design quality, user experience improvements, and SEO optimization capabilities. One model produced a beautiful, polished, production-ready redesign. One was fine. And one completely whiffed. If you’re trying to figure out where each model fits in your workflow—design, planning, back-end, or something else—this episode will save you a lot of trial and error. *What you’ll learn:* 1. How each AI model approaches the same design challenge differently 2. Why planning capabilities dramatically impact design quality 3. The specific visual and functional improvements each model made 4. Which model excels at front-end design versus back-end functionality 5. How to strategically choose the right AI model for different parts of your workflow 6. The importance of model-switching based on specific use cases *Blog design:* https://www.chatprd.ai/blog *Brought to you by:* Lovable—Build apps by simply chatting with AI: https://lovable.dev/ *Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo *In this episode, we cover:* (00:00) Introduction to the AI design challenge (01:25) The question: Which model is the better designer? (03:08) The prompt used for all three models (04:10) Gemini 3 Pro’s approach and results (06:00) Opus 4.5’s approach and results (10:54) Codex 5.1’s approach and disappointing results (14:51) Comparing the three designs side by side (16:03) Analyzing the change logs and SEO improvements from each model (22:43) Final verdict (23:00) Conclusion and next steps *Tools referenced:* • Gemini 3 Pro: https://deepmind.google/models/gemini/pro/ • Anthropic Opus 4.5: https://www.anthropic.com/news/claude-opus-4-5 • OpenAI Codex 5.1: https://platform.openai.com/docs/models/gpt-5.1-codex • Cursor: https://cursor.com/ Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email jordan@penname.co.

Claire Vohost

Dec 3, 202525mWatch on YouTube ↗

CHAPTERS

AI models as “design engineers”: the one-shot redesign showdown setup
Claire Vo sets up a mini experiment to test whether today’s top coding models are also great designers. She’ll take a poorly designed blog page from her site and have three models redesign it in one shot, using the same prompt and same code input.
Sponsor message: Lovable for building apps and websites by chatting with AI
A brief sponsor segment introduces Lovable as a tool that turns chat prompts into deployable apps and websites. It’s positioned as faster than traditional no-code for building functional products.
Why this test matters: pretty one-shots vs improving real existing sites
Claire explains the motivation: social feeds show impressive AI-generated designs, but the real question is who can improve an already-live page with real constraints. She frames the models as potential teammates to trust for iterative product work.
The baseline: the ChatPRD blog page and the test environment
Claire introduces the specific target page—the ChatPRD blog—calling out that it’s visually bland and functionally improvable. She runs the comparison inside Cursor, model-by-model, to keep the workflow consistent.
The exact shared prompt: redesign + UX + SEO/navigation best practices
She reveals the simple prompt used for all three models. The intent is to mimic how she’d talk to a colleague—high level, outcome-focused, without micromanaging design details.
Gemini 3 Pro: fast, serviceable redesign with a few layout quirks
Gemini 3 Pro produces a more modern blog layout with a featured post hero and card grid. Claire notes solid improvements like tags, dates, and hover effects, but also calls out spacing issues and missing functional upgrades like pagination handling.
Claude Opus 4.5: tool-assisted planning leads to the most polished design
Opus 4.5 stands out by creating a step-by-step to-do list in Cursor and executing it systematically. The result is a more refined visual design with thoughtful micro-interactions, better asset usage from the repo, and stronger overall UX polish.
GPT-5.1 Codex: weak front-end instincts and confusing UX choices
Codex 5.1 also plans via to-dos, but at a more generic level. Claire finds the visual output disappointing—default “AI purple” styling, weaker logo/background pairing, and UX problems like unclear linking and odd category browsing behavior.
Side-by-side comparison: what each model’s design pattern reveals
Claire compares the three outputs at a high level, noticing recurring conventions (featured post + card grid). She concludes Opus 4.5 delivers the best overall craft, Gemini is acceptable with some shortcomings, and Codex isn’t suited for front-end design work.
Change logs as evaluation: comparing design, UX, and SEO edits per model
Claire asks each model to summarize its changes, using the summaries as a structured way to compare what actually shipped. This reveals that some models go beyond the listing page into article templates and technical SEO, while others keep changes shallow.
Deep dive: Gemini vs Opus on article pages, related content, and CTAs
Claire checks whether the models changed individual blog post pages, not just the homepage listing. Gemini adds related articles and solid technical SEO; Opus improves the subscribe CTA component and metadata presentation but doesn’t clearly add related links.
Final verdict and practical takeaway: Opus 4.5 wins; model-switching is the skill
Claire declares Opus 4.5 the best “designer” based on design quality, usability, and overall implementation detail. She emphasizes that teams should treat models as specialists and switch models depending on whether the task is design, planning, SEO, or backend coding.
Wrap-up: shipping the new design and channel call-to-action
Claire reflects on how remarkable it is to generate three viable redesign options quickly and plans to ship the preferred design. She closes with where to find the show and how viewers can support it.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

AI models as “design engineers”: the one-shot redesign showdown setup

Sponsor message: Lovable for building apps and websites by chatting with AI

Why this test matters: pretty one-shots vs improving real existing sites

The baseline: the ChatPRD blog page and the test environment

The exact shared prompt: redesign + UX + SEO/navigation best practices

Gemini 3 Pro: fast, serviceable redesign with a few layout quirks

Claude Opus 4.5: tool-assisted planning leads to the most polished design

GPT-5.1 Codex: weak front-end instincts and confusing UX choices

Side-by-side comparison: what each model’s design pattern reveals

Change logs as evaluation: comparing design, UX, and SEO edits per model

Deep dive: Gemini vs Opus on article pages, related content, and CTAs

Final verdict and practical takeaway: Opus 4.5 wins; model-switching is the skill

Wrap-up: shipping the new design and channel call-to-action

Get more out of YouTube videos.