Gemini 3 vs. Claude Opus 4.5 vs. GPT-5.1 Codex: Which AI model is the best designer?

I put three cutting-edge AI models to the test in a head-to-head design competition. Using the exact same prompt, I challenged Google’s Gemini 3, Anthropic’s Opus 4.5, and OpenAI’s Codex 5.1 to redesign my blog page, evaluating them on visual design quality, user experience improvements, and SEO optimization capabilities. One model produced a beautiful, polished, production-ready redesign. One was fine. And one completely whiffed. If you’re trying to figure out where each model fits in your workflow—design, planning, back-end, or something else—this episode will save you a lot of trial and error. *What you’ll learn:* 1. How each AI model approaches the same design challenge differently 2. Why planning capabilities dramatically impact design quality 3. The specific visual and functional improvements each model made 4. Which model excels at front-end design versus back-end functionality 5. How to strategically choose the right AI model for different parts of your workflow 6. The importance of model-switching based on specific use cases *Blog design:* https://www.chatprd.ai/blog *Brought to you by:* Lovable—Build apps by simply chatting with AI: https://lovable.dev/ *Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo *In this episode, we cover:* (00:00) Introduction to the AI design challenge (01:25) The question: Which model is the better designer? (03:08) The prompt used for all three models (04:10) Gemini 3 Pro’s approach and results (06:00) Opus 4.5’s approach and results (10:54) Codex 5.1’s approach and disappointing results (14:51) Comparing the three designs side by side (16:03) Analyzing the change logs and SEO improvements from each model (22:43) Final verdict (23:00) Conclusion and next steps *Tools referenced:* • Gemini 3 Pro: https://deepmind.google/models/gemini/pro/ • Anthropic Opus 4.5: https://www.anthropic.com/news/claude-opus-4-5 • OpenAI Codex 5.1: https://platform.openai.com/docs/models/gpt-5.1-codex • Cursor: https://cursor.com/ Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email jordan@penname.co.

Claire Vohost

Dec 3, 202525mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Three coding models redesign a blog page, Opus wins decisively

Host Claire Vo runs a “one-shot” redesign challenge on an underwhelming ChatPRD blog page, using the same codebase and prompt in Cursor across three new coding models: Gemini 3 Pro, Opus 4.5, and GPT-5.1 Codex.
Gemini 3 produces a clean, modern refresh with a featured-post hero, card grid, and strong technical SEO additions (e.g., schema, semantic HTML, breadcrumbs, related articles).
Opus 4.5 delivers the most polished, brand-aligned visual design, adds thoughtful UX micro-interactions and empty states, and approaches the task with a clear, tool-driven step plan.
GPT-5.1 Codex underperforms on front-end design and UX coherence (despite decent copywriting and some schema work), leading to Claire’s final verdict: Opus 4.5 is the best “designer,” while Codex is better suited for back-end roles.

IDEAS WORTH REMEMBERING

5 ideas

Opus 4.5 is the strongest front-end “design engineer” in this test.

Its output is described as the most beautiful and most functionally coherent overall, with better spacing, higher-quality UI details, and a more deliberate implementation process driven by a step-by-step plan.

Planning quality correlates with implementation quality.

Opus 4.5 creates a granular to-do list (layout, post display, SEO specifics), while Codex’s plan is broader and Gemini’s is more direct; the more structured plan produced the most consistently polished result.

Gemini 3 Pro is serviceable visually and surprisingly strong on article-level SEO.

While Claire prefers Opus visually, she later credits Gemini for going beyond the listing page to improve individual blog posts with elements like related articles, semantic HTML, and JSON-LD/breadcrumb work.

Micro-interactions and empty states are differentiators, not just layouts.

All three gravitate toward familiar blog patterns (featured post + grid), but Opus stands out with hover CTA details (arrow slide-in), reading-time badges, and graceful placeholders for missing images.

GPT-5.1 Codex struggles with front-end UX coherence in this scenario.

Claire calls out “AI slop” purple gradients, unclear navigation/category behavior, missing/incorrect library display, and a featured module that lacks CTA/linking—suggesting weak product-design instincts despite coding strength elsewhere.

WORDS WORTH SAVING

5 quotes

Which of these new models is actually the best designer?

— Claire Vo

I think it's easy to one-shot something and make it look beautiful… But if you have an existing site and you wanna make it better, who's your trusted design engineer?

— Claire Vo

Opus 4.5… triggered a to-do list… and gave a step-by-step flow it was gonna follow.

— Claire Vo

It gave me AI slop purple gradient. Like, we do not need any more purple to blue gradients in AI designs.

— Claire Vo

Codex 5.1 is just not your front-end girl.

— Claire Vo

One-shot redesign experiment in CursorSingle shared prompt and same input codeDesign/UX patterns: hero featured post + card gridPlanning behavior: to-do lists vs direct executionMicro-interactions and component polishSEO upgrades: structured data, metadata, breadcrumbsModel-switching by workflow role

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.