How I AIHow I AI

Gemini 3 vs. Claude Opus 4.5 vs. GPT-5.1 Codex: Which AI model is the best designer?

Claire Vo on three coding models redesign a blog page, Opus wins decisively.

Claire Vohost
Dec 3, 202525mWatch on YouTube ↗
One-shot redesign experiment in CursorSingle shared prompt and same input codeDesign/UX patterns: hero featured post + card gridPlanning behavior: to-do lists vs direct executionMicro-interactions and component polishSEO upgrades: structured data, metadata, breadcrumbsModel-switching by workflow role
AI-generated summary based on the episode transcript.

In this episode of How I AI, featuring Claire Vo, Gemini 3 vs. Claude Opus 4.5 vs. GPT-5.1 Codex: Which AI model is the best designer? explores three coding models redesign a blog page, Opus wins decisively Host Claire Vo runs a “one-shot” redesign challenge on an underwhelming ChatPRD blog page, using the same codebase and prompt in Cursor across three new coding models: Gemini 3 Pro, Opus 4.5, and GPT-5.1 Codex.

At a glance

WHAT IT’S REALLY ABOUT

Three coding models redesign a blog page, Opus wins decisively

  1. Host Claire Vo runs a “one-shot” redesign challenge on an underwhelming ChatPRD blog page, using the same codebase and prompt in Cursor across three new coding models: Gemini 3 Pro, Opus 4.5, and GPT-5.1 Codex.
  2. Gemini 3 produces a clean, modern refresh with a featured-post hero, card grid, and strong technical SEO additions (e.g., schema, semantic HTML, breadcrumbs, related articles).
  3. Opus 4.5 delivers the most polished, brand-aligned visual design, adds thoughtful UX micro-interactions and empty states, and approaches the task with a clear, tool-driven step plan.
  4. GPT-5.1 Codex underperforms on front-end design and UX coherence (despite decent copywriting and some schema work), leading to Claire’s final verdict: Opus 4.5 is the best “designer,” while Codex is better suited for back-end roles.

IDEAS WORTH REMEMBERING

5 ideas

Opus 4.5 is the strongest front-end “design engineer” in this test.

Its output is described as the most beautiful and most functionally coherent overall, with better spacing, higher-quality UI details, and a more deliberate implementation process driven by a step-by-step plan.

Planning quality correlates with implementation quality.

Opus 4.5 creates a granular to-do list (layout, post display, SEO specifics), while Codex’s plan is broader and Gemini’s is more direct; the more structured plan produced the most consistently polished result.

Gemini 3 Pro is serviceable visually and surprisingly strong on article-level SEO.

While Claire prefers Opus visually, she later credits Gemini for going beyond the listing page to improve individual blog posts with elements like related articles, semantic HTML, and JSON-LD/breadcrumb work.

Micro-interactions and empty states are differentiators, not just layouts.

All three gravitate toward familiar blog patterns (featured post + grid), but Opus stands out with hover CTA details (arrow slide-in), reading-time badges, and graceful placeholders for missing images.

GPT-5.1 Codex struggles with front-end UX coherence in this scenario.

Claire calls out “AI slop” purple gradients, unclear navigation/category behavior, missing/incorrect library display, and a featured module that lacks CTA/linking—suggesting weak product-design instincts despite coding strength elsewhere.

WORDS WORTH SAVING

5 quotes

Which of these new models is actually the best designer?

Claire Vo

I think it's easy to one-shot something and make it look beautiful… But if you have an existing site and you wanna make it better, who's your trusted design engineer?

Claire Vo

Opus 4.5… triggered a to-do list… and gave a step-by-step flow it was gonna follow.

Claire Vo

It gave me AI slop purple gradient. Like, we do not need any more purple to blue gradients in AI designs.

Claire Vo

Codex 5.1 is just not your front-end girl.

Claire Vo

QUESTIONS ANSWERED IN THIS EPISODE

5 questions

What specific Cursor features/tool-calls enabled Opus 4.5’s to-do planning flow, and can you force Gemini/Codex to use the same approach?

Host Claire Vo runs a “one-shot” redesign challenge on an underwhelming ChatPRD blog page, using the same codebase and prompt in Cursor across three new coding models: Gemini 3 Pro, Opus 4.5, and GPT-5.1 Codex.

You used one prompt to mimic a “colleague request.” What single extra constraint (brand colors, component library, no gradients, etc.) most improves outcomes without overprompting?

Gemini 3 produces a clean, modern refresh with a featured-post hero, card grid, and strong technical SEO additions (e.g., schema, semantic HTML, breadcrumbs, related articles).

Gemini seemed to add stronger article-page SEO (related articles, semantic HTML, schema). What exact code diffs did it introduce, and are they safe/accurate for real SEO?

Opus 4.5 delivers the most polished, brand-aligned visual design, adds thoughtful UX micro-interactions and empty states, and approaches the task with a clear, tool-driven step plan.

Opus didn’t explicitly mention JSON-LD in its summary—did it actually add it, and how would you validate schema correctness across models?

GPT-5.1 Codex underperforms on front-end design and UX coherence (despite decent copywriting and some schema work), leading to Claire’s final verdict: Opus 4.5 is the best “designer,” while Codex is better suited for back-end roles.

Codex produced the best hero copy but the weakest UX. How would you split responsibilities (copy vs layout vs SEO) across multiple models in one workflow?

Chapter Breakdown

AI models as “design engineers”: the one-shot redesign showdown setup

Claire Vo sets up a mini experiment to test whether today’s top coding models are also great designers. She’ll take a poorly designed blog page from her site and have three models redesign it in one shot, using the same prompt and same code input.

Sponsor message: Lovable for building apps and websites by chatting with AI

A brief sponsor segment introduces Lovable as a tool that turns chat prompts into deployable apps and websites. It’s positioned as faster than traditional no-code for building functional products.

Why this test matters: pretty one-shots vs improving real existing sites

Claire explains the motivation: social feeds show impressive AI-generated designs, but the real question is who can improve an already-live page with real constraints. She frames the models as potential teammates to trust for iterative product work.

The baseline: the ChatPRD blog page and the test environment

Claire introduces the specific target page—the ChatPRD blog—calling out that it’s visually bland and functionally improvable. She runs the comparison inside Cursor, model-by-model, to keep the workflow consistent.

The exact shared prompt: redesign + UX + SEO/navigation best practices

She reveals the simple prompt used for all three models. The intent is to mimic how she’d talk to a colleague—high level, outcome-focused, without micromanaging design details.

Gemini 3 Pro: fast, serviceable redesign with a few layout quirks

Gemini 3 Pro produces a more modern blog layout with a featured post hero and card grid. Claire notes solid improvements like tags, dates, and hover effects, but also calls out spacing issues and missing functional upgrades like pagination handling.

Claude Opus 4.5: tool-assisted planning leads to the most polished design

Opus 4.5 stands out by creating a step-by-step to-do list in Cursor and executing it systematically. The result is a more refined visual design with thoughtful micro-interactions, better asset usage from the repo, and stronger overall UX polish.

GPT-5.1 Codex: weak front-end instincts and confusing UX choices

Codex 5.1 also plans via to-dos, but at a more generic level. Claire finds the visual output disappointing—default “AI purple” styling, weaker logo/background pairing, and UX problems like unclear linking and odd category browsing behavior.

Side-by-side comparison: what each model’s design pattern reveals

Claire compares the three outputs at a high level, noticing recurring conventions (featured post + card grid). She concludes Opus 4.5 delivers the best overall craft, Gemini is acceptable with some shortcomings, and Codex isn’t suited for front-end design work.

Change logs as evaluation: comparing design, UX, and SEO edits per model

Claire asks each model to summarize its changes, using the summaries as a structured way to compare what actually shipped. This reveals that some models go beyond the listing page into article templates and technical SEO, while others keep changes shallow.

Deep dive: Gemini vs Opus on article pages, related content, and CTAs

Claire checks whether the models changed individual blog post pages, not just the homepage listing. Gemini adds related articles and solid technical SEO; Opus improves the subscribe CTA component and metadata presentation but doesn’t clearly add related links.

Final verdict and practical takeaway: Opus 4.5 wins; model-switching is the skill

Claire declares Opus 4.5 the best “designer” based on design quality, usability, and overall implementation detail. She emphasizes that teams should treat models as specialists and switch models depending on whether the task is design, planning, SEO, or backend coding.

Wrap-up: shipping the new design and channel call-to-action

Claire reflects on how remarkable it is to generate three viable redesign options quickly and plans to ship the preferred design. She closes with where to find the show and how viewers can support it.

EVERY SPOKEN WORD

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome