How I AIGemini 3 vs. Claude Opus 4.5 vs. GPT-5.1 Codex: Which AI model is the best designer?
CHAPTERS
AI models as “design engineers”: the one-shot redesign showdown setup
Claire Vo sets up a mini experiment to test whether today’s top coding models are also great designers. She’ll take a poorly designed blog page from her site and have three models redesign it in one shot, using the same prompt and same code input.
- •Goal: identify which model is the best at redesigning an existing page (not generating a new one)
- •Models tested: Gemini 3 Pro, Claude Opus 4.5, GPT-5.1 Codex
- •Constraint: same prompt, same input code, one-shot redesign
- •Evaluation focus: visual appeal + UX + SEO/navigation improvements
Sponsor message: Lovable for building apps and websites by chatting with AI
A brief sponsor segment introduces Lovable as a tool that turns chat prompts into deployable apps and websites. It’s positioned as faster than traditional no-code for building functional products.
- •Build apps/websites by chatting; customize, automate, deploy
- •Target users: marketers, PMs, founders
- •Positioning: full apps with functionality (not just static pages)
- •Claim: compress weeks/months of work into a weekend
Why this test matters: pretty one-shots vs improving real existing sites
Claire explains the motivation: social feeds show impressive AI-generated designs, but the real question is who can improve an already-live page with real constraints. She frames the models as potential teammates to trust for iterative product work.
- •Benchmark wins don’t always translate to practical design work
- •Redesigning an existing page is harder than generating a fresh landing page
- •Testing is framed as: ‘Who’s the trusted design engineer?’
- •Use case chosen: a live blog page that needs both aesthetic and functional upgrades
The baseline: the ChatPRD blog page and the test environment
Claire introduces the specific target page—the ChatPRD blog—calling out that it’s visually bland and functionally improvable. She runs the comparison inside Cursor, model-by-model, to keep the workflow consistent.
- •Target: ChatPRD blog listing page (current design is “not very good”)
- •Realistic scenario: handing a rough early launch to a designer for a refresh
- •Tooling: Cursor IDE for running each model against the codebase
- •Method: repeat the same workflow across models for fair comparison
The exact shared prompt: redesign + UX + SEO/navigation best practices
She reveals the simple prompt used for all three models. The intent is to mimic how she’d talk to a colleague—high level, outcome-focused, without micromanaging design details.
- •Prompt: redesign blog page to improve visual appeal and UX
- •Add: best practices for SEO and navigation
- •Same prompt + same code input across all models
- •Testing philosophy: evaluate how models respond to natural workplace requests
Gemini 3 Pro: fast, serviceable redesign with a few layout quirks
Gemini 3 Pro produces a more modern blog layout with a featured post hero and card grid. Claire notes solid improvements like tags, dates, and hover effects, but also calls out spacing issues and missing functional upgrades like pagination handling.
- •Output includes featured hero (most recent post) + card layout below
- •Adds tags, release dates, and image hover/zoom effects
- •Doesn’t meaningfully address pagination/edge cases
- •Minor UX issue: top tag area feels cramped against navigation
- •Verdict on this round: good, fast, but not her favorite visually
Claude Opus 4.5: tool-assisted planning leads to the most polished design
Opus 4.5 stands out by creating a step-by-step to-do list in Cursor and executing it systematically. The result is a more refined visual design with thoughtful micro-interactions, better asset usage from the repo, and stronger overall UX polish.
- •Triggers a to-do list/tool call with a clear implementation plan
- •To-dos include layout redesign + post display enhancements + SEO (canonical/meta/structured data)
- •Uses existing site assets (background imagery, design elements) instead of generic gradients
- •Adds micro-interactions: hover arrow/CTA, improved card treatments
- •Includes reading time, author/date metadata; handles missing images with a tasteful placeholder state
GPT-5.1 Codex: weak front-end instincts and confusing UX choices
Codex 5.1 also plans via to-dos, but at a more generic level. Claire finds the visual output disappointing—default “AI purple” styling, weaker logo/background pairing, and UX problems like unclear linking and odd category browsing behavior.
- •Planning is broad: investigate layout → redesign → apply SEO (less design-specific)
- •Visual design issues: generic purple/blue gradient; poor logo treatment on colored background
- •Copy is a bright spot (stronger headline/context)
- •Featured post section lacks clear CTA/link behavior and feels redundant
- •Browse/category behavior appears broken or incomplete; overall page is both unattractive and not functional
Side-by-side comparison: what each model’s design pattern reveals
Claire compares the three outputs at a high level, noticing recurring conventions (featured post + card grid). She concludes Opus 4.5 delivers the best overall craft, Gemini is acceptable with some shortcomings, and Codex isn’t suited for front-end design work.
- •Common pattern across models: featured post hero + multi-column card grid
- •Opus 4.5 has the strongest detail polish and usability touches
- •Gemini 3 is competent but less refined and less context-aware
- •Codex 5.1 struggles with aesthetics and interaction design
- •Takeaway: model choice should match role in the workflow (design vs backend vs planning)
Change logs as evaluation: comparing design, UX, and SEO edits per model
Claire asks each model to summarize its changes, using the summaries as a structured way to compare what actually shipped. This reveals that some models go beyond the listing page into article templates and technical SEO, while others keep changes shallow.
- •Workflow tip: ask agents to summarize changes to speed comparison and review
- •Gemini summary includes: hero, cards, typography, breadcrumbs, related articles, JSON-LD, semantic HTML
- •Opus summary is extensive: grid/cards, reading time, category pills, breadcrumbs, empty states, spacing and patterns, metadata/OG
- •Codex summary is shortest and feels “lazy,” though it does mention schema.org/JSON-LD
- •Summaries help verify whether changes touched listing pages, article pages, or both
Deep dive: Gemini vs Opus on article pages, related content, and CTAs
Claire checks whether the models changed individual blog post pages, not just the homepage listing. Gemini adds related articles and solid technical SEO; Opus improves the subscribe CTA component and metadata presentation but doesn’t clearly add related links.
- •Gemini: updates article pages; adds related articles + metadata; strong JSON-LD/breadcrumbs/semantic HTML
- •Opus: improves article metadata (author/date/reading time) and upgrades the newsletter subscribe CTA widget
- •Opus CTA looks great but includes an unwanted purple gradient and outdated subscriber count text
- •Observation: models can excel at small component redesigns (widgets) as much as full pages
- •Implication: combine models—one for design polish, another for SEO depth if needed
Final verdict and practical takeaway: Opus 4.5 wins; model-switching is the skill
Claire declares Opus 4.5 the best “designer” based on design quality, usability, and overall implementation detail. She emphasizes that teams should treat models as specialists and switch models depending on whether the task is design, planning, SEO, or backend coding.
- •Winner: Claude Opus 4.5 for front-end design and overall UX/SEO practicality
- •Reason: better planning/tool use + stronger detail orientation in implementation
- •Gemini 3: solid, could improve with more structured planning
- •Codex 5.1: valuable elsewhere (backend/functional work), not as a front-end designer
- •Broader lesson: build AI fluency by mapping models to roles and testing repeatable use cases
Wrap-up: shipping the new design and channel call-to-action
Claire reflects on how remarkable it is to generate three viable redesign options quickly and plans to ship the preferred design. She closes with where to find the show and how viewers can support it.
- •Time-to-output: three redesign options in under ~20 minutes
- •She plans to ship the Opus-driven design and share details in show notes
- •Episode positioning: ‘best designer model’ among recent releases
- •Viewer CTA: like/subscribe/comment; podcast available on major platforms
- •Where to learn more: howiaipod.com