Claude Opus 4.6 vs GPT-5.3 Codex: Which is the better software engineer?

I put the newest AI coding models from OpenAI and Anthropic head-to-head, testing them on real engineering work I’m actually doing. I compare GPT-5.3 Codex with Opus 4.6 (and Opus 4.6 Fast) by asking them to redesign my marketing website and refactor some genuinely gnarly components. Through side-by-side experiments, I break down where each model shines—creative development versus code review—and share how I’m thinking about combining them to build a more effective AI engineering stack. *What you’ll learn:* 1. The strengths and weaknesses of OpenAI’s Codex vs. Anthropic’s Opus for different coding tasks 2. How I shipped 44 PRs containing 98 commits across 1,088 files in just five days using these models 3. Why Codex excels at code review but struggles with creative, greenfield work 4. The surprising way Opus and Codex complement each other in a real-world engineering workflow 5. How to use Git concepts like work trees to maximize productivity with AI coding assistants 6. Why Opus 4.6 Fast might be worth the 6x price increase (but be careful with your token budget) *Brought to you by:* WorkOS—Make your app enterprise-ready today: https://workos.com?utm_source=lennys_howiai&utm_medium=podcast&utm_campaign=q22025 *Detailed workflow walkthroughs from this episode:* • How I AI: GPT-5.3 Codex vs. Claude Opus 4.6—Shipping 44 PRs in 5 Days: https://www.chatprd.ai/how-i-ai/gpt-5-3-codex-vs-claude-opus-4-6 • How to Combine Claude Opus and GPT-5.3 Codex for High-Velocity Code Refactoring: https://www.chatprd.ai/how-i-ai/workflows/how-to-combine-claude-opus-and-gpt-5-3-codex-for-high-velocity-code-refactoring • How to Redesign a Marketing Website Using Claude Opus 4.6 for Creative Development: https://www.chatprd.ai/how-i-ai/workflows/how-to-redesign-a-marketing-website-using-claude-opus-4-6-for-creative-development *In this episode, we cover:* (00:00) Introduction to new AI coding models (02:13) My test methodology for comparing models (03:30) Codex’s unique features: Git primitives, skills, and automations (09:05) Testing GPT-5.2 Codex on a website redesign task (10:40) Challenges with Codex’s literal interpretation of prompts (15:00) Comparing the before and after with Codex (16:23) Testing Opus 4.6 on the same website redesign task (20:56) Comparing the visual results of both models (21:30) Real-world engineering impact: 44 PRs in five days (23:03) Refactoring components with Opus 4.6 (24:30) Using Codex for code review and architectural analysis (26:55) Cost considerations for Opus 4.6 Fast (28:52) Conclusion *Tools referenced:* • OpenAI’s GPT-5.3 Codex: https://openai.com/index/introducing-gpt-5-3-codex/ • Anthropic’s Claude Opus 4.6: https://www.anthropic.com/news/claude-opus-4-6 • Cursor: https://cursor.sh/ • GitHub: https://github.com/ *Other references:* • Tailwind CSS: https://tailwindcss.com/ • Git: https://git-scm.com/ • Bugbot: https://cursor.com/bugbot *Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email jordan@penname.co._

Claire Vohost

Feb 10, 202630mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Opus 4.6 builds fast; GPT-5.3 Codex reviews like principal engineer

Claire Vo tests OpenAI’s Codex desktop app (GPT-5.x Codex) against Anthropic’s Claude Opus 4.6/4.6 Fast using an ambitious, repeatable benchmark: redesigning an existing marketing site and refactoring production app components.
She finds GPT-5.x Codex can be overly literal and hard to steer for creative, greenfield redesign work, often overfitting to the last instruction and failing to expand scope beyond a couple pages without heavy guidance.
Opus 4.6 performs better at planning and executing broad, long-running builds (especially in Cursor’s Plan mode), producing a cohesive site redesign after an initial “Tailwind slop” iteration that improved with direction.
Her winning stack is multi-model: use Opus to generate and implement features/design (80–90% done), then use GPT-5.3 Codex as a rigorous reviewer for architecture, performance, edge cases, and hardening before shipping—helping her merge ~93k LOC across 44 PRs in 5 days.

IDEAS WORTH REMEMBERING

5 ideas

Codex’s UI is optimized for “real Git work,” not just chat-based coding.

Codex foregrounds repositories/projects, branches, worktrees for parallel agent work, diffs, and PR creation—useful for power users and for teaching Git concepts visually.

GPT-5.x Codex is reliable but too literal for creative redesigns.

In marketing copy and structure it repeatedly mirrored the user’s phrasing (e.g., explicit “PLG vs enterprise” segmentation, “dense workflow”), and over-rotated the whole page toward the last prompt rather than balancing goals.

Harness matters: Cursor’s planning/to-do scaffolding improved long-task execution.

Claire suspects part of Codex’s weaker redesign experience came from the Codex app’s less mature conversational/task workflow, whereas Cursor’s Plan mode and tooling helped Opus stay organized and autonomous.

Opus 4.6 excels at greenfield and broad, cohesive implementation.

After initial styling missteps, Opus rebuilt the site with a more sophisticated visual system, aligned to brand colors, reused repo assets, and consistently propagated styles across pricing and other pages.

Best results come from pairing models with complementary strengths.

Claire’s repeatable loop is: Opus builds the feature to 80–90%; Codex reviews and finds high-impact issues/edge cases; Opus implements fixes—creating a fast, high-quality iteration cycle.

WORDS WORTH SAVING

5 quotes

I've shipped more code in the last five days than I think I have in the last month.

— Claire Vo

One of the things that I've noticed about the GPT-5X Codex models is they are so literal.

— Claire Vo

After two prompts, [it] literally made the headline 'A Dense Product Workflow for AI-Powered Teams.'

— Claire Vo

Opus 4.6 was just a lot better at planning for itself so that it could execute a long-running task.

— Claire Vo

It really replicates the principal software engineer experience… you will fight them tooth and nail to build anything for you, but they are more than happy to tear apart someone else's code.

— Claire Vo

Codex desktop app Git-centric workflow (projects, branches, worktrees, diffs, PRs)Skills and scheduled automations in CodexMarketing site redesign as evaluation benchmarkGPT-5.x Codex literalness and prompt overfittingOpus 4.6 planning/execution in Cursor Plan modeTwo-model workflow: build with Opus, review/harden with CodexOpus 4.6 Fast cost/performance tradeoffs

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.