Skip to content
How I AIHow I AI

Claude Opus 4.6 vs GPT-5.3 Codex: Which is the better software engineer?

I put the newest AI coding models from OpenAI and Anthropic head-to-head, testing them on real engineering work I’m actually doing. I compare GPT-5.3 Codex with Opus 4.6 (and Opus 4.6 Fast) by asking them to redesign my marketing website and refactor some genuinely gnarly components. Through side-by-side experiments, I break down where each model shines—creative development versus code review—and share how I’m thinking about combining them to build a more effective AI engineering stack. *What you’ll learn:* 1. The strengths and weaknesses of OpenAI’s Codex vs. Anthropic’s Opus for different coding tasks 2. How I shipped 44 PRs containing 98 commits across 1,088 files in just five days using these models 3. Why Codex excels at code review but struggles with creative, greenfield work 4. The surprising way Opus and Codex complement each other in a real-world engineering workflow 5. How to use Git concepts like work trees to maximize productivity with AI coding assistants 6. Why Opus 4.6 Fast might be worth the 6x price increase (but be careful with your token budget) *Brought to you by:* WorkOS—Make your app enterprise-ready today: https://workos.com?utm_source=lennys_howiai&utm_medium=podcast&utm_campaign=q22025 *Detailed workflow walkthroughs from this episode:* • How I AI: GPT-5.3 Codex vs. Claude Opus 4.6—Shipping 44 PRs in 5 Days: https://www.chatprd.ai/how-i-ai/gpt-5-3-codex-vs-claude-opus-4-6 • How to Combine Claude Opus and GPT-5.3 Codex for High-Velocity Code Refactoring: https://www.chatprd.ai/how-i-ai/workflows/how-to-combine-claude-opus-and-gpt-5-3-codex-for-high-velocity-code-refactoring • How to Redesign a Marketing Website Using Claude Opus 4.6 for Creative Development: https://www.chatprd.ai/how-i-ai/workflows/how-to-redesign-a-marketing-website-using-claude-opus-4-6-for-creative-development *In this episode, we cover:* (00:00) Introduction to new AI coding models (02:13) My test methodology for comparing models (03:30) Codex’s unique features: Git primitives, skills, and automations (09:05) Testing GPT-5.2 Codex on a website redesign task (10:40) Challenges with Codex’s literal interpretation of prompts (15:00) Comparing the before and after with Codex (16:23) Testing Opus 4.6 on the same website redesign task (20:56) Comparing the visual results of both models (21:30) Real-world engineering impact: 44 PRs in five days (23:03) Refactoring components with Opus 4.6 (24:30) Using Codex for code review and architectural analysis (26:55) Cost considerations for Opus 4.6 Fast (28:52) Conclusion *Tools referenced:* • OpenAI’s GPT-5.3 Codex: https://openai.com/index/introducing-gpt-5-3-codex/ • Anthropic’s Claude Opus 4.6: https://www.anthropic.com/news/claude-opus-4-6 • Cursor: https://cursor.sh/ • GitHub: https://github.com/ *Other references:* • Tailwind CSS: https://tailwindcss.com/ • Git: https://git-scm.com/ • Bugbot: https://cursor.com/bugbot *Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email jordan@penname.co._

Claire Vohost
Feb 11, 202630mWatch on YouTube ↗

Episode Details

EPISODE INFO

Released
February 11, 2026
Duration
30m
Channel
How I AI
Watch on YouTube
▶ Open ↗

EPISODE DESCRIPTION

I put the newest AI coding models from OpenAI and Anthropic head-to-head, testing them on real engineering work I’m actually doing. I compare GPT-5.3 Codex with Opus 4.6 (and Opus 4.6 Fast) by asking them to redesign my marketing website and refactor some genuinely gnarly components. Through side-by-side experiments, I break down where each model shines—creative development versus code review—and share how I’m thinking about combining them to build a more effective AI engineering stack. *What you’ll learn:*

  1. The strengths and weaknesses of OpenAI’s Codex vs. Anthropic’s Opus for different coding tasks
  2. How I shipped 44 PRs containing 98 commits across 1,088 files in just five days using these models
  3. Why Codex excels at code review but struggles with creative, greenfield work
  4. The surprising way Opus and Codex complement each other in a real-world engineering workflow
  5. How to use Git concepts like work trees to maximize productivity with AI coding assistants
  6. Why Opus 4.6 Fast might be worth the 6x price increase (but be careful with your token budget)

*Brought to you by:* WorkOS—Make your app enterprise-ready today: https://workos.com?utm_source=lennys_howiai&utm_medium=podcast&utm_campaign=q22025 *Detailed workflow walkthroughs from this episode:*

*In this episode, we cover:* (00:00) Introduction to new AI coding models (02:13) My test methodology for comparing models (03:30) Codex’s unique features: Git primitives, skills, and automations (09:05) Testing GPT-5.2 Codex on a website redesign task (10:40) Challenges with Codex’s literal interpretation of prompts (15:00) Comparing the before and after with Codex (16:23) Testing Opus 4.6 on the same website redesign task (20:56) Comparing the visual results of both models (21:30) Real-world engineering impact: 44 PRs in five days (23:03) Refactoring components with Opus 4.6 (24:30) Using Codex for code review and architectural analysis (26:55) Cost considerations for Opus 4.6 Fast (28:52) Conclusion *Tools referenced:*

*Other references:*

*Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email jordan@penname.co._

SPEAKERS

  • Claire Vo

    host

EPISODE SUMMARY

In this episode of How I AI, featuring Claire Vo, Claude Opus 4.6 vs GPT-5.3 Codex: Which is the better software engineer? explores opus 4.6 builds fast; GPT-5.3 Codex reviews like principal engineer Claire Vo tests OpenAI’s Codex desktop app (GPT-5.x Codex) against Anthropic’s Claude Opus 4.6/4.6 Fast using an ambitious, repeatable benchmark: redesigning an existing marketing site and refactoring production app components.

RELATED EPISODES

Claude Code Just Got WAY More Powerful

Claude Code Just Got WAY More Powerful

Quests, token leaderboards, and a skills marketplace: the elite AI adoption playbook | John Kim

Quests, token leaderboards, and a skills marketplace: the elite AI adoption playbook | John Kim

The internal AI tool that's transforming how Stripe designs products | Owen Williams

The internal AI tool that's transforming how Stripe designs products | Owen Williams

A complete beginner's guide to coding with AI: From PRD to generating your very first lines of code

A complete beginner's guide to coding with AI: From PRD to generating your very first lines of code

How Microsoft's AI VP automates everything with Warp | Marco Casalaina

How Microsoft's AI VP automates everything with Warp | Marco Casalaina

How to turn meeting notes into prototypes that your sales team can immediately demo to customers

How to turn meeting notes into prototypes that your sales team can immediately demo to customers

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome