Why I love GPT-5.5 for hard problems

In this mini episode, I break down OpenAI’s new GPT 5.5 and GPT 5.5 Pro after weeks of early testing. I walk through three real jobs I threw at the model: building an app for me to teach my second grader more advanced subtraction concepts, tackling a tech debt problem in the ChatPRD codebase, and hacking into a proprietary Bluetooth pixel display that every other model had failed me on. My verdict: higher intelligence, better efficiency, and genuinely autonomous long-running loops that change what I think is worth tackling. *What you’ll learn:* 1. How I think about GPT 5.5 Pro’s pricing vs engineering time, and when I believe the “intelligence tax” is worth paying 2. Why I treat GPT 5.5 as a developer model first, and why I couldn’t find a consumer use case that justified its intelligence 3. The exact prompt pattern I use to unlock a long-running autonomous subagent loop 4. How I got a near-six-hour autonomous run to one-shot 98% of edge cases in a migration over millions of chat threads and drop my Sentry error rate to the floor 5. Why I’m now throwing GPT 5.5 at tech debt, flaky tests, and security backlogs first 6. How I combined a Bluetooth packet sniffer and GPT 5.5 to reverse-engineer a proprietary pixel speaker after Claude Code and GPT 5.4 both gave up 7. How I use the /personality command inside Codex to swap the default “baked potato” tone for something I actually enjoy working with *In this episode, I cover:* (00:00) Introduction to GPT 5.5 testing (00:40) What is GPT 5.5 and how much does it cost? (03:23) Testing GPT 5.5 in ChatGPT: the intelligence overhang problem (07:12) Moving to Codex: where GPT 5.5 really shines (16:01) Hacking a Chinese Bluetooth speaker (21:47) Final thoughts on GPT 5.5’s intelligence and efficiency *Tools referenced:* • GPT 5.5 and GPT 5.5 Pro: https://openai.com/index/introducing-gpt-5-5/ • Codex: https://openai.com/codex/ • ChatGPT: https://chat.openai.com/ • Claude Code: https://claude.ai/code • Sentry: https://sentry.io/ • Divoom MiniToo: https://divoom.com/products/minitoo *Other references:* • OpenAI Codex Security: https://openai.com/index/codex-security-now-in-research-preview/ *Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email jordan@penname.co._

Claire Vohost

Apr 23, 202623mWatch on YouTube ↗

CHAPTERS

Why GPT‑5.5 feels like a “powerhouse” (and what this episode will prove)
Claire Vo sets the premise: after a couple weeks of hands-on testing, GPT‑5.5—especially for advanced coding—enabled breakthroughs she couldn’t get from other models. She frames the episode around real tasks and a personal “high-tech eval” that stress-tests reasoning and autonomy.
What GPT‑5.5 and GPT‑5.5 Pro are, where to access them, and the pricing reality
She explains the release context (ChatGPT and Codex, not API yet) and OpenAI’s headline promise: higher capacity for complex work plus greater efficiency. She emphasizes the cost, framing it as an “intelligence tax” that can still pay off via ROI.
The “intelligence overhang” problem: when ChatGPT users don’t need this much brainpower
Claire argues many consumer or everyday enterprise tasks in ChatGPT may not demand GPT‑5.5’s level of reasoning. She predicts developers will benefit most until product form factors better package this intelligence for non-technical users.
ChatGPT mini-test: building a subtraction-learning app (and the cost of deep thinking)
She tries GPT‑5.5 in ChatGPT on a practical family task: generating an educational app for advanced subtraction. The model produces a reasonable result but takes ~17 minutes of thinking—raising questions about whether this form factor and latency make sense for simpler builds.
Switching to Codex + GPT‑5.5 Pro: the ‘I am cooking’ productivity jump
Claire says Codex is where GPT‑5.5 Pro’s strengths become obvious: faster feedback loops, parallel task execution, and less babysitting. She positions it as a step-change for software engineers dealing with real backlogs and complex systems.
Use case #1: bulk remediation from a security scan (CSV → grouped fixes → code changes)
Using Codex’s security tooling, she runs a threat assessment, exports issues, and asks GPT‑5.5 Pro to cluster them and implement fixes. Human review confirms quality, and a subsequent penetration test comes back clean—supporting the claim that it can execute across a themed backlog, not just single tasks.
Use case #2 (part A): the ‘millions of chats’ legacy-format migration that kept breaking
She describes a gnarly data debt problem: millions of chat records stored in evolving, provider-specific legacy formats with attachments/tools edge cases. GPT‑5.5 Pro produces a near-one-shot migration approach that covers ~98% of known edge cases—far beyond prior patch-by-patch progress.
Use case #2 (part B): building an autonomous CLI smoke-test harness that ran ~6 hours
Beyond the migration, she needs verification at scale. She asks Codex to build a programmatic testing system (ideally CLI) that replays production-like threads across providers, finds issues, and repairs them—then lets it run unattended for nearly six hours, requiring almost no intervention.
Quality and engineering impact: AI coding can raise reliability, not lower it
Claire challenges the narrative that AI coding reduces quality due to “vibe coding.” She argues that with strong autonomy and systematic testing, models can drive error rates down, citing improved monitoring (Sentry) after the migration/testing work.
The ‘real eval’: hacking a proprietary Chinese Bluetooth speaker/screen (months of struggle)
Her personal benchmark is a Divoom MiniToo retro PC-style Bluetooth speaker with a tiny display. After months of failed attempts (including Claude Code/Opus and GPT‑5.4), she decides to use deep Bluetooth packet inspection and hand the evidence to GPT‑5.5 to reverse-engineer the messaging protocol.
Breakthrough: GPT‑5.5 reverse-engineers encoding and builds a CLI to display text/images
Feeding packet logs and context into GPT‑5.5, she asks it to ‘do anything’ to figure out the protocol. It succeeds: the model handles bitmap encoding/decoding, searches for hints/SDKs, and produces a command-line tool that can push content to the device.
Bonus workflow: Codex notifications routed to the Bluetooth display
She extends the hack into a practical developer workflow by wiring a Codex “notify hook” to the display. Now the device can beep and show a message when Codex finishes a task, demonstrating a tangible automation loop enabled by the reverse-engineered protocol.
Final verdict: smartest and most autonomous yet—plus the ‘baked potato’ personality fix
Claire closes with her take: GPT‑5.5 is extremely smart, efficient, and capable of sustained autonomous work on complex tasks, making it feel like a “favorite senior/staff engineer.” Her main nit is Codex’s dull default tone, but she notes you can change it with `/personality`.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

Why GPT‑5.5 feels like a “powerhouse” (and what this episode will prove)

What GPT‑5.5 and GPT‑5.5 Pro are, where to access them, and the pricing reality

The “intelligence overhang” problem: when ChatGPT users don’t need this much brainpower

ChatGPT mini-test: building a subtraction-learning app (and the cost of deep thinking)

Switching to Codex + GPT‑5.5 Pro: the ‘I am cooking’ productivity jump

Use case #1: bulk remediation from a security scan (CSV → grouped fixes → code changes)

Use case #2 (part A): the ‘millions of chats’ legacy-format migration that kept breaking

Use case #2 (part B): building an autonomous CLI smoke-test harness that ran ~6 hours

Quality and engineering impact: AI coding can raise reliability, not lower it

The ‘real eval’: hacking a proprietary Chinese Bluetooth speaker/screen (months of struggle)

Breakthrough: GPT‑5.5 reverse-engineers encoding and builds a CLI to display text/images

Bonus workflow: Codex notifications routed to the Bluetooth display

Final verdict: smartest and most autonomous yet—plus the ‘baked potato’ personality fix

Get more out of YouTube videos.