GPT-5.5 crushed 6 months of tech debt

In this mini episode, I break down OpenAI’s new GPT 5.5 and GPT 5.5 Pro after weeks of early testing. I walk through three real jobs I threw at the model: building an app for me to teach my second grader more advanced subtraction concepts, tackling a tech debt problem in the ChatPRD codebase, and hacking into a proprietary Bluetooth pixel display that every other model had failed me on. My verdict: higher intelligence, better efficiency, and genuinely autonomous long-running loops that change what I think is worth tackling. *What you’ll learn:* 1. How I think about GPT 5.5 Pro’s pricing vs engineering time, and when I believe the “intelligence tax” is worth paying 2. Why I treat GPT 5.5 as a developer model first, and why I couldn’t find a consumer use case that justified its intelligence 3. The exact prompt pattern I use to unlock a long-running autonomous subagent loop 4. How I got a near-six-hour autonomous run to one-shot 98% of edge cases in a migration over millions of chat threads and drop my Sentry error rate to the floor 5. Why I’m now throwing GPT 5.5 at tech debt, flaky tests, and security backlogs first 6. How I combined a Bluetooth packet sniffer and GPT 5.5 to reverse-engineer a proprietary pixel speaker after Claude Code and GPT 5.4 both gave up 7. How I use the /personality command inside Codex to swap the default “baked potato” tone for something I actually enjoy working with *In this episode, I cover:* (00:00) Introduction to GPT 5.5 testing (00:40) What is GPT 5.5 and how much does it cost? (03:23) Testing GPT 5.5 in ChatGPT: the intelligence overhang problem (07:12) Moving to Codex: where GPT 5.5 really shines (16:01) Hacking a Chinese Bluetooth speaker (21:47) Final thoughts on GPT 5.5’s intelligence and efficiency *Blog & detailed workflow walkthroughs from this episode:* My GPT-5.5 Review—A 6-Hour Autonomous Task and the Bluetooth Hack No Other Model Could Solve: https://www.chatprd.ai/how-i-ai/openai-gpt-5.5-review ↳ Reverse-Engineer a Proprietary Hardware Protocol with AI: http://chatprd.ai/how-i-ai/workflows/reverse-engineer-a-proprietary-hardware-protocol-with-ai ↳ Perform an Autonomous Data Migration with an AI Agent: https://www.chatprd.ai/how-i-ai/workflows/perform-an-autonomous-data-migration-with-an-ai-agent ↳ Automate Security Vulnerability Remediation with AI: https://www.chatprd.ai/how-i-ai/workflows/automate-security-vulnerability-remediation-with-ai *Tools referenced:* • GPT 5.5 and GPT 5.5 Pro: https://openai.com/index/introducing-gpt-5-5/ • Codex: https://openai.com/codex/ • ChatGPT: https://chat.openai.com/ • Claude Code: https://claude.ai/code • Sentry: https://sentry.io/ • Divoom MiniToo: https://divoom.com/products/minitoo *Other references:* • OpenAI Codex Security: https://openai.com/index/codex-security-now-in-research-preview/ *Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email jordan@penname.co._

Claire Vohost

Apr 23, 202623mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

GPT-5.5 Pro excels at autonomous coding, migrations, and device hacking

GPT‑5.5 and GPT‑5.5 Pro feel meaningfully more capable and token-efficient on complex work, but their pricing makes them an “intelligence tax” that needs clear ROI.
In ChatGPT, the model can overthink relatively simple tasks (e.g., a kids’ subtraction app), highlighting a mismatch between extreme intelligence and typical consumer workflows.
In Codex, GPT‑5.5 Pro shines by autonomously executing large, multi-step engineering work like security issue remediation, technical debt cleanup, and complex data migrations.
A standout example is a long-running, near hands-off, six-hour autonomous testing and validation loop that reduced production errors dramatically and uncovered only one edge case across ~2M rows.
As a personal “high-tech eval,” GPT‑5.5 helped reverse-engineer a proprietary Bluetooth protocol to programmatically control a Divoom mini display, enabling terminal-driven notifications and custom output.

IDEAS WORTH REMEMBERING

5 ideas

GPT‑5.5 Pro’s best ROI is ambition, not just speed.

Claire argues the model lets her attempt projects she previously avoided because they were too complex or too time-consuming to reliably decompose and execute—especially with messy edge cases.

ChatGPT may be a poor form factor for “too-smart” models without hard problems.

Her subtraction-app test took ~17 minutes of “thinking,” producing a serviceable result but raising the question of whether most users benefit from that level of reasoning and latency.

Codex + GPT‑5.5 Pro performs well on backlog-style batch work.

Uploading a CSV of security findings and asking it to cluster themes, propose fixes, and implement changes worked well after human/code review—and helped lead to a clean pen test outcome.

Autonomous, long-running loops are where the model differentiates.

A ~6-hour run built a scalable CLI-based smoke test harness across providers, requiring almost no intervention, and found only one remaining edge case after validating large production-like data.

Complex data migrations with unstructured AI-response history are now tractable.

She describes legacy response-format drift across providers and attachments/tools creating hard-to-sanitize records; GPT‑5.5 Pro produced a near one-shot migration covering ~98% of known edge cases.

WORDS WORTH SAVING

5 quotes

I’m gonna pay the intelligence tax.

— Claire Vo

I don't know what to do with all this intelligence if you don't have complex problems to solve.

— Claire Vo

This thing will think.

— Claire Vo

Truly, it just banged its head against the wall for six hours, and I did not have to… zero prompts, zero follow-ups, zero steering.

— Claire Vo

GPT 5.5 has hit my intelligence benchmark for can you hack into this Chinese digital screen with proprietary Bluetooth transport mechanisms and bitmap compression.

— Claire Vo

Pricing and token economics of GPT‑5.5 vs ProEfficiency and context retention through faster loops“Intelligence overhang” in ChatGPT use casesCodex autonomy: subagents, long-running tasks, minimal babysittingSecurity scanning remediation from exported issue lists (CSV)Large-scale data migration/backfill with messy legacy formatsReverse-engineering Bluetooth packets for a proprietary devicePersonality controls in Codex (/personality)

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.