I let Codex run for 6 hours. Here’s what happened.

In this 30-minute episode, I walk through my favorite feature in Codex: the /goal command. I show how Goals transform AI from a turn-based assistant that needs constant ‘what’s next?’ prompting into an autonomous agent that can work for hours on complex, multi-step tasks. I share three real examples: eliminating thousands of Sentry errors, cleaning 3,900 emails down to 68, and organizing hundreds of Linear tasks. *What you’ll learn:* 1. What Goals are and how they differ from standard prompts 2. How I used /goal to eliminate hundreds of error logs in my codebase over a five-hour autonomous run 3. The non-technical use cases that make Goals incredibly powerful: cleaning up 3,900 emails in under four hours and organizing hundreds of project management tasks in Linear 4. How to write effective /goal prompts with measurable outcomes, verification methods, and constraints 5. When not to use Goals and what makes a strong versus weak Goal 6. Why Goals represent a fundamental shift in how we work with AI, from babysitting the model to managing it *Brought to you by:* Mercury—Radically different banking loved by over 300K entrepreneurs: https://mercury.com/ *In this episode, we cover:* (00:00) Introduction (01:50) What is /goal and when should you use it? (02:45) The difference between prompts and Goal-based loops (04:06) Claire’s first five-hour 45-minute autonomous coding task (05:05) How to manage a Goal lifecycle: view, pause, resume, and clear (06:06) How to write strong goals: outcomes vs. outputs (07:34) The six components of effective Goals (08:57) Example: Reducing P95 checkout latency with /goal (09:36) Demo: Using /goal to eliminate Sentry errors in ChatPRD (13:18) Demo: Burning down Vercel API errors (17:28) Non-technical use case: Cleaning 3,900 emails with /goal (21:24) Demo: Using /goal to clean up Linear project tasks (24:41) When not to use /goal (26:10) Why /goal changes everything *Tools referenced:* • Codex: https://openai.com/codex/ • Sentry: https://sentry.io/ • Vercel: https://vercel.com/ • Linear: https://linear.app/ *Other reference:* • OpenAI blog post “Using Goals in Codex”: https://developers.openai.com/cookbook/examples/codex/using_goals_in_codex *Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email jordan@penname.co._

Claire Vohost

May 27, 202630mWatch on YouTube ↗

CHAPTERS

0:04 – 1:36
Goals in Codex: the feature behind “run it overnight” workflows
Claire introduces Codex “Goals” as the mechanism that enables long-running autonomous work—especially the kind people claim they ran overnight. She previews both technical and non-technical ways Goals can take over repetitive, multi-step tasks.
- •What “Goals” are in Codex and why they matter for autonomy
- •What viewers will learn: when to use Goals, how to write them, and demos
- •Promise of long-running tasks without constant prompting
1:36 – 4:07
Prompt vs. goal-based loop: how autonomy actually works
Using an OpenAI developer blog diagram, Claire contrasts turn-based prompting with a goal loop that continuously works, checks, and chooses next steps. The key shift is evidence-driven iteration until a completion condition is met.
- •Prompting is request/response/wait; Goals keep iterating without you
- •Goal loop cycle: work → verify/check → decide next step
- •“Keep going / what’s next?” is a signal you should use /goal
- •Similarity to agentic “loops” people describe online
4:07 – 5:08
Claire’s first multi-hour autonomous run (5h 45m) and why it was different
Claire explains that, despite years using AI coding agents, she couldn’t reliably get multi-hour autonomous tasks until Codex Goals. With Goals, Codex self-manages progress long enough to tackle complex, messy tasks without constant babysitting.
- •Historical limitation: agents needed too much human steering
- •Goals improved self-management and sustained iteration
- •Real-world benchmark: a 5h 45m autonomous coding task
- •Why this matters even for “non-OS-level” engineering work
5:08 – 6:09
Managing the Goal lifecycle: view, pause, resume, remove
Claire outlines the practical controls for running Goals safely. You can start a goal, inspect it, pause it if it goes off track, resume later, or clear it entirely.
- •/goal to start (and to view the current goal)
- •Pause and resume controls for long runs
- •Remove/clear when done or if direction is wrong
- •You don’t need to let it run 6–24 hours unchecked
6:09 – 7:10
Writing strong goals: outcomes over outputs, with measurable success
Claire reframes goal-writing as a PM/engineering success-criteria skill: define what “done” looks like rather than dictating steps. She uses the checkout latency example to show what makes a goal testable and safe.
- •Goals should describe an outcome, not just an instruction
- •Measurability: define thresholds and repeatable checks
- •Guardrails/constraints prevent ‘cheating’ (e.g., deleting a slow page)
- •Best for problems with executable surfaces an LLM can iterate on
7:10 – 8:42
The six components of an effective Goal (and a reusable template)
Claire walks through OpenAI’s recommended structure: outcome, verification, constraints, boundaries, iteration policy, and stop conditions. She highlights a practical template that forces evidence-based progress updates and clear escalation when blocked.
- •Outcome: what must be true at completion
- •Verification: how to prove it (tests, benchmarks, metrics)
- •Constraints & boundaries: what can’t regress and what tools/files are allowed
- •Iteration policy + stop condition: how to choose next experiments and when to ask for help
8:42 – 10:13
Example goal: reducing P95 checkout latency with benchmarks and guardrails
The checkout performance example shows how to turn a vague optimization request into a rigorous loop. The improved goal specifies the benchmark, preserves correctness, constrains the surface area, and requires reporting after each iteration.
- •Target: bring P95 checkout latency below a threshold
- •Verify via checkout benchmark; keep correctness suite green
- •Limit scope to checkout system to avoid risky changes
- •Per-iteration reporting: what changed, results, next experiment; stop with evidence if blocked
10:13 – 13:15
Case study: eliminating Sentry ‘invalid operation range’ errors in ChatPRD
Claire describes a painful, recurring class of document-editing errors and how a Goal-based loop systematically resolved them. Codex categorized failures, implemented root-cause fixes, replayed historical events, and drove the error count to zero.
- •Problem: diff-based editor failing on complex document structures (tables, bullets, formatting)
- •Prior approach was whack-a-mole: fix one case, another appears
- •Goal approach: review every Sentry trace, categorize, fix root cause, replay events
- •Outcome: systematic framework improvements and sustained ‘zero errors’ afterward
13:15 – 16:47
Live demo setup: burning down Vercel API errors with /goal
Claire sets a Goal to triage and eliminate errors in Vercel logs for a chat endpoint over the last two weeks. Codex uses plugins to access logs, classifies errors as user-facing vs. internal, opens PRs for fixes, and downgrades non-critical noise to warnings.
- •Goal design: classify each error category, find root cause, fix via branch/PR
- •Non-user-facing issues become warnings to reduce alert fatigue
- •Success state: no user-facing errors and only appropriate warnings remain
- •Codex plan: inventory repo → pull/group errors → fix/validate by category → publish PRs/report
16:47 – 18:48
Why error burn-down is a killer /goal use case (and expected runtime)
Claire explains that Goals shine for lingering, high-volume quality issues that are annoying to babysit. She notes these runs can take hours, but the payoff is systematic reduction of noisy, recurring operational problems.
- •Long-running runs are normal (2–3 hours expected here)
- •Measurable backlog: a finite set of log errors to eliminate
- •Operational payoff: ‘Sentry zero / error zero’ as a workflow
- •Particularly strong for tech debt, flaky tests, and hard-to-reproduce client errors
18:48 – 21:22
Non-technical power move: cleaning 3,900 emails with Gmail access
Claire demonstrates her favorite non-coding use case: using /goal to triage and clean an overloaded inbox. Codex reads emails via the Gmail plugin, labels and categorizes messages, unsubscribes, and reduces thousands of messages to a small judgment queue.
- •Goal: categorize bulk/promotions/spam, unsubscribe, and ask for judgment when needed
- •Run stats: ~3h 52m and ~6M tokens (not cheap)
- •Results: labels/folders created, unsubscribe actions taken, response-needed items surfaced
- •Inbox reduction: ~3,900 emails down to ~68 requiring review
21:22 – 24:25
Non-technical demo: cleaning up Linear tasks for better project hygiene
Claire applies /goal to task management, asking Codex to archive/cancel stale issues for already-released podcast episodes while keeping future work open. The agent uses clear rules to identify outdated tasks and bulk-update statuses.
- •Problem: Linear backlog spiraled due to agent-generated tasks and stale work
- •Goal: cancel unfinished tasks from past episodes; keep only current/future tasks
- •Codex approach: find the team, identify stale episode tasks, apply status rules
- •Outcome expectation: cleaner backlog and better labeling/assignment hygiene
24:25 – 25:56
When not to use /goal: small edits, vague finish lines, and ‘refactor’ requests
Claire cautions against using Goals for trivial one-line changes or ambiguous objectives with no measurable completion condition. She emphasizes Goals require a durable objective, evidence-based finish line, and multi-turn investigation path.
- •Too heavy for simple edits; Goals are outcome-driven, not output-driven
- •Avoid vague goals like ‘make customers happy’ without metrics
- •‘Refactor this code’ is often too undefined for reliable completion
- •Best-fit triad: durable objective + measurable evidence + multi-step path
25:56 – 30:20
Why /goal changes everything: autonomy, PM skill leverage, and AI-as-colleague dynamics
Claire reflects on how Goals shift her workflow from constant steering to managerial oversight. She argues Goals improve quality-of-life improvements and tech-debt burn-down, and they reward strong outcome definitions—especially from product managers.
- •Less babysitting: set the goal, step away, review results
- •Quality impact: systematic elimination of error classes and operational drag
- •PM relevance: outcomes-not-outputs becomes directly executable via validation criteria
- •Workstyle shift: managing an agent feels more like managing a human colleague