GLM 5.2 is SO GOOD (and almost free)

I put GLM 5.2, the open-weight coding model from Z.AI, through four real tasks inside my actual codebase: a codebase architecture audit, a UI redesign, and a 45-minute autonomous bug-hunting session pulling from Sentry and Vercel logs. Total cost: $3.36 for roughly 6 million tokens, a prioritized bug-fix dashboard I’m actually shipping from, and a landing page redesign that matched Chat PRD’s design system on the first try. *What you’ll learn:* 1. What “open-weight” actually means and why it matters for cost and vendor independence 2. How to connect GLM 5.2 to Cursor and Claude Code 3. How it performs on codebase exploration and autonomous architecture summarization in a real production Next.js app 4. Whether GLM 5.2 can match an existing design system 5. How the model handles a 45-minute long-running autonomous task 6. Where GLM 5.2 stumbled 7. The actual cost breakdown *Brought to you by:* Mercury—Radically different banking loved by over 300K entrepreneurs: https://mercury.com/ *In this episode, we cover:* (00:00) What open-weight models are and why GLM 5.2 is worth testing (01:38) GLM 5.2 model overview (04:02) Capabilities and benchmark results (06:02) How to set up GLM 5.2 in Cursor (08:37) How to set up GLM 5.2 in Claude Code (11:04) Live test 1: codebase exploration and architecture audit on ChatPRD (12:43) Live test 2: generating an HTML architecture and roadmap page (16:37) Live test 3: redesigning the How I AI landing page in Cursor (20:57) Live test 4: 45-minute autonomous task, pulling Sentry errors and Vercel logs (22:35) Where it struggled (23:49) My verdict on the output (25:23) Cost breakdown *Tools referenced:* • z.ai: https://z.ai • GLM 5.2: https://z.ai/blog/glm-5.2 • OpenRouter: https://openrouter.ai • Cursor: https://cursor.com • Claude Code: https://docs.anthropic.com/en/docs/claude-code • Sentry: https://sentry.io • Vercel: https://vercel.com Other references: • SWE-Bench Pro leaderboard (coding benchmark scores referenced in episode): https://www.swebench.com • Frontier Suite and Post-Train Bench (additional benchmarks cited): https://scale.com/leaderboard • Use Claude Code with OpenRouter: https://openrouter.ai/docs/cookbook/coding-agents/claude-code-integration *Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo _Production and marketing by https://penname.co/_ _For inquiries about sponsoring the podcast, email jordan@penname.co._

Claire Vohost

Jun 24, 202627mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Testing GLM 5.2 as low-cost Opus alternative in coding

GLM 5.2 is presented as an open-weight, text-only model with modern tooling features (reasoning mode, function calling, caching, structured output) and a 1M-token context window.
Benchmark positioning suggests GLM 5.2 competes near Claude Opus and GPT-class models on coding suites, motivating hands-on validation rather than relying on leaderboard claims.
Claire shows practical setup paths using OpenRouter as a hosted provider, including a Cursor-specific base URL nuance and Claude Code environment/config changes.
Live tests show strong codebase understanding and surprisingly solid HTML/communication output, plus generally acceptable design iteration when anchored to an existing design system.
The longest autonomous run successfully pulled Sentry/Vercel signals and produced a prioritized fix plan, though the model temporarily struggled with TypeScript/React generation speed and reliability; total spend was about $3.36 for ~6M tokens with high cache hit rate.

IDEAS WORTH REMEMBERING

5 ideas

Open-weight can mean control and flexibility, not necessarily “free.”

Because model weights are downloadable, you can self-host, fine-tune, and avoid vendor lock-in, but licensing and inference costs still matter depending on provider and usage.

GLM 5.2’s feature parity makes it viable in modern coding stacks.

Despite being text-to-text only, it supports the workflow primitives developers rely on (reasoning mode, tool/function calls, streaming, caching, structured outputs, MCP), plus a large context window for repo-scale tasks.

Setup friction is real—small URL details can block adoption.

In Cursor, Claire had to place the OpenRouter key into the “OpenAI API key” field and use the specific base URL `openrouter.ai/api/v1/cursor`; missing `/cursor` was the undocumented gotcha.

Repo orientation and architecture narration were strong early signals.

When asked to explore the ChatPRD codebase and summarize architecture/recent work, the model responded quickly with an accurate, useful overview—suggesting good context use and software-engineering intuition.

GLM 5.2 is notably good at HTML “explainers,” not just code fixes.

Turning an architecture/roadmap summary into a presentable HTML page produced a credible artifact with sensible structure, product pillars, and roadmap items—useful for bridging agent output to human review.

WORDS WORTH SAVING

5 quotes

What if I told you you could get Opus-level reasoning at a fraction of the cost?

— Claire Vo

This is our first of many reviews of open-weight and open source models to see if we should all be paying the tax to Anthropic and OpenAI, or if we can run these models locally and get the same results.

— Claire Vo

I could not find anything for a really long time that told me it had to be /cursor, but it is /cursor, and you need to toggle that change on.

— Claire Vo

Oh my God, guys, it really is having trouble writing JavaScript right now.

— Claire Vo

I spent $3.36 on about 6 million tokens.

— Claire Vo

Open-weight models vs closed frontier APIsGLM 5.2 capabilities and limitations (text-only, 1M context)Benchmarks: SWE-style evaluations and competitive positioningOpenRouter as an inference gateway and vendor flexibilityCursor setup details (OpenAI key slot + /cursor base URL)Claude Code setup (shell env vars + .claude settings)Agentic tests: codebase audit, HTML artifact generation, design iteration, Sentry/Vercel triageCost and caching economics vs Opus/GPT-tier pricing

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.