Skip to content
How I AIHow I AI

GLM 5.2 is SO GOOD (and almost free)

I put GLM 5.2, the open-weight coding model from Z.AI, through four real tasks inside my actual codebase: a codebase architecture audit, a UI redesign, and a 45-minute autonomous bug-hunting session pulling from Sentry and Vercel logs. Total cost: $3.36 for roughly 6 million tokens, a prioritized bug-fix dashboard I’m actually shipping from, and a landing page redesign that matched Chat PRD’s design system on the first try. *What you’ll learn:* 1. What “open-weight” actually means and why it matters for cost and vendor independence 2. How to connect GLM 5.2 to Cursor and Claude Code 3. How it performs on codebase exploration and autonomous architecture summarization in a real production Next.js app 4. Whether GLM 5.2 can match an existing design system 5. How the model handles a 45-minute long-running autonomous task 6. Where GLM 5.2 stumbled 7. The actual cost breakdown *Brought to you by:* Mercury—Radically different banking loved by over 300K entrepreneurs: https://mercury.com/ *In this episode, we cover:* (00:00) What open-weight models are and why GLM 5.2 is worth testing (01:38) GLM 5.2 model overview (04:02) Capabilities and benchmark results (06:02) How to set up GLM 5.2 in Cursor (08:37) How to set up GLM 5.2 in Claude Code (11:04) Live test 1: codebase exploration and architecture audit on ChatPRD (12:43) Live test 2: generating an HTML architecture and roadmap page (16:37) Live test 3: redesigning the How I AI landing page in Cursor (20:57) Live test 4: 45-minute autonomous task, pulling Sentry errors and Vercel logs (22:35) Where it struggled (23:49) My verdict on the output (25:23) Cost breakdown *Tools referenced:* • z.ai: https://z.ai • GLM 5.2: https://z.ai/blog/glm-5.2 • OpenRouter: https://openrouter.ai • Cursor: https://cursor.com • Claude Code: https://docs.anthropic.com/en/docs/claude-code • Sentry: https://sentry.io • Vercel: https://vercel.com Other references: • SWE-Bench Pro leaderboard (coding benchmark scores referenced in episode): https://www.swebench.com • Frontier Suite and Post-Train Bench (additional benchmarks cited): https://scale.com/leaderboard • Use Claude Code with OpenRouter: https://openrouter.ai/docs/cookbook/coding-agents/claude-code-integration *Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo _Production and marketing by https://penname.co/_ _For inquiries about sponsoring the podcast, email jordan@penname.co._

Claire Vohost
Jun 24, 202627mWatch on YouTube ↗

CHAPTERS

  1. Why open-weight models matter—and why GLM 5.2 might replace Claude Opus

    Claire frames the central question: can an open-weight model deliver “Opus-level” coding and reasoning without the premium API tax. She sets the goal of the episode as a hands-on evaluation of GLM 5.2 in real coding workflows rather than just trusting hype or benchmarks.

  2. Sponsor: Mercury Command (conversational banking workflows)

    A sponsored segment describing Mercury’s Command feature and the value proposition of completing banking tasks via conversation instead of dashboards. The emphasis is on speed, simplicity, and using real account data with existing permissions and controls.

  3. GLM 5.2 explained: what “open-weight” means and why it changes the tradeoffs

    Claire explains GLM (General Language Model) and introduces Z.AI as the Beijing-based creator. She clarifies open-weight: downloadable weights enabling self-hosting and fine-tuning, with licensing caveats, plus the practical benefits of cost and flexibility.

  4. Model overview: context window, interfaces, and limitations

    The episode outlines GLM 5.2’s core specs and ergonomics. Claire highlights a major constraint—text-only I/O—while noting it still supports modern developer features like tool use and structured outputs.

  5. Benchmarks & positioning: does it really compete with Opus/GPT-class models?

    Claire reviews external benchmark claims suggesting GLM 5.2 is in the same arena as top frontier models on coding-oriented tests. The takeaway is that it’s credible enough to justify hands-on testing in real projects.

  6. Choosing an inference provider: using OpenRouter to access GLM 5.2

    Instead of running locally, Claire uses a hosted route via OpenRouter due to laptop constraints. She explains the practical setup steps: account, billing limits, and generating an API key for tooling integration.

  7. Cursor setup (including the undocumented base-URL nuance)

    Claire walks through configuring Cursor to use GLM 5.2 via OpenRouter, noting the key “gotcha” that took time to discover. Once configured, the model appears as an available option in Cursor chat.

  8. Claude Code setup: environment variables + model selection in settings.json

    Claude Code configuration is presented as more documented: set OpenRouter environment variables in the shell profile and update the Claude settings to point at the GLM model string. Claire also explains what a shell profile is for less terminal-native viewers.

  9. Live test #1: exploring a real codebase (ChatPRD) and auditing architecture

    Claire tests GLM 5.2’s ability to orient in an unfamiliar repository by asking it to explore ChatPRD and summarize architecture plus recent shipping work. The model responds quickly with a largely accurate picture, indicating strong baseline “software engineer” sense.

  10. Live test #2: generating an HTML architecture + roadmap page (communication + taste)

    She asks the model to turn the audit into a presentable HTML page that communicates architecture and roadmap. The result is “slop-adjacent” but genuinely useful: attractive enough, structurally clear, and surprisingly aligned with brand cues and real roadmap themes.

  11. Live test #3: redesigning the How I AI landing page hero in Cursor (design system fit)

    Next, Claire evaluates whether GLM 5.2 can improve a high-traffic marketing hero while respecting an existing design system. The first pass is promising (better CTA, helpful metadata, player-like sidebar), and iteration improves the sidebar styling, though layout balance remains imperfect.

  12. Live test #4: a 45-minute autonomous task (Sentry + Vercel log triage to a fix plan)

    Claire runs a long, agentic workflow: pull 72 hours of Sentry errors and Vercel logs, then produce a prioritized bug-fix plan. The model performs tool/MCP calls, requests Vercel auth when needed, and outputs a well-structured plan—despite temporary struggles compiling TypeScript/React.

  13. Where it struggled, final verdict, and cost breakdown vs Opus

    Claire summarizes strengths and weaknesses: strong HTML/CSS, solid tool-based investigation, and useful long-running planning, with weaker moments in React/TypeScript authoring. She closes with cost results from OpenRouter usage and her decision to keep GLM 5.2 in rotation as a practical Opus alternative.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.