a16zThis Week in AI: GPT-5 Ships, 4o Pulled Back, Grok Imagine Goes Social
CHAPTERS
What’s on deck this week: creative tools, GPT-5, and vibecoding
Justine and Olivia set the agenda for the episode, spanning new consumer-facing creative models, major OpenAI model changes, and their evolving “vibecoding” thesis. They frame the discussion around what’s actually changing for everyday users versus what’s impressive on paper.
- •Topics preview: Grok Imagine, Genie 3, ElevenLabs music, GPT-5 + GPT-4 deprecation, vibecoding market
- •Consumer lens: usability and workflow matter as much as model quality
- •Emphasis on creative tooling as a fast-moving frontier
Grok Imagine goes social: image/video generation inside X
They break down xAI’s Grok Imagine and why its biggest differentiator isn’t raw quality—it’s distribution and native social integration. The ability to turn images into videos directly from X changes how casual users create and share AI content.
- •Imagine is available in Grok app, coming to web, and embedded directly in X
- •Long-press on an image in X to animate/edit it via Grok—low-friction creation loop
- •Social-native creative tooling is rare among major platforms today
- •Positioning: not the top model vs. Veo 3, but uniquely integrated and consumer-friendly
Speed + mobile workflow as the killer feature
They argue Grok Imagine’s near-instant image generation and fast video creation unlocks iteration in a way slower tools don’t. For non-professional creators, removing multi-step export/import workflows—especially on mobile—is a major leap for mainstream adoption.
- •Instant/very fast generations enable rapid iteration (vs. 30–90+ seconds or minutes elsewhere)
- •Mobile-first creation matters because many video tools are not mobile-native
- •One-tap access to camera roll + quick animation supports memes, old photos, everyday content
- •Consumer creative tools win by reducing friction more than by maximizing fidelity
Real-person generation and moderation tradeoffs
Grok’s “uncensored” feel (relative to other tools) enables generating real people and meme-y content with fewer blocks. They discuss how other models often restrict “prominent person” outputs, and why fewer constraints can make a product feel more playful—while also implying different safety choices.
- •Grok can generate real people/celeb-like outputs more readily than many competitors
- •Other tools often block outputs due to “prominent person” policies—even for non-celeb users
- •Looser moderation contributes to meme generation and ‘fun’ experimentation
- •Open question: how social platforms broadly will embrace AI-native creative content
GPT-5 ships—and GPT-4.0 disappears: why users got upset
The conversation shifts to OpenAI’s GPT-5 release and the surprise removal/deprecation of GPT-4.0 in the product experience. They explain why consumers reacted strongly: losing a familiar “friend” model felt like a downgrade even if benchmarks improved.
- •GPT-5 release is a major LLM moment; GPT-4.0 deprecation became the consumer flashpoint
- •Users noticed GPT-4.0 was gone immediately when trying side-by-side comparisons
- •Community backlash surfaced heavily on Reddit and social platforms
- •Sam Altman indicated GPT-4.0 would return for paid users after feedback
GPT-5 vs GPT-4.0: better at coding, worse at ‘vibes’
They separate two issues: removing excessive validation (“glazing”) is good, but GPT-5 also seems less expressive and less fun for casual chat. The net result is a model that may be objectively stronger yet less engaging for companionship-style usage.
- •GPT-5 appears notably stronger for front-end code generation and debugging
- •GPT-5 is less emotive/expressive: fewer exclamation points, emojis, all-caps, playful tone
- •Fixing over-validation increases trust, but may also reduce “personality” users liked
- •Takeaway: ‘smartest’ model may not be the best consumer chat/companion model
AI for health and mental health: OpenAI leans in as regulation tightens
They discuss OpenAI’s increasing emphasis on medical use cases (including HealthBench) alongside Illinois’ new law restricting AI-driven therapy without licensed supervision. The tension: users already use general chatbots for mental health support, while enforcement and definitional boundaries remain unclear.
- •Illinois law bans AI mental health/therapy support without licensed professional oversight
- •Some AI mental health companies reportedly paused Illinois sign-ups/operations
- •Regulation is hard to enforce given private chats and general-purpose chatbots
- •OpenAI highlighted medical capability: physician-involved benchmarking (HealthBench) and public anecdotes
- •Liability vs. product direction: OpenAI appears to endorse health usage more explicitly
Genie 3 from Google: interactive world models you can ‘walk through’
They explain why Genie 3 demos went viral: it turns prompts/images (and even videos) into navigable, real-time environments. Unlike standard video generation, the user can control movement and see the scene regenerate dynamically as if exploring a virtual world.
- •Genie 3 generates interactive scenes/worlds in real time (like a personal video game)
- •Inputs can include text prompts, images, and possibly other generated video content
- •User controls (move left/right, navigate) drive on-the-fly scene regeneration
- •Not publicly released yet; early access demos and office trials drove online buzz
So what do you do with a world model? video control, gaming, and RL environments
They outline practical paths for Genie 3: recording controlled traversal to make more editable video, accelerating game development, and enabling fully personalized mini-games. They also highlight a less consumer-centric but important use: generating scalable RL environments for training agents and robots.
- •Controlled ‘screen capture’ of navigation could create more controllable video than standard video models
- •Game dev acceleration: generate worlds quickly from prompts and potentially “freeze” them for others to play
- •Personal gaming: each user generates their own bespoke world/experience
- •Agent training: dynamic worlds could supply unlimited RL environments vs. manually built simulators
ElevenLabs’ licensed music model: why licensing changes the market
They cover ElevenLabs’ move into AI music generation with a key differentiator: training on fully licensed music. While many consumers may not care, licensing is crucial for enterprises and media buyers who need legal clarity for commercial use.
- •Model trained on fully licensed music, addressing a highly litigious rights landscape
- •Music rights are complex (labels, artists, ownership splits), making scraping risky
- •Consumers may use unlicensed tools for casual content, but businesses require licensing assurances
- •Enables safer use in ads, films, TV, games, and other monetized contexts
Vibecoding in practice: Olivia ships a ‘selfie with Jensen’ app overnight
Olivia shares a hands-on vibecoding case study: she built and published a meme app that lets users generate a Jensen Huang selfie using image model APIs. The project demonstrates how non-technical creators can ship real, viral software quickly—while also revealing cost and reliability pitfalls.
- •Built with Lovable, connected to Fal to use Flux Contexts; generated downloadable “selfie with Jensen” images
- •Launched publicly and reached ~3,000 users overnight
- •Hit a self-funded API budget cap ($100), leading to degraded/placeholder outputs
- •Proof point: rapid creation and distribution is now accessible to non-engineers
Early vibecoding platforms assume technical knowledge: security and privacy gaps
They discuss the hidden sharp edges: the app accidentally exposed an API key and lacked proper private storage for uploaded photos. The key critique is that today’s vibecoding tools don’t proactively prevent common security/privacy mistakes, which blocks true mainstream adoption.
- •Two major issues: exposed API key and unprotected storage for uploaded user photos
- •Platforms didn’t flag risks before publishing; community alerted her after launch
- •Non-technical builders need guardrails, warnings, and safe defaults
- •Signals an early-market gap between “demo-able” and “production-safe” vibecoding
The vibecoding market will fragment: ‘training wheels’ consumers vs. stack-control pros
Justine summarizes the broader thesis from her post with Anish Acharya: vibecoding is trying to serve incompatible user segments with one product. The market likely splits into specialized platforms optimized for consumers (fast, safe, mobile, constrained) versus developers/enterprises (flexible, integrated, controllable).
- •One-size-fits-all platforms face tension across consumer, indie SaaS, and enterprise/internal tools use cases
- •Consumer vibecoding needs guardrails and simplicity, even at the cost of flexibility
- •Enterprise/dev users demand control over stack, integrations, and system constraints
- •Different go-to-market paths: viral consumer distribution vs. PLG within businesses or sales-led motions
- •Analogy to other AI markets: multiple winners emerge by specializing on segments and priorities