CHAPTERS
Consumer AI recap kickoff: what’s on deck this week
Justine and Olivia Moore launch their first “This Week in Consumer AI,” outlining the biggest consumer-facing AI developments they’ll cover. They preview a fast-moving week across AI video, voice, Apple’s AI posture, monetization data, and a creative demo.
- •New series focused on consumer AI trends and products
- •Hosts are a16z investing partners (and identical twins)
- •Roadmap: Veo 3, ChatGPT voice, Apple AI, ElevenLabs V3, revenue ramp data, and a brand-building demo
- •Thesis vibe: consumer AI is accelerating rapidly and showing up in everyday feeds
Veo 3 breakout: the ‘ChatGPT moment’ for AI video
They discuss why Google DeepMind’s Veo 3 suddenly pushed AI video into mainstream social feeds. The key leap is native audio generation alongside video, enabling “one prompt → full talking clip” content that looks like vlogs, podcasts, and street interviews.
- •Veo 3 compared to a ‘ChatGPT moment’ for AI video adoption
- •Major differentiator: generates audio natively with video from text prompts
- •Enables talking-head/podcast-style clips from a single prompt
- •Viral wave: channels and short-form accounts built entirely on Veo 3 generations
Why the viral formats look the way they do: 8‑second limit and ‘faceless’ characters
The hosts explain constraints that shape today’s Veo 3 content: generations are limited to ~8 seconds and audio only works reliably from text-to-video. Creators work around consistency issues by using masked/known characters (stormtroopers, yeti, capybara) where small identity drift is less noticeable.
- •Current limitation: ~8-second clips, making long-form continuity difficult
- •Audio generation works for text-to-video, not image-to-video workflows
- •Character consistency is easier with ‘known’ or face-covered characters
- •Rise of ‘faceless channels’ where creators don’t need to appear on camera
How to access Veo 3 (and what it costs)
They clarify early confusion about availability and pricing. Veo 3 debuted behind Google’s expensive AI Ultra plan via Flow, but is now accessible through APIs and third-party tools—still with meaningful per-generation costs that change how creators prompt and iterate.
- •Initial access required Google AI Ultra/Flow (~$250/month), fueling hype/FOMO
- •Now available via API through consumer tools (e.g., Hedra, Krea) and dev platforms (e.g., Replicate)
- •Pricing remains high (discussed as ~cents-per-second scale), so prompting discipline matters
- •Expectation: pressure for distillation/optimization to bring costs down
What’s next for AI video: creators, coherence, and model economics
They speculate on where AI video goes from here: more creator formats and storytelling, plus a push from model providers toward longer clips. The tradeoffs will be coherence and pricing—driving demand for smaller, cheaper models that preserve quality.
- •Creators experimenting with narrative series and comedic characters
- •Longer generations are the obvious next frontier, but coherence may degrade
- •Compute cost is a core limiter for broad consumer access
- •Likely direction: distilled/optimized models that deliver similar magic at lower cost
ChatGPT Advanced Voice Mode catches up: more human, more expressive
OpenAI’s Advanced Voice Mode receives a subtle but meaningful upgrade: speech feels more natural via inflections, fillers, and conversational timing. They frame it as OpenAI re-entering a category where competitors (Sesame, Gemini, Grok, open-source) had recently felt more lifelike.
- •Update rolled out to paid users first, then broader availability
- •Improvements: more natural prosody, ‘ums/uhs,’ and expressive cadence
- •Voice space got more competitive while OpenAI’s voice felt stagnant
- •They do a quick live demo to illustrate realism gains
Why OpenAI may have moved slowly on voice
They discuss possible reasons for the delayed voice upgrades. One factor is product prioritization across many frontiers (reasoning, image, video), and another is caution after public controversy around highly human-sounding assistants (the ‘Her’ discussion).
- •Tradeoffs in frontier labs: many parallel priorities competing for attention
- •Public sensitivity to ‘too-human’ voice companions may have increased caution
- •OpenAI’s workload spans voice, image, video (Sora), and core LLM capabilities
- •Result: voice felt like it lagged until this recent catch-up
Apple Intelligence at WWDC: useful features, but Siri still underwhelms
They react to Apple’s AI announcements with a focus on what’s missing: a truly capable AI Siri. Apple appears to rely heavily on ChatGPT for “real AI” tasks, while shipping safer, incremental features like Genmoji improvements, transcription, and real-time translation.
- •Perceived disappointment: the ‘AI Siri’ people want still isn’t here
- •Example: Siri failing a basic calendar question and offering to ‘search ChatGPT’
- •Apple previously got pushback on notification summaries, possibly slowing rollout
- •Notable highlight: real-time call/FaceTime translation across languages
ElevenLabs Eleven V3: controllable emotion, interruptions, and SFX via text tags
They cover ElevenLabs’ new Eleven V3 voice model, emphasizing a workflow shift: expressive delivery can be prompted in text rather than recorded and transferred. Tags enable emotions, accents, whispers, interruptions, and sound effects—unlocking more natural multi-character scenes.
- •Key upgrade: emotion/inflection control through text prompting (tags)
- •Reduces need for speech-to-speech workflows to ‘act’ a line first
- •Supports multi-speaker interactions and realistic interruptions
- •Adds sound effects prompting, improving narrative and ad realism
AI storytelling convergence: Veo 3 + Eleven V3 raises the creative ceiling
They connect the dots between breakthroughs in AI video and AI voice: creators can now generate end-to-end scenes with dialogue and performance. The result is a ‘world of possibilities’ for storytelling—but also an overwhelming pace of new tools to test.
- •Voice and video advances combine into full-stack AI narrative creation
- •Better conversational dynamics (interruptions, emotion) increase believability
- •New creative formats for marketing, sketches, and serial content
- •‘Exciting but exhausting’ pace for creators as tooling rapidly improves
a16z data: consumer AI startups are monetizing faster than ever
Olivia shares findings from a dataset of gen-AI-era companies a16z met over ~22–24 months. Consumer AI businesses are reaching surprisingly high revenue run-rates quickly, largely driven by direct-to-consumer subscriptions and willingness to pay for powerful AI-native capabilities.
- •Method: analyze revenue ramp after monetization across gen-AI-era companies
- •Consumer median ARR run-rate at ~12 months discussed as ~$4.2M (top quartile ~8.7M)
- •Consumer ramp appears faster than B2B benchmarks in the AI era
- •Business model shift: consumer subscriptions are now common (not years-later ads)
Why the monetization shift works: inference costs, higher ARPU, and new value props
They explain structural reasons consumer AI charges more: AI has real marginal costs (inference), pushing companies toward subscriptions. At the same time, AI products replace expensive human services (tutoring, coaching, language learning, creative production), making $20–$30/month feel like a bargain.
- •Unlike classic software, AI features have non-trivial per-user marginal costs
- •Average pricing discussed around ~$22/month, higher than many pre-AI subscriptions
- •AI enables new users (non-creatives) and accelerates pros (workflow supercharge)
- •Categories mentioned: companions, language learning, kids’ reading help, nutrition/coaching
Retention and expansion: ‘AI tourism’ vs paid durability—and consumer upsell mechanics
They separate curiosity-driven usage from real subscription behavior. Free-user churn is high (‘AI tourism’), but paid retention looks comparable to pre-AI consumer norms, and revenue expansion is emerging through credits/overages—bringing enterprise-like expansion (and even ‘whale’ dynamics) to consumer apps.
- •Free traffic is spiky; many users try and leave (‘tourism’)
- •Paid retention at the median looks similar to pre-AI consumer products
- •Credits and add-on packs create revenue expansion within consumer subscriptions
- •Bottoms-up adoption can convert into enterprise deals faster than before
Demo: building a froyo brand with AI (ChatGPT → Ideogram → Krea/Flux Context)
Justine walks through an end-to-end workflow to create a modern frozen yogurt brand (“Melt”) and generate consistent product/store imagery. The centerpiece is Flux Context (via Krea), which enables Photoshop-like edits with natural language while preserving logo/product consistency better than many general image models.
- •Workflow: brand ideation in ChatGPT, logo/typography in Ideogram, final edits and scenes in Krea
- •Flux Context (Black Forest Labs) enables ‘edit with words’ while maintaining consistency
- •Use cases: product photos in different environments, packaging color changes, flavor variants
- •Broader implication: future entrepreneurs can build ‘full-stack AI brands’ (design, ads, influencers, web) quickly
