Skip to content
a16za16z

What You Missed in AI This Week (Google, Apple, ChatGPT)

Things in consumer AI are moving fast. In this episode, Justine and Olivia Moore, investing partners (and identical twins!) at a16z, break down what’s real, what’s overhyped, and what’s next across the consumer AI space. They cover: - Veo 3: how Google's video model unlocked a new genre of content - OpenAI’s Advanced Voice Mode: upgrades, realism, and... um, human-like hesitation - Apple's AI announcements - ElevenLabs' V3: expressive voice tags, real-time interruptions, and narrative tools for creators - New data from a16z: AI consumer startups are ramping revenue faster than ever—and they show you how - Justine walks through how she used ChatGPT, Ideogram, and Krea to launch a fully AI-assisted brand prototype (store photos and all) It’s exhausting (in the best way) to be a creative in the age of AI. Timecodes: 00:00 Introduction 00:28 Meet the Hosts: Justine and Olivia 00:45 Veo 3: The Game-Changer in AI Video 06:34 ChatGPT's Advanced Voice Mode Updates 10:22 Apple's AI Announcements and Siri's Shortcomings 12:18 ElevenLabs' New Voice Model: 11 V3 15:50 Report from a16z: AI Revenue Growth 23:14 Demo of the Week: AI in Brand Creation Resources: Read ‘What “Working” Means in the Era of AI Apps’: https://a16z.com/revenue-benchmarks-ai-apps/ Find Justine on X: https://x.com/venturetwins Find Olivia on X: https://x.com/omooretweets Tools Discussed: Veo 3: https://gemini.google/overview/video-generation OpenAI: https://openai.com/chatgpt ElevenLabs (V3 voice model) – https://elevenlabs.io/ Ideogram (logo/image generation) – https://ideogram.ai/ Black Forest Labs/Flux Context (image editing via Krea) – https://www.krea.ai/ Flux Context demo (Krea launch post) – https://www.krea.ai/blog/flux-context Hedra: https://www.hedra.com/ Stay Updated: Let us know what you think: https://ratethispodcast.com/a16z Find a16z on Twitter: https://twitter.com/a16z Find a16z on LinkedIn: https://www.linkedin.com/company/a16z Subscribe on your favorite podcast app: https://a16z.simplecast.com/ Follow our host: https://x.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.

Olivia MoorehostJustine Moorehost
Jun 13, 202529mWatch on YouTube ↗

CHAPTERS

  1. Consumer AI recap kickoff: what’s on deck this week

    Justine and Olivia Moore launch their first “This Week in Consumer AI,” outlining the biggest consumer-facing AI developments they’ll cover. They preview a fast-moving week across AI video, voice, Apple’s AI posture, monetization data, and a creative demo.

    • New series focused on consumer AI trends and products
    • Hosts are a16z investing partners (and identical twins)
    • Roadmap: Veo 3, ChatGPT voice, Apple AI, ElevenLabs V3, revenue ramp data, and a brand-building demo
    • Thesis vibe: consumer AI is accelerating rapidly and showing up in everyday feeds
  2. Veo 3 breakout: the ‘ChatGPT moment’ for AI video

    They discuss why Google DeepMind’s Veo 3 suddenly pushed AI video into mainstream social feeds. The key leap is native audio generation alongside video, enabling “one prompt → full talking clip” content that looks like vlogs, podcasts, and street interviews.

    • Veo 3 compared to a ‘ChatGPT moment’ for AI video adoption
    • Major differentiator: generates audio natively with video from text prompts
    • Enables talking-head/podcast-style clips from a single prompt
    • Viral wave: channels and short-form accounts built entirely on Veo 3 generations
  3. Why the viral formats look the way they do: 8‑second limit and ‘faceless’ characters

    The hosts explain constraints that shape today’s Veo 3 content: generations are limited to ~8 seconds and audio only works reliably from text-to-video. Creators work around consistency issues by using masked/known characters (stormtroopers, yeti, capybara) where small identity drift is less noticeable.

    • Current limitation: ~8-second clips, making long-form continuity difficult
    • Audio generation works for text-to-video, not image-to-video workflows
    • Character consistency is easier with ‘known’ or face-covered characters
    • Rise of ‘faceless channels’ where creators don’t need to appear on camera
  4. How to access Veo 3 (and what it costs)

    They clarify early confusion about availability and pricing. Veo 3 debuted behind Google’s expensive AI Ultra plan via Flow, but is now accessible through APIs and third-party tools—still with meaningful per-generation costs that change how creators prompt and iterate.

    • Initial access required Google AI Ultra/Flow (~$250/month), fueling hype/FOMO
    • Now available via API through consumer tools (e.g., Hedra, Krea) and dev platforms (e.g., Replicate)
    • Pricing remains high (discussed as ~cents-per-second scale), so prompting discipline matters
    • Expectation: pressure for distillation/optimization to bring costs down
  5. What’s next for AI video: creators, coherence, and model economics

    They speculate on where AI video goes from here: more creator formats and storytelling, plus a push from model providers toward longer clips. The tradeoffs will be coherence and pricing—driving demand for smaller, cheaper models that preserve quality.

    • Creators experimenting with narrative series and comedic characters
    • Longer generations are the obvious next frontier, but coherence may degrade
    • Compute cost is a core limiter for broad consumer access
    • Likely direction: distilled/optimized models that deliver similar magic at lower cost
  6. ChatGPT Advanced Voice Mode catches up: more human, more expressive

    OpenAI’s Advanced Voice Mode receives a subtle but meaningful upgrade: speech feels more natural via inflections, fillers, and conversational timing. They frame it as OpenAI re-entering a category where competitors (Sesame, Gemini, Grok, open-source) had recently felt more lifelike.

    • Update rolled out to paid users first, then broader availability
    • Improvements: more natural prosody, ‘ums/uhs,’ and expressive cadence
    • Voice space got more competitive while OpenAI’s voice felt stagnant
    • They do a quick live demo to illustrate realism gains
  7. Why OpenAI may have moved slowly on voice

    They discuss possible reasons for the delayed voice upgrades. One factor is product prioritization across many frontiers (reasoning, image, video), and another is caution after public controversy around highly human-sounding assistants (the ‘Her’ discussion).

    • Tradeoffs in frontier labs: many parallel priorities competing for attention
    • Public sensitivity to ‘too-human’ voice companions may have increased caution
    • OpenAI’s workload spans voice, image, video (Sora), and core LLM capabilities
    • Result: voice felt like it lagged until this recent catch-up
  8. Apple Intelligence at WWDC: useful features, but Siri still underwhelms

    They react to Apple’s AI announcements with a focus on what’s missing: a truly capable AI Siri. Apple appears to rely heavily on ChatGPT for “real AI” tasks, while shipping safer, incremental features like Genmoji improvements, transcription, and real-time translation.

    • Perceived disappointment: the ‘AI Siri’ people want still isn’t here
    • Example: Siri failing a basic calendar question and offering to ‘search ChatGPT’
    • Apple previously got pushback on notification summaries, possibly slowing rollout
    • Notable highlight: real-time call/FaceTime translation across languages
  9. ElevenLabs Eleven V3: controllable emotion, interruptions, and SFX via text tags

    They cover ElevenLabs’ new Eleven V3 voice model, emphasizing a workflow shift: expressive delivery can be prompted in text rather than recorded and transferred. Tags enable emotions, accents, whispers, interruptions, and sound effects—unlocking more natural multi-character scenes.

    • Key upgrade: emotion/inflection control through text prompting (tags)
    • Reduces need for speech-to-speech workflows to ‘act’ a line first
    • Supports multi-speaker interactions and realistic interruptions
    • Adds sound effects prompting, improving narrative and ad realism
  10. AI storytelling convergence: Veo 3 + Eleven V3 raises the creative ceiling

    They connect the dots between breakthroughs in AI video and AI voice: creators can now generate end-to-end scenes with dialogue and performance. The result is a ‘world of possibilities’ for storytelling—but also an overwhelming pace of new tools to test.

    • Voice and video advances combine into full-stack AI narrative creation
    • Better conversational dynamics (interruptions, emotion) increase believability
    • New creative formats for marketing, sketches, and serial content
    • ‘Exciting but exhausting’ pace for creators as tooling rapidly improves
  11. a16z data: consumer AI startups are monetizing faster than ever

    Olivia shares findings from a dataset of gen-AI-era companies a16z met over ~22–24 months. Consumer AI businesses are reaching surprisingly high revenue run-rates quickly, largely driven by direct-to-consumer subscriptions and willingness to pay for powerful AI-native capabilities.

    • Method: analyze revenue ramp after monetization across gen-AI-era companies
    • Consumer median ARR run-rate at ~12 months discussed as ~$4.2M (top quartile ~8.7M)
    • Consumer ramp appears faster than B2B benchmarks in the AI era
    • Business model shift: consumer subscriptions are now common (not years-later ads)
  12. Why the monetization shift works: inference costs, higher ARPU, and new value props

    They explain structural reasons consumer AI charges more: AI has real marginal costs (inference), pushing companies toward subscriptions. At the same time, AI products replace expensive human services (tutoring, coaching, language learning, creative production), making $20–$30/month feel like a bargain.

    • Unlike classic software, AI features have non-trivial per-user marginal costs
    • Average pricing discussed around ~$22/month, higher than many pre-AI subscriptions
    • AI enables new users (non-creatives) and accelerates pros (workflow supercharge)
    • Categories mentioned: companions, language learning, kids’ reading help, nutrition/coaching
  13. Retention and expansion: ‘AI tourism’ vs paid durability—and consumer upsell mechanics

    They separate curiosity-driven usage from real subscription behavior. Free-user churn is high (‘AI tourism’), but paid retention looks comparable to pre-AI consumer norms, and revenue expansion is emerging through credits/overages—bringing enterprise-like expansion (and even ‘whale’ dynamics) to consumer apps.

    • Free traffic is spiky; many users try and leave (‘tourism’)
    • Paid retention at the median looks similar to pre-AI consumer products
    • Credits and add-on packs create revenue expansion within consumer subscriptions
    • Bottoms-up adoption can convert into enterprise deals faster than before
  14. Demo: building a froyo brand with AI (ChatGPT → Ideogram → Krea/Flux Context)

    Justine walks through an end-to-end workflow to create a modern frozen yogurt brand (“Melt”) and generate consistent product/store imagery. The centerpiece is Flux Context (via Krea), which enables Photoshop-like edits with natural language while preserving logo/product consistency better than many general image models.

    • Workflow: brand ideation in ChatGPT, logo/typography in Ideogram, final edits and scenes in Krea
    • Flux Context (Black Forest Labs) enables ‘edit with words’ while maintaining consistency
    • Use cases: product photos in different environments, packaging color changes, flavor variants
    • Broader implication: future entrepreneurs can build ‘full-stack AI brands’ (design, ads, influencers, web) quickly

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.