Skip to content
How I AIHow I AI

How to digest 36 weekly podcasts without spending 36 hours listening | Tomasz Tunguz

Tomasz Tunguz is the founder of Theory Ventures, which invests in early-stage enterprise AI, data, and blockchain companies. In this episode, Tomasz reveals his custom-built “Parakeet Podcast Processor,” which helps him extract value from 36 podcasts weekly without spending 36 hours listening. He walks through his terminal-based workflow that downloads, transcribes, and summarizes podcast content, extracting key insights, investment theses, and even generating blog post drafts. We explore how AI enables hyper-personalized software experiences that weren’t feasible before recent advances in language models. *What you’ll learn:* 1. How to build a terminal-based podcast processing system that downloads, transcribes, and extracts key insights from multiple podcasts daily 2. A workflow for using Nvidia’s Parakeet and other AI tools to clean transcripts and generate structured summaries of podcast content 3. How to extract actionable investment theses and company mentions from podcast transcripts using AI prompting techniques 4. A systematic approach to generating blog post drafts with AI that maintains your personal writing style through iterative feedback 5. Why using an “AP English teacher” grading system can help improve AI-generated content through multiple revision cycles 6. How to leverage Claude Code for maintaining and updating personal productivity tools with minimal friction *Brought to you by:* Notion—The best AI tools for work: https://www.notion.com/howiai Miro—A collaborative visual platform where your best work comes to life: http://miro.com/ *25k giveaway:*  To celebrate 25,000 YouTube followers, we’re doing a giveaway. Win a free year of my favorite AI products, including v0, Replit, Lovable, Bolt, Cursor, and, of course, ChatPRD, by leaving a rating and review on your favorite podcast app and subscribing to the podcast on YouTube. To enter: https://www.howiaipod.com/giveaway *Where to find Tomasz Tunguz:* Blog: https://tomtunguz.com/ Theory Ventures: https://theory.ventures/ LinkedIn: https://www.linkedin.com/in/tomasztunguz/ X: https://x.com/ttunguz *Where to find Claire Vo:* ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo *In this episode, we cover:* (00:00) Introduction to Tomasz Tunguz (03:32) Overview of the podcast ripper system and its components (05:06) Demonstration of the transcript cleaning process (06:59) Extracting quotes, investment theses, and company mentions (10:20) Why Tomasz prefers terminal-based tools (12:38) The benefits of personalized software versus off-the-shelf solutions (15:31) A workflow for generating blog posts from podcast insights (17:34) Using the “AP English teacher” grading system for blog posts (18:25) Challenges with matching personal writing style using AI (22:00) Tomasz’s three-iteration process for improving blog posts (26:13) The grading prompt and evaluation criteria (28:16) AI’s role in writing education (30:28) Final thoughts *Tools referenced:* • Whisper (OpenAI): https://openai.com/research/whisper • Parakeet: https://build.nvidia.com/nvidia/parakeet-ctc-0_6b-asr • Ollama: https://ollama.com/ • Gemma 3: https://deepmind.google/models/gemma/gemma-3/ • Claude: https://claude.ai/ • Claude Code: https://claude.ai/code • Gemini: https://gemini.google.com/ • FFmpeg: https://ffmpeg.org/ • DuckDB: https://duckdb.org/ • LanceDB: https://lancedb.com/ *Other references:* • 35 years of product design wisdom from Apple, Disney, Pinterest, and beyond | Bob Baxley: https://www.lennysnewsletter.com/p/35-years-of-product-design-wisdom-bob-baxley • Dan Luu’s blog post on latency: https://danluu.com/input-lag/ • GitHub CEO: The AI Coding Gold Rush, Vibe Coding & Cursor: https://www.readtobuild.com/p/github-ceo-the-ai-coding-gold-rush • Stanford Named Entity Recognition library: https://nlp.stanford.edu/software/CRF-NER.html _Production and marketing by https://penname.co/._ _For inquiries about sponsoring the podcast, email jordan@penname.co._

Tomasz TunguzguestClaire Vohost
Aug 25, 202535mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 3:32

    Why Tomasz built a “podcast ripper” to keep up with 36 shows

    Tomasz explains the core problem: he wants insights from dozens of weekly podcasts but doesn’t have time to listen. His solution is an automated pipeline that downloads episodes, transcribes them, and produces skimmable outputs he can read quickly.

    • Maintains a list of ~36 must-follow podcasts
    • Automation runs daily to fetch new episodes
    • Prefers reading over listening for speed and control (skipping/scan)
    • Transcription + prompting turns long audio into actionable text
  2. 3:32 – 5:06

    Architecture overview: downloading feeds, converting audio, transcribing locally

    Tomasz walks through the system he built (Parakeet Podcast Processor) and the main plumbing that turns audio files into text. The toolchain emphasizes local processing and modular steps that can be swapped as models improve.

    • Podcast processor reads a feed/file and downloads episodes
    • FFmpeg converts audio formats for consistent processing
    • Uses Whisper initially; highlights NVIDIA’s Parakeet as a newer option that runs well on Mac
    • Pipeline approach: ingest → convert → transcribe → post-process
  3. 5:06 – 6:59

    Transcript cleanup: using an LLM as a transcript editor

    After transcription, Tomasz cleans transcripts by removing filler words while preserving technical content and length. He demonstrates a “transcript editor” prompt and describes why cleaning mattered more earlier in the project than it does now.

    • Uses Gemma 3 (via Ollama) to clean transcripts
    • Prompt goals: remove ums/ahs, keep technical details, preserve length
    • Improves readability for fast scanning
    • Cleaning’s importance decreased as he moved more tasks to stronger LLMs
  4. 6:59 – 10:20

    Orchestration + storage: tracking processed episodes with DuckDB

    To make the workflow reliable, Tomasz stores processing metadata locally so episodes aren’t reprocessed unnecessarily. He describes an orchestrator that pulls daily transcripts from the database and runs summarization prompts in batches.

    • Local DuckDB logs what was processed and when
    • Orchestrator fetches all transcripts for the day
    • Batch processing yields a single daily summary artifact
    • Designed for repeatable, low-friction daily use
  5. 10:20 – 12:38

    Daily digest outputs: summaries, key topics/themes, and the most valuable quotes

    Tomasz shows what the daily generated document looks like: each podcast gets host/guest context, a comprehensive summary, key topics, and key themes. He emphasizes that curated quotes are the highest-signal output for his workflow.

    • Per-episode structured summary: host/guest + comprehensive recap
    • Key topics and key themes extracted for quick scanning
    • Quote extraction highlighted as the most useful component
    • Digest is positioned as a daily briefing document
  6. 12:38 – 15:31

    From content to action: investment theses, tweets, and company discovery

    Beyond summarization, the pipeline generates venture-style “investment theses,” draft tweets, and lists of companies mentioned in episodes. These outputs connect podcast listening to concrete next steps like market maps, CRM enrichment, and outreach.

    • Actionable investment theses suggested from episodes
    • Noteworthy observations formatted as potential tweets (still iterating)
    • Entity extraction surfaces startups/companies mentioned
    • Unknown companies can be pushed into a CRM for enrichment/research
  7. 15:31 – 17:34

    Why the terminal: speed, low latency, and scriptability

    Claire probes why Tomasz stays in the terminal instead of building a UI. Tomasz argues the terminal offers the lowest interaction latency, reduces frustration, and makes it easy to script automations across email, CRM actions, and AI tools.

    • Terminal offers minimal keyboard-to-computer latency (via Dan Luu’s argument)
    • Adopted terminal usage during COVID and now lives in it
    • Scriptable workflows: bulk actions, AI-assisted email replies, CRM updates
    • Sees terminal as a legitimate product surface (e.g., Claude Code)
  8. 17:34 – 18:25

    Hyper-personal tools vs off-the-shelf apps: “glove-like fit” with modern AI

    They discuss why personalized internal tools are now practical: AI reduces the cost of building and modifying bespoke software. Tomasz highlights how quickly he can tweak workflows (like reordering sections or emailing digests) using tools like Claude Code.

    • Off-the-shelf apps won’t match niche preferences (e.g., terminal-first)
    • AI makes bespoke utilities cheap to create and maintain
    • Claude Code enables fast repo-wide changes and feature additions
    • Workflow can evolve quickly as needs change
  9. 18:25 – 22:00

    Podcast insights to blog drafts: the blog post generator pipeline

    Tomasz introduces a second system that turns a specific podcast quote/topic into a blog post draft. It uses the podcast transcript as context and pulls relevant prior writing to shape content and tone, though a demo bug appears during search.

    • Selects a quote/topic from the podcast digest to anchor a post
    • Uses transcript context to ground the draft
    • Retrieves relevant prior blog posts via local vector search (LanceDB)
    • Generates a draft blog post file; demo includes a search bug
  10. 22:00 – 26:13

    The “AP English teacher” grader: setting a quality bar and iterating

    To improve drafts, Tomasz uses an evaluation prompt that grades the post like an AP English teacher, then revises until it reaches roughly an A−. He describes why hooks and conclusions matter most and how he runs multiple improvement passes.

    • Grades output with letter grade + numeric score
    • Emphasis areas: hook/lead and conclusion cohesion
    • Three-iteration loop: draft → grade → improve → re-grade
    • Stops around A− as “good enough” for a working draft
  11. 26:13 – 28:16

    Style matching is hard: model voices, personal quirks, and linking limitations

    Both agree AI struggles to capture a writer’s authentic rhythm—especially in short-form. Tomasz compares model “personalities,” notes AI’s tendency toward grammatical perfection, and shares unresolved challenges like automatically linking to related posts.

    • Different models produce different tones (clinical vs warm/verbose)
    • AI often removes intentional stylistic quirks (punctuation, fragments, ampersands)
    • Hard to make AI link to other relevant posts in a coherent way
    • Short-form (tweets) is even harder than long-form style matching
  12. 28:16 – 30:28

    Prompt + rubric details: what his generator optimizes for (brevity, no headers, flow)

    Tomasz reveals the blog generator prompt design and the structural constraints he enforces from his own analytics. He explains dynamic style extraction from related posts and why he avoids headers due to dwell-time impacts.

    • Uses past posts (2,000+) as style context; summarizes style patterns dynamically
    • Targets ~500 words or less and fast reader time-to-value
    • No section headers (found they hurt dwell time)
    • Paragraph constraints: smooth transitions, often two long sentences max
  13. 30:28 – 35:14

    AI for writing education + wrap-up: future tiny teams and “AI model cage matches”

    They broaden to education: AI can handle first-pass grammar/structure feedback, freeing teachers to focus on creativity. Tomasz then answers rapid questions: his vision of a 30-person $100M company and his tactic of using multiple models to critique each other when outputs degrade.

    • AI as a first-pass writing evaluator (grammar, structure, logic checks)
    • Teachers can focus more on creativity and higher-order feedback
    • Vision: lean, engineering-heavy teams with PLG-driven growth and internal automation
    • When stuck: have multiple models compete/critique to improve outputs; switching models increases robustness

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.