Skip to content
Dwarkesh PodcastDwarkesh Podcast

@Asianometry & Dylan Patel — How the semiconductor industry actually works

Dylan Patel runs Semianalysis, the leading publication and research firm on AI hardware: https://www.semianalysis.com/. Jon Y runs @Asianometry, the world’s best YouTube channel on semiconductors and business history. 𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒 * Transcript: https://www.dwarkeshpatel.com/p/dylan-jon * Apple Podcasts: https://podcasts.apple.com/us/podcast/dylan-patel-jon-asianometry-how-the-semiconductor/id1516093381?i=1000671564456 * Spotify: https://open.spotify.com/episode/6q1XODE2L5bqqBwe7434S7?si=seXQ6K_LQZeAV6776H6MhQ * Me on Twitter: https://twitter.com/dwarkesh_sp 𝐒𝐏𝐎𝐍𝐒𝐎𝐑𝐒 * Jane Street is looking to hire their next generation of leaders. Their deep learning team is looking for FPGA programmers, CUDA programmers, and ML researchers. To learn more about their full time roles, internship, tech podcast, and upcoming Kaggle competition, go here: https://jane-st.co/dwarkesh * Stripe builds financial infrastructure for the internet. Millions of companies from Anthropic to Amazon use Stripe to accept payments, automate financial processes and grow their revenue. Learn more here: https://stripe.com/ If you’re interested in advertising on the podcast: https://www.dwarkeshpatel.com/p/advertise 𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒 00:00:00 – Xi’s path to AGI 00:05:05 – Liang Mong Song 00:09:10 – How semiconductors get better 00:12:01 – China can centralize compute 00:19:35 – Export controls & sanctions 00:33:36 – Huawei’s intense culture 00:39:36 – Why the semiconductor industry is so stratified 00:41:43 – N2 should not exist 00:46:38 – Taiwan invasion hypothetical 00:50:06 – Mind-boggling complexity of semiconductors 00:59:58 – Chip architecture design 01:05:21 – Architectures lead to different AI models? China vs. US 01:10:57 – Being head of compute at an AI lab 01:17:09 – Scaling costs and power demand 01:37:50 – Are we financing an AI bubble? 01:51:05 – Starting Asianometry and SemiAnalysis 02:06:55 – Opportunities in the semiconductor stack

Jon Y (Asianometry)guestDylan PatelguestDwarkesh Patelhost
Oct 2, 20242h 10mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 5:02

    If Xi Jinping were “scale‑pilled”: espionage, talent poaching, and what to steal first

    The conversation opens with a thought experiment: if China’s top leadership fully prioritized scaling to AGI, what would they do first. The guests discuss the practical value of stealing know-how across the semiconductor and AI stack, and why simply possessing blueprints is different from being able to manufacture at scale.

    • Espionage vs. execution: stolen files don’t automatically translate to working manufacturing
    • China’s ability to poach experienced fab talent (especially from Taiwan)
    • ASML hacking as an example of trying to steal at the most critical bottlenecks
    • The stack view: chips, fab recipes, models, and organizational capacity all matter
  2. 5:02 – 9:10

    The Liang Mong‑Song saga: TSMC → Samsung’s leap → SMIC’s catch‑up

    Jon tells the story of Liang Mong‑Song and earlier Taiwan-to-China talent flows, illustrating how individuals can shift competitive balance. The chapter highlights Samsung’s brief edge, TSMC’s aggressive response, and how SMIC later benefited from the same kind of leadership and know-how transfer.

    • Richard Chang and early Taiwanese talent migration helped bootstrap China’s industry
    • Liang Mong‑Song’s leadership style and technical impact
    • Samsung’s push to leading edge and winning major business for a period
    • TSMC’s “nightingale army” culture and trade-secret lawsuits
    • Liang’s move to SMIC and accelerated Chinese progress
  3. 9:10 – 12:01

    How process nodes improve in practice: recipes, knobs, yields, and apprenticeship knowledge

    They zoom into what ‘making the next node’ actually involves: thousands of steps, multivariable tuning, and specialized engineers who understand only their slice. The discussion emphasizes tacit knowledge, master-apprentice training, and why yield collapses multiplicatively.

    • Node development as iterative experimentation across many tools and parameters
    • Optimization targets are multi-objective: yield, performance, and power
    • Extreme specialization: engineers may only master one process (e.g., etch)
    • Tacit knowledge and apprenticeship pipelines as a durable competitive moat
    • Yield math: many steps with high per-step success still compound to low total yield
  4. 12:01 – 19:35

    China’s compute advantage case: centralizing GPUs + power buildout (Three Gorges example)

    Dylan argues China could concentrate compute faster than the U.S. if leadership forced centralization, because China can build power and substations rapidly. They debate the tradeoffs: centralization improves scaling but risks bureaucracy, fragility, and attention.

    • U.S. efforts are decentralized across many labs; China could theoretically centralize
    • China’s ability to add power capacity and build dense infrastructure quickly
    • Three Gorges/industrial load as an existence proof for multi‑GW concentration
    • Risks of centralization: single point of failure, security attention, bureaucratic control
    • Scaling-law incentives push toward larger unified training runs
  5. 19:35 – 28:49

    Export controls and sanctions: what’s working, what’s backfiring, and the ‘jigsaw puzzle’ problem

    They examine export controls as an ‘art’ that requires deep technical understanding, and critique incremental restrictions that reveal where to invest domestically. The key tension: restricting shipped chips while allowing equipment/tool flows can unintentionally accelerate China’s ability to build competitive domestic chips.

    • Sanctions are hard because officials must define technical lines with incomplete expertise
    • Incremental restrictions can function like teaching the missing pieces to replace
    • Equipment flows vs. chip flows: different leverage and different enforcement difficulty
    • Advanced packaging is hard to sanction due to geography and corporate structures
    • Strategic goal ambiguity: stop better chips, stop better AIs, or slow scaling?
    • SMIC sanctions loopholes (site-specific vs. whole-company) and tool imports
  6. 28:49 – 34:02

    SMIC capacity, DUV constraints, and why Huawei/China still get usable chips despite low yield

    Dylan provides rough capacity math for SMIC’s advanced nodes and explains why yields are challenged without EUV and with immature processes. They discuss the dilemma: capacity can go to phones/telecom rather than AI accelerators, unless priorities shift.

    • Estimated immersion tool counts and resulting 7nm wafer capacity ranges
    • Chip-per-wafer math under low-yield conditions and why it’s still meaningful at scale
    • DUV multi-patterning and process immaturity as yield/power constraints
    • Allocation problem: wafers used for consumer products vs. AI accelerators
    • Why ‘export controls succeeded’ is ambiguous if the target is stopping progress entirely
  7. 34:02 – 39:35

    Huawei’s “cracked” engineering culture: paranoia, struggle, 996, and propaganda uncertainty

    The discussion turns to Huawei’s performance under constraints and the cultural/organizational reasons it might out-execute rivals. They also touch on information uncertainty around China and the possibility that visible incompetence narratives may be misleading or strategic.

    • Huawei competing effectively despite sanctions and older nodes
    • Culture themes: struggle, paranoia (‘only the paranoid survive’), and 996 intensity
    • How propaganda/opacity complicates outside assessment of competence
    • Espionage exists, but doesn’t fully explain architecture and execution differences
    • Huawei as an ecosystem force: beneficial or stifling for broader Chinese tech?
  8. 39:35 – 41:59

    Why semiconductors are so stratified: end of vertical integration and the foundry shakeout

    They explain how vertical integration gave way to specialized suppliers and why most layers now have only a few dominant firms. Foundries consolidated as leading-edge R&D costs exploded, forcing massive demand aggregation to fund each new node.

    • Spin-offs created the modern equipment industry; specialists beat in-house tools
    • Market structure: 1–3 players per layer with extreme share concentration
    • Foundry proliferation in early era vs. consolidation at leading edge
    • Rising node costs require aggregating customer demand and profits
    • Economic barriers drive fewer surviving foundries over time
  9. 41:59 – 46:38

    ‘N2 should not exist’: Moore’s Law economics, AI demand revival, and what nodes really buy you

    N2 becomes a symbol for the economic strain of continuing node shrinks absent a killer demand driver. They argue AI changes the viability of aggressive leading-edge investment, and clarify that node improvements now come as much from system/data locality effects as from transistor scaling alone.

    • Without AI, leading-edge node spending looked increasingly unjustified
    • Recent nodes bring smaller density/power gains; SRAM scaling is limited
    • System-level gains: better data locality reduces off-chip movement and saves power
    • AI workloads strongly reward process leadership (power, density, perf per watt)
    • The industry’s ability to pay for nodes depends on concentrated buyers and demand
  10. 46:38 – 50:06

    Taiwan invasion/earthquake hypothetical: immediate market shock and cascading ‘tech reset’

    They walk through a grim scenario where Taiwanese chip exports stop. The impacts extend far beyond cutting-edge AI chips to everyday products and industrial supply chains, causing a rapid economic shock and long multi-year rebuild efforts.

    • Day-one: massive market crash driven by chip dependence of major tech firms
    • Medium term: car production and consumer electronics disrupted due to trailing nodes too
    • Taiwan dominance at key nodes (e.g., large share of 28nm) creates hidden choke points
    • Chip count per vehicle and embedded electronics amplify systemic fragility
    • Reallocation toward military/critical needs and emergency industrial policy responses
  11. 50:06 – 59:33

    Mind-boggling complexity and coordination: abstraction layers, secrecy, and the ‘miracle’ stack

    Dwarkesh probes how anyone learns semiconductors given the secrecy and specialization. Jon and Dylan describe an industry that coordinates through standards bodies, ‘gossip,’ and shared roadmaps while remaining profoundly siloed and under-documented.

    • Semiconductor knowledge is not publicly legible like AI; documentation is limited
    • Siloing: tool makers and fabs don’t fully understand each other’s optimizations
    • Coordination via roadmaps (IRDS), standards, argumentation, and consensus-building
    • Legacy software stacks (XP/CentOS) persist because systems are fragile and risky to change
    • AI can accelerate design/search, but manufacturing adoption is slower and data-limited
  12. 59:33 – 1:04:38

    Chip architecture, data movement, and where 100× gains might come from

    They pivot from manufacturing to architecture: most power is spent moving data, not computing. Dylan argues huge efficiency gains remain possible through better architectures, packaging, networking, and AI-assisted design tools—potentially dwarfing node-only gains.

    • Semiconductors as an enormous ‘search space’ with near-infinite design permutations
    • Primary optimization: useful intelligence per joule; data movement dominates power
    • Potential 100× gains from architecture even if node scaling slowed
    • EDA has increased designer output; AI could further amplify design productivity
    • Innovation vectors span memory, interconnect, packaging, power delivery, and system design
  13. 1:04:38 – 1:10:37

    Different hardware leads to different models: TPU vs GPU, and China’s constrained design paths

    They explore how hardware constraints shape optimal model architectures and might create divergence between U.S. and Chinese AI systems. The discussion spans sparsity/MoE tradeoffs, attention variants, width-vs-depth choices, and use-case/data differences (e.g., vision/surveillance).

    • Optimal model architecture depends on hardware (compute/memory/network topology)
    • Constraints can push divergence: sparsity, experts, attention tweaks, width/depth
    • China-specific constraints could drive different accelerator designs and earlier optics integration
    • Use-case/data differences (video/image) reinforce architectural divergence
    • Self-fulfilling loop: hardware favors transformers; research follows what runs well
  14. 1:10:37 – 1:26:20

    Being head of compute at a new AI lab: clusters, data centers, ESG, and multi-site scaling

    Dwarkesh asks Dylan what he’d do with $1B as ‘head of compute’ for a new lab. Dylan explains why small leased clusters don’t matter at frontier scale, why power price is less important than power availability, and how multi-site training depends on fiber and logistics.

    • Frontier research needs very large clusters; small reseller lots are ‘cope’ for top labs
    • Compute rental prices fell; TCO dominated by GPUs, not electricity
    • Why labs can’t ‘turn off at night’ to match solar: GPU capex requires high utilization
    • Data center bottlenecks: substations, transformers, cooling, generators, and permitting
    • Multi-site scaling signals: fiber deals and regional interconnect strategies
    • ESG constraints shape how big companies pursue aggressive power solutions
  15. 1:26:20 – 1:51:05

    Scaling to 2028–2029: gigawatt sites, 1e30 ‘total flops’, and financing/bubble dynamics

    They project cluster sizes and power needs into the late decade, with Dylan suggesting 1e30 total compute delivered (including synthetic data and post-training) is plausible. The conversation then becomes a debate about financing: whether AI is a bubble, why CEOs ‘must’ invest (Pascal’s wager), and what happens if a major model release disappoints.

    • Projected largest clusters: 100K now, 300–700K next, multi‑GW sites by 2026
    • Power per GPU rising; local delivery and grid buildout are the hard constraints
    • Training FLOP accounting is changing: synthetic data, search, and post-training dominate
    • OpenAI potentially needing $50–$100B raises for planned clusters
    • Pascal’s wager framing: under-investing risk > over-investing risk for incumbents
    • Bubble comparison to dot‑com and the idea that bubbles build durable infrastructure
  16. 1:51:05 – 2:10:53

    Origin stories and opportunity map: Asianometry + SemiAnalysis + where to build next

    They close with how Jon built Asianometry from a tourist channel into a research-heavy history/industry channel, and how Dylan built SemiAnalysis from deep online hobbyist obsession into a data/product business. The final question asks where entrepreneurs should enter the semiconductor stack; answers emphasize memory’s importance and broad opportunities across many ‘unoptimized’ layers.

    • Asianometry’s evolution: Taiwan life videos → history/business/semiconductors focus
    • Work cadence and topic selection: parallel scripts, idea lists, interest-driven research
    • Dylan’s path: forums/Reddit → investing → consulting/newsletter → conference network → team and data products
    • SemiAnalysis model: tracking components, supply chain, and data center buildouts in detail
    • Entrepreneur opportunities: memory is pivotal, but innovation is available across the whole stack

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.