No PriorsNo Priors Ep. 53 | With AMD CTO Mark Papermaster
CHAPTERS
- 0:00 – 0:52
Conviction’s Embed Accelerator plug + setting up the episode with AMD CTO Mark Papermaster
Sarah opens with a brief announcement about Conviction’s Embed Accelerator, including funding and compute credits. She then introduces Mark Papermaster, highlighting his leadership background across major hardware companies and teeing up a conversation about GPUs and industry competition.
- •Embed Accelerator: $150k uncapped SAFE plus compute/API credits
- •Mark Papermaster introduced as AMD CTO
- •Mentions Mark’s prior roles at IBM, Apple, and Cisco
- •Episode focus: GPUs, competition, and the evolving compute landscape
- 0:52 – 2:46
Mark’s career arc: from early CMOS at IBM to Apple, then AMD as Moore’s Law slows
Mark recounts entering chip design during the early shift to CMOS at IBM, where he worked across microprocessors and large systems. He describes moving to Apple to lead iPhone/iPod engineering before joining AMD in 2011 at a moment when Moore’s Law was clearly decelerating and innovation needed to come from system-level design.
- •Early hands-on chip design during the CMOS transition at IBM
- •Work spanning PowerPC, mainframes, and RISC servers
- •Recruited to Apple to run iPhone/iPod engineering
- •Joined AMD in 2011 as Moore’s Law began to slow, raising the innovation bar
- 2:46 – 4:42
AMD’s portfolio and market footprint: CPUs, gaming, cloud, embedded, and acquisitions
Mark gives a high-level overview of AMD’s evolution from second-source roots to a broad compute portfolio. He emphasizes AMD’s competitiveness mandate over the last decade and how acquisitions like Xilinx and Pensando expanded AMD into embedded and networking/interconnect needs for modern scale-out workloads.
- •AMD’s shift from second-source heritage to broad compute portfolio
- •Turnaround mandate under CEO Lisa Su and leadership team
- •EPYC CPUs in cloud deployments; major presence in gaming consoles
- •Xilinx acquisition expands embedded/FPGA footprint
- •Pensando adds networking/interconnect for scaled workloads
- 4:42 – 9:13
Why AMD leaned into AI: heterogeneous compute strategy and the road to MI300
Elad asks how AMD’s AI focus emerged; Mark traces the GPU-driven breakthrough era (e.g., image recognition and NLP acceleration) and explains AMD’s deliberate sequencing. AMD first rebuilt CPU competitiveness with Zen, then scaled heterogeneous CPU+GPU systems for HPC and AI, culminating in the MI300 platform launch.
- •AI acceleration via GPUs recognized as pivotal early on
- •AMD prioritized rebuilding CPU leadership first (Zen in 2017)
- •Long-term bet on heterogeneous compute (CPU + GPU) enabled by ATI acquisition
- •HPC supercomputer wins as a proving ground for hardware + software co-design
- •MI300 announced as flagship for HPC and AI training/inference
- 9:13 – 11:00
MI300 workloads and performance claims: training, inference, efficiency, and memory bandwidth
Mark details where AMD is most bullish: large-model training and especially inference at scale. He argues MI300 is competitive in training and leads in inference, attributing gains to optimized math engines plus high-bandwidth memory capacity that improves performance per watt and density per rack.
- •Primary demand driver: LLM training and inference
- •MI300 positioned as “halo” product to compete head-on at the top end
- •Emphasis on inference leadership and FP16/vLLM-style throughput metrics
- •Performance-per-watt and rack-space efficiency highlighted
- •Memory bandwidth/capacity framed as critical to real-world efficiency
- 11:00 – 15:39
Software stack and developer adoption: ROCm, frameworks, and real deployment feedback loops
Sarah asks about competing beyond hardware—especially software ecosystems like CUDA vs ROCm. Mark explains AMD’s approach: support GPU semantics and key frameworks (PyTorch, ONNX, TensorFlow), partner with platforms like Hugging Face for continuous testing, and use early customer deployments to harden “easy to deploy” experiences.
- •Competition is multi-dimensional: hardware, efficiency, and software ecosystem
- •ROCm positioned as the enabling stack; focus on ease of deployment
- •Deep involvement with PyTorch/ONNX/TensorFlow; PyTorch Foundation participation
- •Hugging Face model testing on AMD hardware as part of release process
- •Customer deployments (e.g., Lamini) used to refine usability and reliability
- 15:39 – 17:16
Open-source philosophy: avoiding lock-in and winning on merit
Mark explains why AMD emphasizes open-source tooling and stacks, from LLVM to ROCm. He frames it as a customer-choice strategy—eschewing proprietary “walled gardens”—and argues open ecosystems accelerate collaboration and keep the industry from stagnating due to lack of competition.
- •Open source as a cultural commitment and collaboration lever
- •LLVM and ROCm highlighted as core open components
- •Goal: customer choice rather than proprietary lock-in
- •Competition viewed as necessary to avoid industry stagnation
- •Xilinx acquisition further strengthened AMD’s open-source posture
- 17:16 – 20:46
How AI compute markets expand: from hyperscalers to specialized clouds, edge, and AI PCs
Elad asks about the cloud compute market and how it changes as GPU supply constraints ease. Mark predicts constraints will eventually abate while demand continues to balloon, driving a proliferation of model sizes and tailored clusters and pushing inference outward—from hyperscalers to tier-2 providers, edge deployments, and AI-enabled PCs and embedded devices.
- •GPU supply constraints expected to ease over time, while demand expands rapidly
- •Shift from hyperscaler-only clusters to more diverse data center operators
- •Rise of smaller/fine-tuned models enables more varied compute configurations
- •Inference moving to the edge for latency and data locality (factory floor, etc.)
- •AI accelerators integrated into PCs; embedded pull via Xilinx portfolio
- 20:46 – 24:28
What constrains GPU supply: wafers, substrates, advanced packaging, and looming power limits
The discussion turns to what’s actually limiting GPU availability. Mark explains that it’s not just fab capacity: substrates and advanced packaging (chiplets, HBM integration, vertical/lateral interconnect) are key bottlenecks, and he flags data center power availability as a major longer-term constraint—driving a stronger focus on energy efficiency each generation.
- •Supply-demand management lessons from the pandemic era
- •Constraints include substrates and advanced packaging capacity, not only wafers
- •MI300 exemplifies complex chiplet + HBM packaging and interconnect requirements
- •AMD’s fabless model and partnerships (e.g., TSMC) emphasized
- •Power availability in data centers described as a key future constraint; efficiency becomes top priority
- 24:28 – 28:05
Post–Moore’s Law innovation: chiplets, heterogeneous engines, packaging, and end-to-end co-optimization
Sarah asks what innovation looks like when transistor scaling no longer does “most of the heavy lifting.” Mark describes a holistic design approach: mix specialized compute engines, use chiplets on the most suitable process nodes, advance packaging/interconnect (including 3D concepts), and optimize up through the software stack and application requirements.
- •Moore’s Law slowing increases cost and reduces automatic power/perf gains
- •Holistic design replaces node-shrink-only progress
- •Heterogeneous compute: right engine for the right workload (CPU/GPU/dedicated accelerators)
- •Chiplets enable mixing nodes and functions for better economics/performance
- •Packaging/interconnect plus software-stack awareness are core to next-gen gains
- 28:05 – 30:41
Supply chain and geopolitics: geographic diversification of fabs and packaging ecosystems
Sarah raises strategic concerns around supply chain concentration and geopolitical risk. Mark outlines AMD’s approach: collaborate with governments and partners to expand and diversify manufacturing and packaging footprints across regions, noting that semiconductor supply chain shifts take years compared to software product cycles.
- •Semiconductor supply continuity framed as national-security relevant
- •Support for fab expansion and geographic diversification (e.g., US and global builds)
- •Packaging and broader ecosystem diversification required—not just foundries
- •Acknowledges historical global “pockets of expertise” and need to rebalance
- •Long lead times: supply chain changes take years, unlike software iteration
- 30:41 – 36:03
New AI-first hardware wave: what predicts success in devices like AR/VR and assistants
Elad asks about the resurgence of consumer hardware experimentation (Vision Pro, Rabbit, Humane, robotics). Mark credits shrinking, low-power compute as an enabler but says success depends on product-market pull—devices must deliver a genuinely loved capability and often create a new category rather than incremental improvement.
- •Compute and power efficiency advancements enable new device categories
- •AR/VR success requires low latency to avoid discomfort and deliver presence
- •Technology is necessary but not sufficient for product success
- •Successful devices solve real user needs and deliver “loveable” experiences
- •AI in PCs could feel like a new category due to local, low-latency capabilities (e.g., live translation)
- 36:03 – 39:03
AMD’s 2024 focus: AI across the full portfolio and a major deployment year
Sarah closes by asking what AMD wants engineers and founders to know about 2024. Mark describes 2024 as a deployment year: AMD has AI-enabled its portfolio from cloud to edge to PCs and gaming, and now aims to be recognized in AI based on delivered performance, ecosystem readiness, and breadth of real-world deployments.
- •2024 positioned as a major rollout/deployment year for AMD’s AI capabilities
- •AI enablement across cloud, edge, PCs, embedded, and gaming (e.g., AI upscaling)
- •Goal: broader recognition beyond the incumbent leader via results and value
- •Emphasis on partner ecosystem and end-to-end experiences spanning cloud + client
- •Episode wrap-up and thanks