
No Priors Ep. 53 | With AMD CTO Mark Papermaster
Sarah Guo (host), Mark Papermaster (guest), Elad Gil (host)
In this episode of No Priors, featuring Sarah Guo and Mark Papermaster, No Priors Ep. 53 | With AMD CTO Mark Papermaster explores aMD CTO Reveals Strategy Powering Next-Generation AI Chips and Computing AMD CTO Mark Papermaster discusses AMD’s decade-long transformation from a struggling PC-focused company into a central player in high-performance computing and AI. He explains how AMD built a competitive CPU and GPU portfolio, culminating in the MI300 accelerator for large-scale AI training and inference. The conversation covers open-source software strategy, supply-chain and packaging constraints, energy efficiency, and the impact of Moore’s Law slowing. Papermaster also outlines how AI will increasingly span cloud, edge, and end-user devices, making 2024 a major deployment year for AMD’s AI-enabled portfolio.
AMD CTO Reveals Strategy Powering Next-Generation AI Chips and Computing
AMD CTO Mark Papermaster discusses AMD’s decade-long transformation from a struggling PC-focused company into a central player in high-performance computing and AI. He explains how AMD built a competitive CPU and GPU portfolio, culminating in the MI300 accelerator for large-scale AI training and inference. The conversation covers open-source software strategy, supply-chain and packaging constraints, energy efficiency, and the impact of Moore’s Law slowing. Papermaster also outlines how AI will increasingly span cloud, edge, and end-user devices, making 2024 a major deployment year for AMD’s AI-enabled portfolio.
Key Takeaways
AMD’s AI strategy is built on a long-term CPU and GPU roadmap.
Before attacking AI head-on, AMD rebuilt its Zen CPU line and strengthened GPUs, enabling it to offer competitive heterogeneous systems that pair high-performance CPUs with massively parallel GPUs.
The MI300 targets leading-edge LLM training and inference workloads.
MI300 variants are designed for both high-performance computing and AI, with strong training performance and leading FP16 VLLM inference efficiency by tightly coupling math engines with high-bandwidth memory and advanced packaging.
Software ecosystem and openness are critical competitive levers against incumbents.
AMD’s ROCm stack is open source, tightly integrated with PyTorch, ONNX, TensorFlow, and platforms like Hugging Face, aiming to make porting workloads from CUDA-like environments straightforward and avoid vendor lock-in.
GPU supply constraints are easing, but power and packaging are rising bottlenecks.
Wafer capacity, advanced packaging, and substrates have been key constraints, but industry expansion—especially via partners like TSMC—is addressing them; long term, data center power availability and energy efficiency become the dominant challenges.
Innovation beyond Moore’s Law requires holistic, system-level design.
With node shrinks delivering less automatic benefit and higher cost, AMD leans on chiplets, heterogeneous compute engines, advanced 2D/3D packaging, and co-designed software stacks to keep performance-per-watt and capability improving.
AI will be distributed across cloud, edge, and end devices for latency and cost.
Massive LLMs will remain in hyperscale clouds, but fine-tuned and smaller models will increasingly live in tier-two data centers, at the edge, and on PCs/embedded devices to meet low-latency, application-specific needs.
Geographic diversification of semiconductor manufacturing is now a strategic necessity.
Given national security and geopolitical risks, AMD works with foundry partners like TSMC and Samsung as they build fabs and packaging capacity in the US, Europe, and additional regions to ensure more resilient supply chains.
Notable Quotes
“We’re not about locking in someone with a proprietary walled garden software stack. We want to win with the best solution.”
— Mark Papermaster
“It was clear that the industry needed that powerful combination of the serial computing of CPUs and the massive parallelization you get from a GPU.”
— Mark Papermaster
“With Moore’s Law slowing, it demands what I call holistic design—from transistor design all the way up through packaging and the software stack.”
— Mark Papermaster
“This is a huge year for us because we’ve just completed AI-enabling our entire portfolio—cloud, edge, PCs, embedded, and gaming.”
— Mark Papermaster
“The devices that are successful really serve a need… it’s got to be something that you love, and that creates a new category.”
— Mark Papermaster
Questions Answered in This Episode
How difficult is it in practice for large AI teams to migrate existing CUDA-based workloads to ROCm and AMD GPUs, and where are the main friction points?
AMD CTO Mark Papermaster discusses AMD’s decade-long transformation from a struggling PC-focused company into a central player in high-performance computing and AI. ...
What specific architectural bets (e.g., chiplet configurations, memory hierarchies) is AMD making for post-LLM AI workloads beyond today’s transformer-heavy landscape?
How will rising data center power constraints reshape GPU and system design over the next 5–10 years, and could this favor more specialized accelerators over general-purpose GPUs?
In a world of distributed AI between cloud, edge, and device, what are the missing abstractions or developer tools to orchestrate where different parts of a model run?
Given growing geopolitical risk, how far can geographic diversification of fabs and packaging realistically go without significantly increasing cost for customers?
EVERY SPOKEN WORD
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome