This video isn’t embeddableWatch on YouTube →

Stanford CS153 Frontier Systems | Andreas Blattmann from Black Forest Labs on Visual Intelligence

For more information about Stanford's online Artificial Intelligence programs, visit: https://stanford.io/ai Follow along with the course schedule and syllabus, visit: https://cs153.stanford.edu/ In this CS153 “Frontier Systems” session, Anjney Midha welcomes Andreas Blattmann, co-founder of Black Forest Labs and co-creator of Stable Diffusion, for a discussion on the visual intelligence frontier and how frontier AI “factories” scale. Blattmann recounts his path from mechanical engineering to a Heidelberg PhD lab, developing latent diffusion to train image generators efficiently and enabling Stable Diffusion’s 2022 release. They contrast earlier unimodal content-creation models with today’s push toward unified multimodal systems spanning images, video, and audio, plus action prediction for computer use and robotics, emphasizing observation and interaction loops. Using Flux as a case study, they cover pre-training, mid-training, post-training, distillation for speed, customer feedback driving image editing and character consistency, and why open weights enable customization. They also discuss Self Flow for multimodal alignment, safety guardrails, EU compliance, data labeling strategies, diffusion vs autoregressive tradeoffs, and skepticism about explicit 3D representations. Guest Speaker: Andreas Blattmann is the co-founder of Black Forest Labs (BFL), the German generative AI startup behind the FLUX text-to-image foundation model, backed by Andreessen Horowitz and other major venture firms. Before founding BFL, he was a generative AI researcher at LMU Munich, NVIDIA, and Stability AI, where he made significant contributions to image and video generation. He is a co-inventor of Latent Diffusion, the generative modeling technique that produced the open-source text-to-image system Stable Diffusion (which he co-developed) and now powers cutting-edge models, including FLUX, Midjourney, and OpenAI's DALL-E 3, with applications extending into audio generation and medical imaging. His academic publications have amassed over 22,000 citations. He was named to Capital Magazin's Top 40 Under 40 in Germany in 2024. Follow the playlist: https://youtube.com/playlist?list=PLoROMvodv4rN447WKQ5oz_YdYbS74M5IA&si=DOJ5amlyRdyMJBhG

Anjney MidhahostAndreas Blattmannguest

May 4, 20261h 1mWatch on YouTube ↗

EPISODE INFO

Released: May 4, 2026
Duration: 1h 1m
Channel: Stanford Online
Watch on YouTube: ▶ Open ↗

EPISODE DESCRIPTION

For more information about Stanford's online Artificial Intelligence programs, visit: https://stanford.io/ai Follow along with the course schedule and syllabus, visit: https://cs153.stanford.edu/ In this CS153 “Frontier Systems” session, Anjney Midha welcomes Andreas Blattmann, co-founder of Black Forest Labs and co-creator of Stable Diffusion, for a discussion on the visual intelligence frontier and how frontier AI “factories” scale. Blattmann recounts his path from mechanical engineering to a Heidelberg PhD lab, developing latent diffusion to train image generators efficiently and enabling Stable Diffusion’s 2022 release. They contrast earlier unimodal content-creation models with today’s push toward unified multimodal systems spanning images, video, and audio, plus action prediction for computer use and robotics, emphasizing observation and interaction loops. Using Flux as a case study, they cover pre-training, mid-training, post-training, distillation for speed, customer feedback driving image editing and character consistency, and why open weights enable customization. They also discuss Self Flow for multimodal alignment, safety guardrails, EU compliance, data labeling strategies, diffusion vs autoregressive tradeoffs, and skepticism about explicit 3D representations. Guest Speaker: Andreas Blattmann is the co-founder of Black Forest Labs (BFL), the German generative AI startup behind the FLUX text-to-image foundation model, backed by Andreessen Horowitz and other major venture firms. Before founding BFL, he was a generative AI researcher at LMU Munich, NVIDIA, and Stability AI, where he made significant contributions to image and video generation. He is a co-inventor of Latent Diffusion, the generative modeling technique that produced the open-source text-to-image system Stable Diffusion (which he co-developed) and now powers cutting-edge models, including FLUX, Midjourney, and OpenAI's DALL-E 3, with applications extending into audio generation and medical imaging. His academic publications have amassed over 22,000 citations. He was named to Capital Magazin's Top 40 Under 40 in Germany in 2024. Follow the playlist: https://youtube.com/playlist?list=PLoROMvodv4rN447WKQ5oz_YdYbS74M5IA&si=DOJ5amlyRdyMJBhG

SPEAKERS

Anjney Midha
host
Host/instructor for Stanford CS153 Frontier Systems; investor and former operator (e.g., Discord) with startup experience including founding Ubiquity6.
Andreas Blattmann
guest
Co-founder of Black Forest Labs and generative vision researcher known for work on Stable Diffusion and BFL’s Flux models.

EPISODE SUMMARY

In this episode of Stanford Online, featuring Anjney Midha and Andreas Blattmann, Stanford CS153 Frontier Systems | Andreas Blattmann from Black Forest Labs on Visual Intelligence explores black Forest Labs' path from Stable Diffusion to multimodal visual intelligence Blattmann traces his journey from a small Heidelberg lab to co-creating latent diffusion (enabling Stable Diffusion) by compressing pixel-space generation into lower-dimensional latent representations to drastically reduce compute requirements.

RELATED EPISODES

Stanford CS153 Frontier Systems | Scott Nolan from General Matter on Energy Bottlenecks

Stanford CS153 Frontier Systems | The Discipline of Delivering Value per Gigawatt

Stanford CS153 Frontier Systems | Mati Staniszewski from ElevenLabs on The Future of Voice Systems

Stanford CS153 Frontier Systems | Jensen Huang from NVIDIA on the Compute Behind Intelligence

Stanford CS230 | Autumn 2025 | Lecture 6: AI Project Strategy

Stanford CS230 | Autumn 2025 | Lecture 10: What’s Going On Inside My Model?

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Episode Details