Zico Kolter: OpenAI's Newest Board Member on The Biggest Questions and Concerns in AI Safety | E1197

Zico Kolter: OpenAI's Newest Board Member on The Biggest Questions and Concerns in AI Safety | E1197

The Twenty Minute VCSep 4, 20241h 3m

Zico Kolter (guest), Harry Stebbings (host), Harry Stebbings (host), Narrator

How LLMs work and why next-word prediction yields intelligenceData availability, multimodal data, and synthetic data as future fuelModel size, commoditization, and the role of small vs. large modelsCompute scaling, economic tradeoffs, and perceived performance plateausAI safety priorities: jailbreaks, specification-following, and harmful capabilitiesMisinformation, erosion of objective reality, and trust in institutionsRegulation, open vs. closed models, and global coordination on AI safety

In this episode of The Twenty Minute VC, featuring Zico Kolter and Harry Stebbings, Zico Kolter: OpenAI's Newest Board Member on The Biggest Questions and Concerns in AI Safety | E1197 explores openAI Board Member Zico Kolter Dissects Data, Safety, and AGI Futures Zico Kolter, head of CMU’s Machine Learning Department and new OpenAI board member, explains why next-word prediction LLMs are a profound scientific discovery and why we are far from hitting hard limits on data, models, or compute.

OpenAI Board Member Zico Kolter Dissects Data, Safety, and AGI Futures

Zico Kolter, head of CMU’s Machine Learning Department and new OpenAI board member, explains why next-word prediction LLMs are a profound scientific discovery and why we are far from hitting hard limits on data, models, or compute.

He argues that data is not the core bottleneck, model architectures are increasingly commoditized, and that larger models still deliver meaningful gains—especially in complex tasks like coding—despite benchmark plateaus.

Kolter’s central AI safety concern is that current models cannot reliably follow specifications, making them vulnerable to prompt injection and jailbreaks, which becomes dangerous as we embed them into critical systems and agents.

He favors a pragmatic focus on near-term risks like cyberattacks, misinformation, and infrastructure failures, is cautious but not absolutist on open-weight releases, and remains broadly optimistic that society can adapt if safety is treated as a prerequisite for deployment.

Key Takeaways

Data is not the near-term bottleneck for AI progress.

Despite having used much of the highest-quality public text, Kolter notes that models are currently trained on surprisingly small datasets (tens of terabytes) relative to what exists, and vast untapped multimodal and private data—constrained more by compute and methods than sheer availability—remain.

Get the full analysis with uListen AI

Model architectures matter less than scale, data, and training strategy.

Kolter believes we are in a “post-architecture” phase: transformers are useful but not uniquely magical, and many architectures could work if scaled and trained similarly; capabilities are driven more by data, size, and optimization than clever structural tweaks.

Get the full analysis with uListen AI

Larger frontier models still provide meaningful real-world gains.

While benchmarks show diminishing improvements (e. ...

Get the full analysis with uListen AI

The most urgent safety problem is unreliable adherence to specifications.

Because models can be prompt-injected or jailbroken, they often override developer instructions in favor of user prompts; this is tolerable in chatbots but becomes critical when LLMs are embedded in agents and infrastructure, effectively creating an unpatchable ‘buffer overflow’ style vulnerability.

Get the full analysis with uListen AI

AI will drastically lower the skill bar for serious cyber and other attacks.

Kolter highlights cyber risk as especially acute: models that can find software vulnerabilities or craft exploits could put powerful attack capabilities into the hands of many low-skill actors, making even known risks far more scalable and dangerous.

Get the full analysis with uListen AI

Misinformation’s main danger is undermining belief in any evidence, not just spreading lies.

He argues AI is an accelerant in a longer trend: as convincing deepfakes and synthetic media proliferate, the likely outcome is not that people believe everything but that they trust almost nothing outside their close circles, eroding shared objective reality.

Get the full analysis with uListen AI

Open-weight models are valuable but should be constrained at higher capability levels.

Kolter strongly values open-weight releases for research and ecosystem health today, but says there will be a point—e. ...

Get the full analysis with uListen AI

Notable Quotes

You can train word predictors and they produce intelligent, coherent, long-form responses; that is one of the most notable scientific discoveries of the past 10 or 20 years.

Zico Kolter

We are nowhere close to hitting the limits of available data in these models.

Zico Kolter

Right now the AI models we have are not able to reliably follow specifications.

Zico Kolter

This is sort of like these models have a buffer overflow in all of them that we know about and that we don’t know how to patch and fix.

Zico Kolter

I want to develop and improve safety of these tools because I want to use them. To reach that point, they have to be safe.

Zico Kolter

Questions Answered in This Episode

If specification-following is the core safety problem, what technical or governance mechanisms could most realistically ensure models obey fixed constraints even under adversarial prompting?

Zico Kolter, head of CMU’s Machine Learning Department and new OpenAI board member, explains why next-word prediction LLMs are a profound scientific discovery and why we are far from hitting hard limits on data, models, or compute.

Get the full analysis with uListen AI

How should policymakers decide where to draw the line for when an open-weight release becomes too dangerous—what specific capability thresholds or evaluation regimes should be used?

He argues that data is not the core bottleneck, model architectures are increasingly commoditized, and that larger models still deliver meaningful gains—especially in complex tasks like coding—despite benchmark plateaus.

Get the full analysis with uListen AI

In a world where people increasingly distrust all digital evidence, what new institutions or verification infrastructures could restore a workable sense of shared reality?

Kolter’s central AI safety concern is that current models cannot reliably follow specifications, making them vulnerable to prompt injection and jailbreaks, which becomes dangerous as we embed them into critical systems and agents.

Get the full analysis with uListen AI

Given Kolter’s view that architectures are commoditized, where will competitive and national advantage in AI primarily come from over the next decade—data, compute, safety, integration, or something else?

He favors a pragmatic focus on near-term risks like cyberattacks, misinformation, and infrastructure failures, is cautious but not absolutist on open-weight releases, and remains broadly optimistic that society can adapt if safety is treated as a prerequisite for deployment.

Get the full analysis with uListen AI

How can enterprises redesign workflows and roles so that humans and AI systems complement each other, rather than simply using AI for short-term cost-cutting and job elimination?

Get the full analysis with uListen AI

Transcript Preview

Zico Kolter

The real negative outcome is that people are not gonna believe anything that they see. It didn't even need AI to get there, but AI is absolutely an accelerant for this process. It is a relatively new phenomenon that we have sort of a record of objective fact in the world. Humans evolved at a time during a- an environment where all we could do was trust our close associates. That's how we believed things.

Harry Stebbings

Ready to go? (upbeat music) Ziko, I am so excited for this, dude. I've been looking forward to his one for a while. So thank you so much for joining me today.

Zico Kolter

Great. Thanks. Wonderful to be here.

Harry Stebbings

Now, we're gonna discuss some pretty meaty topics. Before we do dive in, can you just give me the 60-second context on why you're so well-versed to discuss them and your roles today?

Zico Kolter

So I- I seem to have- be collecting jobs here. I have a number of different roles. Um, but I am- I'm first and foremost a professor and the head of the Machine Learning Department at Carnegie Mellon. Uh, I've been here for about 12 years. And here, the- the Machine Learning Department is really kind of unique because it's a whole department just for machine learning. And I've been heading that up actually as of quite recently. Kinda get to immerse myself in the business and- and, uh, the thought of machine learning all day, every day. Also, I am, uh, recently on the board of OpenAI, uh, which w- I joined, uh, at this point a couple weeks ago, and it's been extremely exciting as well.

Harry Stebbings

Now, I wanna start with some foundations and mechanics. When we look at kind of the basic techniques that underpin current AI systems, can you help me understand, what are the basic techniques today behind current AI systems?

Zico Kolter

Let's talk about AI as LLMs, but with, of course, the context that AI is a much, much broader topic than this.

Harry Stebbings

Mm-hmm.

Zico Kolter

Um, LLMs are- are amazing. Um, the way they work, at the most basic level, is that you take a lot of data from the internet. You train a model, and I know that's a very sort of colloquial term that we use here. But basically what you do is you build a great big set of kind of mathematical equations that will learn to predict the words in the sequence that- that- that's given to them. So, you know, if you see "the quick brown fox" as your starting phrase of a sentence, it will predict the word "jumped." Uh, this is a common phrase we use to, I think, use every letter in the English language, uh, (laughs) in a single phrase. People often use that... A- and that's what it does. To be clear, we train a big model on predicting words on the internet. Um, and then, when it comes time to actually speak with an AI system, all we do is we use that model to predict what's the next word in a response. This is, to put it bluntly, a little bit absurd that this works. And I think people often- often sort of... So there's sort of two- two chains- two- two philosophies of thought here. People often use this sort of mechanism of how these models work as- as a way to dismiss them o- o- oftentimes. I know. People say, "Oh, well, AI is- it's just predicting words. That's all it's doing, therefore it can't be intelligent. It can't be..." And I- I think that's just demonstrably wrong. What I think is amazing though is the scientific fact that when you build a model like this, when you build a model that predicts words and then just turn this model loose, have it predict words one after the other and then chain them all together, what comes out of that process is intelligent. And I think it's demonstrably intelligent, right? I- I really believe these systems are intelligent, definitely. And I would say that this fact, you can train word predictors and they produce intelligent, coherent, long-form responses, is one of the most notable, if not the most notable, scientific discovery of the past 10, 20 years. Maybe much longer than that, right? May- maybe it was much deeper than that, in fact. And so, I really think that, um, this is not oftentimes given its due as a scientific discovery because it is a scientific discovery.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome