No Priors Ep. 40 | With Arthur Mensch, CEO Mistral AI

No PriorsNov 9, 202332m

Sarah Guo (host), Arthur Mensch (guest), Elad Gil (host), Narrator

Mensch’s research background in optimization, retrieval, mixture-of-experts, and scaling laws (including Chinchilla)Design and significance of small, efficient models like Mistral 7BData quality, pre-training vs. instruction tuning, and annotationsOpen-source AI as a scientific, economic, and safety choiceDebates around AI safety: content moderation, physical risk, and existential riskRegulation, compute thresholds, and the bioweapon narrativeMistral’s platform strategy, guardrails architecture, and European AI ecosystem

In this episode of No Priors, featuring Sarah Guo and Arthur Mensch, No Priors Ep. 40 | With Arthur Mensch, CEO Mistral AI explores mistral CEO Arthur Mensch Champions Efficient, Open-Source Frontier AI Models Arthur Mensch, CEO and co-founder of Mistral AI, explains how his team leverages a decade of optimization and scaling-law research to build highly efficient, small open-source language models like Mistral 7B. He argues that careful data curation, compression, and attention to inference cost can deliver models that run cheaply on commodity hardware while remaining surprisingly capable. Mensch strongly defends open source as essential for scientific progress and safety, criticizing current regulatory narratives around AI risk—especially bioweapons and arbitrary compute thresholds—as largely unsubstantiated and prone to regulatory capture. He outlines Mistral’s modular approach to safety and guardrails, its plans for larger models and agents, and why Europe, particularly France, is well-positioned to host a major global AI company.

Mistral CEO Arthur Mensch Champions Efficient, Open-Source Frontier AI Models

Arthur Mensch, CEO and co-founder of Mistral AI, explains how his team leverages a decade of optimization and scaling-law research to build highly efficient, small open-source language models like Mistral 7B. He argues that careful data curation, compression, and attention to inference cost can deliver models that run cheaply on commodity hardware while remaining surprisingly capable. Mensch strongly defends open source as essential for scientific progress and safety, criticizing current regulatory narratives around AI risk—especially bioweapons and arbitrary compute thresholds—as largely unsubstantiated and prone to regulatory capture. He outlines Mistral’s modular approach to safety and guardrails, its plans for larger models and agents, and why Europe, particularly France, is well-positioned to host a major global AI company.

Key Takeaways

Optimize both training and inference to make AI economically usable at scale.

Mensch stresses that frontier models must be designed not only for raw benchmark performance but for low inference cost, enabling agents and ubiquitous deployment without prohibitive runtime expenses.

Get the full analysis with uListen AI

Small, well-trained models can be far more capable than expected.

By applying improved scaling laws and compression insights, Mistral 7B shows that a 7B-parameter model can be both fast and useful, running on devices like a MacBook Pro while matching or surpassing larger models on many tasks.

Get the full analysis with uListen AI

High-quality data curation is as critical as algorithmic innovation.

Mistral invests heavily in selecting and cleaning open web data for pre-training, treating data quality as a primary driver of model performance, distinct from later-stage instruction tuning.

Get the full analysis with uListen AI

Open sourcing current LLMs likely does not materially increase misuse risk.

Mensch argues there is no solid evidence that LLMs provide more dangerous capabilities than search engines for tasks like bioweapons, nor that knowledge access is the bottleneck for such misuse; thus blanket restrictions on open source are scientifically unfounded.

Get the full analysis with uListen AI

Safety should be implemented as modular guardrails, not baked-in censorship.

He advocates shipping raw models plus configurable filters for inputs and outputs (e. ...

Get the full analysis with uListen AI

Capability regulation should focus on observable model behavior, not raw compute.

Mensch criticizes arbitrary FLOP thresholds as a poor proxy for danger, emphasizing that risk depends heavily on data, domain, and demonstrated capabilities rather than pre-training budgets.

Get the full analysis with uListen AI

Europe, especially France, has the talent and ecosystem to build global AI leaders.

He cites deep mathematical talent, growing startup and investor ecosystems in places like Paris and London, and a desire among European researchers to stay local as foundations for a major European AI company.

Get the full analysis with uListen AI

Notable Quotes

“We realized that there was also a lot of opportunity in actually compressing models more… with Mistral 7B we were definitely far away from the limit of compression.”
— Arthur Mensch

“By doing what we do, by being much more open about the technology we create, we want to steer the community into a regime where things just work better, where things are safer because of more scrutiny.”
— Arthur Mensch

“Nothing is showing that a LLM is actually marginally better than a search engine to find knowledge on topics that would enable bad use.”
— Arthur Mensch

“Assuming that the model should be well behaved is, I think, a wrong assumption. You need to make the assumption that the model should know everything and then on top of that have some modules that moderate and guardrail the model.”
— Arthur Mensch

“I’m not too worried about existential risk… There’s no evidence whatsoever that we are on the way of making that happen.”
— Arthur Mensch

Questions Answered in This Episode

How far can model compression and architectural innovation go before larger models remain strictly necessary for advanced reasoning?

Get the full analysis with uListen AI

What empirical studies would be needed to convincingly demonstrate that LLMs do or do not increase real-world physical risk, such as in biology or cybersecurity?

Get the full analysis with uListen AI

How should regulators practically measure and categorize “dangerous capabilities” in AI systems instead of relying on compute thresholds?

Get the full analysis with uListen AI

In a modular-guardrail world, who should set the norms or standards for what outputs and inputs must be filtered across different jurisdictions?

Get the full analysis with uListen AI

What unique governance or business models could a European open-source AI champion like Mistral adopt that differ from U.S.-based incumbents such as OpenAI or Google?

Get the full analysis with uListen AI

Transcript Preview

Sarah Guo

(electronic music) Open source AI models have completely changed the landscape of technology over the past year. One tiny team of ex-DeepMind and Meta researchers in France has made a huge splash recently, Mistral. This week, Elad and I are joined by Arthur Mensch, the CEO and co-founder of Mistral, who recently released Mistral 7B, an Apache 2.0 licensed open source model that has changed people's mental models about what can be done with small models. Arthur, welcome to No Priors.

Arthur Mensch

Thank you for inviting me. I'm very glad to be here.

Sarah Guo

Okay. So, just six months ago, when we met, you were leaving DeepMind to start Mistral. It takes real guts to look at the scale of dollars and compute that OpenAI, and Google, and others have amassed and say like, "We want to play in this game too, and it's important we do." Uh, tell us about the inspiration to start.

Arthur Mensch

Me, Guillaume, and Timothée were, uh, I guess, pretty early in the field and, uh, it's- it had been 10 years that we had been doing machine learning. And we do- we did know where to start from and how to make a good model with a limited amount of compute and money. Well, not so limited, but at least more limited than, uh, where we were coming from. Uh, and so I- I think that's why it get us started. The various companies we were in move into directions that we hadn't anticipated when we joined the company. And, uh, we decided that there would be a very good opportunity for creating something that would be a standalone company in Europe, uh, focusing on- on making AI, uh, better, focusing on making frontier AI, and focusing on making, uh, open source AI as a core value.

Sarah Guo

Maybe we can talk about each of those pieces. So, 10 years in, uh, machine learning before, you were a co-author on the Chinchilla's scaling laws paper, you worked on, um, uh, the sort of mixture of experts ideas early. Can you talk a little bit about what your research directions were at DeepMind?

Arthur Mensch

Yeah. So, I come from an optimization background, so my focus has always been, uh, for the last 10 years to make algorithms more efficient and to, uh, use better the data that we have to make, uh, models with good prediction performances. Uh, and so when I arrived at DeepMind, uh, I joined the LLM team that was 10 people at the time. And very quickly, I started to work on retrieval augmented models. Uh, so with a paper called RETRO, uh, that I co-led with my friend, Seb, uh, Bordeau, who is still at DeepMind. The point there was to use like very large databases during pre-training, uh, so that we didn't force knowledge into the model itself and we would tell the model that it would have access to an external memory anyway. And so it was working quite well. We could actually lower the perplexity, let's say. That's what you work on when you make LLMs. Uh, there were some limitations that we- uh, that- that I think the community has started to address quite, quite well. And that was at the time when retrieval methods weren't really, uh, mainstream. Uh, now they've become completely mainstream. So that's the first project I did. I worked on, um, on sparse mixture of experts, uh, also quite quickly, uh, because that was related to my topic of post-doc, which was optimal transport. So optimal transport is a setting where you have, uh, I guess, tokens, you need to assess them- assign them to devices and you need to make it sure- to make sure that there's some good, uh, assignment in between the two of them so that the devices don't see too many tokens. And, um, as it turns out, uh, the way you do it is with optimal trans- optimal transport is a mathematical framework to do it correctly. And so we- I started to work on introducing this to sparse mixture of experts. And very quickly, we started to move onto, uh, scaling laws. So, how do you actually take the method that is working at a certain scale and try to predict how that will evolve with the scale, the number of experts, the- the amount of data you see? Uh, and so that's, uh, a j- a work I- I've- I've done with many colleagues as well, uh, on how do you adapt the scaling laws for dense parameters, uh, for dense models, uh, to a setting where you wanna predict the performance not only with relation to the size of the model but also the number of experts, 'cause that was the s- similar thing I worked on. And then connectly worked on Chinchilla, which is, uh, I think a major paper in the history of LLM, uh, also with Seb, uh, Jardin, uh, Laurent, many other people. Basically, the story was that everybody was training models on too few tokens, uh, because of the paper from 2020 that happened to be not very well, uh, executed. And so what, uh, we observed is that, uh, you could actually correct that. And so instead of training very large models on very few tokens, you should actually grow the number of tokens as you grow the- the size of the model, which if you think about it, makes a lot of sense because you don't want to have infinite size model looking at, uh, a finite size- a- a finite number of tokens. And similarly, you don't want to have a finite size model looking at an infinite number of tokens.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome