Lenny's Podcast

Kevin Weil: Why evals are the new core skill in AI products

Through fine-tuning runs and writing evals against the fuzzy outputs; OpenAI builds at the edge of capabilities, betting on better models every two months.

Kevin WeilguestLenny Rachitskyhost

Apr 10, 20251h 31mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

OpenAI’s CPO on building products atop rapidly evolving AI foundations

Kevin Weil, Chief Product Officer at OpenAI, explains how building on AI is fundamentally different from past tech shifts because the underlying capabilities improve dramatically every few months. This forces product teams to plan loosely, ship quickly, and design around fuzzy, probabilistic model behavior instead of deterministic software. He highlights the rising importance of evals, fine-tuning, and ensembles of models, and argues that every serious product team will eventually embed ML researchers as core members. Weil also reflects on missed opportunities like Facebook’s Libra, the transformative potential of AI tutoring, and the skills he’s encouraging his kids (and future builders) to develop in an AI-first world.

IDEAS WORTH REMEMBERING

5 ideas

Treat today’s AI as the worst you’ll ever use—build for where capabilities are going, not where they are.

Weil stresses a “model maximalist” mindset: if your product idea is barely feasible with current models, you’re probably in the right spot, because models are improving so quickly that what barely works now will feel magical in a few months.

Mastering evals is becoming a core product-building skill.

Because models are fuzzy and probabilistic, you must measure how well they perform on specific tasks (e.g., 60% vs. 95% vs. 99.5% accuracy) to decide what product experiences are viable, how to design them, and when fine-tuning or additional data are needed.

Future products will rely on fine-tuned, task-specific ensembles of models, not just a single general model call.

OpenAI internally uses multiple models of different sizes, prompts, and fine-tunes in concert (an ensemble) to solve problems like support, demonstrating that breaking workflows into specialized sub-tasks yields better results and should become standard practice.

Org design matters: small, high-agency, PM-light teams move fastest in AI.

OpenAI keeps relatively few PMs, leans heavily on product-minded engineers and researchers, and uses lightweight quarterly planning, empowering teams to ship without waiting on top-down approvals, then iterate in public.

Think about AI like people: human analogies often guide better product and UX decisions.

Weil repeatedly reasons about models as if they were humans or teams (e.g., letting a model “think” like a person, using multiple models like a brainstorming group), which helps design interfaces (like showing partial chain-of-thought) and system architectures that feel natural.

WORDS WORTH SAVING

5 quotes

The AI models that you’re using today is the worst AI model you will ever use for the rest of your life.

— Kevin Weil

Every two months, computers can do something they’ve never been able to do before and you need to completely think differently about what you’re doing.

— Kevin Weil

If you’re building and the product you’re building is right on the edge of the capabilities of the models, keep going because you’re doing something right.

— Kevin Weil

Sometimes it’s not any one thing, it’s just good work consistently over a long period of time.

— Mark Zuckerberg (as quoted by Kevin Weil)

Libra is probably the biggest disappointment of my career… the world would be a better place if we’d been able to ship that product.

— Kevin Weil

How OpenAI builds products on top of rapidly improving AI modelsThe role and rising importance of evals, fine-tuning, and ensemblesOrg design at OpenAI: bottoms-up teams, PM-light culture, research–product integrationImplications of AI for coding, product work, and future skillsStartup opportunities in the AI ecosystem and where OpenAI won’t goThe future of interfaces: why chat and natural language remain centralReflections on Facebook’s Libra project and AI’s broader societal impact

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.