No Priors Ep. 135 | With Humans& Founder Eric Zelikman

Name: No Priors Ep. 135 | With Humans& Founder Eric Zelikman
Uploaded: 2025-10-09T00:00:00Z
Duration: 36 min 58 s
Description: Eric Zelikman, former Stanford researcher and xAI lead, discusses his work on advancing AI reasoning through methods like STAR and Q* and his shift toward building more human-centric systems.

Sarah Guo and Eric Zelikman on from IQ to EQ: Building Human-Centric AI That Truly Collaborates.

Sarah GuohostEric ZelikmanguestElad Gilhost

Oct 9, 202536mWatch on YouTube ↗

Early motivation for AI: unlocking underused human talent and potentialSTAR and Q* (QuietStar): scalable reinforcement learning for reasoningCurrent state and limits of model 'IQ' and jagged capabilitiesVerification, context, and out-of-distribution challenges in domains like codingCritique of task-centric training, benchmarks, and human-out-of-the-loop scalingVision for EQ: models that understand human goals, context, and long-term effectsHumans&: mission, product vision, and early hiring priorities

In this episode of No Priors, featuring Sarah Guo and Eric Zelikman, No Priors Ep. 135 | With Humans& Founder Eric Zelikman explores from IQ to EQ: Building Human-Centric AI That Truly Collaborates Eric Zelikman, former Stanford researcher and xAI lead, discusses his work on advancing AI reasoning through methods like STAR and Q* and his shift toward building more human-centric systems.

WHAT IT’S REALLY ABOUT

From IQ to EQ: Building Human-Centric AI That Truly Collaborates

Eric Zelikman, former Stanford researcher and xAI lead, discusses his work on advancing AI reasoning through methods like STAR and Q* and his shift toward building more human-centric systems.
He explains how reinforcement learning and scalable reasoning have dramatically improved model 'IQ', yet current models still lack deep understanding of human goals, context, and long-term outcomes.
Zelikman argues that the industry’s task-centric, single-turn benchmarks and automation mindset limit AI’s ability to genuinely empower people rather than replace them.
His new company, Humans&, aims to build models that understand users over time, remember their context, and act as long-horizon collaborators that expand human potential instead of just automating existing GDP slices.

IDEAS WORTH REMEMBERING

7 ideas

Scaling reasoning via RL can continuously extend model capabilities.

STAR showed that by reinforcing successful chains of thought, models can progressively solve harder problems (e.g., increasing digit-length arithmetic) without a clear performance plateau, indicating strong scalability of reasoning-focused RL.

Model performance is highly sensitive to context and problem framing.

Today’s models do best when given rich, precise context and tasks with clearly verifiable answers; users and product designers should structure interactions to include as much relevant information and clear evaluation criteria as possible.

Verification and training distribution still bound what models can reliably do.

In areas like code, success depends heavily on how close a task is to training distributions and how verifiable outputs are; out-of-domain, poorly verifiable problems still reliably expose model weaknesses.

Single-turn, task-centric training creates shallow, brittle AI behavior.

Optimizing for one-off responses leads to models that avoid asking clarifying questions, rarely model long-term consequences, and can exhibit issues like sycophancy and harmful advice without grasping downstream impact on users’ lives.

Long-term, human-in-the-loop collaboration can grow the economic pie.

Instead of just automating existing work segments, models that deeply understand people's goals, constraints, and aspirations can help them pursue entirely new, out-of-distribution projects, driving net new value and innovation.

Memory and persistent user modeling are underexploited but crucial.

Because current paradigms treat interactions as independent tasks, there has been little pressure to build strong long-term memory; Zelikman argues future systems must continuously learn about users and use that knowledge across sessions.

Building EQ in AI is both a capabilities and values choice.

Labs can design scaling paths that either sideline humans (fully autonomous long-horizon agents) or keep them central (cooperative systems that model human goals and agency); choosing the latter is an explicit design and research decision.

WORDS WORTH SAVING

5 quotes

The role that [models] play in people's lives is a lot less deep, a lot less positive than it could be.

— Eric Zelikman

If you have a model that goes off and does its own thing for eight hours, people will probably feel less real agency over the things that they're building.

— Eric Zelikman

Fundamentally these models don't really understand people. They don't understand people's goals.

— Eric Zelikman

It's really remarkable that the field is kind of so stuck in this task‑centric regime.

— Eric Zelikman

We’re much more likely to solve a lot of these fundamental human problems by building models that are really good at collaborating with large groups of people.

— Eric Zelikman

QUESTIONS ANSWERED IN THIS EPISODE

5 questions

How do you practically collect and evaluate data for long-horizon, human-in-the-loop interactions without waiting months or years for outcomes?

Eric Zelikman, former Stanford researcher and xAI lead, discusses his work on advancing AI reasoning through methods like STAR and Q* and his shift toward building more human-centric systems.

What technical architecture or training changes are needed to give models robust, privacy-preserving memory and persistent user models?

He explains how reinforcement learning and scalable reasoning have dramatically improved model 'IQ', yet current models still lack deep understanding of human goals, context, and long-term outcomes.

How would you design new benchmarks that measure “life impact” or long-term user empowerment instead of single-task accuracy?

Zelikman argues that the industry’s task-centric, single-turn benchmarks and automation mindset limit AI’s ability to genuinely empower people rather than replace them.

In what concrete scenarios do you expect human–AI collaboration to create net new value, rather than just replacing existing jobs or workflows?

His new company, Humans&, aims to build models that understand users over time, remember their context, and act as long-horizon collaborators that expand human potential instead of just automating existing GDP slices.

What are the main safety, alignment, and consent challenges when building models that deeply model individuals’ goals, preferences, and weaknesses?

EVERY SPOKEN WORD

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

At a glance

From IQ to EQ: Building Human-Centric AI That Truly Collaborates

Scaling reasoning via RL can continuously extend model capabilities.

Model performance is highly sensitive to context and problem framing.

Verification and training distribution still bound what models can reliably do.

Single-turn, task-centric training creates shallow, brittle AI behavior.

Long-term, human-in-the-loop collaboration can grow the economic pie.

Memory and persistent user modeling are underexploited but crucial.

Building EQ in AI is both a capabilities and values choice.

How do you practically collect and evaluate data for long-horizon, human-in-the-loop interactions without waiting months or years for outcomes?

What technical architecture or training changes are needed to give models robust, privacy-preserving memory and persistent user models?

How would you design new benchmarks that measure “life impact” or long-term user empowerment instead of single-task accuracy?

In what concrete scenarios do you expect human–AI collaboration to create net new value, rather than just replacing existing jobs or workflows?

What are the main safety, alignment, and consent challenges when building models that deeply model individuals’ goals, preferences, and weaknesses?

Get more out of YouTube videos.