Skip to content
No PriorsNo Priors

No Priors Ep. 74 | With Google DeepMind VP of Research Oriol Vinyals

In this episode of No Priors, hosts Sarah and Elad are joined by Oriol Vinyals, VP of Research, Deep Learning Team Lead, at Google DeepMind and Technical Co-lead of the Gemini project. Oriol shares insights from his career in machine learning, including leading the AlphaStar team and building competitive StarCraft agents. We talk about Google DeepMind, forming the Gemini project, and integrating AI technology throughout Google products. Oriol also discusses the advancements and challenges in long context LLMs, reasoning capabilities of models, and the future direction of AI research and applications. The episode concludes with a reflection on AGI timelines, the importance of specialized research, and advice for future generations in navigating the evolving landscape of AI. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @oriolvinyalsml Show Notes: 00:00 Introduction to Oriol Vinyals 00:55 The Gemini Project and Its Impact 02:04 AI in Google Search and Chat Models 08:29 Infinite Context Length and Its Applications 14:42 Scaling AI and Reward Functions 31:55 The Future of General Models and Specialization 38:14 Reflections on AGI and Personal Insights 43:09 Will the Next Generation Study Computer Science? 45:37 Closing thoughts

Sarah GuohostOriol VinyalsguestElad Gilhost
Aug 1, 202446mWatch on YouTube ↗

CHAPTERS

  1. 0:05 – 2:05

    Oriol Vinyals’ role at DeepMind and the post-merger research org

    Sarah opens by introducing Oriol’s background (from AlphaStar to Gemini) and asks how Google’s AI research is organized after a turbulent year. Oriol explains the two major structural shifts: forming the Gemini project and unifying research into Google DeepMind.

    • Oriol’s career arc: AlphaStar leadership to Gemini technical co-lead
    • Gemini formed by combining parallel LLM efforts (Google Brain + Legacy DeepMind)
    • Leadership alignment and team integration to ship the first Gemini model
    • Creation of a single AI/AGI research org: Google DeepMind
    • Gemini’s mission: a strong “core model” powering many products
  2. 2:05 – 4:29

    How Gemini interfaces with Google’s products—and the question of “AI replacing search”

    Oriol describes how a global research effort operates across California and London, including the practical benefit of “24/7” development. He outlines how model decisions flow into major Google product surfaces, culminating in Search integration.

    • Two-hub execution (California + London) and continuous training/development cadence
    • Research goals: push state-of-the-art and anticipate what comes next
    • Product pull: incorporating requirements from many Google product areas
    • Early focus surfaces: Cloud/enterprise, developers, Gemini app (chat)
    • Search as a critical downstream integration point with massive user impact
  3. 4:29 – 8:30

    Chat vs. traditional search: complementary UIs and how they may converge

    Elad probes which query types belong in chat versus classic search. Oriol frames the current state as experimentation, where chat can use Search as a tool for grounding while Search gains summarization and reasoning capabilities from LLMs.

    • LLM-first experiences (chatbots) benefit from tools like Search for citations/grounding
    • Hallucinations make retrieval/citations important for trust
    • Search is adding AI summaries and broader LLM-powered experiences
    • No clear “winner” yet; both interfaces serve different intents
    • At Google scale, product teams iterate with users while model teams stay open-minded
  4. 8:30 – 11:15

    What’s surprised Oriol: long context, video understanding, and new interfaces

    Oriol highlights “infinite context” as unexpectedly transformative once people started using it, especially for video and large-document Q&A. He connects this to emerging interface ideas like camera-based debugging and agentic phone experiences (Project Astra).

    • Long context feels “superhuman” when you can upload an hour-long video and ask anything
    • Contrast with RNN/LSTM “infinite memory” that didn’t work well in practice
    • Developers discovering novel workflows (e.g., pointing a camera at a screen to debug)
    • Project Astra as a vision: memory + agentic interaction on-device
    • Still early: magical demos exist, but mainstream product fit is evolving
  5. 11:15 – 13:09

    When will very long context be widely deployed? Practical timelines and constraints

    Elad asks when long context windows become common in enterprise and consumer settings. Oriol suggests the technology is close and expects a near-term jump in “commodity” context lengths, with further gains driven by compelling use cases and hardware progress.

    • Deployment depends on compelling use cases as much as raw capability
    • Hardware and systems co-evolve with research breakthroughs
    • Expectation: 10× increases in common context windows within ~1–2 years
    • Extremely long context will continue as a research driver
    • Techniques will mix: hierarchical memory, RAG, and other hybrids
  6. 13:09 – 14:42

    Retrieval and hierarchical memory in an “infinite context” world

    Sarah asks whether retrieval remains relevant if context becomes effectively unbounded. Oriol argues hierarchical memory and retrieval stay valuable for efficiency and quality, but notes the tradeoff: retrieval compresses information while full-context reasoning preserves fine detail.

    • Hierarchical memory improves efficiency of learning and recall
    • Retrieval systems risk oversimplification (e.g., a whole book as one vector)
    • Full-context models can reason over “every word,” improving fidelity
    • Likely future is hybrid: long context plus selective retrieval
    • Google’s deep experience combining retrieval with neural methods
  7. 14:42 – 19:28

    Where LLMs still fall short: crisp reasoning, reliability, and hallucinations

    Oriol places today’s moment in the arc from general algorithms (deep learning era) to general models (LLMs), plus multimodality. He stresses that while reasoning exists, it’s not consistently reliable—models can solve very hard problems yet fail easy ones—making robustness a key frontier.

    • Field expansion: more researchers, open source, and accessibility driving progress
    • 2010–2020: general algorithms applied to specialized domains (vision, speech, games)
    • LLM era: language modeling as a powerful abstraction enabling general models
    • Multimodality extends generality across vision/audio/video
    • Open problem: making reasoning accurate and dependable (less contradiction/hallucination)
  8. 19:28 – 22:03

    What “crisp and accurate” reasoning means: probability, errors at scale, and verification loops

    Sarah asks for the difference between having reasoning and having crisp reasoning. Oriol explains that models assign probabilities over many possible outputs, so errors never fully disappear; improving “crispness” means driving error probability down via scale and via extra computation/verification processes around the model.

    • LLMs maintain a probability distribution that gives non-zero mass to wrong outputs
    • Even small error rates become visible when deployed globally
    • Scaling may reduce errors, but may not be sufficient alone
    • Add redundancy/logic/verification—e.g., wrapping LLMs with small programs or procedures
    • Goal: use more compute at inference to approach “as good as humans” reliability
  9. 22:03 – 26:31

    Training vs inference compute: Bitter Lesson, RL in language, and imperfect rewards

    Sarah asks how compute will be allocated between training and inference in a few years. Oriol uses AlphaGo as an analogy and argues the biggest missing piece for language is robust reward functions; today RL can overfit to flawed rewards, so the ‘middle’ (post-training/RL) needs breakthroughs.

    • Bitter Lesson framing: scale learning and scale search
    • In language, reward is hard—truth isn’t binary the way games are
    • AlphaGo recipe: pretrain → RL self-play → inference-time search; RL dominated compute
    • Today’s LLMs: pretraining dominates; RL can overfit to imperfect human preference rewards
    • Future expectation: more compute moves into RL/post-training once better rewards exist
  10. 26:31 – 28:47

    Scaling reward functions past human experts: model-as-judge and bootstrapping

    Elad asks how reward modeling scales once models outperform human evaluators (e.g., medical domains). Oriol proposes a bootstrapping path: models become better at judging outputs than generating them, enabling reinforcement loops where the model supplies increasingly capable evaluation signals.

    • Human annotation scales, but hits limits when models surpass experts
    • Key hypothesis: verification can be easier than generation
    • Model-as-reward/model-as-judge enables self-improvement loops
    • Emerging area: generative reward models and reward-model leaderboards
    • Long interactions and user feedback could become lightweight supervision signals
  11. 28:47 – 31:55

    Nyquist analogy, super-resolution ‘violations,’ and emergent self-correction

    Elad introduces Nyquist-Shannon as a metaphor for whether something less intelligent can evaluate something more intelligent. Oriol extends the analogy via super-resolution models that ‘hallucinate’ plausible detail, then ties it back to emergence: once self-correction appears in pockets, algorithms can exploit it for compounding gains.

    • Nyquist framing: sampling limits what can be reconstructed/understood
    • Generative super-resolution appears to ‘break’ Nyquist by learning world structure (but can hallucinate)
    • Emergence: self-correction may appear as capabilities cross thresholds
    • Not uniform across domains; capabilities emerge unevenly
    • Need algorithms that harness model-based evaluation and correction effectively
  12. 31:55 – 35:08

    General models vs specialization: when to build domain-specific systems

    Sarah asks whether general multimodal models eliminate the need for specialized research (e.g., proteins, materials). Oriol argues general models may be “20% good at everything,” but important world problems justify specialization to reach high performance sooner, with spillover benefits back into the general model.

    • General models offer broad competence, but may be shallow in critical domains
    • Specialization is justified when the problem impact is high (fusion, weather, proteins)
    • General-to-specialized direction: fine-tune or build narrow systems atop Gemini
    • Specialized advances can feed back (data, rewards, techniques) into the core model
    • Expectation: an enduring hybrid portfolio of general models plus targeted systems
  13. 35:08 – 38:14

    Is focusing on math/CS a dead end? Using ‘crisp’ domains to learn fuzzier truth

    Sarah raises criticism that constrained domains (math, games) may not translate to general reasoning. Oriol responds that even “crisp” domains get messy once language explanations and proof checking enter, and that working through these complexities can lead to more general reward modeling and truth/verification methods.

    • Perfect reward is rare; even “survival” is hard to formalize as reward
    • Math correctness is harder than it seems (proofs, formalization errors, language overlays)
    • If approached thoughtfully, math/CS can push progress on broader truth/reward questions
    • Reward models need to assess ‘rough correctness,’ not just win/lose signals
    • Potential outcome: more general self-improvement loops than exist today
  14. 38:14 – 46:08

    AGI skepticism and personal reflections: timelines, what matters, and advice for kids

    Oriol offers a contrarian stance: the exact moment of “AGI” may not matter because capability will be a distribution with uneven strengths and weaknesses. He then reflects on personal adaptation (parenting, scaling oneself with tools) and closes with advice: follow passion, but learn to leverage AI and find underexplored corners for impact.

    • AGI timelines (e.g., 2028) depend heavily on definitions and tests
    • AGI may not be a single threshold; expect uneven capability distributions
    • Focus should remain on fixing egregious failures and enabling science/product impact
    • Personal adaptation: using AI to manage information overload and scale effectiveness
    • Education advice: pursue passion, embrace AI tools, and explore domains where AI hasn’t fully penetrated

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.