Skip to content
Huberman LabHuberman Lab

The Neuroscience of Speech, Language & Music | Dr. Erich Jarvis

My guest this episode is Dr. Erich Jarvis, PhD—Professor and Head of the Laboratory of Neurogenetics of Language at Rockefeller University and Investigator with the Howard Hughes Medical Institute (HHMI). Dr. Jarvis' research spans the molecular and genetic mechanisms of vocal communication, comparative genomics of speech and language across species and the relationship between speech, language and movement. We discuss the unique ability of humans (and certain animal species) to learn and communicate using complex language, including verbal speech production and the ability to interpret both written and spoken language. We also discuss the connections between language, singing and dance, and why song may have evolved before language. Dr. Jarvis also explains some of the underlying biological and genetic components of stuttering/speech disorders, non-verbal communication, why it's easiest to learn a language as a child and how individuals can learn multiple languages at any age. This episode ought to be of interest to everyone interested in the origins of human speech, language, music and culture and how newer technology, such as social media and texting, changes our brains. Thank you to our sponsors AG1 (Athletic Greens): https://athleticgreens.com/huberman InsideTracker: https://insidetracker.com/huberman ROKA: https://roka.com/huberman LMNT: https://drinklmnt.com/huberman Supplements from Momentous https://www.livemomentous.com/huberman Social & Website Instagram: https://www.instagram.com/hubermanlab Twitter: https://twitter.com/hubermanlab Facebook: https://www.facebook.com/hubermanlab TikTok: https://www.tiktok.com/@hubermanlab LinkedIn: https://www.linkedin.com/in/andrew-hu... Website: https://hubermanlab.com Newsletter: https://hubermanlab.com/neural-network Dr. Erich Jarvis Dr. Erick Jarvis' Lab: https://www.jarvislab.net Rockefeller University: https://www.rockefeller.edu/our-scientists/heads-of-laboratories/1159-erich-d-jarvis Google Scholar: https://scholar.google.com/citations?user=cI-fi9MAAAAJ Twitter: https://twitter.com/erichjarvis Instagram: https://www.instagram.com/erich_d_jarvis LinkedIn: https://www.linkedin.com/in/erich-jarvis-ba73624 Other Resources: Earth Biogenome Project: https://www.earthbiogenome.org GenomeArk: https://vgp.github.io/genomeark Timestamps 00:00:00 Dr. Erich Jarvis & Vocal Communication 00:03:43 Momentous Supplements 00:04:36 InsideTracker, ROKA, LMNT 00:08:01 Speech vs. Language, Is There a Difference? 00:10:55 Animal Communication, Hand Gestures & Language 00:15:25 Vocalization & Innate Language, Evolution of Modern Language 00:21:10 Humans & Songbirds, Critical Periods, Genetics, Speech Disorders 00:27:11 Innate Predisposition to Learn Language, Cultural Hybridization 00:31:34 Genes for Speech & Language 00:35:49 Learning New or Multiple Languages, Critical Periods, Phonemes 00:41:39 AG1 (Athletic Greens) 00:42:52 Semantic vs. Effective Communication, Emotion, Singing 00:47:32 Singing, Link Between Dancing & Vocal Learning 00:52:55 Motor Theory of Vocal Learning, Dance 00:55:03 Music & Dance, Emotional Bonding, Genetic Predispositions 01:04:11 Facial Expressions & Language, Innate Expressions 01:09:35 Reading & Writing 01:15:13 Writing by Hand vs. Typing, Thoughts & Writing 01:20:58 Stutter, Neurogenetics, Overcome Stutter, Conversations 01:26:58 Modern Language Evolution: Texting, Social Media & the Future 01:36:26 Movement: The Link to Cognitive Growth 01:40:21 Comparative Genomics, Earth Biogenome Project, Genome Ark, Conservation 01:48:24 Evolution of Skin & Fur Color 01:51:22 Dr. Erich Jarvis, Zero-Cost Support, YouTube Feedback, Spotify & Apple Reviews, Momentous Supplements, AG1 (Athletic Greens), Instagram, Twitter, Neural Network Newsletter, Huberman Lab Clips The Huberman Lab Podcast is for general informational purposes only and does not constitute the practice of medicine, nursing or other professional health care services, including the giving of medical advice, and no doctor/patient relationship is formed. The use of information on this podcast or materials linked from this podcast is at the user's own risk. The content of this podcast is not intended to be a substitute for professional medical advice, diagnosis, or treatment. Users should not disregard or delay in obtaining medical advice for any medical condition they may have and should seek the assistance of their health care professionals for any such conditions.

Andrew HubermanhostErich Jarvisguest
Aug 29, 20221h 54mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 10:00

    Intro, Guest Background, and Episode Framing

    Huberman introduces Erich Jarvis, outlining his work on vocal learning, speech disorders, genetics, and the links between language, music, and movement across species. He also briefly explains the podcast’s sponsorships and mission before beginning the main conversation.

    • Jarvis studies genes-to-circuits-to-cognition for speech, language, song, and dance in humans and animals.
    • His lab focuses on vocal learning species like songbirds and parrots to understand human speech.
    • Huberman emphasizes the relevance of Jarvis's work for speech disorders, multilingualism, and everyday communication.
    • Administrative segments: sponsorships, podcast mission, and zero-cost education framing.
  2. 10:00 – 18:00

    Defining Speech vs. Language and the Myth of a Language Module

    Jarvis challenges the traditional separation of speech and language and the concept of a distinct 'language module'. He posits that language computations are embedded directly in speech production and auditory perception circuits, with vocal learning as the key specialization.

    • Psychological/behavioral terms like “speech” and “language” don’t map cleanly onto brain organization.
    • No strong evidence for a separate, encapsulated language module in the brain.
    • Speech production circuits (larynx, jaw, tongue) contain the algorithms for spoken language.
    • Auditory pathways contain the algorithms for understanding speech.
    • These production circuits are highly specialized in humans and a few birds; auditory comprehension is widespread across species.
  3. 18:00 – 29:00

    Animal Communication, Gestures, and Evolutionary Parallels

    The discussion broadens to animal communication systems, gestural language, and the evolutionary relationship between hand/body movements and speech. Jarvis explains why some species excel at gestures more than vocal imitation and how humans unconsciously gesture while speaking.

    • Many animals use complex body and aerial displays as communicative “languages.”
    • Brain areas for speech and for hand gestures are adjacent, suggesting evolutionary linkage.
    • Speech circuits likely evolved out of preexisting motor pathways controlling body movement.
    • Humans are extremely advanced at spoken language but only moderately more advanced at gestural communication compared to some primates.
    • Examples: Koko the gorilla could learn sign-like gestures and understand speech but couldn’t vocally imitate words.
  4. 29:00 – 38:00

    Innate vs. Learned Vocalizations and the Evolutionary Origins of Speech

    Jarvis distinguishes innate vocalizations from learned vocal communication, highlighting vocal learning as the rare trait underlying spoken language. He then uses genomic data from Neanderthals and other hominins to argue that spoken language is likely at least hundreds of thousands of years old.

    • Most vertebrate vocalizations (e.g., crying, barking) are innate and brainstem-driven.
    • Learned vocal communication—imitation of novel sounds—is rare and defines spoken language.
    • Forebrain motor circuits take over brainstem vocal control in humans, parrots, songbirds.
    • Genomic comparisons with Neanderthals and Denisovans show shared variants in speech-related genes (e.g., FOXP2), suggesting they likely had some form of spoken language.
    • Estimated timeline: advanced vocal learning in hominins probably evolved 500,000–1,000,000 years ago.
  5. 38:00 – 42:00

    Songbirds, Brain Homologies, and Genetic Convergence

    Huberman and Jarvis explore the songbird model of vocal learning, including critical periods, deafening effects, and specialized song nuclei. Jarvis describes strong behavioral and genetic convergence between human speech circuits and bird song circuits despite deep evolutionary divergence.

    • Songbirds, parrots, and hummingbirds independently evolved vocal learning; their behaviors mirror human speech properties (critical periods, deafening-induced deterioration).
    • Unique song system nuclei (e.g., Area X, RA) are absent in non–vocal-learning relatives.
    • Connectivity and gene expression patterns in bird song nuclei parallel Broca’s area, laryngeal motor cortex, etc., in humans.
    • Convergent changes in genes like FOXP2 produce similar speech/song deficits when disrupted across species.
    • Convergence extends to circuits, genes, and even specific mutations, implying strong evolutionary constraints on how complex vocal learning can evolve.
  6. 42:00 – 54:00

    Hummingbirds, Multimodal Signaling, and Motor Demands of Speech

    Hummingbirds are presented as extreme examples of small-brained but highly specialized vocal learners that integrate wing-produced sounds with vocal song. Jarvis then pivots to what speech-related genes actually do in the brain, revealing unexpected roles in connectivity, protection, and plasticity.

    • Some hummingbirds coordinate wing “claps” with vocal song, effectively adding percussive elements to their vocalizations.
    • Vocal-learning species often evolve multiple complex traits (e.g., hummingbirds: rapid flight, song, coordinated wing sounds).
    • Key gene categories in speech/song circuits: axon-guidance genes (often turned off locally to allow atypical connections), calcium-buffering and neuroprotective genes, and plasticity-related genes.
    • Laryngeal muscles are the fastest-firing muscles in the body; circuits controlling them require high-frequency firing and robust protection from excitotoxicity.
    • Speech learning demands greater plasticity than many other motor skills (e.g., walking).
  7. 54:00 – 1:06:00

    Critical Periods, Bilingualism, and What Makes Language Easier for Kids

    The conversation turns to why children more easily acquire languages and accents, how critical periods shape phoneme repertoires, and whether early bilingualism eases later language learning. Jarvis emphasizes universal critical period dynamics across brain systems with especially marked effects in speech.

    • Entire brain undergoes critical periods; speech and language show especially strong age-related learning effects.
    • Brains must stabilize to avoid constant overwriting and to cope with limited storage capacity, reducing plasticity after early development.
    • Humans remain unusually juvenile-like (more plastic) across life due to genes like SRGAP2 but still experience critical period closure.
    • Children exposed to multiple languages retain broader phoneme repertoires and can later learn additional languages faster—not because of more plasticity, but because they already control more sound units.
    • Preference for native-species song in birds parallels human bias toward native language input; social bonding influences which models are learned.
  8. 1:06:00 – 1:16:00

    Emotion, Semantics, Music, and Lateralization of Communication

    Jarvis distinguishes semantic (meaning-based) and affective (emotion-based) communication, noting that both often use the same speech and auditory circuits but with differing emotional inputs and lateralization. He discusses how singing and music likely predate and scaffolded semantic speech.

    • Semantic communication conveys symbolic/abstract meaning; affective communication conveys emotional tone (e.g., courtship song, territorial calls).
    • The same vocal circuits can serve both functions, modulated by emotional centers (hypothalamus, cingulate, etc.).
    • Right hemisphere is relatively more engaged in music and affective prosody; left hemisphere is relatively dominant for speech and syntax.
    • All vocal-learning species use learned sounds for affective purposes; only a subset (e.g., humans, some parrots, dolphins) use them for rich semantic content.
    • This supports the hypothesis that singing and emotionally charged vocalizations evolved first, with semantic, propositional language emerging later.
  9. 1:16:00 – 1:35:00

    Jarvis’s Path from Dance to Neuroscience and the Motor Theory of Vocal Learning

    Jarvis recounts his personal history as a serious dancer and musician in a musically gifted family, and how his interest in dance led him to study the brain. He then integrates this with modern findings that vocal-learning circuits are embedded in motor circuits, forming the basis of his motor theory of vocal learning origin and its implications for dance.

    • Jarvis comes from a multigenerational family of singers; he pursued dance (including Alvin Ailey training) when he realized his singing wasn’t as strong as relatives'.
    • He chose science over a professional dance career, partly from a desire to positively impact society, and was drawn to neuroscience because it controls movement and dance.
    • His lab discovered that vocal-learning circuits (for speech/song) are nested within more ancient motor circuits controlling limb and body movement.
    • Only vocal-learning species reliably dance to a beat; famous example: Snowball the dancing cockatoo.
    • He proposes that speech circuits evolved via duplication of motor circuits, and the tight auditory–vocal integration subsequently spread to neighboring motor regions, enabling dance as “speaking with the body.”
  10. 1:35:00 – 1:47:00

    Motor Control, Genetics of Talent, and the Role of Movement in Cognition

    The discussion explores inter-individual differences in dance and singing ability, possible genetic contributions, and how motor activity supports cognition. Jarvis highlights that continuous movement (e.g., dance, walking, running) can help maintain brain health and thinking capacity into old age.

    • Jarvis grew up dancing informally in a Motown-influenced household and later discovered formal dance talent.
    • Direct-to-consumer genomics (e.g., 23andMe) suggests he has genotypes associated with fast-twitch athleticism and with difficulty singing on pitch, paralleling his own strengths and weaknesses.
    • Motor and cognitive functions are deeply intertwined; circuits for movement and for speech are anatomically adjacent and functionally interrelated.
    • Maintaining movement (dance, walking, running) appears to help keep brain circuits, including cognitive ones, in good functional shape.
    • Jarvis personally experiences dance as supporting his scientific thinking, not just his physical fitness.
  11. 1:47:00 – 1:57:00

    Facial Expressions, Gestures, and Multimodal Communication

    Huberman and Jarvis examine how facial expressions, hand gestures, and speech interact in communication. Jarvis explains shared and distinct motor pathways for face and voice, the innate and learned components of expressions, and why gestures are so hard to suppress when we speak.

    • Non-human primates have rich facial expression repertoires; they have strong cortical control over facial muscles but weak/absent direct cortical control of laryngeal motor neurons.
    • Humans inherited an already complex facial-expression system and layered learned vocal control on top of it.
    • Facial and manual gestures have both innate and learned components; they often align with voice for clarity but can also be dissociated (e.g., acting, deception).
    • Suppressing hand gestures while speaking is effortful, indicating tight coupling between gesture and speech circuits.
    • In text or email, the absence of visible facial and bodily signals increases ambiguity, making misinterpretation more likely.
  12. 1:57:00 – 2:10:00

    From Thought to Speech to Reading and Writing

    Jarvis lays out a concrete neural model of how we go from visual symbols to internal speech to comprehension and then to writing. He explains why reading silently still activates laryngeal muscles and why writing speed must align with internal speech for fluent composition.

    • When reading: visual cortex decodes letters → speech motor areas silently ‘speak’ the words → auditory cortex ‘hears’ this internal speech.
    • EMG recordings show low-level laryngeal muscle activity during silent reading or inner speech.
    • Writing recruits hand-motor regions adjacent to speech areas, which translate internal speech into pen or keyboard movements.
    • People often must stop speaking aloud in order to write because overlapping circuits compete for conscious control.
    • Fluent writing feels easiest when the rate of internal speech matches the motor system’s capacity to produce written output (handwriting or typing).
  13. 2:10:00 – 2:21:00

    Texting, Shorthand, and the Evolution of Modern Language Use

    The conversation shifts to texting, shorthand (OMG, WTF), and social media. Jarvis frames these not as degradations of language but as new forms of rapid written communication with different cognitive demands and social risks, especially given the very short delay between thought and public distribution.

    • Texting is the first large-scale instance of thumb-based writing; it favors brevity and rapid back-and-forth.
    • Shorthand and abbreviations are partly generational/cultural markers rather than intrinsic evidence of reduced intelligence.
    • Short-form writing increases communication speed but constrains nuance and can easily be misinterpreted.
    • Tweets and texts reduce the buffer between internal speech and external publication, amplifying impulsive or poorly filtered thoughts.
    • Jarvis views these technologies as part of ongoing language evolution but acknowledges “casualties” where rapid, contextless communication has serious consequences.
  14. 2:21:00 – 2:30:00

    Stuttering, Sensory–Motor Integration, and Therapeutic Approaches

    Jarvis describes how his lab unintentionally modeled stuttering in songbirds via basal ganglia damage and recovery, and how these findings parallel human neurogenic stuttering. He emphasizes the role of basal ganglia and sensory–motor integration and touches on behavioral therapies.

    • Lesions in songbird basal ganglia (analogs to human striatum) induce stuttering-like patterns as circuits recover.
    • Birds can recover more fully due to adult neurogenesis; mammals show more limited neuronal replacement.
    • In humans, many forms of stuttering—especially neurogenic—are linked to basal ganglia dysfunction affecting timing and movement coordination in speech circuits.
    • Behavioral therapies often work by slowing speech, using rhythmic cues, or enhancing sensory–motor control to reduce disfluency.
    • Sensory feedback (auditory and sometimes tactile) is central to stabilizing speech output in both birds and humans.
  15. 2:30:00 – 2:38:00

    Brain–Computer Interfaces and Reading Out Internal Speech

    Huberman raises emerging work translating brain activity directly into text or synthesized speech in paralyzed patients. Jarvis notes that these approaches leverage the same speech circuits used for internal speech and raises ethical and practical questions about future mind-to-text systems.

    • BCI work (e.g., Eddie Chang’s group) decodes activity in speech-related cortex to produce words for people who cannot speak.
    • Because people often think in internal speech, speech circuits provide a natural substrate for decoding ‘thoughts’.
    • Jarvis warns that direct translation of internal speech to external signals could be ethically fraught if used without consent.
    • Similar methods applied to vocal-learning birds could reveal covert songs, e.g., during sleep or rehearsal.
    • The boundary between internal thought and public communication is likely to blur further as BCI technologies advance.
  16. 2:38:00 – 2:48:00

    Practical Implications: Improving Speech, Language, and Brain Health

    Huberman asks about practical tools for enhancing speech and language. Jarvis emphasizes the importance of continued movement and vocal use—dance, walking, running, singing, or speaking—to maintain healthy motor and cognitive circuits, suggesting that these activities likely support overall brain function.

    • Using the circuits you care about (speech, movement, cognition) is key to preserving them—“use it or lose it.”
    • Dance and other complex motor skills engage extensive brain resources and appear to support cognitive vitality.
    • Practicing speech (public speaking, singing, reading aloud) keeps vocal and associated cognitive circuits active.
    • Jarvis’s own experience: sustained dance practice helps him think more clearly and maintain mental agility.
    • Huberman notes that long slow runs serve a similar ‘loosening and priming’ role for his own extended speaking tasks.
  17. 2:48:00 – 3:06:00

    Global Genome Projects, Vocal Learning, and Conservation

    The final major segment focuses on Jarvis’s leadership in large-scale genome projects. He explains why complete, error-free genomes across species are crucial for understanding trait evolution (including vocal learning) and outlines their use in conservation and potential species resurrection.

    • Comparative genomics across many species enables GWAS-like analyses that link genetic changes to traits such as vocal learning or flight.
    • Earlier genome assemblies often contained errors (false duplications, missing regulatory regions); new telomere-to-telomere methods greatly improve accuracy.
    • Previously “dark” genome regions now revealed include regulatory elements that are specialized in vocal-learning species and likely help wire speech circuits.
    • The Vertebrate Genomes Project and Earth BioGenome Project aim to sequence all vertebrates and all eukaryotes, respectively.
    • The “Genome Ark” concept: store high-quality genomes of all species, especially endangered ones, for basic science, conservation, and possible future de-extinction efforts.
    • Examples: projects to support resurrection attempts for the passenger pigeon and woolly mammoth rely on high-quality genomes produced by Jarvis’s teams.
  18. 3:06:00

    Closing Reflections and Future Directions

    Huberman closes by reflecting on the surprising continuity between human language and animal song and movement. He thanks Jarvis for his work and hints at rapidly unfolding advances in genomics, speech neuroscience, and conservation that will likely merit future discussion.

    • Humans share deep structural and genetic commonalities with songbirds and parrots in speech-related circuits.
    • Language, music, and movement are tightly interwoven in brain organization and evolution.
    • Large-scale genome and connectome efforts are now yielding concrete insights into brain specialization and behavior.
    • Jarvis has additional work in progress that will likely further reshape understanding of speech, language, and their evolution.
    • Huberman reiterates the value of bringing cutting-edge basic science to the public in an accessible format.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.