Skip to content
No PriorsNo Priors

No Priors Ep. 57 | With LangChain CEO and Co-Founder Harrison Chase

Companies are employing AI agents and co-pilots to help their teams increase efficiency and accuracy, but developing apps that are trained properly can require a skillset many enterprise teams don’t have. This week on No Priors, Sarah and Elad are joined by Harrison Chase, the CEO and co-founder of LangChain, an open-source framework and developer toolkit that helps developers build LLM applications. In this conversation they talk about the gaps in open source app development, what it will take to keep up with private companies, the importance of creating prompts that can be compatible with many API models, and why memory is so undeveloped in this space. Show Notes: 0:00 Introduction to LangChain 1:45 Managing an open source environment 4:30 Developing useful AI agents 10:03 Sophistication and limitations of AI app development 14:17 Switching between model APIs 17:10 Context windows, fine tuning and functionality 21:37 Evolution of AI open source environment 23:53 The next big breakthroughs

Sarah GuohostHarrison ChaseguestElad Gilhost
Mar 28, 202427mWatch on YouTube ↗

CHAPTERS

  1. 0:05 – 1:42

    LangChain’s origin story and what the company ships today

    Harrison Chase explains how LangChain began as a personal Python side project in late 2022, sparked by common patterns he saw at hackathons just before ChatGPT’s release. He then clarifies what LangChain has become as a company: open-source packages plus LangSmith for evaluation, testing, and monitoring.

    • Started as a side project while exploring early LLM app patterns (Sept/Oct 2022)
    • Early timing tailwind: ChatGPT launch shortly after
    • LangChain today spans open-source libraries and a commercial platform
    • LangSmith focuses on testing, evaluation, and monitoring of LLM systems
  2. 1:42 – 3:07

    Stability vs churn in fast-moving open source: integrations, abstractions, and runtimes

    Sarah asks how to maintain an open-source framework while the ecosystem changes rapidly. Harrison describes what has stayed stable (lots of integrations) versus what evolved most (how developers compose systems), and why LangChain invested in lower-level composition primitives.

    • Early LangChain: a few high-level implementations + a small set of model integrations
    • Now: hundreds of integrations (models, vector stores, etc.) and higher-level building blocks
    • Big shift: teams need customization when moving from prototype to production
    • Investment in lower-level runtime/composition: streaming, structured outputs, and composability
  3. 3:07 – 4:26

    From chains to agents to graphs: why LangGraph and cyclic workflows matter

    Harrison explains the evolution of LLM app patterns: simple chains, more complex chains, autonomous agents, and now controlled graphs/state machines. He introduces LangChain Expression Language and LangGraph as tools to support production-grade agent loops that require cycles (not just DAGs).

    • Upper-layer patterns (how to assemble components) have churned the most
    • LangChain Expression Language: lower-level chaining protocol for composition
    • LangGraph targets cyclic graphs needed for agent loops (LLM-in-a-loop)
    • Trend toward “controlled state machines” instead of fully autonomous for-loops
  4. 4:26 – 7:20

    What’s missing for truly performant agents: UX, planning, reflection, and testing

    Elad probes the blockers to production-ready agents. Harrison highlights open questions in user experience and transparency, plus core model limitations around planning and self-correction, and the need for better testing workflows to build confidence in agent behavior.

    • Agent UX is unresolved: communicate strengths/limits and enable user course-correction
    • Observability of intermediate steps is critical for trust and debugging
    • Planning ability of current LLMs is a major limitation; apps often hardcode domain flows
    • Research themes: explicit planning steps and post-action reflection loops
    • Testing individual nodes/steps is essential to gain confidence in multi-step systems
  5. 7:20 – 10:00

    Agent memory: procedural tool-use vs personalized long-term memory

    Elad asks about memory; Harrison splits it into two categories: system/procedural memory (how to do tasks and use tools) and personalization memory (user preferences). He outlines practical approaches—few-shot prompting and fine-tuning for tool use, plus explicit or passive mechanisms for remembering user facts.

    • Two memory types: procedural (system-level) vs personalization (user-specific)
    • Procedural memory often improved via few-shot prompting; sometimes fine-tuning
    • Data flywheel: monitor app → collect good examples → feed back as few-shot examples
    • Personalization memory approaches: explicit “remember/delete” tool calls vs passive extraction
    • Memory remains nascent; no clear dominant solution yet
  6. 10:00 – 12:22

    How real-world apps are getting more sophisticated: multi-step RAG and controlled agents

    Sarah asks whether the ecosystem’s application sophistication has changed. Harrison says LangChain has always targeted multi-step systems, but notes newer progress: richer query analysis for RAG and a shift from rigid chains or fully autonomous agents toward controlled state-machine/graph designs.

    • LangChain/LangSmith emphasize multi-step apps with testing and observability
    • RAG is becoming multi-stage: query analysis, retriever routing, and metadata filtering
    • Agent improvements increasingly driven by few-shot/data flywheels
    • Production systems often land between chains and autonomous agents: controlled graphs
    • LangGraph reflects updated beliefs about what works reliably in production
  7. 12:22 – 14:07

    Where agentic apps work first: customer support, coding, and recommendation-style assistants

    The discussion turns to which agentic use cases are proving viable. Harrison points to customer support and coding as standout areas due to clear feedback loops and measurable outcomes, with emerging patterns in preference-driven recommendation chatbots.

    • Customer support as an obvious near-term win; mentions Sierra and Klarna examples
    • Coding agents benefit from execution-based feedback loops (does the code run?)
    • Great UX can be a differentiator (noted in the “Cognition” demo)
    • Recommendation-style assistants differ from support: preference- and taste-driven
    • Developers building for themselves can iterate quickly on coding assistants
  8. 14:07 – 15:42

    Can teams really switch between model providers? The hard parts of portability

    Sarah questions whether prompt-and-play model switching is real. Harrison argues switching is harder than it seems because prompts and capabilities differ across providers, though convergence may improve as models become more capable and feature parity increases.

    • Model outputs differ; prompts often need to be rewritten per model
    • Hope for convergence as models get smarter and idiosyncrasies matter less
    • Feature parity matters: function calling historically locked prompts to OpenAI
    • Portability improves as more providers add common primitives (e.g., function calling)
  9. 15:42 – 17:02

    What differentiates model APIs: function calling, modalities, context length, safety, cost, and latency

    The group enumerates key API-level differences that affect application architecture and portability. Harrison emphasizes context windows and modalities, while Elad adds enterprise concerns like prompt injection; Harrison notes switching pressure grows when scaling due to cost and latency constraints.

    • Differentiators: context window size, modalities (e.g., vision), function calling
    • Safety concerns like prompt injection become important in enterprise deployments
    • Scaling changes priorities: moving from GPT-4 prototypes to cheaper/faster models
    • Switching models is most urgent during prototype-to-production rollout
    • Cost and latency often drive architecture and provider decisions
  10. 17:02 – 19:53

    Long context windows vs RAG: what changes, what doesn’t, and where benchmarks mislead

    Elad asks whether million-token contexts make RAG obsolete. Harrison says long context enables more one-shot tasks, but iterative interaction (agents/chaining) still matters; for RAG, “needle in a haystack” doesn’t reflect many real use cases that require multi-fact reasoning and robust ingestion/indexing strategies.

    • Long context enables one-shot extraction and Q&A over a few long documents
    • Chaining/agents remain necessary for iterative environment interaction and feedback
    • RAG isn’t just retrieval; many tasks require reasoning over multiple evidence points
    • Needle-in-haystack benchmarks can overstate real-world RAG performance
    • Ingestion/indexing (chunking, parent document retrieval, hybrid search) remains uncertain
  11. 19:53 – 21:21

    Fine-tuning in practice: mostly experimentation until teams hit serious scale

    Elad asks about fine-tuning adoption. Harrison reports most teams experiment, but few deploy fine-tunes unless they have critical scale, due to challenges in dataset curation, evaluation, and slower iteration compared to prompts and few-shot examples.

    • Fine-tuning is rare in production except at critical scale
    • Hard parts: collecting/curating training data and maintaining visibility into it
    • Evaluation/testing of fine-tuned models is a major pain point
    • Iteration speed: prompts and few-shot examples change faster than fine-tuned weights
    • LangSmith targets parts of this workflow (data, eval, testing)
  12. 21:21 – 22:56

    Open-source model evolution and a contrarian take: reasoning still lags frontier closed models

    Sarah asks whether Llama/Mistral and other open-source advances changed what developers build. Harrison notes rising interest in open source, but argues that for reasoning-heavy LangChain-style apps, open-source models still trail GPT-4/Claude 3 despite hype; he also notes how quickly provider deprecations can break older code.

    • Rapid ecosystem churn: original OpenAI models used early on have since been deprecated
    • Interest in open-source models is increasing among developers
    • Reasoning capability is crucial for many LangChain applications
    • Claim: open-source reasoning still lags frontier closed models (GPT-4/Claude 3)
    • Mismatch between social-media hype and real production requirements (for now)
  13. 22:56 – 27:31

    Next breakthroughs: memory, continual learning, and few-shot datasets as a practical flywheel

    Zooming out, Harrison predicts the most exciting progress will be at the application/UX layer, especially around long-term memory. He describes a path toward continual learning via feedback loops and curated few-shot example datasets—faster and cheaper than fine-tuning—and connects it to optimization ideas like DSPy.

    • Big opportunity: application/UX innovations that leverage long-term memory
    • Personalized memory could enable new consumer and business experiences
    • Continual learning framing: systems improve automatically from interaction feedback
    • Few-shot dataset accumulation as an under-discussed, practical approach vs fine-tuning
    • DSPy as a related optimization paradigm for multi-step LLM programs

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.