No PriorsNo Priors Ep. 57 | With LangChain CEO and Co-Founder Harrison Chase
CHAPTERS
- 0:05 – 1:42
LangChain’s origin story and what the company ships today
Harrison Chase explains how LangChain began as a personal Python side project in late 2022, sparked by common patterns he saw at hackathons just before ChatGPT’s release. He then clarifies what LangChain has become as a company: open-source packages plus LangSmith for evaluation, testing, and monitoring.
- •Started as a side project while exploring early LLM app patterns (Sept/Oct 2022)
- •Early timing tailwind: ChatGPT launch shortly after
- •LangChain today spans open-source libraries and a commercial platform
- •LangSmith focuses on testing, evaluation, and monitoring of LLM systems
- 1:42 – 3:07
Stability vs churn in fast-moving open source: integrations, abstractions, and runtimes
Sarah asks how to maintain an open-source framework while the ecosystem changes rapidly. Harrison describes what has stayed stable (lots of integrations) versus what evolved most (how developers compose systems), and why LangChain invested in lower-level composition primitives.
- •Early LangChain: a few high-level implementations + a small set of model integrations
- •Now: hundreds of integrations (models, vector stores, etc.) and higher-level building blocks
- •Big shift: teams need customization when moving from prototype to production
- •Investment in lower-level runtime/composition: streaming, structured outputs, and composability
- 3:07 – 4:26
From chains to agents to graphs: why LangGraph and cyclic workflows matter
Harrison explains the evolution of LLM app patterns: simple chains, more complex chains, autonomous agents, and now controlled graphs/state machines. He introduces LangChain Expression Language and LangGraph as tools to support production-grade agent loops that require cycles (not just DAGs).
- •Upper-layer patterns (how to assemble components) have churned the most
- •LangChain Expression Language: lower-level chaining protocol for composition
- •LangGraph targets cyclic graphs needed for agent loops (LLM-in-a-loop)
- •Trend toward “controlled state machines” instead of fully autonomous for-loops
- 4:26 – 7:20
What’s missing for truly performant agents: UX, planning, reflection, and testing
Elad probes the blockers to production-ready agents. Harrison highlights open questions in user experience and transparency, plus core model limitations around planning and self-correction, and the need for better testing workflows to build confidence in agent behavior.
- •Agent UX is unresolved: communicate strengths/limits and enable user course-correction
- •Observability of intermediate steps is critical for trust and debugging
- •Planning ability of current LLMs is a major limitation; apps often hardcode domain flows
- •Research themes: explicit planning steps and post-action reflection loops
- •Testing individual nodes/steps is essential to gain confidence in multi-step systems
- 7:20 – 10:00
Agent memory: procedural tool-use vs personalized long-term memory
Elad asks about memory; Harrison splits it into two categories: system/procedural memory (how to do tasks and use tools) and personalization memory (user preferences). He outlines practical approaches—few-shot prompting and fine-tuning for tool use, plus explicit or passive mechanisms for remembering user facts.
- •Two memory types: procedural (system-level) vs personalization (user-specific)
- •Procedural memory often improved via few-shot prompting; sometimes fine-tuning
- •Data flywheel: monitor app → collect good examples → feed back as few-shot examples
- •Personalization memory approaches: explicit “remember/delete” tool calls vs passive extraction
- •Memory remains nascent; no clear dominant solution yet
- 10:00 – 12:22
How real-world apps are getting more sophisticated: multi-step RAG and controlled agents
Sarah asks whether the ecosystem’s application sophistication has changed. Harrison says LangChain has always targeted multi-step systems, but notes newer progress: richer query analysis for RAG and a shift from rigid chains or fully autonomous agents toward controlled state-machine/graph designs.
- •LangChain/LangSmith emphasize multi-step apps with testing and observability
- •RAG is becoming multi-stage: query analysis, retriever routing, and metadata filtering
- •Agent improvements increasingly driven by few-shot/data flywheels
- •Production systems often land between chains and autonomous agents: controlled graphs
- •LangGraph reflects updated beliefs about what works reliably in production
- 12:22 – 14:07
Where agentic apps work first: customer support, coding, and recommendation-style assistants
The discussion turns to which agentic use cases are proving viable. Harrison points to customer support and coding as standout areas due to clear feedback loops and measurable outcomes, with emerging patterns in preference-driven recommendation chatbots.
- •Customer support as an obvious near-term win; mentions Sierra and Klarna examples
- •Coding agents benefit from execution-based feedback loops (does the code run?)
- •Great UX can be a differentiator (noted in the “Cognition” demo)
- •Recommendation-style assistants differ from support: preference- and taste-driven
- •Developers building for themselves can iterate quickly on coding assistants
- 14:07 – 15:42
Can teams really switch between model providers? The hard parts of portability
Sarah questions whether prompt-and-play model switching is real. Harrison argues switching is harder than it seems because prompts and capabilities differ across providers, though convergence may improve as models become more capable and feature parity increases.
- •Model outputs differ; prompts often need to be rewritten per model
- •Hope for convergence as models get smarter and idiosyncrasies matter less
- •Feature parity matters: function calling historically locked prompts to OpenAI
- •Portability improves as more providers add common primitives (e.g., function calling)
- 15:42 – 17:02
What differentiates model APIs: function calling, modalities, context length, safety, cost, and latency
The group enumerates key API-level differences that affect application architecture and portability. Harrison emphasizes context windows and modalities, while Elad adds enterprise concerns like prompt injection; Harrison notes switching pressure grows when scaling due to cost and latency constraints.
- •Differentiators: context window size, modalities (e.g., vision), function calling
- •Safety concerns like prompt injection become important in enterprise deployments
- •Scaling changes priorities: moving from GPT-4 prototypes to cheaper/faster models
- •Switching models is most urgent during prototype-to-production rollout
- •Cost and latency often drive architecture and provider decisions
- 17:02 – 19:53
Long context windows vs RAG: what changes, what doesn’t, and where benchmarks mislead
Elad asks whether million-token contexts make RAG obsolete. Harrison says long context enables more one-shot tasks, but iterative interaction (agents/chaining) still matters; for RAG, “needle in a haystack” doesn’t reflect many real use cases that require multi-fact reasoning and robust ingestion/indexing strategies.
- •Long context enables one-shot extraction and Q&A over a few long documents
- •Chaining/agents remain necessary for iterative environment interaction and feedback
- •RAG isn’t just retrieval; many tasks require reasoning over multiple evidence points
- •Needle-in-haystack benchmarks can overstate real-world RAG performance
- •Ingestion/indexing (chunking, parent document retrieval, hybrid search) remains uncertain
- 19:53 – 21:21
Fine-tuning in practice: mostly experimentation until teams hit serious scale
Elad asks about fine-tuning adoption. Harrison reports most teams experiment, but few deploy fine-tunes unless they have critical scale, due to challenges in dataset curation, evaluation, and slower iteration compared to prompts and few-shot examples.
- •Fine-tuning is rare in production except at critical scale
- •Hard parts: collecting/curating training data and maintaining visibility into it
- •Evaluation/testing of fine-tuned models is a major pain point
- •Iteration speed: prompts and few-shot examples change faster than fine-tuned weights
- •LangSmith targets parts of this workflow (data, eval, testing)
- 21:21 – 22:56
Open-source model evolution and a contrarian take: reasoning still lags frontier closed models
Sarah asks whether Llama/Mistral and other open-source advances changed what developers build. Harrison notes rising interest in open source, but argues that for reasoning-heavy LangChain-style apps, open-source models still trail GPT-4/Claude 3 despite hype; he also notes how quickly provider deprecations can break older code.
- •Rapid ecosystem churn: original OpenAI models used early on have since been deprecated
- •Interest in open-source models is increasing among developers
- •Reasoning capability is crucial for many LangChain applications
- •Claim: open-source reasoning still lags frontier closed models (GPT-4/Claude 3)
- •Mismatch between social-media hype and real production requirements (for now)
- 22:56 – 27:31
Next breakthroughs: memory, continual learning, and few-shot datasets as a practical flywheel
Zooming out, Harrison predicts the most exciting progress will be at the application/UX layer, especially around long-term memory. He describes a path toward continual learning via feedback loops and curated few-shot example datasets—faster and cheaper than fine-tuning—and connects it to optimization ideas like DSPy.
- •Big opportunity: application/UX innovations that leverage long-term memory
- •Personalized memory could enable new consumer and business experiences
- •Continual learning framing: systems improve automatically from interaction feedback
- •Few-shot dataset accumulation as an under-discussed, practical approach vs fine-tuning
- •DSPy as a related optimization paradigm for multi-step LLM programs