Skip to content
No PriorsNo Priors

No Priors Ep. 52 | With Pinecone CEO Edo Liberty

Accurate, customizable search is one of the most immediate AI use cases for companies and general users. Today on No Priors, Elad and Sarah are joined by Pinecone CEO, Edo Liberty, to talk about how RAG architecture is improving syntax search and making LLMs more available. By using a RAG model Pinecone makes it possible for companies to vectorize their data and query it for the most accurate responses. In this episode, they talk about how Pinecone’s Canopy product is making search more accurate by using larger data sets in a way that is more efficient and cost effective—which was almost impossible before there were serverless options. They also get into how RAG architecture uniformly increases accuracy across the board, how these models can increase “operational sanity” in the dataset for their customers, and hybrid search models that are using keywords and embeds. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @EdoLiberty Show Notes: 0:00 Introduction to Edo and Pinecone 2:01 Use cases for Pinecone and RAG models 6:02 Corporate internal uses for syntax search 10:13 Removing the limits of RAG with Canopy 14:02 Hybrid search 16:51 Why keep Pinecone closed source 22:29 Infinite context 23:11 Embeddings and data leakage 25:35 Fine tuning the data set 27:33 What’s next for Pinecone 28:58 Separating reasoning and knowledge in AI

Sarah GuohostEdo LibertyguestElad Gilhost
Feb 22, 202431mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 1:49

    What Pinecone is: vector databases and embeddings as AI “memory”

    Sarah introduces Edo Liberty and Pinecone, framing the company as infrastructure for long-term memory in AI apps. Edo explains how models convert text/images into numeric embeddings and why searching those vectors efficiently becomes a new database problem.

    • Pinecone as a purpose-built vector database for AI applications
    • Embeddings/vectors as the numeric representations models actually use
    • Why vector search differs from traditional data storage and retrieval
    • Early skepticism about the “vector database” category—now mainstream
  2. 1:49 – 3:07

    Why start Pinecone in 2019: seeing the “earthquake” before the AI “tsunami”

    Elad asks how Edo formed the thesis before the generative AI boom. Edo describes how transformers (e.g., BERT) signaled the direction of travel and how the company timing felt both too early and too late—suggesting it was right.

    • Transformers and deep learning adoption were already accelerating by 2019
    • Connecting the dots from embeddings to large-scale retrieval needs
    • Founder timing anxiety: too early vs. too late
    • Betting that vector-native infrastructure would be required
  3. 3:07 – 6:01

    Core RAG value: retrieving the right context to reduce hallucinations

    Edo lays out the main embeddings use case: retrieval-augmented generation (RAG). By pulling relevant documents into the model’s context at query time, AI systems answer more accurately and hallucinate less.

    • RAG brings relevant information into the prompt/context at interaction time
    • Similarity search over embeddings to find “relevant” content
    • Experiment: augmenting multiple LLMs with Common Crawl reduces hallucinations (sometimes ~50%)
    • RAG can level performance differences across base models
  4. 6:01 – 7:42

    Where customers use Pinecone: proprietary enterprise corpora and AI Q&A products

    The conversation shifts from web-scale demos to real customer deployments. Most usage is retrieval over private company data—often text-heavy—with examples like Notion Q&A and Gong’s sales-call intelligence.

    • Common data types: internal wikis, legal/medical docs, sales calls, knowledge bases
    • Text remains the dominant modality in production workloads
    • Pattern: companies hosting lots of user/customer text build AI features atop it
    • Examples: Notion Q&A and Gong leveraging Pinecone for large-scale Q&A/search
  5. 7:42 – 10:13

    Serverless Pinecone and Canopy: removing scale, cost, and ops limits for RAG

    Elad asks about the “serverless direction” and Canopy. Edo clarifies Canopy is an open-source RAG framework, while Pinecone Serverless is a re-architecture aimed at massive scale, simpler operations, and better unit economics.

    • Distinction: Canopy (open-source RAG framework) vs. Pinecone Serverless (managed product)
    • Shift from early recsys/anomaly workloads to RAG’s massive knowledge requirements
    • Production RAG needs hundreds of millions to tens of billions of vectors
    • Serverless goals: easy ops, elastic scaling, and cost-efficient querying
  6. 10:13 – 11:49

    Why RAG is hard in practice: chunking, embedding, retrieval, and re-ranking

    Sarah probes where developers struggle. Edo details the end-to-end system complexity beyond “just store vectors,” explaining why Canopy serves as a working reference implementation teams can adapt to their data type and product.

    • RAG pipeline steps: ingest → chunk → embed → index → query transform → search → rerank
    • System design varies by domain (medical vs. Jira vs. Slack)
    • Canopy as an “end-to-end cookbook” to get a baseline working system
    • Encourages iterative tuning after establishing a functional prototype
  7. 11:49 – 13:54

    Why not Postgres/pgvector or Elastic: purpose-built scaling and retrieval mode mismatch

    Sarah asks about alternatives: retrofitting vectors into traditional databases or relying on classic keyword search. Edo argues bolt-on vector indexes hit production limits, and keyword search often fails because semantic relevance lives in high-dimensional vector space.

    • Traditional DB vector features can work for small experiments but struggle in production
    • Key constraints: scale, latency/throughput, and especially cost/unit economics
    • Keyword search retrieves in the wrong space; embeddings capture contextual similarity
    • Vector databases are designed for efficient large-scale vector operations
  8. 13:54 – 16:51

    Hybrid search: combining dense embeddings with sparse (keyword) signals—useful but transitional

    Sarah explores when embeddings alone aren’t enough. Edo explains Pinecone’s hybrid approach—scoring dense and sparse vectors together—and predicts keywords become more of a control/boosting lever as embedding quality improves.

    • Hybrid search used when perfect embedding performance is hard to achieve
    • Keywords can be represented as sparse vectors; combine sparse + dense scoring
    • Benefits: boosting and controllable retrieval behavior
    • Prediction: keyword-heavy hybrid usage diminishes over time
  9. 16:51 – 20:23

    Closed-source strategy: managed service complexity, trust, and the open-source tradeoff

    Sarah asks why Pinecone is closed source in a database market where many projects are open. Edo argues cloud-native distribution reduces the need for open source, external contributions are often limited, and operating Pinecone’s multi-tenant system is too complex to self-host easily.

    • Historical role of open source: distribution before cloud maturity
    • Trust via reliability and service quality vs. code transparency
    • Community contributions often dominated by the originating company anyway
    • Multi-tenant, fully managed architecture is operationally complex to run standalone
  10. 20:23 – 22:36

    Long-context and “infinite context”: why stuffing prompts doesn’t replace retrieval

    Elad asks whether long context windows reduce the need for vector databases. Edo is skeptical: bigger contexts can raise costs without improving results, and practical limits remain—retrieval is still necessary to select the right information.

    • Token-based pricing incentivizes selling larger context windows
    • Evidence: naive context stuffing often doesn’t improve quality
    • Analogy: sending “the internet” with every query isn’t feasible
    • Vector DBs can hold far more knowledge than fits in any context window
  11. 22:36 – 25:28

    Privacy and data leakage: RAG as an operationally safer alternative to fine-tuning

    Elad raises enterprise concerns about leakage and personalization. Edo explains two leakage paths (model providers retaining data; fine-tuning causing memorization) and why retrieval-time augmentation with no training/logging can reduce exposure and support deletion requirements.

    • Leakage risk #1: third-party model services storing/retraining on your data
    • Leakage risk #2: fine-tuning can cause proprietary data to surface unexpectedly
    • RAG approach: don’t train on proprietary data; retrieve only what’s needed at inference
    • Operational benefits: easier GDPR-style deletion—remove from DB and it’s gone
  12. 25:28 – 27:33

    Fine-tuning vs prompt engineering vs RAG: what works, what’s risky, and why

    Elad asks when fine-tuning makes sense relative to prompts and RAG. Edo supports fine-tuning scientifically but warns it’s easy to do poorly without expert teams; prompt engineering is necessary but limited; RAG is powerful yet still complex to implement well.

    • Fine-tuning can help a lot, but can also regress performance if done wrong
    • Prompt engineering shapes format/behavior but has diminishing returns
    • RAG provides strong knowledge injection but requires careful system work
    • Market maturity sign: even the “simplest” approach still feels complex
  13. 27:33 – 31:28

    What’s next: scaling serverless infrastructure and a future split between reasoning and knowledge

    Closing out, Elad asks about Pinecone’s roadmap and broader AI trends. Edo emphasizes infrastructure improvements (security, stability, cost, performance) and shares excitement about separating reasoning engines from knowledge stores to reduce inefficiency and enable better products.

    • Near-term focus: faster, larger, more secure, easier-to-use serverless Pinecone
    • RAG is surfacing classic IR challenges the community has studied for decades
    • Big idea: separate reasoning from knowledge instead of cramming “half the internet” into models
    • Expectation: experimentation will translate into more production AI products for consumers

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.