CHAPTERS
Reducto’s mission: turning complex documents into LLM-ready structured data
The founders explain that Reducto converts messy, high-stakes enterprise documents (claims, health records, financial statements) into clean structured outputs. The primary goal is to make downstream LLM use cases—like RAG and summarization—work reliably with real-world inputs.
Early enterprise adoption and why this became a bottleneck worth solving
Reducto quickly found traction with sophisticated customers, including named startups and unnamed trillion-dollar enterprises. The team frames document ingestion as a core bottleneck preventing AI applications from reaching production quality.
From long-term LLM memory to a surprise “marketing stunt” that became the product
The company didn’t set out to build a document processing platform; it emerged while building a different LLM product. A quick-and-dirty segmentation model shared as a blog/experiment drew strong inbound interest and revealed unmet demand.
The schlep-blindness insight: PDFs are hard, everyone suffers, nobody wants to own it
They discuss how PDF/document extraction is a known pain point but avoided because it’s perceived as unglamorous. Reducto leaned into this “boring but essential” infrastructure gap—similar to the early opportunity Stripe exploited in payments.
A vision-first approach: reading documents like humans instead of rules and heuristics
Reducto’s technical pivot was to treat document understanding as a computer vision problem rather than a pile of PDF-specific rules. Layout cues—spacing, indentation, hierarchy—become first-class signals for building robust, general parsing.
Customer impact: big jumps in downstream LLM accuracy and less post-processing work
Customers often see large improvements in end-to-end LLM performance simply by switching ingestion providers. Reducto also reduces the need for manual cleanup steps like chunking and post-processing, letting teams focus on product logic and reasoning.
New capabilities unlocked: scanned docs, handwriting, checkboxes, and messy real-world artifacts
They highlight cases where previously impossible workflows become feasible, including scanned documents without metadata and forms with handwriting. By combining deterministic CV and modern VLM/OCR strengths, Reducto handles edge cases like highlights, circled numbers, and instruction-guided extraction.
How they closed a Fortune 25 enterprise deal rapidly: credibility through performance
The founders recount an intense evaluation process that began during YC and culminated in a full enterprise contract. Their main competitor was the customer’s internal document processing team, making it a high bar—yet they won by proving speed and quality gains.
Reaching “SOTA” in document extraction: data engine, benchmarks, and iteration speed
Reducto found that public datasets/benchmarks weren’t sufficient for measuring real enterprise complexity. They invested in high-quality data pipelines and diverse sampling to evaluate rigorously, iterate quickly, and set a higher standard for extraction quality.
Expanding beyond PDFs: a unified ingestion layer across file types
Customer demand pulled Reducto into supporting more formats to avoid forcing teams to maintain multiple pipelines. They now handle spreadsheets, images, documents, and slides—while aiming to preserve the same accuracy bar.
Why Reducto is becoming core AI infrastructure for agents and enterprise apps
They argue that teams build ingestion in-house only because quality demands force them to—until a better external option exists. Reducto aims to be that foundational layer so teams can adopt new models faster and focus on higher-level reasoning and product improvements.
Hiring and culture: scrappy builders obsessed with data quality details
Following their Series A, the team discusses hiring across engineering and ML roles. They emphasize scrappiness and a deep care for detail—because data quality directly determines customer outcomes, and building SOTA requires hands-on, meticulous validation.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome