CHAPTERS
Reducto’s mission: turning complex documents into LLM-ready structured data
The founders explain that Reducto converts messy, high-stakes enterprise documents (claims, health records, financial statements) into clean structured outputs. The primary goal is to make downstream LLM use cases—like RAG and summarization—work reliably with real-world inputs.
- •Transforms complex documents into structured data
- •Targets LLM workflows: RAG, summarization, and more
- •Handles enterprise-grade document types (insurance, healthcare, finance)
- •Positions ingestion/structuring as the critical first step for AI apps
Early enterprise adoption and why this became a bottleneck worth solving
Reducto quickly found traction with sophisticated customers, including named startups and unnamed trillion-dollar enterprises. The team frames document ingestion as a core bottleneck preventing AI applications from reaching production quality.
- •Adoption by teams like Vanta and large unnamed enterprises
- •Rapid growth within ~1 year post-YC
- •Document ingestion is a fundamental blocker for AI app quality
- •Customers want reliability across a wide variety of documents
From long-term LLM memory to a surprise “marketing stunt” that became the product
The company didn’t set out to build a document processing platform; it emerged while building a different LLM product. A quick-and-dirty segmentation model shared as a blog/experiment drew strong inbound interest and revealed unmet demand.
- •Original work: long-term memory for LLMs
- •User request: manage uploaded files alongside chat history
- •Off-the-shelf tools failed to meet needs
- •Weekend segmentation demo outperformed incumbents and drove demand
- •Teams asked for an API and paid product immediately
The schlep-blindness insight: PDFs are hard, everyone suffers, nobody wants to own it
They discuss how PDF/document extraction is a known pain point but avoided because it’s perceived as unglamorous. Reducto leaned into this “boring but essential” infrastructure gap—similar to the early opportunity Stripe exploited in payments.
- •Founders were surprised the problem wasn’t already solved
- •Community signal: many AI founders struggled with ingestion
- •“Boring” infrastructure work can create massive leverage
- •Analogy to PG’s schlep blindness / Stripe-like opportunity
A vision-first approach: reading documents like humans instead of rules and heuristics
Reducto’s technical pivot was to treat document understanding as a computer vision problem rather than a pile of PDF-specific rules. Layout cues—spacing, indentation, hierarchy—become first-class signals for building robust, general parsing.
- •Shift from rule-based parsing to computer-vision-driven understanding
- •Uses human-like interpretation of layout semantics
- •Background in CV/ML research influenced the approach
- •Focus on generality: not just invoices, but the “long tail” of documents
Customer impact: big jumps in downstream LLM accuracy and less post-processing work
Customers often see large improvements in end-to-end LLM performance simply by switching ingestion providers. Reducto also reduces the need for manual cleanup steps like chunking and post-processing, letting teams focus on product logic and reasoning.
- •Swapping ingestion can materially boost LLM accuracy
- •Reported improvements can reach ~30%+ on hard documents
- •Reduces customer burden: chunking and post-processing handled upstream
- •Reframes the problem: once data is clean, new applications become possible
New capabilities unlocked: scanned docs, handwriting, checkboxes, and messy real-world artifacts
They highlight cases where previously impossible workflows become feasible, including scanned documents without metadata and forms with handwriting. By combining deterministic CV and modern VLM/OCR strengths, Reducto handles edge cases like highlights, circled numbers, and instruction-guided extraction.
- •Enables features that previously weren’t feasible for customers
- •Handles scanned docs lacking metadata
- •Handwriting extraction improved via VLM-era techniques
- •Robust to real-world artifacts (checkboxes, highlighting, circling)
- •Supports plain-text instructions to guide extraction behavior
How they closed a Fortune 25 enterprise deal rapidly: credibility through performance
The founders recount an intense evaluation process that began during YC and culminated in a full enterprise contract. Their main competitor was the customer’s internal document processing team, making it a high bar—yet they won by proving speed and quality gains.
- •Enterprise journey began from Launch YC demo success
- •Competitor was the company’s internal document team (strong benchmark)
- •Long, multi-stakeholder evaluation with deep technical scrutiny
- •Onsite grilling session with many stakeholders, including competitors
- •Outcome: fully signed deal and ongoing usage
Reaching “SOTA” in document extraction: data engine, benchmarks, and iteration speed
Reducto found that public datasets/benchmarks weren’t sufficient for measuring real enterprise complexity. They invested in high-quality data pipelines and diverse sampling to evaluate rigorously, iterate quickly, and set a higher standard for extraction quality.
- •Document AI lacked enough high-quality public datasets
- •Built in-house data pipelines and a “data engine”
- •Prioritized diversity of documents to cover edge cases
- •Vision-first perspective enabled novel methods over legacy heuristics
- •Benchmarking and iteration became a core competence
Expanding beyond PDFs: a unified ingestion layer across file types
Customer demand pulled Reducto into supporting more formats to avoid forcing teams to maintain multiple pipelines. They now handle spreadsheets, images, documents, and slides—while aiming to preserve the same accuracy bar.
- •Support expanded to spreadsheets, images, documents, slides
- •Customer pain: maintaining separate ingestion pipelines
- •Strategy: broaden endpoints while maintaining accuracy standards
- •Positions product as a unified ingestion layer for enterprises
Why Reducto is becoming core AI infrastructure for agents and enterprise apps
They argue that teams build ingestion in-house only because quality demands force them to—until a better external option exists. Reducto aims to be that foundational layer so teams can adopt new models faster and focus on higher-level reasoning and product improvements.
- •Core value: unblock product quality without building ingestion in-house
- •Helps teams move faster with new models and workflows
- •Applicable to startups, scale-ups, and enterprises
- •Acts as foundational infra for AI apps and agentic systems
Hiring and culture: scrappy builders obsessed with data quality details
Following their Series A, the team discusses hiring across engineering and ML roles. They emphasize scrappiness and a deep care for detail—because data quality directly determines customer outcomes, and building SOTA requires hands-on, meticulous validation.
- •Hiring across product engineering and ML roles
- •Seeks scrappy startup-minded engineers and founders
- •Strong emphasis on detail-oriented craftsmanship
- •Example: researchers manually reviewing thousands of pages
- •Core mantra: data quality determines end results and customer success
