YC Root AccessArtie: Real Time Data Streaming For The AI Age
At a glance
WHAT IT’S REALLY ABOUT
Artie builds mission-critical real-time data streaming for AI workloads
- Artie is a real-time data streaming platform that moves production data (e.g., Postgres) into destinations like Snowflake with low latency and strong reliability guarantees.
- The founders built Artie after repeatedly seeing teams fail to productionize in-house CDC pipelines, which can take 6–24+ months and requires specialized distributed-systems expertise.
- Early go-to-market was high-friction and trust-based: Substack became the first major customer via a cold email and a rigorous proof-of-concept that pushed billions of rows under strict constraints.
- The core technical challenge is handling messy real-world data and failure modes (backfills, ordering guarantees, sharding, Kafka rebalancing bugs, undocumented SQL Server CDC paths) where customers expect the vendor to own the entire outcome.
- After processing 700B+ rows in the last year and reaching $1M ARR with a tiny team, Artie raised a $12M Series A and plans to triple headcount while expanding from database CDC into an events API and more real-time destinations.
IDEAS WORTH REMEMBERING
5 ideasReal-time pipelines become mandatory once companies hit a certain scale.
The founders describe a repeat pattern: teams want fresher data for experimentation and operational workflows, but internal data teams can’t justify or staff the effort unless it’s a top company priority.
Building CDC connectors in-house is deceptively expensive and failure-prone.
They cite year-long (or multi-year) attempts that still weren’t production-ready, because correctness requires handling edge cases like ordering, integrity, and failure recovery—not just moving rows.
Enterprise adoption hinges on proving reliability, not just features.
Substack’s initial hesitancy was overcome through a tight one-month POC that stress-tested constraints and volume; for infrastructure, “it works in prod” is the sales argument.
Backfill+stream correctness is a central design constraint.
Artie implemented an approach similar to “DB Log” patterns: backfill while capturing ongoing changes (buffering in Kafka) so long backfills don’t create data gaps or inconsistencies.
At-scale data streaming is dominated by ‘unknown unknowns.’
They highlight surprising real-world issues—like invalid timestamps in MongoDB or massive single-table backfills—that only appear in production and force continuous hardening.
WORDS WORTH SAVING
5 quotesArtie is a real-time data streaming platform… we help companies move data across their systems in real time.
— Jacqueline Cheong
Spending a year to two years building a Postgres to Snowflake connector just seems weird. It seems nonsensical.
— Robin Tang
It was a cold email.
— Jacqueline Cheong
Data processing… is like a series of accumulated battle scars.
— Robin Tang
Their bug is your bug.
— Jared Friedman
High quality AI-generated summary created from speaker-labeled transcript.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome