Artie: Real Time Data Streaming For The AI Age

In this episode of Founder Firesides, YC Managing Partner Jared Friedman talks to the founders of Artie (S23), Jacqueline Cheong and Robin Tang, who have just announced their Series A. Artie is a real-time data streaming platform for cutting edge companies, streaming up-to-date and reliable data between systems in real time.

Jared FriedmanhostJacqueline CheongguestRobin Tangguest

Jan 26, 202626mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

Artie builds mission-critical real-time data streaming for AI workloads

Artie is a real-time data streaming platform that moves production data (e.g., Postgres) into destinations like Snowflake with low latency and strong reliability guarantees.
The founders built Artie after repeatedly seeing teams fail to productionize in-house CDC pipelines, which can take 6–24+ months and requires specialized distributed-systems expertise.
Early go-to-market was high-friction and trust-based: Substack became the first major customer via a cold email and a rigorous proof-of-concept that pushed billions of rows under strict constraints.
The core technical challenge is handling messy real-world data and failure modes (backfills, ordering guarantees, sharding, Kafka rebalancing bugs, undocumented SQL Server CDC paths) where customers expect the vendor to own the entire outcome.
After processing 700B+ rows in the last year and reaching $1M ARR with a tiny team, Artie raised a $12M Series A and plans to triple headcount while expanding from database CDC into an events API and more real-time destinations.

IDEAS WORTH REMEMBERING

5 ideas

Real-time pipelines become mandatory once companies hit a certain scale.

The founders describe a repeat pattern: teams want fresher data for experimentation and operational workflows, but internal data teams can’t justify or staff the effort unless it’s a top company priority.

Building CDC connectors in-house is deceptively expensive and failure-prone.

They cite year-long (or multi-year) attempts that still weren’t production-ready, because correctness requires handling edge cases like ordering, integrity, and failure recovery—not just moving rows.

Enterprise adoption hinges on proving reliability, not just features.

Substack’s initial hesitancy was overcome through a tight one-month POC that stress-tested constraints and volume; for infrastructure, “it works in prod” is the sales argument.

Backfill+stream correctness is a central design constraint.

Artie implemented an approach similar to “DB Log” patterns: backfill while capturing ongoing changes (buffering in Kafka) so long backfills don’t create data gaps or inconsistencies.

At-scale data streaming is dominated by ‘unknown unknowns.’

They highlight surprising real-world issues—like invalid timestamps in MongoDB or massive single-table backfills—that only appear in production and force continuous hardening.

WORDS WORTH SAVING

5 quotes

Artie is a real-time data streaming platform… we help companies move data across their systems in real time.

— Jacqueline Cheong

Spending a year to two years building a Postgres to Snowflake connector just seems weird. It seems nonsensical.

— Robin Tang

It was a cold email.

— Jacqueline Cheong

Data processing… is like a series of accumulated battle scars.

— Robin Tang

Their bug is your bug.

— Jared Friedman

Real-time data streaming and CDC (Postgres → Snowflake)Mission-critical infrastructure and reliability expectationsFounder-led sales and enterprise trust-building via POCsBackfills at scale and DB log style approachesMessy data, schema evolution, sharding, multi-tenant ingestionKafka rebalancing/ordering bugs and vendor/SDK riskRoadmap: events API, new sources/destinations (e.g., SQL Server, Elasticsearch)

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.