Skip to content
ClaudeClaude

Building the best agentic analytics harness: Powered by Claude, built with Claude Code

Omni built the best agentic harness for analytics tailored for and powered by Claude models, with 99% of the platform's code written using Claude Code. Cofounder & CTO Chris Merrick shows how they architect their multi-agent system, design and size their tools, and evaluate effectiveness.

Chris Merrickguest
May 21, 202626mWatch on YouTube ↗

At a glance

WHAT IT’S REALLY ABOUT

How Omni built Blobby, an agentic analytics harness with Claude

  1. Omni uses Claude to translate natural-language business questions into semantic queries that compile into SQL against a customer’s warehouse, with a semantic layer providing curated definitions, context, and permissions.
  2. The team’s development velocity increased substantially after adopting Claude Code (especially with Opus), helping engineers—and even the CTO—ship more code and iterate faster.
  3. Early Blobby iterations improved by adding LLM-targeted metadata (AI context, sample queries, and field values) to better ground the model in business terminology and real data distributions.
  4. A custom agentic harness (tasks, tool loops, checkpointing) boosted robustness, especially once Blobby could recover from errors using good, descriptive error messages and an explicit “retry budget.”
  5. Deep trace observability via evals enabled “blobotomies”: major architecture fixes like avoiding a split-brain outer-agent/sub-agent design and shifting to SQL-generation plus parsing for more efficient one-shot queries.

IDEAS WORTH REMEMBERING

5 ideas

Grounding quality depends on business-aware semantics, not just a strong LLM.

Claude can reason well, but accurate analytics requires encoding company-specific definitions (e.g., what “last quarter” means) and mapping terminology to the right datasets and joins.

Curate and localize context close to the fields it describes.

Omni’s semantic layer acts like “Claude.md near the code”: putting AI context next to metric/field definitions improves retrieval and reduces misinterpretation versus dumping guidance elsewhere.

Add LLM-specific metadata: AI context, sample queries, and example values.

Sample queries show “typical use,” while example values help with filter selection and fuzzy matching (e.g., recognizing region abbreviations or correcting user typos).

Agentic loops raise reliability most when they can recover from failures.

A large quality jump came from explicitly teaching Blobby how to handle errors, giving it retries, and investing in error messages that explain what broke and how to fix it.

Avoid “split-brain” agent architectures where planning and execution disagree.

Omni found the outer agent could request tasks that a query sub-agent couldn’t satisfy in one query, causing inconsistent behavior; consolidating tools/knowledge into a single “brain” improved complex-case performance.

WORDS WORTH SAVING

5 quotes

I sort of assumed that at some point I'd have to stop writing code. Uh, and I thank Claude very much for making me, uh, still able to do some software engineering from time to time.

Chris Merrick

Earlier in twenty-twenty-five we said to the team, "I don't know when and I don't know how, but I know our jobs are changing, so let's just start experimenting, start using these tools, figure out what works."

Chris Merrick

Claude is incredible at answering questions, but you need to tell it more about your business if you want it to answer questions about your business, right?

Chris Merrick

The agentic loop was really great at recovering from errors. So, like, one of the earliest, like, massive, uh, quality increases we made was to just, A, tell Blobby how to recover from errors and give, give it some budget to go do that.

Chris Merrick

He said, "Not good enough. Go fix it."

Chris Merrick

Claude-powered natural language to SQL via semantic layerSemantic layer: curation, context localization, permissionsAgentic harness design: tasks, tools, inner/outer loopsError recovery budgets and improving error messagesModel selection tradeoffs (Haiku → Sonnet) and token usageTrace-based evals and debugging “bad sessions”Switching from structured JSON queries to SQL + parsing (CTEs)

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.