Skip to content
ClaudeClaude

How Datadog built a universal machine tool for Claude Code

At Datadog, 90% of engineers adopted AI coding tools for production work in the last four months, with Claude Code driving two-thirds of that usage. As sessions grew more ambitious, the reusable tools they produced — for verification, debugging, orchestration — sprawled into unmaintainable one-offs. Sesh Nalla, VP of Engineering, shares how Datadog built Temper: a constrained framework that produces secure, reusable tools that compound across sessions and teams instead.

May 5, 202630mWatch on YouTube ↗

At a glance

WHAT IT’S REALLY ABOUT

Datadog’s Temper: machine-tool discipline for agent-built software factories

  1. The speaker traces Datadog’s progression from human-built distributed systems to agent-assisted construction, culminating in a need for industrial-grade repeatability rather than one-off agent improvisation.
  2. Datadog’s experiments (Courier, Bits Evolve, Helix) reveal shifting bottlenecks: from building systems, to feedback/verification harness quality, to operationalization and shared tooling for production readiness.
  3. Temper is introduced as a “machine tool” that forces intent into explicit, declarative operational blueprints (states, transitions, policies, effects) that compile into a runtime rather than ad-hoc code.
  4. By turning implicit control logic into formal transition tables with policy gates and typed effects, Temper aims to let agents iterate quickly (even hot-reload) while maintaining safety and auditability.
  5. A multi-layer verifier (model checking, fault injection, property testing) becomes the key scaling mechanism for trust, compounding over time as new gaps are discovered and encoded into tests/models.

IDEAS WORTH REMEMBERING

5 ideas

Agent speed exposes human-shaped tooling as the new bottleneck.

As Claude Code enables one person to drive multiple fast agent sessions, the limiting factor becomes coordination, operational knowledge transfer, and production hardening—workflows originally designed around human attention.

Scaling agent-built systems requires industrialization, not just mechanization.

Mechanization is “agents write more code,” while industrialization is making work repeatable, controllable, and verifiable so multiple people (and agents) can operate what was built.

Closed-loop evolution only works as well as its harness.

Bits Evolve demonstrated that benchmarks, tests, and observability define what improvements survive; weak feedback loops lead to shallow or misguided optimization.

Temper shifts output from code to explicit operational state machines.

Instead of generating scattered CRUD/control logic across services and docs, agents produce a declarative blueprint that compiles into a formal transition system, improving reasoning and review for both humans and agents.

Data-like transition tables make change safer and more targeted.

When the control plane is represented as explicit transitions plus policies, agents can propose small, localized spec edits (e.g., add rollback path) without rewriting large imperative code paths.

WORDS WORTH SAVING

5 quotes

In a few days, to our disbelief, we had a full functional Kafka, Kafka-compatible system working, and we called it Helix.

Sesh Nalla

So mechanization means agents are doing more of the work now. And industrialization, if we were to borrow the metaphor, means work becomes repeatable, verifiable, controllable, and scalable.

Sesh Nalla

So the human role now becomes like designing the factory and the constraints and the outcomes and the verification loop so this thing can run for hours and days and weeks producing what you want it to produce.

Sesh Nalla

You're no longer writing the code. You're shaping the work.

Sesh Nalla

The win was not that one artisan could build a brilliant machine. The win was that a machine built with machine tools made parts composable and inspectable and replaceable that we could build larger machines.

Sesh Nalla

Manufacturing machine-tool metaphor for softwareShifting bottlenecks and Amdahl’s law in agent workflowsCourier, Bits Evolve, Helix (Kafka-compatible system) lineageDark factory: autonomous agent software productionTemper blueprints: states, transitions, triggers, invariantsPolicy gates, typed effects, Wasm for bounded arbitrary codeVerification stack: model checking, simulations, property tests

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome