Content-Addressed Steps: How opentine Guarantees Bit-Exact Replays

What content-addressing means in opentine

Every step in an opentine run has an ID derived from the hash of its inputs. In the current core (see opentine.core.step_id), the hash is computed over three canonical components:

kind — the step type (prompt, tool-call, tool-result, etc.)
inputs — the step's payload dict (the prompt text, the tool name + arguments, the fetched bytes, etc.)
parent_id — the ID of the step this one follows

def step_id(kind, inputs, parent_id):
    blob = msgspec.json.encode({"k": kind.value, "i": inputs, "p": parent_id})
    return hashlib.sha256(blob).hexdigest()[:12]

The ID is a 12-character prefix of the SHA-256 — short enough to reference in a terminal, long enough to avoid collisions at agent-run scale. Two steps built from the same (kind, inputs, parent_id) triple produce the same ID every time.

This is the same primitive git uses for commits and Nix uses for packages. opentine applies it to agent execution.

What the ARP spec formalizes beyond the current core

The Agent Run Protocol specification — the framework-agnostic standard opentine also implements — extends content-addressing in ways the reference core will pull in over time:

Richer hash inputs: system prompt, tool schemas, model identifier + version, sampling parameters — all intended to be part of the canonical hash so two runs with different models produce different IDs even when the user text matches. The shipping core today hashes the inputs payload; deeper inputs are recorded in step metadata but not yet in the ID itself.
Canonical serialization rules: lexicographic JSON key ordering, float normalization, NFC Unicode, trailing-zero stripping. The ARP spec documents these; opentine today relies on msgspec's default deterministic ordering, which is stable in practice across msgspec versions but not yet a formally-canonical spec.
Object-store layout: the spec defines a content-addressed .arp/objects/<prefix>/<suffix>/ layout for cross-run deduplication. Today opentine serializes whole runs to .tine files; the object-layer CAS is a forward item.

The posts in this silo describe both the shipping core and the direction the ARP spec is pulling it. Where a feature is aspirational, we flag it.

Why it matters — the three properties it unlocks

1. Replay is bit-exact when the model is deterministic

If you replay a step whose inputs include a deterministic-sampling model call (temperature 0, fixed seed, or a local model with fixed weights), the output is bit-identical to the original. No "re-run and hope it matches" — the semantics are precisely reproducible.

For debugging, regression tests, and multi-agent evaluations, this is the difference between "I think it was working yesterday" and "the step ID is identical, here's the diff between then and now."

For stochastic-sampling runs, the step ID still uniquely identifies the inputs that were requested — so a replay tells you whether the model's behavior has drifted between the original run and now. If the model is newer (e.g., Anthropic updated the underlying weights), the output will differ, and the diff is the drift signal.

2. Resume and fork reuse work without re-execution

Since step IDs are stable, a run that re-executes a subtree doesn't need to recompute ancestors that are already serialized in the .tine file. tine resume and tine fork re-materialize the parent chain by reading it from disk — no model round-trips for steps that have already happened.

A note on caching vs. resume: content-addressing is the foundation a transparent step cache can be built on, and the ARP spec describes that object-layer cache. The shipping opentine core today does not yet implement automatic cache-lookup on new runs (each tine run starts fresh); what it does implement is resume and fork from persisted step trees, which gets you most of the practical benefit for iteration workflows. Treat cross-run caching as a near-term addition, not a current feature.

Practical effect today: iterating on an agent workflow where only the later steps change costs near-zero tokens to re-run the early steps, because you resume or fork from the saved tree rather than starting a new run.

3. Forking is a pointer operation

When you fork a run from step 36 — changing the next prompt, tool definition, or model — you don't need to copy the preceding 35 steps. The fork is a new run that shares the step IDs 1–35 with the parent and diverges at 36. Disk and network overhead is a handful of bytes.

If you later want to compare the two runs, opentine can diff them: identical subtrees are elided; divergence points are highlighted. The fork primitive post covers the user-facing ergonomics of this.

How the hashing stays stable

Naive content-addressing breaks on serialization-order ambiguity. If today {"a": 1, "b": 2} and tomorrow {"b": 2, "a": 1} represent the same logical input, a naive hash treats them as different IDs.

opentine's shipping core uses msgspec.json.encode, which produces a deterministic byte order for equivalent inputs on the same msgspec version and platform — stable enough that the same (kind, inputs, parent_id) triple reliably yields the same step ID.

The ARP specification goes further and defines a formal canonical form (lexicographic JSON key ordering, NFC Unicode normalization, normalized floats, trailing-zero stripping, fixed enum/boolean representations) so that step IDs are portable across runtimes and implementations. That canonical form is the direction the reference core is moving — today it is the spec, tomorrow it is also the enforcement layer.

The run tree (with DAG-shaped composition)

Each run's step graph today is a tree — every step has a single parent_id. Forks create new runs that share ancestor step IDs with the parent run but diverge from the fork point forward. When you look at the full picture across a family of forked runs, the composition is DAG-shaped: shared ancestor subtrees, divergent branches.

Runs are serialized to portable .tine files — a single archive containing every step's metadata (ID, kind, inputs, outputs) and the parent-chain edges. A .tine file is self-contained: ship it to a colleague, they can replay, inspect, or fork it locally.

Step IDs are globally consistent because the hash inputs are. Two .tine files that both contain a step whose ID is fe3a767307a4 describe the same step, and import tooling can deduplicate on that property — though the object-level cross-run deduplication store is part of the ARP-spec roadmap (the current core stores whole-run .tine files).

What this gives you in the three consoles

opentine exposes three first-party consoles over the same content-addressed kernel:

Console	Purpose	What content-addressing unlocks
opentine-tui	Terminal dashboard (Textual)	Live filtering on step IDs, keyboard-driven fork / resume / diff against any parent step
opentine-gui	Native desktop (Dear PyGui)	Interactive node-editor of the DAG, minimap navigation, fork from any visual node
opentine-web	Browser (Starlette + Mermaid.js)	Shareable DAG views via URL, REST API endpoints keyed on step ID, embedding in internal tooling

Each console reads the same .tine runs directory (configurable via OPENTINE_RUNS_DIR / TINE_RUNS_DIR). Switching surfaces mid-workflow is a no-op — your state is in the on-disk run files, identified by the same content-derived step IDs in every console.

Comparison: how this differs from other agent frameworks

Framework	Step identity model	Replay	Fork cost
opentine	Content-addressed hash of inputs	Bit-exact (deterministic sampling)	Pointer operation
LangGraph	State checkpointing with custom serializer	Replay from checkpoint, not bit-exact	Deep-copy of state
DSPy	Program compilation + prompt caching	Cache keyed on compiled program	Recompile to fork
Raw agent loops (no framework)	None	Impossible without re-running everything	Re-run from scratch

The distinctive property is that content-addressing makes replay, caching, and forking the same mechanism — all three reduce to "look up this ID in the content-addressed store." Other frameworks treat them as separate concerns.

When content-addressing doesn't help

A realistic post about any primitive should cover its limits:

Non-deterministic model sampling. If you run at high temperature with no seed, two executions of the same inputs produce different outputs. The step ID is stable (keyed on inputs), but the realized output in the .tine file is whatever the model returned that time. Resume/fork reuses the saved output; a fresh run from the same inputs would get a different one. Callers can rely on the saved tree for reproducibility or override it by deleting the stored output — explicit invalidation rather than automatic.
External mutable state. If a step calls an external API whose response depends on wall-clock time or an external database's state, the outputs will differ between runs. opentine records this by including the external fetch response in the step's content-addressed inputs for downstream dependencies, but it cannot retroactively make the external world deterministic.
Model provider drift. If Anthropic silently updates claude-opus-4-7 weights between runs, the content-derived step ID remains the same (under the current hash scope) but the output drifts. Under the ARP spec's richer hash inputs this changes — the model identifier and version become part of the ID so drift shows up as a different step ID. Until that lands, "same ID" does not imply "same output" across time when a model identifier is coarse-grained.

In practice most agent workflows are tolerable of all three caveats, and the 90% case where content-addressing works cleanly is where most of the value lives.

Where to go next

opentine: Why the Fork Primitive Matters — the user-level story
opentine vs LangGraph vs Temporal vs DSPy — runtime comparison
The opentine product page for installation, CLI reference, and the three consoles

Or install the CLI and inspect a run yourself: tine run <your-agent.py> and then tine show <run_id> to see the content-addressed step tree that gets built.