Guide

Agent tests that hit the live model are flaky, paid, and nondeterministic — a temperature wobble or a provider hiccup turns a green suite red, and every CI run burns tokens. tapedeck records that call once against the real API, commits the result as a cassette, and replays it on every run after: deterministic, offline, free, and stream-accurate.

tapedeck is a Vercel AI SDK companion. It plugs in at the wrapLanguageModel middleware layer (model spec v3), so it is provider-agnostic and stream-aware by construction — no HTTP proxy, no mock to hand-write, no infra. Switch behaviour with one env var; nothing else in your code changes.

Install

npm install -D @nkwib/tapedeck
# or
pnpm add -D @nkwib/tapedeck

Requires the ai peer (>=6.0.0 <7). tapedeck has zero runtime dependencies beyond that peer — @ai-sdk/provider is a type-only dev dependency.

Quickstart

Wrap your model once and read the mode from an env var. That is the whole integration:

import { openai } from '@ai-sdk/openai';
import { generateText, wrapLanguageModel } from 'ai';
import { cassetteMiddleware } from '@nkwib/tapedeck';

const model = wrapLanguageModel({
  model: openai('gpt-4o'),
  middleware: cassetteMiddleware({
    mode: process.env.CASSETTE_MODE ?? 'live', // record | replay | live
    cassetteDir: './cassettes',
    redact: ['apiKey', 'authorization', /token/i],
  }),
});

// CASSETTE_MODE=record → hits the live API, writes a cassette.
// CASSETTE_MODE=replay → offline, deterministic, free.
const { text } = await generateText({ model, prompt: 'Say hi' });

The recommended workflow: live in development, record to capture a fixture once, replay in CI. Run the test against the live API a single time, commit the cassette, and flip CI to replay.

Modes

ModeBehaviour
recordCalls the real model, serializes request + response to a cassette, returns the live result.
replayLooks up the cassette by hash, serves it. A miss throws — a changed prompt or tool schema fails the test, forcing a re-record.
livePassthrough. No recording, no lookup.

In record mode the call flows out to the provider and the response is captured on the way back:

In replay mode the cassette is resolved by hash and served straight back — the provider is never touched:

One env var, no code change
An invalid mode string fails fast with CassetteModeError the moment the middleware is constructed, so a typo'd CASSETTE_MODE never silently falls through to a live call.

cassetteMiddleware

cassetteMiddleware(options?) returns an AI SDK LanguageModelV3Middleware that intercepts both doGenerate (one-shot) and doStream (streaming).

OptionTypeDefaultDescription
mode'record' \| 'replay' \| 'live''live'Operating mode.
cassetteDirstring'./cassettes'Directory cassettes are read from / written to.
redact(string \| RegExp)[][]Extra key matchers, merged with the built-in defaults.
cassetteNamestringForce a specific filename instead of hash-addressing. Named cassettes are multi-interaction (keyed by hash). Mostly used internally by withCassette.
storeCassetteStorefilesystemStorage backend. Use memoryCassetteStore() on edge runtimes.
tracerTapedeckTracerOTel-compatible tracer; emits tapedeck.generate / tapedeck.stream spans.

tapedeck also exports lower-level primitives for direct use — hashing (computeCassetteHash, stableStringify, normalizeTools), cassette I/O (loadCassette / saveCassette, parseCassette / serializeCassette, isMultiCassette), diff/merge (diffCassettes, diffCassetteFiles, mergeCassetteDirs), storage (fileCassetteStore, memoryCassetteStore), telemetry (withSpan), and the constants CASSETTE_VERSION, MULTI_CASSETTE_VERSION, cassetteFilename(hash), REDACTED, DEFAULT_REDACT. See the API reference for all of them.

Streaming

Streaming is first-class — not a non-goal. In record mode tapedeck drains the live stream, captures the ordered stream parts, and re-serves them so your code still receives the response. In replay mode the recorded parts are replayed as a genuine ReadableStream via the SDK's own simulateReadableStream, so streamText, UI message streams, and tool-call streaming all see the surface they would live.

import { streamText } from 'ai';

const { textStream } = await streamText({ model, prompt: 'Tell me a story' });
for await (const delta of textStream) process.stdout.write(delta);
// Identical output whether the model is live or replayed from a cassette.

Cassette format

Cassettes are pretty-printed JSON, keyed by a stable hash, designed to diff cleanly in PRs:

{
  "version": "tapedeck@0.1.0",
  "hash": "sha256:abc123…",
  "recordedAt": "2026-06-10T12:00:00Z",
  "request": {
    "modelProvider": "openai",
    "modelId": "gpt-4o",
    "prompt": [ ],
    "tools": [ ],
    "temperature": 0.7
  },
  "response": {
    "type": "stream",
    "chunks": [
      { "type": "text-delta", "id": "0", "delta": "I'll" },
      { "type": "text-delta", "id": "0", "delta": " help" },
      { "type": "tool-call", "toolCallId": "call_123", "toolName": "search", "input": "{\"query\":\"t-shirts\"}" }
    ]
  }
}

A one-shot generateText produces a "type": "generate" response holding the recorded content array, finish reason, and usage instead of chunks.

Named cassettes (from withCassette / cassetteName) use the v2 multi-interaction format: one file holding every call the test makes, keyed by hash — generate and stream interactions can mix freely:

{
  "version": "tapedeck@0.3.0",
  "recordedAt": "2026-06-10T12:00:00Z",
  "interactions": [
    { "hash": "sha256:abc…", "request": { }, "response": { "type": "generate" } },
    { "hash": "sha256:def…", "request": { }, "response": { "type": "stream", "chunks": [ ] } }
  ]
}

Legacy v1 single-interaction named cassettes still replay (served as-is); hash-addressed cassettes always use the single format.

Hash algorithm

The hash is a SHA-256 of the canonicalized, sorted JSON of:

{ modelProvider, modelId, prompt, toolSchemas, maxOutputTokens, temperature, topP }

Tool schemas are normalized (descriptions stripped, keys sorted) so cosmetic doc changes don't invalidate a cassette — but a changed prompt, tool input schema, or sampling param does. That is the point: a behavioural change fails CI loudly instead of replaying stale data.

A changed prompt fails CI on purpose
When the inputs to the hash change, replay misses with CassetteMissError and the test fails. Re-record, eyeball the cassette diff in the PR, and commit the new fixture.

Secret redaction

Redaction is key-name based and runs at record time, so secrets never reach disk:

  • Default matchers: apiKey, authorization, x-api-key, bearer, token (case-insensitive).
  • Configurable via redact: (string | RegExp)[] — strings match field / header names case-insensitively; RegExps test the raw key. Your matchers are merged with the built-in defaults.
  • Replaying a cassette that still contains a value a matcher would strip throws CassetteSecretError — a committed secret fails the build instead of leaking.
cassetteMiddleware({
  mode: 'record',
  redact: ['apiKey', 'authorization', /secret/i],
});

Vitest helper

@nkwib/tapedeck/vitest exports withCassette(name, testFn, options?), which pins a test to a named cassette and forces replay mode for its duration:

import { describe, it, expect } from 'vitest';
import { withCassette } from '@nkwib/tapedeck/vitest';

describe('checkout agent', () => {
  it('runs the checkout flow', async () => {
    await withCassette('checkout-flow.json', async () => {
      const result = await runAgent({ prompt: 'buy a t-shirt' });
      expect(result.steps).toHaveLength(3);
    });
  });
});

Any cassetteMiddleware instance active inside the callback picks up the named cassette automatically — via an AsyncLocalStorage context — and tears down on exit, so there is no global setup/teardown to wire up. Pass options.mode to override the forced replay, or options.cassetteDir to point at a different directory.

The named cassette is multi-interaction: if the agent above makes three model calls, all three are recorded into checkout-flow.json keyed by request hash, and each call replays its own response — in any order. Re-recording a test starts the file fresh, so stale interactions never linger.

@nkwib/tapedeck/vitest also exports the toFollowRoute() matcher: pair with toolroute to assert that the replayed trajectory only makes transitions your router allows (expect(result.steps).toFollowRoute(router) after expect.extend({ toFollowRoute })).

CLI

The package ships a tapedeck bin for the record/replay workflow:

npx tapedeck record ./scripts/demo.mjs    # run with CASSETTE_MODE=record
npx tapedeck replay ./scripts/demo.mjs    # run with CASSETTE_MODE=replay
npx tapedeck record pnpm test             # non-file args run as commands on PATH

npx tapedeck ls ./cassettes               # kind, model, recordedAt per cassette
npx tapedeck diff a.json b.json           # semantic field-level diff (exit 1 on difference)
npx tapedeck merge ./from-ci ./cassettes  # merge directories; --force overwrites conflicts

diff pinpoints which fields diverged and ignores recordedAt; merge skips identical files and fails on conflicts unless --force is passed.

Telemetry

Pass any OTel-compatible tracer and every record/replay emits a span — typed structurally, so tapedeck keeps zero runtime dependencies:

import { trace } from '@opentelemetry/api';

cassetteMiddleware({ mode: 'replay', tracer: trace.getTracer('tapedeck') });

Spans (tapedeck.generate / tapedeck.stream) carry mode, hash, cassette path, model, hit/miss, and chunk-count attributes; a miss records the exception with an error status, so a failing CI replay shows up in traces.

Storage & edge runtimes

Cassette I/O goes through a CassetteStore (read/write/list). The default is the filesystem (loaded lazily); pass memoryCassetteStore() — or a KV/R2-backed store — on edge runtimes. The core never imports node:fs, node:path, or node:crypto statically; the one remaining Node builtin is node:async_hooks, which Cloudflare Workers provides under the nodejs_compat flag. See Compatibility for the caveats.

Errors

ErrorWhen
CassetteMissErrorreplay mode, no cassette matches the hash. Message includes the hash and the path searched.
CassetteSecretErrorA replayed cassette still contains unredacted secrets. Lists the offending field paths.
CassetteCorruptErrorInvalid JSON, unknown version, or a malformed / mismatched response shape.
CassetteModeErrorAn invalid mode string was supplied.

All extend CassetteError, so you can catch the whole family with one instanceof CassetteError.

Roadmap

Everything deferred from the first cut — OTel spans, the CLI, diff/merge tooling, the edge-safe core, the toFollowRoute() matcher, and multi-interaction named cassettes — has shipped as of 0.3.0. Still ahead:

  • Deployed Cloudflare Workers smoke test in CI — edge support is designed-for, not yet CI-verified.
  • Interaction-level merge for multi-cassettes (merge is file-level today).

See the changelog for the full release history.