← Back to home

Before / after — Vercel AI SDK

A real agent test on the Vercel AI SDK. Same test, same model call. The only difference is whether CI hits the live API — or replays a committed cassette: record once, replay forever.

Before Vercel AI SDK · live model in CI
import { describe, it, expect } from 'vitest';
import { openai } from '@ai-sdk/openai';
import { runCheckoutAgent } from '../src/agent';

// Option A: let the test call the live model in CI.
describe('checkout agent', () => {
  it('runs the checkout flow', async () => {
    // Needs OPENAI_API_KEY wired into CI secrets — auth hygiene on
    // every runner. Each run bills real tokens: $ per push, per retry.
    const result = await runCheckoutAgent({
      model: openai('gpt-4o'),
      prompt: 'buy a t-shirt',
    });

    // The model is nondeterministic. It picked three steps on your laptop;
    // tonight it adds a clarifying turn and this assertion flips red.
    expect(result.steps).toHaveLength(3);

    // 3am: an upstream latency spike times the request out. The build
    // goes red, nobody touched the code. "Works on my machine." Re-run.
  });
});

// Option B: hand-write a MockLanguageModelV3 for every turn instead.
// No network — but you author each chunk by hand, it doesn't replay a
// real stream, and the next `ai` SDK bump rewrites the part shapes out
// from under you. Brittle boilerplate that rots the day you stop looking.

The test calls the real model on every run: flaky on upstream latency, billed per push, and asserting against nondeterministic output. The only escape — hand-writing a MockLanguageModelV3 per turn — is brittle boilerplate that collapses on the next SDK bump.

After tapedeck · cassetteMiddleware + withCassette
import { describe, it, expect } from 'vitest';
import { openai } from '@ai-sdk/openai';
import { wrapLanguageModel } from 'ai';
import { withCassette } from '@nkwib/tapedeck/vitest';
import { cassetteMiddleware } from '@nkwib/tapedeck';

// Wrap the model once. Behaviour switches on one env var — nothing else.
const model = wrapLanguageModel({
  model: openai('gpt-4o'),
  middleware: cassetteMiddleware({
    mode: process.env.CASSETTE_MODE ?? 'live', // record | replay | live
    cassetteDir: './cassettes',
    redact: ['apiKey', 'authorization', /token/i],
  }),
});

describe('checkout agent', () => {
  it('runs the checkout flow', async () => {
    // Recorded once with CASSETTE_MODE=record against the live API.
    // withCassette pins this fixture and forces replay for its duration.
    await withCassette('checkout-flow.json', async () => {
      const result = await runCheckoutAgent({ model, prompt: 'buy a t-shirt' });

      // Deterministic, offline, free — and stream-accurate: the recorded
      // parts replay as a genuine ReadableStream, so streamText sees what
      // it would live, down to the chunk boundaries.
      expect(result.steps).toHaveLength(3);
    });
  });
});

// Change the prompt or a tool schema and the hash changes — replay
// misses and the test fails loudly:
//   CassetteMissError: no cassette for sha256:abc123… in ./cassettes
// Re-record, commit the new cassette, move on.

Wrap the model once and record a cassette once. Every run after is deterministic, offline, free, and stream-accurate — replayed as a real ReadableStream. Change a prompt or tool schema and the hash changes: replay misses and CI fails loudly with a CassetteMissError.

The wedge, in a few lines of diff

- const result = await runCheckoutAgent({
-   model: openai('gpt-4o'),
-   prompt: 'buy a t-shirt',
- });
+ const model = wrapLanguageModel({
+   model: openai('gpt-4o'),
+   middleware: cassetteMiddleware({ mode: process.env.CASSETTE_MODE ?? 'live' }),
+ });
+ await withCassette('checkout-flow.json', async () => {
+   const result = await runCheckoutAgent({ model, prompt: 'buy a t-shirt' });
+   expect(result.steps).toHaveLength(3);
+ });

Try it

Wrap your model, run the test once with CASSETTE_MODE=record against the live API, and commit the cassette. Set CASSETTE_MODE=replay in CI — your tests are now offline, deterministic, and free.