@vevee/sdk

Drop-in usage metering and limits for AI-powered apps. Track LLM tokens, image generations, video seconds, agent steps - anything you sell. Provider-agnostic, strict-mode enforcement, zero-runtime-deps SDK.

i
For AI agents and coding assistants: a flat-text version of these docs is available at /docs.md and the full SDK reference at /llms.txt. Both are designed for ingestion by LLMs.

30-second snippet

The minimum viable integration - install, init, track:

pnpm add @vevee/sdk
import { createClient } from '@vevee/sdk';

const vevee = createClient({ apiKey: process.env.VEVEE_KEY! });

// After your AI call succeeds:
await vevee.track('user_abc123', 'image.render', 1, {
  model: 'flux-pro',
  resolution: '1024x1024',
});

What is Vevee?

Vevee is the metering and quota layer for products that resell AI capacity. You define plans and limits in the dashboard, install the SDK in your backend, and we handle per-user counters, period rollovers, atomic reservations, and analytics. You never write another if (user.imagesUsed >= plan.imagesLimit) branch.

The problem we solve

  • Every AI app needs per-user quotas, but rolling your own means counters, period resets, race conditions, and a usage table that grows forever.
  • Stripe meters dollars, not tokens or images. PostHog tracks events but doesn't enforce limits.
  • Naive if (used < limit) checks break under concurrency - two parallel requests both pass the check and both consume.

What you get

  • Atomic reservations - concurrent requests cannot bypass a quota.
  • Limit groups - one event can count against multiple quotas (e.g. premium-images AND total-images).
  • Period rollovers - daily / weekly / monthly / lifetime, calendar- or subscription-anchored.
  • Provider-agnostic - works with OpenAI, Anthropic, Replicate, Fal, your own models. The SDK only sees event names and quantities.
  • Zero runtime deps - the SDK uses native fetch. Tiny bundle, dual ESM/CJS, full .d.ts.

Mental model - three primitives

The SDK exposes three primitives in increasing order of safety:

MethodWhat it doesAtomic?When to use
track()Records consumption. Increments every matching limit group.NoAfter-the-fact metering when you don't need pre-flight enforcement.
canUse()Read-only check. Does NOT increment.NoUI gating - disable a button, show "upgrade" banners.
reserve() / commit() / release()Atomically holds quota for 60s, then confirms or refunds.YesEvery paid AI call. The only safe pattern under concurrency.
!
Naive canUse → call → track is broken. Two parallel requests can both pass canUse, both call your AI provider, and both call track - pushing the user past the limit. Use reserve / commit for anything that costs money.

Use cases

1. Freemium image generator (Flux, DALL·E, SDXL)

Free users get 10 images/month. Pro users get 500. Limit on monthly count, also tracking spend in cents.

async function generateImage(userId: string, prompt: string) {
  const r = await vevee.reserve(userId, 'image.render', 1, { model: 'flux-pro' });
  if (!r.allowed) {
    throw new Error(`Out of images: ${r.reasons?.join(', ')}`);
  }
  try {
    const image = await fal.run('fal-ai/flux-pro', { prompt });
    await vevee.commit(r.reservationId!);
    return image;
  } catch (err) {
    await vevee.release(r.reservationId!);
    throw err;
  }
}

2. LLM token metering (OpenAI, Anthropic streaming)

Reserve an upper bound before the call (e.g. max_tokens), commit on success. Optionally track a refund event for the unused tokens.

async function chat(userId: string, messages: Message[]) {
  const maxTokens = 4096;
  const r = await vevee.reserve(userId, 'llm.tokens', maxTokens, {
    model: 'gpt-4o',
    direction: 'output',
  });
  if (!r.allowed) throw new Error('Token budget exceeded');

  try {
    const res = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages,
      max_tokens: maxTokens,
    });
    const used = res.usage?.completion_tokens ?? maxTokens;
    await vevee.commit(r.reservationId!);

    // Refund unused tokens (optional but accurate):
    if (used < maxTokens) {
      await vevee.track(userId, 'llm.tokens.refund', maxTokens - used, {
        model: 'gpt-4o',
      });
    }
    return res;
  } catch (err) {
    await vevee.release(r.reservationId!);
    throw err;
  }
}

3. Video generation (seconds-based metering)

Limit by total seconds rendered per month - different from per-call billing.

async function renderVideo(userId: string, durationSec: number) {
  const r = await vevee.reserve(userId, 'video.render', durationSec, {
    resolution: '1080p',
  });
  if (!r.allowed) {
    return { error: 'monthly_video_quota_reached', reasons: r.reasons };
  }
  try {
    const video = await runway.generate({ duration: durationSec });
    await vevee.commit(r.reservationId!);
    return { video };
  } catch (err) {
    await vevee.release(r.reservationId!);
    throw err;
  }
}

4. Agent step counting (LangGraph, multi-step workflows)

Cap how many tool-calls / agent steps a user can run. Use track() per step because steps are cheap and after-the-fact metering is fine.

async function runAgent(userId: string, task: string) {
  const result = await graph.invoke({ task }, {
    callbacks: [{
      onStep: async (step) => {
        await vevee.track(userId, 'agent.step', 1, { tool: step.tool });
      },
    }],
  });
  return result;
}

Common patterns

Express / Hono middleware

import { vevee } from './vevee';

export const requireQuota = (event: string) => async (req, res, next) => {
  const ok = await vevee.can(req.user.id, event);
  if (!ok) return res.status(429).json({ error: 'limit_reached' });
  next();
};

app.post('/api/render', requireQuota('image.render'), handler);

Next.js route handler with reserve/commit

// app/api/render/route.ts
import { NextResponse } from 'next/server';
import { vevee } from '@/lib/vevee';
import { auth } from '@/lib/auth';

export async function POST(req: Request) {
  const session = await auth();
  if (!session) return NextResponse.json({ error: 'unauthorized' }, { status: 401 });

  const { prompt } = await req.json();
  const r = await vevee.reserve(session.user.id, 'image.render', 1);
  if (!r.allowed) {
    return NextResponse.json({ error: 'limit_reached', reasons: r.reasons }, { status: 429 });
  }

  try {
    const image = await runFluxPro(prompt);
    await vevee.commit(r.reservationId!);
    return NextResponse.json({ image });
  } catch (err) {
    await vevee.release(r.reservationId!);
    throw err;
  }
}

Stripe webhook → upsert subscription

// app/api/stripe/webhook/route.ts
import { vevee } from '@/lib/vevee';

const STRIPE_TO_PLAN: Record<string, string> = {
  prod_pro_monthly: 'plan_01HXY...PRO',
  prod_team:        'plan_01HXY...TEAM',
};

export async function POST(req: Request) {
  const event = parseStripeEvent(await req.text());

  if (event.type === 'checkout.session.completed') {
    const userId = event.data.object.client_reference_id;
    const productId = event.data.object.line_items[0].price.product;
    await vevee.upsertSubscription({
      userId,
      planId: STRIPE_TO_PLAN[productId],
    });
  }

  if (event.type === 'customer.subscription.deleted') {
    const userId = event.data.object.metadata.userId;
    await vevee.upsertSubscription({ userId, planId: 'plan_free' });
  }

  return new Response('ok');
}

Client-side usage display (with pk_live_ key)

'use client';
import { createClient } from '@vevee/sdk';

const vevee = createClient({ apiKey: process.env.NEXT_PUBLIC_VEVEE_KEY! });

export async function getMyUsage(userId: string) {
  const usage = await vevee.usage(userId);
  // usage.counters -> [{ groupId, label, unit, quota, count, remaining, costCents, filters }, ...]
  // usage.period   -> { start, end }
  // Includes every group on the user's plan (zero-filled). 'remaining' is
  // pre-clamped to never go negative; 'filters' distinguishes "overall"
  // buckets from per-source / per-variant splits.
  return usage;
}

Decision tree - which method should I call?

Are you about to spend money on an AI provider?
├── YES → reserve() → AI call → commit() / release()
└── NO
    ├── Showing a button / quota in the UI?
    │   └── canUse() or can()
    ├── Just recording usage after the fact?
    │   └── track()
    └── Reading a user's current counters?
        └── usage()

API key types

PrefixWhereCan do
sk_live_…Backend only. Never ship to a client bundle.Every endpoint: track, canUse, reserve/commit/release, usage, upsertSubscription.
pk_live_…Safe in browser, mobile, public repos.Read-only - only the caller's own usage().

Errors at a glance

Every failure throws an VeveeError with { code, status, message }. The codes you'll handle most often:

  • limit_reached (429) - your end-user is at quota.
  • workspace_limit_reached (429) - YOUR Vevee account is at quota.
  • invalid_key (401) - bad or revoked API key.
  • requires_secret_key (403) - used a pk_live_ on a write endpoint.
  • reservation_expired (400) - committed/released after the 60s TTL.

Full list: Errors reference.

Where to next

For AI agents

If you're a coding assistant integrating this SDK, prefer these flat-text resources - they include the full surface in a single fetch:

  • /docs.md - this page in raw markdown.
  • /llms.txt - the entire SDK reference, every method, every parameter, every response shape, every error code.