Use case

Token-based pricing: charge users for actual AI consumption

Charge AI app users by tokens, requests, or compute seconds. Pre-paid credits, post-paid invoicing, hybrid models - implementation patterns and trade-offs from someone who has shipped all three.

Last updated: 2026-05-10

The problem

Pricing AI features is awkward. Users do not all consume the same amount; a power user costs you 10x what a casual user does. Flat-rate pricing either subsidises power users from casuals (margin pain) or undercuts you on heavy enterprise accounts (revenue pain).

Token-based pricing - bill what users actually consume - solves this, but introduces new questions: do you sell pre-paid credits, post-bill via invoices, or both? What's the right token-to-dollar margin? How do you communicate the price to users?

And underneath all of that: you need a real-time per-user token counter that holds under concurrency, a way to refund unused reservations, and a way to surface "you have X tokens left" in the UI.

The solution

Use unit "tokens" or "cents" on a limit group. Reserve an upper bound, call the AI provider, commit, refund the unused difference. Surface the user's remaining balance via vevee.usage(userId).

For pre-paid credits: assign a high custom limit when the user buys credits. For post-paid: don't enforce a hard cap; just track usage and reconcile at month-end via your billing system.

For hybrid: let users buy credit packs that supplement a baseline plan. Vevee handles the math; you handle the checkout.

Example

Pre-paid: user buys 100k tokens, you set their custom limit to 100k, every call deducts. Real-time balance available in the SDK.

import { createClient } from "@vevee/sdk";

const vevee = createClient({ apiKey: process.env.VEVEE_KEY! });

// On credit purchase:
export async function onCreditsPurchased(userId: string, addTokens: number) {
  // Read current quota
  const usage = await vevee.usage(userId);
  const tokensGroup = usage.counters.find(c => c.label === "Tokens")!;
  const newQuota = tokensGroup.quota + addTokens;

  await vevee.upsertSubscription({
    userId,
    planId: "plan_paygo",
    customLimits: { tokens: { quota: newQuota } },
  });
}

// In the UI (browser):
import { createClient as createPkClient } from "@vevee/sdk";
const vevee = createPkClient({ apiKey: PK_LIVE_KEY });

export async function getRemainingTokens(userId: string) {
  const u = await vevee.usage(userId);
  const tokens = u.counters.find(c => c.label === "Tokens");
  return tokens?.remaining ?? 0;
}

Token markup: where margin comes from

You almost certainly want to mark up your token cost. If GPT-4o costs you $5/1M input tokens, charge users $7-$10/1M to cover GPU latency, your gross margin, and overhead. Vevee does not enforce a markup - you set the conversion in your own code or the dashboard.

Pre-paid vs post-paid trade-offs

Pre-paid (buy credits, deplete) means users do not get surprise bills, but they do get blocked when they run out. Post-paid (use, get billed at month-end) is more enterprise-friendly but means you carry the credit risk. Most consumer AI apps go pre-paid; most B2B go post-paid.

Refunds for unused reservations

When you reserve 4k tokens for a streaming response and the model only generates 800, you should refund the 3.2k difference. Use a refund event after commit; it decrements the counter without affecting the audit trail.

Showing the price upfront

Users hate variable bills. The best products show "this prompt will cost approximately X tokens / $Y" before the call runs. Vevee does not estimate token counts - but you can call OpenAI's tiktoken locally to get the prompt token count and add a fixed buffer for the completion.

Frequently asked questions

Should I price per-token or per-request?

Per-token is more honest but harder to communicate. Per-request is simpler but penalizes you when users send giant prompts. Many AI apps use a hybrid: per-request pricing for short prompts, with a token-based overage above 8k.

How do I handle Stripe metered billing on top of this?

At period close, push the user's consumed tokens (or cents) from Vevee to a Stripe usage record on their meter. Stripe invoices; Vevee tracks.

What if my model costs change?

Update the conversion rate in your tracking code. Existing counters retain their historical values; new events use the new rate. Vevee does not lock you to a price.

Can I show users a "what would this prompt cost?" preview?

Yes - use a local tokenizer (tiktoken for OpenAI, Anthropic-tokenizer for Claude) to estimate, then call vevee.canUse(userId, "llm.tokens", estimate) to check if they have enough budget without consuming any.

Other use cases