AIPricingLabGuide · 8 min read
Guide · 8 min read

How to charge AI app users by token usage (with refunds and live balance)

Step-by-step: charge users for actual token consumption with pre-paid credits, post-paid invoicing, and a live balance display. Atomic reservations, accurate refunds, and Stripe meter sync.

Last updated: 2026-05-10

Charging by tokens means you actually need to count tokens correctly, hold the right amount of quota during a streaming call, and refund what was not used. Get any of those wrong and you either overcharge users (churn) or undercharge yourself (margin).

This is the implementation pattern that works.

Step-by-step

1. Pick a unit: tokens, cents, or both

For pure token billing, use unit "tokens". For dollar-cost billing (where different models cost different amounts), use unit "cents" and convert at track time. Both work; cents is more flexible if you support multiple models with different costs.

2. Define a limit group with the right unit

In the dashboard, create a limit group on the user's plan. Unit: tokens or cents. Quota: how much they get this period. Period: usually monthly. Anchor: subscription_start (cleaner) or calendar (simpler).

3. Reserve an upper bound before the call

You do not know exact token count until the response. Reserve a safe upper bound - for chat, prompt token count + max_tokens.

import { encoding_for_model } from "tiktoken";

const enc = encoding_for_model("gpt-4o");
const promptTokens = enc.encode(JSON.stringify(messages)).length;
const maxOutTokens = 1500;
const upperBound = promptTokens + maxOutTokens;

const r = await vevee.reserve(userId, "llm.tokens", upperBound, { model: "gpt-4o" });
if (!r.allowed) throw new LimitError();

4. Call the AI provider, then commit and refund

After the response, you know the actual token count. Commit the reservation, then refund the difference between upper-bound and actual.

try {
  const res = await openai.chat.completions.create({
    model: "gpt-4o",
    messages,
    max_tokens: maxOutTokens,
  });
  const actual =
    (res.usage?.prompt_tokens ?? 0) +
    (res.usage?.completion_tokens ?? 0);

  await vevee.commit(r.reservationId!);

  if (actual < upperBound) {
    await vevee.track(userId, "llm.tokens.refund", upperBound - actual, {
      reservationId: r.reservationId!,
    });
  }
  return res;
} catch (err) {
  await vevee.release(r.reservationId!);
  throw err;
}

5. Show the live balance to the user

Use a pk_live_ public key in the browser. vevee.usage(userId) returns the user's counters with remaining quota.

// In a React component
const usage = await vevee.usage(userId);
const tokens = usage.counters.find(c => c.label === "Tokens")!;
return <div>{tokens.remaining.toLocaleString()} tokens left this month</div>;

6. For pre-paid credits: bump the quota on purchase

When the user buys 100k more tokens, bump their custom limit. AIPricingLab counters keep ticking.

await vevee.upsertSubscription({
  userId,
  planId: "plan_paygo",
  customLimits: {
    tokens: { quota: currentQuota + addedTokens },
  },
});

Streaming chat: same pattern

For streaming responses, reserve up front, stream the result, count tokens at the end (most providers send a final usage chunk), then commit + refund. The reservation holds quota for the entire stream so a parallel request cannot race past.

Multiple model tiers

GPT-4o-mini at $0.15/M input is much cheaper than GPT-4o at $2.50/M. If you charge users $/token, you should adjust the cost-per-token by model. Easiest pattern: track in cents (not tokens) and compute cents-per-call at track time using a model price table.

Stripe meter sync

For post-paid billing, push the user's monthly token total to a Stripe meter at period close. AIPricingLab tracks; Stripe bills. The /guides/usage-based-pricing-ai guide covers this in detail.

Refund accuracy matters

Users notice when their balance does not match what they actually consumed. Always issue refund events for unused reservations - the absolute worst feedback is "I sent one short prompt and you charged me 4000 tokens." Auto-refunding the unused portion makes balances match user intuition.

Frequently asked questions

Should I count input + output tokens or just output?

Count both. Input tokens are cheaper but they are still your cost. Most providers (OpenAI, Anthropic) bill you for both, so you should bill users for both.

How do I handle different prompt-token vs completion-token prices?

Track in cents (not tokens) and compute cost at track time. Or stack two limit groups - "input tokens" and "output tokens" - each with its own conversion. Either works; cents is simpler.

What if my token estimate (upper bound) is wrong?

If actual < upper bound: refund the difference. If actual > upper bound: it cannot happen if max_tokens is set correctly, but if it somehow does, you can track an additional event for the overage.

How do I let users buy credit packs?

Have Stripe create a one-time payment intent for the pack ($10 = 200k tokens). On checkout success, bump their custom limit by 200k. Done.

Other guides