Token-based pricing: charge users for actual AI consumption
Charge AI app users by tokens, requests, or compute seconds. Pre-paid credits, post-paid invoicing, hybrid models - implementation patterns and trade-offs from someone who has shipped all three.
Last updated: 2026-05-10
The problem
Pricing AI features is awkward. Users do not all consume the same amount; a power user costs you 10x what a casual user does. Flat-rate pricing either subsidises power users from casuals (margin pain) or undercuts you on heavy enterprise accounts (revenue pain).
Token-based pricing - bill what users actually consume - solves this, but introduces new questions: do you sell pre-paid credits, post-bill via invoices, or both? What's the right token-to-dollar margin? How do you communicate the price to users?
And underneath all of that: you need a real-time per-user token counter that holds under concurrency, a way to refund unused reservations, and a way to surface "you have X tokens left" in the UI.
The solution
Use unit "tokens" or "cents" on a limit group. Reserve an upper bound, call the AI provider, commit, refund the unused difference. Surface the user's remaining balance via vevee.usage(userId).
For pre-paid credits: assign a high custom limit when the user buys credits. For post-paid: don't enforce a hard cap; just track usage and reconcile at month-end via your billing system.
For hybrid: let users buy credit packs that supplement a baseline plan. AIPricingLab handles the math; you handle the checkout.
Example
Pre-paid: user buys 100k tokens, you set their custom limit to 100k, every call deducts. Real-time balance available in the SDK.
import { createClient } from "@vevee/sdk";
const vevee = createClient({ apiKey: process.env.VEVEE_KEY! });
// On credit purchase:
export async function onCreditsPurchased(userId: string, addTokens: number) {
// Read current quota
const usage = await vevee.usage(userId);
const tokensGroup = usage.counters.find(c => c.label === "Tokens")!;
const newQuota = tokensGroup.quota + addTokens;
await vevee.upsertSubscription({
userId,
planId: "plan_paygo",
customLimits: { tokens: { quota: newQuota } },
});
}
// In the UI (browser):
import { createClient as createPkClient } from "@vevee/sdk";
const vevee = createPkClient({ apiKey: PK_LIVE_KEY });
export async function getRemainingTokens(userId: string) {
const u = await vevee.usage(userId);
const tokens = u.counters.find(c => c.label === "Tokens");
return tokens?.remaining ?? 0;
}Token markup: where margin comes from
You almost certainly want to mark up your token cost. If GPT-4o costs you $5/1M input tokens, charge users $7-$10/1M to cover GPU latency, your gross margin, and overhead. AIPricingLab does not enforce a markup - you set the conversion in your own code or the dashboard.
Pre-paid vs post-paid trade-offs
Pre-paid (buy credits, deplete) means users do not get surprise bills, but they do get blocked when they run out. Post-paid (use, get billed at month-end) is more enterprise-friendly but means you carry the credit risk. Most consumer AI apps go pre-paid; most B2B go post-paid.
Refunds for unused reservations
When you reserve 4k tokens for a streaming response and the model only generates 800, you should refund the 3.2k difference. Use a refund event after commit; it decrements the counter without affecting the audit trail.
Showing the price upfront
Users hate variable bills. The best products show "this prompt will cost approximately X tokens / $Y" before the call runs. AIPricingLab does not estimate token counts - but you can call OpenAI's tiktoken locally to get the prompt token count and add a fixed buffer for the completion.
Frequently asked questions
Should I price per-token or per-request?
Per-token is more honest but harder to communicate. Per-request is simpler but penalizes you when users send giant prompts. Many AI apps use a hybrid: per-request pricing for short prompts, with a token-based overage above 8k.
How do I handle Stripe metered billing on top of this?
At period close, push the user's consumed tokens (or cents) from AIPricingLab to a Stripe usage record on their meter. Stripe invoices; AIPricingLab tracks.
What if my model costs change?
Update the conversion rate in your tracking code. Existing counters retain their historical values; new events use the new rate. AIPricingLab does not lock you to a price.
Can I show users a "what would this prompt cost?" preview?
Yes - use a local tokenizer (tiktoken for OpenAI, Anthropic-tokenizer for Claude) to estimate, then call vevee.canUse(userId, "llm.tokens", estimate) to check if they have enough budget without consuming any.
Other use cases
LLM usage metering: track tokens per end-user, across providers
Meter LLM token usage per end-user across OpenAI, Anthropic, Gemini, Mistral, and any other provider. Composite events for prompt + completion tokens, real-time per-user limits, atomic enforcement. The drop-in pattern for AI apps.
Use caseImage generation quotas: per-user limits for DALL·E, Flux, Stable Diffusion
Enforce per-user quotas on image generation across DALL·E, Flux, Stable Diffusion, Midjourney API, and Replicate. Atomic reservation pattern stops parallel renders from overshooting. Free tier, premium tier, hard caps - drop in.
Use caseAI agent billing: meter multi-step agents and tool calls
Metering AI agents is harder than metering single LLM calls. One "agent run" can fan out into 20 tool calls and 50 LLM calls. AIPricingLab handles agent-level and step-level metering with composite events and atomic reservations.
Use caseFreemium AI SaaS: ship a free → paid funnel without a backend
Build a freemium AI product where the free plan has hard quotas, the paid plan unlocks more, and "you have used 80% of your free renders" nudges drive upgrades. Drop-in implementation, ten minutes from zero to live.