VeveeBlog · 7 min read
Blog · 7 min read

Tokens, credits, or requests: choosing the unit you meter (and price)

The pricing doc has three columns: $9, $19, $49. The prices took twenty minutes. The row above them - what a user actually gets for the money - has been blank for a week. That blank row is the real decision.

Last updated: 2026-06-09

The decision that outlives your pricing page

The pricing doc has three columns: $9, $19, and $49. The numbers took twenty minutes. The row above them - what a user actually gets for the money - has been blank for a week, because nobody can decide whether the plan says 200 generations, 2 million tokens, or 500 credits. Here is why that row deserves more sweat than the prices: you will change the prices five times in the next two years and almost nobody will notice. The unit is different. It gets baked into your events schema, your usage bar, your upgrade nudges ("you have 12 generations left"), your support macros, and - hardest to undo - your customers’ mental model of what your product costs to use. Repricing is an edit. Re-uniting is a migration plus a re-education campaign for every existing user. Spend the hour now.

Three properties, and no unit gets all of them

Every candidate unit can be scored on three axes. Cost-accuracy: does the unit track your provider bill, so that a user consuming more units actually costs you proportionally more? User-legibility: can a customer predict, before clicking the button, what an action will cost them - and explain their plan to a colleague without opening a calculator? Gaming-resistance: can a determined user extract 10x the intended value while staying inside the unit’s letter of the law? No unit wins all three. Requests are legible and inaccurate. Tokens are accurate and illegible. Credits buy back legibility by making you the central banker of a tiny economy. Pricing design for AI products is not finding the unit without a weakness - it is deciding which weakness your product can afford, then building a guardrail against the one you picked.

Requests: the unit a user can count on their fingers

A request - one generation, one summary, one render - is maximally legible. "100 generations a month" requires no explanation, fits on a pricing card, and can be repeated verbatim in a Slack recommendation. That legibility is why it converts, and it is also where the accuracy goes to die: for any LLM-backed action, one "request" can be 200 tokens or 200,000. A user who pastes a tweet and a user who pastes a 300-page contract each consume one unit of their allowance while differing in cost to you by three orders of magnitude. And the spread is not random - it is adversarial, because the moment users learn that requests are the scarce resource, they stuff each one: longer prompts, batched questions, "also do these nine other things" appended to every message. Requests are still the right call when your actions have naturally bounded cost - a fixed-size image, a summary with a hard output cap, a 15-second video clip - or when UX simplicity is itself the product. Most consumer AI apps should start here, with eyes open.

Tokens and raw units: the unit your provider bills you in

Metering tokens (or seconds of video, or pixels rendered) gives you perfect cost-accuracy by definition - it is the unit on your provider invoice, so your margin per user is arithmetic instead of a distribution you pray over. Gaming-resistance is equally strong: there is no stuffing attack against a unit that measures the stuffing. What you give up is legibility, completely. Nobody outside your engineering team knows what 50,000 tokens buys them; users cannot predict whether the next click costs 400 tokens or 40,000, so the usage bar becomes a slot machine and your support inbox fills with screenshots of a meter the customer does not understand. Raw units are the right choice in exactly two situations: you sell to developers or API customers who already think in tokens, or you run a thin-margin B2B product where cost tracking is the point and your buyer reads the meter like an electricity bill. Selling raw tokens to consumers is how you get one-star reviews that say the pricing is confusing - because it is.

Credits: legibility you have to govern

Credits are the configurable middle. You define an exchange rate per action - one standard image costs 5 credits, a thousand tokens of chat costs 1, a second of video costs 20 - and suddenly a heterogeneous product has one number on the pricing page again. The user regains predictability (the button says what it costs before they click), and you regain cost-correlation, because expensive actions can simply cost more credits. The price you pay is that you now own a small economy. Somebody has to set the rates, defend them when a user notices that video is "overpriced" relative to images, and rebalance them when your provider cuts inference prices by 40% - which triggers the inevitable "why did the credit cost of X change" email thread, the closest thing SaaS has to a central bank press conference. Credits are worth that overhead when your actions are genuinely heterogeneous - text plus images plus video in one product - which describes almost every AI app eventually, even the ones that launched with a single model and a single button.

The hybrid most real apps converge on

After enough support tickets, most products land on the same compromise: a legible unit on the pricing page, with a raw unit underneath as a guardrail. The plan says "100 generations a month" - that is what the user sees, counts, and repeats to colleagues. Invisibly, every generation also counts toward a token budget (or a monthly cents cap), sized so that normal usage never touches it but a stuffed mega-prompt does. One event, two limits: the visible count group increments by one, the hidden cost group increments by the real token usage carried in the event’s metadata. The user who sends ordinary prompts never learns the second limit exists; the user running a 100k-token context through every "generation" hits the budget wall instead of your margin. The trick is that this must be one tracked event matching two limit groups - not two tracking calls, not an if-statement at the call site - or the two meters drift and the guardrail becomes a second source of bugs.

// One tracked event, two limit groups
await vevee.track(userId, 'image_generation', 1, {
  model: 'flux',
  tokens: 8400, // real usage from the provider response
});
// -> "monthly generations"  unit: count,  quota: 100      (on the pricing page)
// -> "generation budget"    unit: tokens, quota: 1500000  (invisible guardrail)
//
// canUse checks BOTH groups - the stuffed prompt is stopped
// by the budget even though the count says 41/100

The decision checklist

Compressed to a card you can pin next to the pricing doc - find your product’s row and start there.

  • One action type with bounded cost → requests. Take the legibility win; it is real.
  • Selling to developers or via API → tokens (or the raw provider unit). Your buyer already thinks in it.
  • Heterogeneous actions - text, image, video in one product → credits, and budget the ongoing rate-governance work honestly.
  • Variable-cost actions behind a consumer-friendly unit → hybrid: a visible count, a hidden token or cents guardrail.
  • Tiebreaker for everything above: pick the unit your customer can say out loud when recommending your plan to a colleague. If the sentence needs a footnote, the unit is wrong.

If you change your mind later

You probably will - launching on requests and migrating to credits at the first multi-model release is the most common pricing migration in AI products. Whether it is survivable depends entirely on what you logged, not on what you charged. If every event in your history carries a quantity and metadata - which model, how many tokens, what variant - then the old usage can be re-priced under the new unit: you can compute what every existing user would have consumed in credits, grandfather them onto an equivalent allowance, and show them a usage history that makes sense in the new currency. If all you stored was a counter - "this user did 73 things" - there is nothing to re-price, and the migration becomes a guess dressed up as an email announcement. The lesson costs nothing today: log rich events from day one, with real quantities and real metadata, even if your current unit ignores most of it. The unit on the pricing page is a view over the data. Make sure the data can support a different view.

Whatever you pick, the meter should not care

The unit decision should live in your plan configuration, not in your application code - which is exactly how Vevee models it: limit groups carry a unit of count, tokens, seconds, or cents, and one tracked event can count toward several groups at once, so the hybrid pattern above is configuration, not a rewrite. Pick the unit your customers can say out loud, log everything underneath it, and keep the option to change your mind.

More from the blog

engineering · 9 min

The race condition in "if (usage < limit)" that is costing your AI app money

A user at nine of ten images opens six tabs and clicks generate in all of them. Every tab passes your limit check, every tab gets an image, and you pay for all six. The bug is one read-then-write - and your unit tests will never catch it.

engineering · 9 min

How to build a credits system for your AI app (ledger design, rollover, refunds)

A user emails: "I had 40 credits this morning and now I have 12, and I only made one image." If your balance is a column you mutated, you cannot answer that email. Here is the ledger design that can.

thinking · 7 min

Stripe metered billing for AI apps: what it solves and what it does not

You are sketching the pricing page for your AI app and every plan ends in "per generation" - because your costs arrive per generation. Stripe meters the charging half beautifully. The question is what happens at request time.

thinking · 8 min

Helicone is in maintenance mode. An honest map of where to go

Somewhere in your codebase a base URL points at Helicone, and it has been quietly doing four jobs at once. The team got acquired - good for them - but now you have to replace each job separately, and the one everyone forgets is the one your users will exploit.

engineering · 8 min

How to stop free-tier abuse without killing signups

The usage graph spikes at 3:41am: four thousand free generations in an hour, all from disposable emails. Your first instinct is a credit-card wall. That instinct will cost you more than the abuse does.

engineering · 3 min

I rewrote my cancel flow with one LLM call. It argues better than I do.

The cancel flow is where SaaS revenue goes to die politely. "Are you sure? You'll lose access to Premium features" has never changed a single mind - but reminding a user of their own 312 generations this month is an argument.

engineering · 3 min

The trial-ending email everyone sends is the same email. Here's the one that converts

"Your trial ends in 3 days! Upgrade now to keep access." You've received a hundred of these. You've deleted a hundred of these. The email fails because it's about your product, not about the user's trial.

engineering · 3 min

Stop making users do math on your pricing page: recommend their plan from their own usage

Every pricing page asks the user to solve an estimation problem: "How many credits will I need per month?" Nobody knows. But for any user who has actually used your product, their usage history IS the answer - here is how to put it on the pricing page.

engineering · 3 min

Win-back emails fail because they're written for "users." Write them for the one user instead.

Every dormant-user campaign in history: "We miss you! Here's what's new." Open rate: pity clicks. The email fails because it's about your changelog - and the user left because of something in their experience.

engineering · 3 min

Spotify Wrapped is a growth loop, not a year-end gimmick. Ship one for your AI app in an afternoon.

Spotify Wrapped works because people love seeing their own behavior reflected back as a story. Every AI app with usage data can run that loop monthly - and almost none do, because turning usage rows into narrative used to be a content problem. It is now a schema problem.

engineering · 3 min

"AI-personalized copy converts better" - prove it or delete it. Here is the 40-line A/B harness

Half of the "we added AI personalization and conversions went up 40%" posts have no control group. The other half measured clicks, not revenue. If LLM-generated copy is going in front of your paywall, you owe yourself a real experiment - and the harness is tiny.

thinking · 3 min

AI personalization without the creepy part: opt-out as a first-class return value

Users have learned that "personalized for you" means "we mined everything you ever did" - and the backlash is rational. When I added LLM-generated personalized copy to my app, the part I sweated was not the generation. It was making declining it a real, respected choice.

engineering · 3 min

I stopped walking into demo calls blind: every lead now comes with a usage brief written 10 seconds before the call

Founder-led sales has one structural weakness: you have no time to prep. If the lead has touched your product, their usage history is the best discovery call you'll never have to run - here's how I turn it into a one-page brief, automatically, before every call.

thinking · 4 min

The "founder email" converts like crazy and scales like garbage. Here is the middle path.

A personal email from the founder converts trials at a rate no automated sequence touches - and at 50 signups a week it stops scaling. The middle path: drafts generated from each user's real usage, that you read, edit, and send yourself.

engineering · 3 min

Churn doesn't announce itself. My Monday Slack digest does it instead.

Every founder finds out about churn the same way: the cancellation email. By then the user has been gone for weeks - the decision happened earlier, quietly, in their usage. So I made the metering tables write me a memo.

engineering · 3 min

Half of every support ticket is asking what the user already did. Attach the answer instead.

The ticket says "it's not working" - and the next twenty minutes go to figuring out who this user is, what plan they're on, and whether "it" is a bug, a quota, or a misunderstanding. The actual fix usually takes two. All of that context lives in your usage data; here is how to attach it to every ticket automatically.

thinking · 4 min

Your users are telling you your roadmap, in writing, every day. It's in your prompt logs

Founders pay for interviews and beg for survey responses to learn what people are actually trying to do. Meanwhile, your users type their intent into your product, in their own words, hundreds of times a day - and nobody performs for a prompt box.

engineering · 5 min

How to reset usage limits when a subscription renews

A weekly plan, used twice. First week: ten generations. Second week: zero, because the counter never reset. This is the cron-job mistake - and the fix is one field on one call.

engineering · 6 min

How to manage subscription renewals: aligning Vevee with Stripe

A user signs up on Jan 15. Stripe charges them on the 15th of every month. Your metering layer resets on the 1st. Two clocks. One angry support ticket per cycle.

engineering · 5 min

Stop hardcoding your pricing page - render it from your metering layer

Every B2B SaaS I have shipped repeats the same mistake: plans live in two places - the dashboard that enforces them, and a const PLANS = [...] on the marketing site. They drift within a quarter.

thinking · 4 min

Meter AI by user, not by account - your margin depends on it

A few users will cost you 100x what your median user costs. If you only meter at the account level, you will not see them coming until your gross margin is gone.

engineering · 5 min

reserve / commit / release: the only correct way to enforce AI quotas

Every team I have seen build per-user AI metering has shipped a version of canUse → call OpenAI → track. It looks correct in single-threaded tests. It is broken in production.

thinking · 4 min

Why Stripe Billing is not enough for AI products

Stripe is excellent at one thing: turning usage into invoices. AI products need three other things, and Stripe does not do any of them.

engineering · 6 min

Dynamic onboarding: a different first step for every user

A teacher and a student sign up the same minute. The teacher wants to build a quiz; the student wants to summarize a lecture. Your onboarding shows them both the same five-step checklist. One of them bounces.

thinking · 6 min

Paywall copy that rewrites itself for every user

Your paywall says "Upgrade to Pro for unlimited generations." A teacher reads it and shrugs. A student on a budget reads it and closes the tab. The same words, two lost conversions - because the words were written for nobody in particular.

engineering · 5 min

Add usage limits to your AI app in 10 minutes (no backend required)

You shipped an AI feature on Friday. By Monday one user had burned $212 of OpenAI credit on your free tier. The fix is not a TODO comment that says "add rate limiting" - it is two method calls.

engineering · 5 min

Meter LLM tokens, not requests - your flat per-request limit is lying

Two users, both at 100 requests. One sent tweets, the other sent novels. Your cost for them differed by 400x. Your limit treated them identically - and your margins noticed.

engineering · 7 min

The upgrade nudge that writes itself: convert free users before they hit the wall

By the time a user hits your paywall, they are blocked, annoyed, and halfway to a competitor’s signup page. The best moment to make the pitch was three days earlier - when they were winning. Here is how to catch it, automatically, for every user at once.

engineering · 5 min

One event, two limits: gate your premium model without forking your code

The premium model launch was going great until you looked at the bill: free users had figured out the good model and were living on it. You need a sub-limit. You do not need a second code path.

engineering · 5 min

Test mode: break your pricing in the sandbox, not on your customers

You changed the free tier from 10 to 25 generations and somehow locked out every Pro user for an hour. Nobody tested it, because testing it meant tracking fake events into production analytics. There is a mode for this.

engineering · 5 min

The support ticket that solves itself: log the prompt behind every AI event

A user says your AI feature "broke" on Tuesday. You have a charge for the call, a timestamp, and no idea what they asked or what the model said. The evidence existed for exactly one request - the one you didn’t log.

engineering · 5 min

Your funnel has one broken step. Find it without writing a single SQL query.

A hundred people saw your paywall this week. Three upgraded. Is that a copy problem, a price problem, or did ninety of them never generate anything worth paying for? You cannot fix what you cannot locate.

engineering · 5 min

One user, three ghosts: fix your funnel with identify()

Your funnel says signup conversion is 4%. It is actually 11%. The missing users didn’t bounce - they came back on another tab and got counted as someone new. Every number downstream of that split is wrong.

thinking · 5 min

The usage bar that sells the upgrade (build it with a public key in an afternoon)

An invisible limit feels like a trap. A visible one feels like a fuel gauge. Same quota, same plan, same user - and a measurably different reaction when the wall finally arrives.

engineering · 6 min

How to cancel a subscription without burning the bridge (or your data)

The user cancels. Do they drop to a free tier, or lose access when the paid month runs out? Those are different products, different SDK calls, and different mistakes when you get them wrong.