How to build a credits system for your AI app (ledger design, rollover, refunds)
A user emails: "I had 40 credits this morning and now I have 12, and I only made one image." If your balance is a column you mutated, you cannot answer that email. Here is the ledger design that can.
Last updated: 2026-06-09
Why credits at all
Your app does three expensive things: it generates images, it answers chat with an LLM, and it renders short video clips. Price each on its own axis and your pricing page becomes a spreadsheet - 50 images, 200k tokens, 90 video-seconds per month - that nobody can compare to a competitor or to their own usage. Credits collapse the mess into one currency: an image costs 5 credits, a thousand tokens cost 1, a video-second costs 20, and the plan is simply "500 credits a month." Two things make this more than cosmetics. First, it decouples your packaging from provider pricing - when you swap one image model for one that costs a third as much, you adjust an internal cost map and your plans, your Stripe products, and your pricing page do not move. Second, users already have a mental model for wallets: a number that goes down when they do things and tops up when they pay. The catch is that "a number that goes down" is exactly the part most teams build wrong, and the wrongness only shows up under concurrency and in support tickets.
The cardinal rule: a balance is not a column
The instinct is a credits column on the users table: read it, subtract, write it back. Resist it. A credit balance is a derived value - the SUM over an append-only ledger - and the ledger row is the unit of truth. Every grant is a positive row, every spend a negative row, every refund a positive row that points at the spend it reverses, and nothing is ever UPDATEd or DELETEd. The reasons compound. Auditability: when a user asks where their credits went, you read their rows back to them instead of shrugging at a mutated integer. Refund safety: undoing a charge is inserting a row, not racing an UPDATE against concurrent spends. Correctness: read-modify-write on a column loses updates the moment two requests interleave; an insert cannot clobber another insert. And analytics fall out for free - revenue per feature is a GROUP BY on reason.
CREATE TABLE credit_ledger (
id TEXT PRIMARY KEY, -- lg_<nanoid>
user_id TEXT NOT NULL,
delta INTEGER NOT NULL, -- +grant / -spend / +refund / -expiry
reason TEXT NOT NULL, -- 'monthly_grant' | 'pack_purchase' | 'spend'
-- | 'refund' | 'expiry' | 'admin_adjust'
ref_id TEXT, -- idempotency key, or the debit a refund reverses
expires_at TEXT, -- grants only; NULL = never
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_ledger_user ON credit_ledger (user_id, created_at);
-- the balance is a query, not a column
SELECT COALESCE(SUM(delta), 0) AS balance
FROM credit_ledger WHERE user_id = ?;Integer math only
Store credits as integers, and if money enters the picture anywhere, store it as integer cents. Floats accumulate representation error - after a few thousand debits of 0.1 credits your wallet reads 117.99999999999883 and your "balance >= cost" check starts failing for users who visibly have enough. There is no debugging that; there is only never doing it. If your pricing genuinely needs fractional credits - say a chat message costs 0.25 - do not reach for REAL or float. Multiply the unit by 100 and meter centi-credits: the message costs 25, the monthly grant is 50000, and the UI divides by 100 at render time. Display is the only layer allowed to know about decimals. The same rule applies when you map credits to money for refunds or proration: cents in the ledger, dollars only in the template. Every arithmetic operation in the spend path - cost lookup, multiplication by token count or seconds, the SUM itself - should be closed over integers, which also means rounding is an explicit, documented decision (Math.ceil on token costs, usually) instead of an accident of IEEE 754.
Grants, expiry, and what "rollover" actually means
Not all credits are equal. The 500 credits from a monthly allowance should expire when the cycle ends; the 1,000-credit pack a user bought should not, or should last a year. Model this as different grant rows with different expires_at values - never as separate balances. Two rules follow. Spend order: debits consume the soonest-expiring grant first (expiring-first FIFO), so a user with 100 allowance credits expiring Friday and 1,000 purchased credits burns the allowance first - anything else quietly steals value from them. Expiry: do not filter expired grants out of the balance query, because the spends that consumed them have no expiry and the sum goes wrong. Instead run a daily job that, for each grant past its expires_at, computes the unspent remainder under your FIFO rule and writes a negative expiry row for it. The balance stays a plain SUM with no WHERE clauses, and the expiry itself is visible in the user's history. This framing also dissolves the rollover debate: rollover is a grant policy, not a balance mutation. "No rollover" means this month's grant expires at cycle end; "rollover up to 2x" means at renewal you grant the new allowance and extend or re-issue the unspent remainder, capped. Either way you only ever insert rows.
The spend path is the race-condition path
Here is where the column-based design actually loses money. A user with 5 credits fires two image requests at once; both handlers read balance 5, both decide 5 >= 5, both subtract, and you have done 10 credits of work for 5 - or written -5 into the column. The check and the debit must be one atomic operation: a conditional insert whose WHERE clause re-evaluates the balance inside the same statement (or, in Postgres, a transaction with the user's rows locked). If the condition fails, zero rows are written and you reject the request before touching the AI provider. The second half is just as important: the AI call happens after the debit, and AI calls fail - timeouts, content filters, provider 500s. A failed call must trigger an automatic refund row referencing the debit, or your users pay for errors. If this shape feels familiar, it should: it is reserve/commit/release wearing a different hat. The conditional insert is reserve (a provisional debit), letting it stand after the call succeeds is commit, and the refund row is release. You also want a sweeper for the crash case - a debit whose request died before refunding needs a timeout-driven release, which is exactly why hosted versions of this pattern put a TTL on reservations.
async function spendCredits(userId: string, cost: number, refId: string) {
// check + debit in ONE statement - no read-then-write gap
const res = await db.execute({
sql:
"INSERT INTO credit_ledger (id, user_id, delta, reason, ref_id) " +
"SELECT ?, ?, ?, 'spend', ? " +
"WHERE (SELECT COALESCE(SUM(delta), 0) FROM credit_ledger " +
" WHERE user_id = ?) >= ?",
args: [newId('lg'), userId, -cost, refId, userId, cost],
});
if (res.rowsAffected === 0) throw new InsufficientCreditsError();
}
async function refund(userId: string, debitId: string, amount: number) {
// a refund is a new row pointing at the debit it reverses
await db.execute({
sql:
"INSERT INTO credit_ledger (id, user_id, delta, reason, ref_id) " +
"VALUES (?, ?, ?, 'refund', ?)",
args: [newId('lg'), userId, amount, debitId],
});
}Idempotency: retries must not double-charge
Networks retry. Your client retries on timeout, your queue redelivers, your user double-clicks the generate button. If each attempt inserts a fresh debit, a flaky connection charges someone three times for one image - and unlike a duplicate row in an analytics table, this one shows up as missing money. The fix costs one column and one index: every debit carries a client-generated idempotency key in ref_id, and a unique index on (user_id, ref_id) WHERE reason = 'spend' makes the second insert a no-op (INSERT ... ON CONFLICT DO NOTHING, then treat conflict as success and return the original result). The key should be generated where the user action originates - one key per generate-click, reused across retries of that click - not per HTTP attempt, or it protects nothing. Refunds get the same discipline from the other direction: a refund row stores the id of the debit it reverses in its own ref_id, and a unique index on refunded debit ids guarantees a debit is reversed at most once even if the failure handler runs twice. Between the two rules, every credit movement in the system is traceable to exactly one cause and can happen exactly once.
Pricing the actions
Somewhere a function has to answer "what does this event cost in credits," and that function must live on the server - a cost shipped in the client request is a cost the user sets. Keep it a small, boring, static map from event type to integer cost, with metadata as the only input that modulates it: the premium image model costs 12 where the default costs 5, token-based events charge per 1k tokens rounded up, video charges per second. The part teams get wrong is time. Provider prices change, your margins change, and you will revise this map - but the ledger rows already written were priced under the old map, and they must stay true. So version the pricing: tag each spend row (or the map itself) with a version, add v4 as a new file when prices change, and never retro-edit v3. When a user disputes a January charge, you can reproduce it with January's prices.
// pricing/v3.ts - frozen once shipped; price changes become pricing/v4.ts
export const PRICING_VERSION = 3;
export function creditCost(e: MeteredEvent): number {
switch (e.type) {
case 'image.generate':
return e.meta.model === 'flux-pro' ? 12 : 5;
case 'chat.completion':
return Math.ceil(e.meta.totalTokens / 1000); // 1 credit per 1k tokens
case 'video.render':
return e.meta.seconds * 20;
}
}What this costs to own
None of the pieces above is hard in isolation, which is why every team underestimates the whole. Budget a real week for the first working version, and then accept that you have adopted a small financial system with a forever-maintenance contract - because unlike most internal tools, this one corrupts money when it drifts. The standing inventory:
- The ledger itself: schema, indexes, and the balance query everyone must use instead of caching a number somewhere convenient.
- The atomic spend path with automatic refunds on AI failure, plus the sweeper for debits orphaned by crashed requests.
- The expiry job: FIFO remainder math, a daily run, and alerting for when it silently stops.
- Spend-order rules that stay correct as you add grant types (allowance, pack, promo, referral bonus).
- Admin tooling: support will need "adjust balance" within the first month, and it must write admin_adjust rows with an operator id - not poke the database.
- A per-user history UI, because "where did my credits go" is the top billing ticket and the ledger is only useful if someone can read it.
If you only want the enforcement half
Everything above is buildable with a weekend, a database, and discipline - and if credits are core to your product, build it. But notice that the dangerous parts are the enforcement mechanics, not your business rules. Vevee's metering covers exactly that half: limit groups with quotas in count or cents stand in for the wallet, reserve(userId, eventType) is the atomic provisional debit, commit and release are the finalize and the refund, unconfirmed reservations auto-release after 60 seconds so crashed requests cannot leak, and idempotency is handled for you. You keep your own grant logic and pricing map; the race conditions stop being your problem.
More from the blog
The race condition in "if (usage < limit)" that is costing your AI app money
A user at nine of ten images opens six tabs and clicks generate in all of them. Every tab passes your limit check, every tab gets an image, and you pay for all six. The bug is one read-then-write - and your unit tests will never catch it.
thinking · 7 minStripe metered billing for AI apps: what it solves and what it does not
You are sketching the pricing page for your AI app and every plan ends in "per generation" - because your costs arrive per generation. Stripe meters the charging half beautifully. The question is what happens at request time.
thinking · 8 minHelicone is in maintenance mode. An honest map of where to go
Somewhere in your codebase a base URL points at Helicone, and it has been quietly doing four jobs at once. The team got acquired - good for them - but now you have to replace each job separately, and the one everyone forgets is the one your users will exploit.
engineering · 8 minHow to stop free-tier abuse without killing signups
The usage graph spikes at 3:41am: four thousand free generations in an hour, all from disposable emails. Your first instinct is a credit-card wall. That instinct will cost you more than the abuse does.
thinking · 7 minTokens, credits, or requests: choosing the unit you meter (and price)
The pricing doc has three columns: $9, $19, $49. The prices took twenty minutes. The row above them - what a user actually gets for the money - has been blank for a week. That blank row is the real decision.
engineering · 3 minI rewrote my cancel flow with one LLM call. It argues better than I do.
The cancel flow is where SaaS revenue goes to die politely. "Are you sure? You'll lose access to Premium features" has never changed a single mind - but reminding a user of their own 312 generations this month is an argument.
engineering · 3 minThe trial-ending email everyone sends is the same email. Here's the one that converts
"Your trial ends in 3 days! Upgrade now to keep access." You've received a hundred of these. You've deleted a hundred of these. The email fails because it's about your product, not about the user's trial.
engineering · 3 minStop making users do math on your pricing page: recommend their plan from their own usage
Every pricing page asks the user to solve an estimation problem: "How many credits will I need per month?" Nobody knows. But for any user who has actually used your product, their usage history IS the answer - here is how to put it on the pricing page.
engineering · 3 minWin-back emails fail because they're written for "users." Write them for the one user instead.
Every dormant-user campaign in history: "We miss you! Here's what's new." Open rate: pity clicks. The email fails because it's about your changelog - and the user left because of something in their experience.
engineering · 3 minSpotify Wrapped is a growth loop, not a year-end gimmick. Ship one for your AI app in an afternoon.
Spotify Wrapped works because people love seeing their own behavior reflected back as a story. Every AI app with usage data can run that loop monthly - and almost none do, because turning usage rows into narrative used to be a content problem. It is now a schema problem.
engineering · 3 min"AI-personalized copy converts better" - prove it or delete it. Here is the 40-line A/B harness
Half of the "we added AI personalization and conversions went up 40%" posts have no control group. The other half measured clicks, not revenue. If LLM-generated copy is going in front of your paywall, you owe yourself a real experiment - and the harness is tiny.
thinking · 3 minAI personalization without the creepy part: opt-out as a first-class return value
Users have learned that "personalized for you" means "we mined everything you ever did" - and the backlash is rational. When I added LLM-generated personalized copy to my app, the part I sweated was not the generation. It was making declining it a real, respected choice.
engineering · 3 minI stopped walking into demo calls blind: every lead now comes with a usage brief written 10 seconds before the call
Founder-led sales has one structural weakness: you have no time to prep. If the lead has touched your product, their usage history is the best discovery call you'll never have to run - here's how I turn it into a one-page brief, automatically, before every call.
thinking · 4 minThe "founder email" converts like crazy and scales like garbage. Here is the middle path.
A personal email from the founder converts trials at a rate no automated sequence touches - and at 50 signups a week it stops scaling. The middle path: drafts generated from each user's real usage, that you read, edit, and send yourself.
engineering · 3 minChurn doesn't announce itself. My Monday Slack digest does it instead.
Every founder finds out about churn the same way: the cancellation email. By then the user has been gone for weeks - the decision happened earlier, quietly, in their usage. So I made the metering tables write me a memo.
engineering · 3 minHalf of every support ticket is asking what the user already did. Attach the answer instead.
The ticket says "it's not working" - and the next twenty minutes go to figuring out who this user is, what plan they're on, and whether "it" is a bug, a quota, or a misunderstanding. The actual fix usually takes two. All of that context lives in your usage data; here is how to attach it to every ticket automatically.
thinking · 4 minYour users are telling you your roadmap, in writing, every day. It's in your prompt logs
Founders pay for interviews and beg for survey responses to learn what people are actually trying to do. Meanwhile, your users type their intent into your product, in their own words, hundreds of times a day - and nobody performs for a prompt box.
engineering · 5 minHow to reset usage limits when a subscription renews
A weekly plan, used twice. First week: ten generations. Second week: zero, because the counter never reset. This is the cron-job mistake - and the fix is one field on one call.
engineering · 6 minHow to manage subscription renewals: aligning Vevee with Stripe
A user signs up on Jan 15. Stripe charges them on the 15th of every month. Your metering layer resets on the 1st. Two clocks. One angry support ticket per cycle.
engineering · 5 minStop hardcoding your pricing page - render it from your metering layer
Every B2B SaaS I have shipped repeats the same mistake: plans live in two places - the dashboard that enforces them, and a const PLANS = [...] on the marketing site. They drift within a quarter.
thinking · 4 minMeter AI by user, not by account - your margin depends on it
A few users will cost you 100x what your median user costs. If you only meter at the account level, you will not see them coming until your gross margin is gone.
engineering · 5 minreserve / commit / release: the only correct way to enforce AI quotas
Every team I have seen build per-user AI metering has shipped a version of canUse → call OpenAI → track. It looks correct in single-threaded tests. It is broken in production.
thinking · 4 minWhy Stripe Billing is not enough for AI products
Stripe is excellent at one thing: turning usage into invoices. AI products need three other things, and Stripe does not do any of them.
engineering · 6 minDynamic onboarding: a different first step for every user
A teacher and a student sign up the same minute. The teacher wants to build a quiz; the student wants to summarize a lecture. Your onboarding shows them both the same five-step checklist. One of them bounces.
thinking · 6 minPaywall copy that rewrites itself for every user
Your paywall says "Upgrade to Pro for unlimited generations." A teacher reads it and shrugs. A student on a budget reads it and closes the tab. The same words, two lost conversions - because the words were written for nobody in particular.
engineering · 5 minAdd usage limits to your AI app in 10 minutes (no backend required)
You shipped an AI feature on Friday. By Monday one user had burned $212 of OpenAI credit on your free tier. The fix is not a TODO comment that says "add rate limiting" - it is two method calls.
engineering · 5 minMeter LLM tokens, not requests - your flat per-request limit is lying
Two users, both at 100 requests. One sent tweets, the other sent novels. Your cost for them differed by 400x. Your limit treated them identically - and your margins noticed.
engineering · 7 minThe upgrade nudge that writes itself: convert free users before they hit the wall
By the time a user hits your paywall, they are blocked, annoyed, and halfway to a competitor’s signup page. The best moment to make the pitch was three days earlier - when they were winning. Here is how to catch it, automatically, for every user at once.
engineering · 5 minOne event, two limits: gate your premium model without forking your code
The premium model launch was going great until you looked at the bill: free users had figured out the good model and were living on it. You need a sub-limit. You do not need a second code path.
engineering · 5 minTest mode: break your pricing in the sandbox, not on your customers
You changed the free tier from 10 to 25 generations and somehow locked out every Pro user for an hour. Nobody tested it, because testing it meant tracking fake events into production analytics. There is a mode for this.
engineering · 5 minThe support ticket that solves itself: log the prompt behind every AI event
A user says your AI feature "broke" on Tuesday. You have a charge for the call, a timestamp, and no idea what they asked or what the model said. The evidence existed for exactly one request - the one you didn’t log.
engineering · 5 minYour funnel has one broken step. Find it without writing a single SQL query.
A hundred people saw your paywall this week. Three upgraded. Is that a copy problem, a price problem, or did ninety of them never generate anything worth paying for? You cannot fix what you cannot locate.
engineering · 5 minOne user, three ghosts: fix your funnel with identify()
Your funnel says signup conversion is 4%. It is actually 11%. The missing users didn’t bounce - they came back on another tab and got counted as someone new. Every number downstream of that split is wrong.
thinking · 5 minThe usage bar that sells the upgrade (build it with a public key in an afternoon)
An invisible limit feels like a trap. A visible one feels like a fuel gauge. Same quota, same plan, same user - and a measurably different reaction when the wall finally arrives.
engineering · 6 minHow to cancel a subscription without burning the bridge (or your data)
The user cancels. Do they drop to a free tier, or lose access when the paid month runs out? Those are different products, different SDK calls, and different mistakes when you get them wrong.