VeveeBlog · 9 min read
Blog · 9 min read

The race condition in "if (usage < limit)" that is costing your AI app money

A user at nine of ten images opens six tabs and clicks generate in all of them. Every tab passes your limit check, every tab gets an image, and you pay for all six. The bug is one read-then-write - and your unit tests will never catch it.

Last updated: 2026-06-09

The five lines every AI app ships first

Limit: ten images a month. A user sitting at nine opens your app in six tabs and clicks generate in all of them inside the same second. Every tab gets an image, and you pay your provider for all six, because your limit check looks like the code below: read the user's usage from the database, compare it to the plan limit, call the model, increment the counter on the way out. This is the pattern every AI app ships first, and not because anyone was careless - it is because it looks correct. It compiles, the types check, the unit tests pass, because unit tests run requests one at a time. The bug only exists between two requests, which means it only exists in production, and it costs you provider dollars every time it fires.

// The version every AI app ships first
async function generateImage(userId: string, prompt: string) {
  const row = await db.get(
    'SELECT used FROM usage WHERE user_id = ?', [userId],
  );

  if (row.used >= PLAN_LIMIT) {        // check
    throw new Error('limit_reached');
  }

  const image = await openai.images.generate({ prompt }); // 2-30s

  await db.run(                        // act
    'UPDATE usage SET used = used + 1 WHERE user_id = ?', [userId],
  );
  return image;
}

The interleaving, step by step

Walk it with limit = 10 and used = 9. Request A reads usage and gets 9. Request B arrives a few milliseconds later and reads usage - A has not written anything yet, so B also gets 9. Both compare 9 < 10, both pass the check, both call the model. A finishes and writes used = 10. B finishes and writes used = 11 - or worse, if the write is computed from the value read earlier (SET used = 10), B silently overwrites A and the counter records one event when two happened. This is a textbook TOCTOU race: time-of-check to time-of-use. The check produced a snapshot of the counter, and the snapshot was stale by the time the code acted on it. Re-checking does not help - a second read just before the increment is another snapshot with the same flaw, and shrinking the window only changes how often you lose, not whether you can. The only real fixes make the check and the increment a single indivisible operation.

Why AI apps lose more to this than rate limiters ever did

Classic rate limiting has the same race and mostly nobody cares, because the window between check and increment is microseconds and the cost of an overshoot is a little extra server load. AI inference changes all three variables. First, the window: an LLM stream or an image render takes 2 to 30 seconds, so the vulnerable gap is not a lucky microsecond collision - it is a barn door that any two requests in the same half-minute walk through together. Second, the cost: every request past the limit is a real line on your provider invoice, not amortized CPU. At fifty cents per video render, every lost race is a measurable loss. Third, the incentive: your users benefit from the bug. A free-tier user who notices that parallel tabs beat the limit will use parallel tabs, and the determined ones write a script that fires twenty requests through Promise.all and collects twenty results on a ten-image plan. You are not defending against bad luck; you are defending against arbitrage.

Fix #1: push the check into the database

The database can do the check and the increment in one statement, which closes the race completely. Issue a conditional UPDATE that increments only while the counter is still under quota, then read the affected-row count: one row means the quota was held, zero means the user is at the limit. When two requests race, the database serializes them on the row lock and exactly one wins. In Postgres you can get the same guarantee with SELECT ... FOR UPDATE inside a transaction when you need to read the row before deciding. The trade-offs are real but honest. Every request from the same user now serializes on one hot row, which is fine at human speeds and noticeable for high-frequency API traffic. And you now own the schema around the statement: a counters table keyed by user and period, the rollover logic that decides when a fresh period row starts, the quota lookup that joins the user's current plan, and the migration when pricing changes. The race fix is one line of SQL; the bookkeeping around it is the actual project.

// One statement - the check and the increment cannot be separated
const result = await db.execute({
  sql:
    'UPDATE counters SET used = used + 1 ' +
    'WHERE user_id = ? AND period_start = ? AND used < quota',
  args: [userId, periodStart],
});

if (result.rowsAffected === 0) {
  throw new Error('limit_reached'); // no row matched: at quota
}

const image = await openai.images.generate({ prompt });
// If the call fails, you already counted it. The refund
// decrement in your catch block is your problem now - see fix #3.

Fix #2: Redis atomic increment

Redis INCR is atomic, so the classic move is to increment first and inspect the returned value: if it came back over the limit, decrement and refuse. That works, and a small Lua script makes it tidier by doing the check and the conditional increment in one round trip - Redis executes scripts atomically, so nothing interleaves. The cost is operational rather than logical. You have added a second piece of infrastructure whose durability now matters: with default persistence settings, a restart or an eviction under memory pressure erases counters, and an erased counter means every affected user silently gets a fresh quota. You also still own the period problem - the key needs the billing period baked into it, or a TTL that expires exactly at the cycle boundary, and TTLs that drift from the billing date reintroduce the misalignment you were trying to engineer away. Fast and correct, yes. Free of bookkeeping, no.

// Lua: check + increment as one atomic unit inside Redis
const GATE =
  'local used = redis.call("INCRBY", KEYS[1], ARGV[1]) ' +
  'if used > tonumber(ARGV[2]) then ' +
  '  redis.call("DECRBY", KEYS[1], ARGV[1]) return 0 ' +
  'end return 1';

const key = 'usage:' + userId + ':2026-06'; // period in the key
const allowed = await redis.eval(GATE, 1, key, '1', '10');
if (allowed === 0) throw new Error('limit_reached');

Fix #3: reserve before the call, settle after

Both fixes above share a quieter flaw: they count the event before the AI call runs, and AI calls fail. Provider timeouts, content-filter refusals, malformed responses - in every case the user paid quota for nothing, and the refund decrement in your catch block is precisely the line that never executes when your process dies mid-call. The structurally correct shape for a slow, expensive, failure-prone operation is a reservation. Reserve atomically checks and holds the quota before the call - the same indivisible check-and-increment as fix #1, so the race stays closed. Then run the AI call. On success, commit the reservation and the hold becomes a permanent count. On failure, release it and the quota returns to the user instantly. If your process crashes between reserve and either verdict, a TTL auto-releases the hold - 60 seconds is a good default, long enough to outlast almost any real inference call, short enough that a crashed worker leaks quota for one minute at most. It is the transaction pattern applied to a counter: hold, do the work, then confirm or roll back. Fixes #1 and #2 close the race; only this one also closes the refund.

The pattern in practice

Here is the reservation flow as application code, using @vevee/sdk, where reserve, commit, and release are single methods backed by a hosted atomic counter. reserve returns an allowed flag and a reservation id; commit confirms the hold after the provider call succeeds; release in the catch block hands the quota straight back. Note what the error path costs the user: nothing. A failed render is a released reservation, not a burned credit - and if the release itself never runs because the worker died, the TTL covers that too.

import { createClient } from '@vevee/sdk';

const vevee = createClient({ apiKey: process.env.VEVEE_SECRET_KEY! });

export async function generateImage(userId: string, prompt: string) {
  const r = await vevee.reserve(userId, 'image_generation', 1);
  if (!r.allowed) throw new Error('limit_reached'); // show the paywall

  try {
    const image = await openai.images.generate({ prompt });
    await vevee.commit(r.reservationId!);  // hold becomes a count
    return image;
  } catch (e) {
    await vevee.release(r.reservationId!); // quota back, instantly
    throw e;
  }
}

When check-then-act is honestly fine

Not every limit deserves this machinery. If you are gating a cheap text model at a fraction of a cent per call, the worst-case overshoot from the race is a rounding error on your bill. If your free tier allows fifty generations and a racing user lands on fifty-two, nothing meaningful happened - that limit exists to shape behavior, not to enforce a contract to the cent. In those cases a plain check-then-track pair is easier to read, easier to debug, and one fewer failure mode, and the overshoot is bounded by how many requests the user can genuinely have in flight at once. Reserve where the units are expensive - video seconds, premium image models, credit packs the user paid real money for - and where an overshoot is a broken promise rather than noise. Let the dollar value of one duplicate request pick the pattern, not the elegance of the pattern itself.

Decide once, per event type

The decision rule fits in a sentence: if one event past the limit costs real money or breaks a paid promise, use reserve/commit/release; otherwise an atomic conditional increment - in your database or in Redis - is enough. The naive read-compare-act pattern is never the right answer for anything you bill on. Whichever fix you choose, the remaining work is the same and it is most of the work: counters keyed by user and period, rollover at each cycle boundary, quota lookups per plan, and a refund path that survives crashes. Vevee implements reserve/commit/release - along with those counters, periods, and plan definitions - as a drop-in API with a free tier, if you would rather wire two method calls than own that schema.

More from the blog

engineering · 9 min

How to build a credits system for your AI app (ledger design, rollover, refunds)

A user emails: "I had 40 credits this morning and now I have 12, and I only made one image." If your balance is a column you mutated, you cannot answer that email. Here is the ledger design that can.

thinking · 7 min

Stripe metered billing for AI apps: what it solves and what it does not

You are sketching the pricing page for your AI app and every plan ends in "per generation" - because your costs arrive per generation. Stripe meters the charging half beautifully. The question is what happens at request time.

thinking · 8 min

Helicone is in maintenance mode. An honest map of where to go

Somewhere in your codebase a base URL points at Helicone, and it has been quietly doing four jobs at once. The team got acquired - good for them - but now you have to replace each job separately, and the one everyone forgets is the one your users will exploit.

engineering · 8 min

How to stop free-tier abuse without killing signups

The usage graph spikes at 3:41am: four thousand free generations in an hour, all from disposable emails. Your first instinct is a credit-card wall. That instinct will cost you more than the abuse does.

thinking · 7 min

Tokens, credits, or requests: choosing the unit you meter (and price)

The pricing doc has three columns: $9, $19, $49. The prices took twenty minutes. The row above them - what a user actually gets for the money - has been blank for a week. That blank row is the real decision.

engineering · 3 min

I rewrote my cancel flow with one LLM call. It argues better than I do.

The cancel flow is where SaaS revenue goes to die politely. "Are you sure? You'll lose access to Premium features" has never changed a single mind - but reminding a user of their own 312 generations this month is an argument.

engineering · 3 min

The trial-ending email everyone sends is the same email. Here's the one that converts

"Your trial ends in 3 days! Upgrade now to keep access." You've received a hundred of these. You've deleted a hundred of these. The email fails because it's about your product, not about the user's trial.

engineering · 3 min

Stop making users do math on your pricing page: recommend their plan from their own usage

Every pricing page asks the user to solve an estimation problem: "How many credits will I need per month?" Nobody knows. But for any user who has actually used your product, their usage history IS the answer - here is how to put it on the pricing page.

engineering · 3 min

Win-back emails fail because they're written for "users." Write them for the one user instead.

Every dormant-user campaign in history: "We miss you! Here's what's new." Open rate: pity clicks. The email fails because it's about your changelog - and the user left because of something in their experience.

engineering · 3 min

Spotify Wrapped is a growth loop, not a year-end gimmick. Ship one for your AI app in an afternoon.

Spotify Wrapped works because people love seeing their own behavior reflected back as a story. Every AI app with usage data can run that loop monthly - and almost none do, because turning usage rows into narrative used to be a content problem. It is now a schema problem.

engineering · 3 min

"AI-personalized copy converts better" - prove it or delete it. Here is the 40-line A/B harness

Half of the "we added AI personalization and conversions went up 40%" posts have no control group. The other half measured clicks, not revenue. If LLM-generated copy is going in front of your paywall, you owe yourself a real experiment - and the harness is tiny.

thinking · 3 min

AI personalization without the creepy part: opt-out as a first-class return value

Users have learned that "personalized for you" means "we mined everything you ever did" - and the backlash is rational. When I added LLM-generated personalized copy to my app, the part I sweated was not the generation. It was making declining it a real, respected choice.

engineering · 3 min

I stopped walking into demo calls blind: every lead now comes with a usage brief written 10 seconds before the call

Founder-led sales has one structural weakness: you have no time to prep. If the lead has touched your product, their usage history is the best discovery call you'll never have to run - here's how I turn it into a one-page brief, automatically, before every call.

thinking · 4 min

The "founder email" converts like crazy and scales like garbage. Here is the middle path.

A personal email from the founder converts trials at a rate no automated sequence touches - and at 50 signups a week it stops scaling. The middle path: drafts generated from each user's real usage, that you read, edit, and send yourself.

engineering · 3 min

Churn doesn't announce itself. My Monday Slack digest does it instead.

Every founder finds out about churn the same way: the cancellation email. By then the user has been gone for weeks - the decision happened earlier, quietly, in their usage. So I made the metering tables write me a memo.

engineering · 3 min

Half of every support ticket is asking what the user already did. Attach the answer instead.

The ticket says "it's not working" - and the next twenty minutes go to figuring out who this user is, what plan they're on, and whether "it" is a bug, a quota, or a misunderstanding. The actual fix usually takes two. All of that context lives in your usage data; here is how to attach it to every ticket automatically.

thinking · 4 min

Your users are telling you your roadmap, in writing, every day. It's in your prompt logs

Founders pay for interviews and beg for survey responses to learn what people are actually trying to do. Meanwhile, your users type their intent into your product, in their own words, hundreds of times a day - and nobody performs for a prompt box.

engineering · 5 min

How to reset usage limits when a subscription renews

A weekly plan, used twice. First week: ten generations. Second week: zero, because the counter never reset. This is the cron-job mistake - and the fix is one field on one call.

engineering · 6 min

How to manage subscription renewals: aligning Vevee with Stripe

A user signs up on Jan 15. Stripe charges them on the 15th of every month. Your metering layer resets on the 1st. Two clocks. One angry support ticket per cycle.

engineering · 5 min

Stop hardcoding your pricing page - render it from your metering layer

Every B2B SaaS I have shipped repeats the same mistake: plans live in two places - the dashboard that enforces them, and a const PLANS = [...] on the marketing site. They drift within a quarter.

thinking · 4 min

Meter AI by user, not by account - your margin depends on it

A few users will cost you 100x what your median user costs. If you only meter at the account level, you will not see them coming until your gross margin is gone.

engineering · 5 min

reserve / commit / release: the only correct way to enforce AI quotas

Every team I have seen build per-user AI metering has shipped a version of canUse → call OpenAI → track. It looks correct in single-threaded tests. It is broken in production.

thinking · 4 min

Why Stripe Billing is not enough for AI products

Stripe is excellent at one thing: turning usage into invoices. AI products need three other things, and Stripe does not do any of them.

engineering · 6 min

Dynamic onboarding: a different first step for every user

A teacher and a student sign up the same minute. The teacher wants to build a quiz; the student wants to summarize a lecture. Your onboarding shows them both the same five-step checklist. One of them bounces.

thinking · 6 min

Paywall copy that rewrites itself for every user

Your paywall says "Upgrade to Pro for unlimited generations." A teacher reads it and shrugs. A student on a budget reads it and closes the tab. The same words, two lost conversions - because the words were written for nobody in particular.

engineering · 5 min

Add usage limits to your AI app in 10 minutes (no backend required)

You shipped an AI feature on Friday. By Monday one user had burned $212 of OpenAI credit on your free tier. The fix is not a TODO comment that says "add rate limiting" - it is two method calls.

engineering · 5 min

Meter LLM tokens, not requests - your flat per-request limit is lying

Two users, both at 100 requests. One sent tweets, the other sent novels. Your cost for them differed by 400x. Your limit treated them identically - and your margins noticed.

engineering · 7 min

The upgrade nudge that writes itself: convert free users before they hit the wall

By the time a user hits your paywall, they are blocked, annoyed, and halfway to a competitor’s signup page. The best moment to make the pitch was three days earlier - when they were winning. Here is how to catch it, automatically, for every user at once.

engineering · 5 min

One event, two limits: gate your premium model without forking your code

The premium model launch was going great until you looked at the bill: free users had figured out the good model and were living on it. You need a sub-limit. You do not need a second code path.

engineering · 5 min

Test mode: break your pricing in the sandbox, not on your customers

You changed the free tier from 10 to 25 generations and somehow locked out every Pro user for an hour. Nobody tested it, because testing it meant tracking fake events into production analytics. There is a mode for this.

engineering · 5 min

The support ticket that solves itself: log the prompt behind every AI event

A user says your AI feature "broke" on Tuesday. You have a charge for the call, a timestamp, and no idea what they asked or what the model said. The evidence existed for exactly one request - the one you didn’t log.

engineering · 5 min

Your funnel has one broken step. Find it without writing a single SQL query.

A hundred people saw your paywall this week. Three upgraded. Is that a copy problem, a price problem, or did ninety of them never generate anything worth paying for? You cannot fix what you cannot locate.

engineering · 5 min

One user, three ghosts: fix your funnel with identify()

Your funnel says signup conversion is 4%. It is actually 11%. The missing users didn’t bounce - they came back on another tab and got counted as someone new. Every number downstream of that split is wrong.

thinking · 5 min

The usage bar that sells the upgrade (build it with a public key in an afternoon)

An invisible limit feels like a trap. A visible one feels like a fuel gauge. Same quota, same plan, same user - and a measurably different reaction when the wall finally arrives.

engineering · 6 min

How to cancel a subscription without burning the bridge (or your data)

The user cancels. Do they drop to a free tier, or lose access when the paid month runs out? Those are different products, different SDK calls, and different mistakes when you get them wrong.