AIPricingLabBlog · 5 min read
Blog · 5 min read

reserve / commit / release: the only correct way to enforce AI quotas

Every team I have seen build per-user AI metering has shipped a version of canUse → call OpenAI → track. It looks correct in single-threaded tests. It is broken in production.

Last updated: 2026-05-10

The naive pattern

Step 1: check if the user has quota (canUse). Step 2: call OpenAI. Step 3: increment the counter (track). It compiles, the unit tests pass, you ship.

Why it is broken

There is a window between step 1 and step 3 where another request from the same user can also pass step 1. Now both requests proceed. Both call OpenAI. Both increment. The user has used 2x their quota; you paid for 2x; your enforcement is decorative.

reserve closes the race

vevee.reserve() does steps 1 and 3 atomically. The check and the increment happen as one indivisible operation, with a 60-second TTL on the held quota. If two requests reserve at the same instant, exactly one succeeds.

commit confirms, release rolls back

After your AI call succeeds, vevee.commit() makes the increment permanent. After failure, vevee.release() rolls it back. If your code crashes between reserve and either, the reservation auto-releases after 60 seconds - no orphan quota.

The 60-second number is deliberate

Long enough to outlast almost every real AI call (LLM streams, image rendering, agent loops). Short enough that a crashed worker can only leak quota for one minute. If your AI calls genuinely run longer than 60 seconds, design your reservation as a check-out / check-in pair with periodic heartbeats.

It is the same pattern Postgres uses

reserve / commit / release is just a transaction over a counter, with a TTL. SELECT … FOR UPDATE held in a 60-second-or-die transaction would do the same thing. We built it as a primitive in AIPricingLab so you do not have to.

More from the blog