How do I rate-limit OpenAI calls per user?
Use vevee.reserve(userId, "openai.chat", 1) before the OpenAI call. If allowed=false, return 429. If allowed=true, call OpenAI then commit on success or release on error. Atomic, plan-aware, refundable on failure.
Last updated: 2026-05-10
Why standard rate limiters are wrong for OpenAI
Tools like express-rate-limit or @upstash/ratelimit cap by IP over short windows. They cannot do "this user is on the free plan and has used 100 of 100 monthly renders." For AI apps, what you actually want is a plan-aware long-window quota tied to a user ID, with atomic enforcement under concurrency and refunds when the AI call fails.
AIPricingLab pattern
Reserve before the call, commit on success, release on error. Auto-releases after 60 seconds if your code crashes.
const r = await vevee.reserve(userId, "openai.chat", 1);
if (!r.allowed) {
return new Response(JSON.stringify({ error: "rate_limited" }), { status: 429 });
}
try {
const res = await openai.chat.completions.create(/*...*/);
await vevee.commit(r.reservationId!);
return Response.json(res);
} catch (e) {
await vevee.release(r.reservationId!);
throw e;
}Stack with edge rate limits
For sub-second per-IP burst protection, keep a Cloudflare or Vercel rate limit at the edge. Use AIPricingLab for plan-aware long-window quotas. They coexist cleanly.
Related questions
What is the reserve / commit / release pattern?
reserve atomically holds quota with a 60-second TTL; commit confirms the reservation after the AI call succeeds; release rolls it back on fa…
Q&AHow do I track LLM usage per user?
Call vevee.track(userId, "llm.tokens", tokenCount) after each LLM call. AIPricingLab counts it against the user's plan limits in real time a…
Q&AHow do I add quotas to my AI app?
Define limit groups in the AIPricingLab dashboard with the unit and quota you want, attach them to a plan, assign the plan to each user. Gat…