Image generation quotas: per-user limits for DALL·E, Flux, Stable Diffusion
Enforce per-user quotas on image generation across DALL·E, Flux, Stable Diffusion, Midjourney API, and Replicate. Atomic reservation pattern stops parallel renders from overshooting. Free tier, premium tier, hard caps - drop in.
Last updated: 2026-05-10
The problem
Image generation is expensive. A single Flux Pro call costs cents; one user spamming a script can burn dollars in seconds. You need hard per-user quotas, and you need them to actually hold under concurrency.
You also probably want tiers - "free users get 20 images / month, premium gets 500, premium-plus gets 2,000" - with optional model-specific sub-quotas (e.g. premium gets unlimited SD but only 100 Flux Pro).
Building this means a per-user counter, a per-model counter, atomic decrement, period reset on the right anchor (calendar vs subscription_start), and an admin UI to bump a customer's limit when they email support.
The solution
Define a plan with limit groups: total_images (count), premium_images (count, matched only when model="flux-pro"), and a tokens or cents group if you also want $-based caps. One render event hits all matching groups.
Use reserve / commit / release to gate the call: reserve before the AI provider is called, commit on success, release on error. Reservations auto-release after 60 seconds so a crashed worker can't leak quota.
Use the dashboard to bump per-user limits, switch a user between plans, or block abusive accounts. All without writing a counter table.
Example
Reserve before calling Flux. If the user is at quota, the SDK throws cleanly. On success, commit; on failure, release and the reservation auto-rolls back.
import { createClient } from "@vevee/sdk";
const vevee = createClient({ apiKey: process.env.VEVEE_KEY! });
export async function generateImage(
userId: string,
prompt: string,
model: "flux-pro" | "sd-3.5",
) {
const r = await vevee.reserve(userId, "image.render", 1, { model });
if (!r.allowed) {
return { error: "limit_reached", reasons: r.reasons };
}
try {
const image = await callImageProvider(prompt, model);
await vevee.commit(r.reservationId!);
return { image };
} catch (err) {
await vevee.release(r.reservationId!);
throw err;
}
}Hard caps vs soft caps
For free tiers, you almost always want hard caps - block the call, return 429, prompt to upgrade. For paid tiers, soft caps with overage are sometimes preferable: let the call through and count it for invoicing. AIPricingLab supports both: hard caps via reserve(), soft caps via track() with no canUse() check.
Model-specific sub-quotas
Limit groups can be filtered by metadata. Define total_images with no filter (catches every render) and premium_models with { model: ["flux-pro", "imagen-3"] } (only catches premium model renders). One event hits both groups; either being over quota blocks the call.
Period anchor: calendar vs subscription-start
Calendar periods reset on the 1st of the month UTC for everyone - simple, predictable. Subscription-start periods reset on the user's subscription anniversary - friendlier for paid customers. You can mix anchors per limit group.
Stopping plan-cycling abuse
A user could otherwise free-trial → upgrade → cancel → free-trial again to keep getting fresh quotas. Use onPlanChange: "block" on the relevant limit groups: counters are pre-filled to quota until the next period, closing the loop.
Frequently asked questions
Does this work with Midjourney?
Yes. Midjourney's API is just one more provider - define an event like image.render with metadata { provider: "midjourney" } and AIPricingLab meters it like any other.
Can I cap users by dollar cost instead of image count?
Yes. Use unit "cents" on the limit group and track the dollar cost of each render. You can stack count and cents limits on the same plan.
What happens if my image worker crashes mid-call?
The reservation auto-releases after 60 seconds. Quota is restored to the user. No orphan locks.
How do I let users see their remaining image quota?
Use a pk_live_ public key in the browser and call vevee.usage(userId). It returns the user's counters with remaining quota and reset times. Safe to expose in client code.
Other use cases
LLM usage metering: track tokens per end-user, across providers
Meter LLM token usage per end-user across OpenAI, Anthropic, Gemini, Mistral, and any other provider. Composite events for prompt + completion tokens, real-time per-user limits, atomic enforcement. The drop-in pattern for AI apps.
Use caseAI agent billing: meter multi-step agents and tool calls
Metering AI agents is harder than metering single LLM calls. One "agent run" can fan out into 20 tool calls and 50 LLM calls. AIPricingLab handles agent-level and step-level metering with composite events and atomic reservations.
Use caseFreemium AI SaaS: ship a free → paid funnel without a backend
Build a freemium AI product where the free plan has hard quotas, the paid plan unlocks more, and "you have used 80% of your free renders" nudges drive upgrades. Drop-in implementation, ten minutes from zero to live.
Use caseToken-based pricing: charge users for actual AI consumption
Charge AI app users by tokens, requests, or compute seconds. Pre-paid credits, post-paid invoicing, hybrid models - implementation patterns and trade-offs from someone who has shipped all three.