Use case

Image generation quotas: per-user limits for DALL·E, Flux, Stable Diffusion

Enforce per-user quotas on image generation across DALL·E, Flux, Stable Diffusion, Midjourney API, and Replicate. Atomic reservation pattern stops parallel renders from overshooting. Free tier, premium tier, hard caps - drop in.

Last updated: 2026-05-10

The problem

Image generation is expensive. A single Flux Pro call costs cents; one user spamming a script can burn dollars in seconds. You need hard per-user quotas, and you need them to actually hold under concurrency.

You also probably want tiers - "free users get 20 images / month, premium gets 500, premium-plus gets 2,000" - with optional model-specific sub-quotas (e.g. premium gets unlimited SD but only 100 Flux Pro).

Building this means a per-user counter, a per-model counter, atomic decrement, period reset on the right anchor (calendar vs subscription_start), and an admin UI to bump a customer's limit when they email support.

The solution

Define a plan with limit groups: total_images (count), premium_images (count, matched only when model="flux-pro"), and a tokens or cents group if you also want $-based caps. One render event hits all matching groups.

Use reserve / commit / release to gate the call: reserve before the AI provider is called, commit on success, release on error. Reservations auto-release after 60 seconds so a crashed worker can't leak quota.

Use the dashboard to bump per-user limits, switch a user between plans, or block abusive accounts. All without writing a counter table.

Example

Reserve before calling Flux. If the user is at quota, the SDK throws cleanly. On success, commit; on failure, release and the reservation auto-rolls back.

import { createClient } from "@vevee/sdk";

const vevee = createClient({ apiKey: process.env.VEVEE_KEY! });

export async function generateImage(
  userId: string,
  prompt: string,
  model: "flux-pro" | "sd-3.5",
) {
  const r = await vevee.reserve(userId, "image.render", 1, { model });
  if (!r.allowed) {
    return { error: "limit_reached", reasons: r.reasons };
  }

  try {
    const image = await callImageProvider(prompt, model);
    await vevee.commit(r.reservationId!);
    return { image };
  } catch (err) {
    await vevee.release(r.reservationId!);
    throw err;
  }
}

Hard caps vs soft caps

For free tiers, you almost always want hard caps - block the call, return 429, prompt to upgrade. For paid tiers, soft caps with overage are sometimes preferable: let the call through and count it for invoicing. Vevee supports both: hard caps via reserve(), soft caps via track() with no canUse() check.

Model-specific sub-quotas

Limit groups can be filtered by metadata. Define total_images with no filter (catches every render) and premium_models with { model: ["flux-pro", "imagen-3"] } (only catches premium model renders). One event hits both groups; either being over quota blocks the call.

Period anchor: calendar vs subscription-start

Calendar periods reset on the 1st of the month UTC for everyone - simple, predictable. Subscription-start periods reset on the user's subscription anniversary - friendlier for paid customers. You can mix anchors per limit group.

Stopping plan-cycling abuse

A user could otherwise free-trial → upgrade → cancel → free-trial again to keep getting fresh quotas. Use onPlanChange: "block" on the relevant limit groups: counters are pre-filled to quota until the next period, closing the loop.

Frequently asked questions

Does this work with Midjourney?

Yes. Midjourney's API is just one more provider - define an event like image.render with metadata { provider: "midjourney" } and Vevee meters it like any other.

Can I cap users by dollar cost instead of image count?

Yes. Use unit "cents" on the limit group and track the dollar cost of each render. You can stack count and cents limits on the same plan.

What happens if my image worker crashes mid-call?

The reservation auto-releases after 60 seconds. Quota is restored to the user. No orphan locks.

How do I let users see their remaining image quota?

Use a pk_live_ public key in the browser and call vevee.usage(userId). It returns the user's counters with remaining quota and reset times. Safe to expose in client code.

Other use cases