VeveeBlog · 8 min read
Blog · 8 min read

Helicone is in maintenance mode. An honest map of where to go

Somewhere in your codebase a base URL points at Helicone, and it has been quietly doing four jobs at once. The team got acquired - good for them - but now you have to replace each job separately, and the one everyone forgets is the one your users will exploit.

Last updated: 2026-06-09

What happened, without the drama

Somewhere in your codebase there is one config line that swaps your OpenAI or Anthropic base URL for Helicone’s, and it has been quietly earning its keep since the day you shipped. In 2026 the Helicone team was acquired by Mintlify and the hosted product moved into maintenance mode. Let’s be clear about the register of this piece: that is a success story, not a cautionary tale. Helicone was open-source, genuinely good, and widely loved - the kind of tool people recommended to each other unprompted - and acquisition is what happens to teams that build something thousands of AI apps depend on. But maintenance mode means no new features, an uncertain horizon, and a migration on your roadmap whether you scheduled one or not. The useful way to spend that migration is not to hunt for "the Helicone alternative." It is to notice how many distinct jobs that single base-URL swap was doing for you, because the honest answer to "what replaces Helicone" is "it depends which Helicone you were using."

Helicone was never one job

Because Helicone sat as a proxy in front of every AI call, it could do unrelated jobs with zero marginal integration cost - everything flowed through it anyway. Unbundle what you were actually getting and there are four: request logging and tracing, so you could debug why a prompt misbehaved last Tuesday; cost visibility, so you could see what you were spending per model and per user; response caching, so repeated prompts didn’t bill you twice; and rate limiting, including per-user limits, so one user couldn’t drain your provider budget. One product, one URL swap, four jobs. The replacements are different tools for different jobs, and that is fine - arguably healthier, since each lane now has specialists that go deeper than a proxy ever could. The mistake to avoid is replacing the job you stare at most (the dashboard) and silently dropping the jobs that ran in the background (the limits). Walk the four lanes one at a time.

Tracing and debugging: the observability lane

If your Helicone habit was opening the request log to figure out why an agent looped or which prompt version produced the bad output, you need an LLM observability tool, and this lane has clear residents. Langfuse is the natural first stop: open-source like Helicone was, self-hostable if that mattered to you, SDK-based rather than proxy-based, with tracing that handles multi-step agent chains well. Lunary plays in the same lane with observability plus analytics. When choosing, check three things honestly. First, integration model: these are SDK tools, so unlike the proxy you instrument your code - more setup, but your provider call stays direct. Second, self-hosting: if part of Helicone’s appeal was running it yourself, confirm the self-hosted tier actually includes what you use, not just the core. Third, framework fit: if you are on LangChain, LlamaIndex, or the Vercel AI SDK, the quality of the official integration will dominate your day-to-day far more than any feature matrix. This is the lane most migration guides cover, so we will not belabor it - it is also the lane you are least likely to forget.

Caching: the job that moved into the stack

Helicone’s response caching was a real money-saver in 2023, when providers billed every token of every identical request at full price. That world is gone. Anthropic and OpenAI both ship provider-side prompt caching now - repeated prefixes (system prompts, few-shot examples, long documents) are billed at a fraction of the normal input rate, handled inside the provider with no middleman. For exact-duplicate full responses, a thin application-level cache - a hash of the request body in Redis or even your existing database - covers the remainder in an afternoon. This is the rare migration lane where the honest advice is that you probably do not need a vendor anymore: the job did not move to a different tool, it moved into the stack you already run. Check your Helicone cache-hit rate before deciding; if it was low single digits, you can drop the job entirely and nobody will notice.

Cost-per-user and limits: the job teams under-replace

Here is the trap. Observability tools will show you cost per user - Langfuse and Lunary both have the dashboard where you discover that user X spent $80 this month. What they will not do is stop user X at $5. Visibility and enforcement look adjacent on a feature list and are entirely different machinery underneath: one is a query over logs, the other is an atomic check that has to run before the AI call and hold up under concurrent requests. Helicone blurred this line because the proxy could do both - it saw every request, so it could count and it could block. Its observability-lane successors inherit the counting, not the blocking. So audit your old setup honestly: if you had per-user rate limits configured, or if you stared at the per-user cost page before deciding whom to invoice or cut off, you were using Helicone as a metering and enforcement layer, and a tracing dashboard does not replace that. This is the lane Vevee lives in: per-end-user metering, plan limits defined in a dashboard, and an SDK that gates the call before it runs - enforcement as the product, not a side effect of logging. Whatever you pick for this lane, pick something, because this is the job whose absence your users will find before you do.

Proxy versus SDK: the architectural fork

Helicone’s magic was the integration cost: change one base URL, get everything. It is worth being honest about what that swap cost you in exchange - a vendor in the hot path of every AI call. Every request paid the proxy’s latency, and every request shared the proxy’s availability: if it degraded, your AI features degraded with it, which is precisely the property that makes maintenance mode uncomfortable. The replacements in both lanes - Langfuse for tracing, Vevee for metering - are SDK-based, and that is not an accident. Your provider call stays direct: you call OpenAI or Anthropic yourself, and the tool is consulted beside the call (a gate before, a track after) rather than sitting inside it. If the metering layer has a bad day, your calls still complete; the failure mode is a missed log line or a fail-open gate, not a down product. The honest trade-off in the other direction: the URL swap was genuinely easier. An SDK means two extra calls around each AI request and an afternoon of integration instead of a one-line diff. You are trading a few hours of setup for getting a third party out of your request path - a trade most teams would have taken anyway once they noticed they were making it.

The migration checklist

The migration is mechanical once you know which jobs you were actually using. Do it in this order - the ordering is the part that matters, because one step has a failure mode the others do not.

  • Inventory the four jobs. Open your Helicone config and dashboard usage, and write down which of tracing, cost visibility, caching, and limits you actually relied on - not which features exist.
  • Export your historical data while the hosted product is still up. Request logs and cost history are cheap to export today and impossible to reconstruct later.
  • Pick a replacement per job, not one tool for everything: observability lane for tracing, provider-side caching plus a thin app cache, a metering layer for per-user costs and limits.
  • Stand up the new limits BEFORE removing the proxy. The day the base URL changes back, every rate limit configured in Helicone stops existing - and unlike a missing dashboard, missing limits announce themselves as a provider bill, found by exactly the users who were being limited.
  • Re-point the base URL back to your provider last, once tracing is capturing and limits are enforcing in the new setup.

Where this leaves you

Helicone earned its place by making four jobs feel like one, and the right way to honor that is to replace it deliberately rather than nostalgically. Tracing goes to the observability lane - Langfuse if you want open-source and self-hostable, Lunary in the same orbit. Caching goes into your stack, mostly via your provider. And the per-user cost and limits job - the one that was protecting your margin while you looked at the tracing tab - goes to a metering layer built for enforcement. That last lane is what Vevee is for: per-end-user metering, plan limits you define in a dashboard, race-safe reserve/commit gating before the call runs, with a free tier and a $15/mo Pro that you can evaluate against your real traffic in an afternoon. Whichever tools you land on, the unbundling itself is the win: each job now has an owner that treats it as the whole product, instead of one proxy treating all four as features.

More from the blog

engineering · 9 min

The race condition in "if (usage < limit)" that is costing your AI app money

A user at nine of ten images opens six tabs and clicks generate in all of them. Every tab passes your limit check, every tab gets an image, and you pay for all six. The bug is one read-then-write - and your unit tests will never catch it.

engineering · 9 min

How to build a credits system for your AI app (ledger design, rollover, refunds)

A user emails: "I had 40 credits this morning and now I have 12, and I only made one image." If your balance is a column you mutated, you cannot answer that email. Here is the ledger design that can.

thinking · 7 min

Stripe metered billing for AI apps: what it solves and what it does not

You are sketching the pricing page for your AI app and every plan ends in "per generation" - because your costs arrive per generation. Stripe meters the charging half beautifully. The question is what happens at request time.

engineering · 8 min

How to stop free-tier abuse without killing signups

The usage graph spikes at 3:41am: four thousand free generations in an hour, all from disposable emails. Your first instinct is a credit-card wall. That instinct will cost you more than the abuse does.

thinking · 7 min

Tokens, credits, or requests: choosing the unit you meter (and price)

The pricing doc has three columns: $9, $19, $49. The prices took twenty minutes. The row above them - what a user actually gets for the money - has been blank for a week. That blank row is the real decision.

engineering · 3 min

I rewrote my cancel flow with one LLM call. It argues better than I do.

The cancel flow is where SaaS revenue goes to die politely. "Are you sure? You'll lose access to Premium features" has never changed a single mind - but reminding a user of their own 312 generations this month is an argument.

engineering · 3 min

The trial-ending email everyone sends is the same email. Here's the one that converts

"Your trial ends in 3 days! Upgrade now to keep access." You've received a hundred of these. You've deleted a hundred of these. The email fails because it's about your product, not about the user's trial.

engineering · 3 min

Stop making users do math on your pricing page: recommend their plan from their own usage

Every pricing page asks the user to solve an estimation problem: "How many credits will I need per month?" Nobody knows. But for any user who has actually used your product, their usage history IS the answer - here is how to put it on the pricing page.

engineering · 3 min

Win-back emails fail because they're written for "users." Write them for the one user instead.

Every dormant-user campaign in history: "We miss you! Here's what's new." Open rate: pity clicks. The email fails because it's about your changelog - and the user left because of something in their experience.

engineering · 3 min

Spotify Wrapped is a growth loop, not a year-end gimmick. Ship one for your AI app in an afternoon.

Spotify Wrapped works because people love seeing their own behavior reflected back as a story. Every AI app with usage data can run that loop monthly - and almost none do, because turning usage rows into narrative used to be a content problem. It is now a schema problem.

engineering · 3 min

"AI-personalized copy converts better" - prove it or delete it. Here is the 40-line A/B harness

Half of the "we added AI personalization and conversions went up 40%" posts have no control group. The other half measured clicks, not revenue. If LLM-generated copy is going in front of your paywall, you owe yourself a real experiment - and the harness is tiny.

thinking · 3 min

AI personalization without the creepy part: opt-out as a first-class return value

Users have learned that "personalized for you" means "we mined everything you ever did" - and the backlash is rational. When I added LLM-generated personalized copy to my app, the part I sweated was not the generation. It was making declining it a real, respected choice.

engineering · 3 min

I stopped walking into demo calls blind: every lead now comes with a usage brief written 10 seconds before the call

Founder-led sales has one structural weakness: you have no time to prep. If the lead has touched your product, their usage history is the best discovery call you'll never have to run - here's how I turn it into a one-page brief, automatically, before every call.

thinking · 4 min

The "founder email" converts like crazy and scales like garbage. Here is the middle path.

A personal email from the founder converts trials at a rate no automated sequence touches - and at 50 signups a week it stops scaling. The middle path: drafts generated from each user's real usage, that you read, edit, and send yourself.

engineering · 3 min

Churn doesn't announce itself. My Monday Slack digest does it instead.

Every founder finds out about churn the same way: the cancellation email. By then the user has been gone for weeks - the decision happened earlier, quietly, in their usage. So I made the metering tables write me a memo.

engineering · 3 min

Half of every support ticket is asking what the user already did. Attach the answer instead.

The ticket says "it's not working" - and the next twenty minutes go to figuring out who this user is, what plan they're on, and whether "it" is a bug, a quota, or a misunderstanding. The actual fix usually takes two. All of that context lives in your usage data; here is how to attach it to every ticket automatically.

thinking · 4 min

Your users are telling you your roadmap, in writing, every day. It's in your prompt logs

Founders pay for interviews and beg for survey responses to learn what people are actually trying to do. Meanwhile, your users type their intent into your product, in their own words, hundreds of times a day - and nobody performs for a prompt box.

engineering · 5 min

How to reset usage limits when a subscription renews

A weekly plan, used twice. First week: ten generations. Second week: zero, because the counter never reset. This is the cron-job mistake - and the fix is one field on one call.

engineering · 6 min

How to manage subscription renewals: aligning Vevee with Stripe

A user signs up on Jan 15. Stripe charges them on the 15th of every month. Your metering layer resets on the 1st. Two clocks. One angry support ticket per cycle.

engineering · 5 min

Stop hardcoding your pricing page - render it from your metering layer

Every B2B SaaS I have shipped repeats the same mistake: plans live in two places - the dashboard that enforces them, and a const PLANS = [...] on the marketing site. They drift within a quarter.

thinking · 4 min

Meter AI by user, not by account - your margin depends on it

A few users will cost you 100x what your median user costs. If you only meter at the account level, you will not see them coming until your gross margin is gone.

engineering · 5 min

reserve / commit / release: the only correct way to enforce AI quotas

Every team I have seen build per-user AI metering has shipped a version of canUse → call OpenAI → track. It looks correct in single-threaded tests. It is broken in production.

thinking · 4 min

Why Stripe Billing is not enough for AI products

Stripe is excellent at one thing: turning usage into invoices. AI products need three other things, and Stripe does not do any of them.

engineering · 6 min

Dynamic onboarding: a different first step for every user

A teacher and a student sign up the same minute. The teacher wants to build a quiz; the student wants to summarize a lecture. Your onboarding shows them both the same five-step checklist. One of them bounces.

thinking · 6 min

Paywall copy that rewrites itself for every user

Your paywall says "Upgrade to Pro for unlimited generations." A teacher reads it and shrugs. A student on a budget reads it and closes the tab. The same words, two lost conversions - because the words were written for nobody in particular.

engineering · 5 min

Add usage limits to your AI app in 10 minutes (no backend required)

You shipped an AI feature on Friday. By Monday one user had burned $212 of OpenAI credit on your free tier. The fix is not a TODO comment that says "add rate limiting" - it is two method calls.

engineering · 5 min

Meter LLM tokens, not requests - your flat per-request limit is lying

Two users, both at 100 requests. One sent tweets, the other sent novels. Your cost for them differed by 400x. Your limit treated them identically - and your margins noticed.

engineering · 7 min

The upgrade nudge that writes itself: convert free users before they hit the wall

By the time a user hits your paywall, they are blocked, annoyed, and halfway to a competitor’s signup page. The best moment to make the pitch was three days earlier - when they were winning. Here is how to catch it, automatically, for every user at once.

engineering · 5 min

One event, two limits: gate your premium model without forking your code

The premium model launch was going great until you looked at the bill: free users had figured out the good model and were living on it. You need a sub-limit. You do not need a second code path.

engineering · 5 min

Test mode: break your pricing in the sandbox, not on your customers

You changed the free tier from 10 to 25 generations and somehow locked out every Pro user for an hour. Nobody tested it, because testing it meant tracking fake events into production analytics. There is a mode for this.

engineering · 5 min

The support ticket that solves itself: log the prompt behind every AI event

A user says your AI feature "broke" on Tuesday. You have a charge for the call, a timestamp, and no idea what they asked or what the model said. The evidence existed for exactly one request - the one you didn’t log.

engineering · 5 min

Your funnel has one broken step. Find it without writing a single SQL query.

A hundred people saw your paywall this week. Three upgraded. Is that a copy problem, a price problem, or did ninety of them never generate anything worth paying for? You cannot fix what you cannot locate.

engineering · 5 min

One user, three ghosts: fix your funnel with identify()

Your funnel says signup conversion is 4%. It is actually 11%. The missing users didn’t bounce - they came back on another tab and got counted as someone new. Every number downstream of that split is wrong.

thinking · 5 min

The usage bar that sells the upgrade (build it with a public key in an afternoon)

An invisible limit feels like a trap. A visible one feels like a fuel gauge. Same quota, same plan, same user - and a measurably different reaction when the wall finally arrives.

engineering · 6 min

How to cancel a subscription without burning the bridge (or your data)

The user cancels. Do they drop to a free tier, or lose access when the paid month runs out? Those are different products, different SDK calls, and different mistakes when you get them wrong.