LLM subscriptions
2026-04-17
Another post in my LLM series, this time about LLM inference providers, labs, available subscriptions, and why it might be a worthwhile purchase at the time of writing. I also sprinkle in some LLM updates that you might have missed and round it out with my own perspective.
Inference and how to get it
Inference is the prediction made by the model as to what (output) tokens should follow other (input) tokens under consideration. In simple terms, by LLM inference we mean computations delegated to the model that give some output tokens in return. This inference computation is not cheap and usually constrained by the provider (the company or datacenter that runs the LLM).
There are a few ways to obtain it:
- using built-in integrations (like the G00gle Search "AI Overview")
- using dedicated tools (like Perplexity or DeepSeek apps)
- buying a subscription from the model provider
- accessing models directly through API calls to the model provider
As we go from top to bottom these options get more expensive, but they do get more flexible too.
Integrations and products
With a built-in integration you have no control over what models are in use, from what provider, what are the model capabilities. You are handed a ready-made product, sometimes only a fancy name, and expected to use it as intended. Search summary is a good example of such integration done right - many people started to rely on it and for certain queries that is more than you need. Other times these "AI" features land anywhere between an annoyance and an obstacle that the user has to fight in order to achieve the result they need. One of many reasons why "AI" gets well-deserved bad rep.
The situation is a little better with dedicated tools. Usually the choice of a tool is either tied to the provider in the first place or allows for switching the underlying model and selecting some preferences. Here's a very good example: T3 Chat is an LLM chat webapp that gives you access to all models for a monthly subscription fee. You get to choose which model you want to use, but the scope of this product subscription is limited to the product itself. You are not buying a subscription for the model inference, you are buying a subscription to access the product. The inference is just a part of it, required to make the product work. Another example that comes to mind is Perplexity. They charge a monthly fee for LLM powered websearch, giving you the option to choose which exact model you want to use for generating the answers. They only support a few selected models. The subscription cost is mainly for access to their app, not meant to provide you with raw model inference.
The cost of inference
Buying inference directly can be rather costly depending on the model. The best way to explore this option is through OpenRouter which is a common gateway for accessing different models served by different providers. You can clearly see how individual providers differ with the service quality: different latency, different uptime statistics, reliability, and most importantly - different inference throughput (tokens per second) for the same model. If you're building a product on top of LLMs, this is the way to go. You pay exactly for what you use. Picking the right model and provider will make all the difference for your project.
For personal use subscriptions may be the best deal right now. If you're not using LLMs much it will probably cost you more than paying for inference directly, but then again if you're not using LLMs much you may as well rely on the other two options discussed previously and avoid any additional costs. There are still many people who keep their subscriptions even though they get nowhere near the usage they pay for. That's one of the reasons why inference subscriptions can stay so cheap. They are heavily subsidized and many companies see them as an opportunity to sell people another product that isn't just inference. As long as you don't fall for this marketing trap, for heavy LLM usage, this comes out to be a really good deal. There are many subscriptions that live on the edge between being a product price and inference price. You should know what you're paying for and keep your hand on the pulse to drop subscriptions that lose their value. The market now is rather competitive, so there's plenty of options to choose from.
Best deals last year
GitHub Copilot
Last year my favorite option was the GitHub Copilot Pro plan. It's priced at 10 USD and gives you a monthly quota of "300 premium request tokens" which can be used for calling top models from 0penAI, G00gle, and Anthr0pic. The way it used to work was that it would consume a predefined amount of "premium request tokens" for each started session, so you could pretty much use a model indefinitely for free after that. Now each message consumes some predefined amount of premium request tokens (probably as intended from the start). Still, this amount is defined per model, regardless of the selected reasoning effort. For a single run of GPT-5.4 xhigh, which can take hours to finish and burn a ton of inference tokens, it is still a decent price. Obviously it is worse than what we had last year. Back then it was probably the most cost-effective option of all. It wasn't as straightforward to use in external tools as it is now, but it was always possible. Between the "premium request tokens" consumption fix and the release of GPT-5.4 that subscription was not worth the price. You could burn through the monthly quota in a matter of minutes trying to use it with Claude Opus.
Perplexity
There was also a brief moment in September last year when you could get a free Perplexity Pro annual subscription. I got on board and I'm getting quite a lot of use from it, often defaulting to the Perplexity mobile app for search. I kinda like it, but to be honest I would never use it if I had to pay for it at all. I set a reminder when I signed up to cancel and delete my account before this promotional subscription runs out. I don't even know how much exactly it costs now, but whatever it is I wouldn't say it's worth the money.
Z.AI
On a different note, another great catch of insane value from last year was the Z.AI GLM Coding Plan. I fell in love with the GLM-4.7 model when I finally tried it in December and I've been preaching it since.
The Lite plan, which is now 18 USD/month, was 3 USD/month back then. I got it, I tried it out, and I signed up for the annual subscription of the Pro plan. If I recall correctly I paid something like 180 USD for it. Now it's priced at 864 USD annually or 72 USD/month. I kinda wish I got the Max plan back then, because it was only slightly more expensive, but I doubt I could push it to the limit. Even on the Lite plan hitting a limit was very difficult.
GLM-4.7 is still a wonderful model and I knew I'd be very happy with it even if Z.AI didn't deliver another one for a year, but they followed with even stronger GLM-5 in February, then GLM-5-Turbo, and then recently GLM-5.1 - all extremely good and very underrated. In benchmarks GLM models are second only to GPT-5.4 and Opus 4.6 with high reasoning effort. In practice GLM models are often way better, depending on the task.
It's also worth noting how much cheaper GLM models are, how much faster, and how much more transparent. They are open-weight, MIT-licensed! The quality of GPT and Claude may degrade over time, we may lose access to these models eventually, but GLM models will stay available, you can download them right now yourself. They are only getting better with time as providers figure out how to run the model more efficiently at higher throughput. That gives me so much more confidence to rely on it, as opposed to constrained private western models.
Long story short - Z.AI's GLM models are my favorite and I'm really glad I got in at the right time.
Fierce competition in China
There are many strong competitors coming from China other than Z.AI. The main one is MiniMax, a company with a very wide offer of generative tools. They timed their previous two LLM releases to match up with Z.AI: M2.1 released a single day after GLM-4.7, M2.5 released the same day as GLM-5. About a month ago they released a new M2.7. I couldn't find a strong enough use-case for the previous two models, so I wasn't particularly keeping up with them, and I haven't yet tried M2.7. They have their own subscription offer. It's hard for me to assess the value, since I am not using it myself, but it does look promising.
I have, however, used a lot of the very impressive Kimi K2.5 model after it released in late January this year. The Moonshot AI lab is a world leader when it comes to LLM research. For some time between GLM releases their model was leading in the intelligence benchmarks too. Moonshot AI offers subscription plans through Kimi Code membership. The prices start at 20 USD/month. Back when I tried their lowest subscription tier I'd burn through the quota rather fast. I heard they have increased it since, but I'm not sure how much you can really get out of these plans. There are stronger models out there and better priced subscriptions. Kimi K2.5 itself is rather cheap even when you pay directly for the inference per token. I do recommend trying out the model, but not necessarily through this subscription. Definitely keep an eye out on future releases and the research that this team drops.
When it comes to research teams the uncontested king is definitely DeepSeek. They have no subscriptions. They don't even charge for the use of their app. The LLM itself is also crazy cheap and very good. Everybody is waiting for the next LLM drop from this lab, because you can bet it's gonna be big. Right now their latest model is the weakest on my list, but this will very likely change with their next highly anticipated release.
You can't talk about LLM research without mentioning Qwen, a research team from Alibaba. Qwen models act as a de facto reference point in LLM research. It's a shame the team didn't receive enough support within their company and their leader had to step down. Fingers crossed they can start up something even better independently. Alibaba does have a Coding Plan subscription, but it seems rather limited and priced at 50 USD/month, so in my eyes not worth considering seriously.
Xiaomi entered the race not too long ago with their impressive MiMo V2 Pro model. On benchmarks it lands very close to MiniMax M2.7, which is very good. Xiaomi also introduced a compelling offer of MiMo Token Plan subscriptions this month, clearly inspired by the Z.AI GLM Coding Plan offer from December. Even website colors are similar. The quota seems very generous judging by their claims. I'm sure MiMo models will only improve with time too, so this may turn out to be one of those rare hidden gems. It can go both ways. Limits on these plans can change any time too, so it's not a given that an interesting offer now will stay reliably good a few weeks or months later.
To wrap up this section, here's how a leaderboard of these LLMs would look according to benchmarks:
- GLM-5.1 (Z.AI)
- Qwen 3.6 Plus
- MiniMax M2.7
- MiMo V2 Pro (Xiaomi)
- Kimi K2.5 (Moonshot AI)
- DeepSeek V3.2
The American offer
OpenCode became an inference provider too with their pay-as-you-go service called OpenCode Zen. They have also experimented with a subscription model through limited-availability OpenCode Black. They've now introduced OpenCode Go, which is a cheaper 10 USD/month subscription that enables access to the open-weight models we've discussed above. There are usage limits in place, but the offer seems more than fair. If you were to pick only one cheap subscription out of all, this should probably be the one.
All the subscriptions discussed so far can be used directly in OpenCode. Major model labs from the land of the free do not give you such liberty. 0penAI GPT, Anthr0pic Claude, and G00gle Gem1n1 all offer similar types of subscriptions, but only 0penAI encourages people to use it however they want. With the other two it's all about working around bullshit restrictions to get any meaningful use of it. Reportedly for both GPT and Claude you may get up to 20x worth of tokens for the price.
I bought ChatGPT Plus for 20 USD/month only to get access to GPT-5.3-Codex on the release day and I've kept the subscription since. I use it only for accessing GPT-5.4 Fast in OpenCode and pretty much nothing else at all. Other perks and features of this subscription are completely useless. Still, I feel like I burn through a lot of tokens with GPT-5.4 and no other subscription could really satiate that. It has been the best model available for a while now - especially useful when fixing things or debugging. For quick implementation work or targeted changes I much prefer GLM models, but I still find GPT to complement it pretty well. If another model comes along I might drop this subscription. I don't see why anybody would pay that much for ChatGPT, but if you want to get some of those sweet subsidized GPT tokens in your tools, then it is a pretty good purchase. From the offers of the three major labs discussed here I'd pick this as number one. Also, Codex TUI is probably the second best tool after OpenCode.
Claude models are infamous for how expensive they are. Even with a ton of subsidized tokens, those are very expensive tokens. I don't think they really carry the value that Anthr0pic claims. That's why I'm skeptical about Claude subscription plans. They are overpriced on two levels. Sure, you get a little bit more Claude Opus use, but if you want the limit to last you for the whole programming session you have to use Claude Sonnet most of the time anyway - and that model is way worse than GLM. I admit Claude Opus can be fun sometimes, but in my opinion Claude models overall are simply overrated. Sure they follow instructions well, but the amount of instructions people feed them to get anything half-decent out is insane to me. I can see why people who over-invest in those complex instructions, MCP configurations, custom skills, and other bloat - why they would prefer a model like Claude. For me this is rarely a top choice when I pick an LLM for the task. Anthr0pic has a bad vibe. Also, Claude Code is probably the worst TUI program ever and Anthr0pic really wants you to use that and nothing else. If you like fighting your tools and have the money to spend this might be a good option for you.
There is a way to make Claude subscriptions work with OpenCode anyway. You need two things: 1. opencode-with-claude plugin for OpenCode and 2. meridian bridge that uses the blessed Claude Code SDK to serve models through a standard API. It's a little convoluted, but the end result is a very smooth experience. Definitely worth setting up to avoid dealing with the horrible Claude Code and Claude-specific configurations.
It's easy to forget that G00gle also has G00gle "AI" subscription plans with Gem1n1. They also offer Claude models through Antigravity IDE, the descendant of Windsurf and another VSCode fork. Apparently these subscriptions can be bundled with other enterprise purchases, so there's a chance your organization already has access to higher quotas with Antigravity. Unfortunately, it's the same story as with Gemini-CLI - their tools are not very good. You may plug this subscription into OpenCode through a custom plugin like opencode-antigravity-auth, but honestly it's not a good experience. This only makes sense if you want to milk some more free inference with the access you already have. If you really want to try out Gem1n1 and Claude along the GPT models I believe you'd be better off paying for a GitHub Copilot plan or Curs0r instead.
Curs0r exists too with their monthly subscriptions, but it doesn't exactly appeal to me. I already have my code editor. I don't use inline auto-complete either. The Cursor Agent is fine, but for the price it's not much. The good thing is you can access a variety of models through their service, including their exclusive Composer 2 based on Kimi K2.5.
Let it all burn
Mostly around the beginning of the year people were infatuated with the idea of agent orchestration. LLMs can write most of the code. Why can't they also work autonomously on the programming projects? Well, we tried it and failed. Notably Anthr0pic failed to build a C compiler and Curs0r failed to build a browser. They went all the way and repainted their experiments as a huge success afterwards. It's quite obviously the opposite, but this kind of trick "can have a strong influence on the weak-minded".
Two other independent attempts ended up as crypto-scams: Geoffrey Huntley's Ralph Loop, and Steve Yegge's Gas Town. Those people are truly the worst charlatans of the AI revolution. It was obvious from all their posts that made the rounds on Hacker News, with claims impossible to reproduce. Unfortunately, the Valley seems to love loud characters like that and somehow they get away with it in the end. At least now there's undeniable proof to show they were full of shit from the very beginning.
So how does one genuinely burn through all these tokens? Turns out good human guidance is crucial for all of this agentic stuff to work well. That's why good tools focus on expanding the human ability instead of replacing it. Two great recent examples: OpenClaw and T3 Code. An LLM-assisted loop of planning, implementing, and reviewing wastes inference like crazy, but yields good results. Preferably involve different LLMs in each step for a little variance. Using custom commands is a good way to make this process more streamlined.
Subscriptions summary
Here's a list of discussed subscriptions and how I would group them in terms of value at current prices.
Most subsidized:
- 0penAI ChatGPT - 20/100/200 USD
- Anthr0pic Claude - 20/100/200 USD
- MiMo Token Plan - 6/16/50/100 USD
Strong contenders:
- MiniMax Token Plan - 10/20/40/50/80/150 USD
- Z.AI GLM Coding Plan - 18/72/160 USD
- Kimi Code - 20/40/100/200 USD
Most flexible:
- OpenCode Go - 10 USD
- GitHub Copilot - 10/40 USD
- Curs0r - 20/60/200 USD
What do I use?
Personally I mostly make use of my Z.AI Coding Plan which is generous enough that I never really run out of inference for my favorite GLM models. I also keep a ChatGPT Plus subscription for certain things - GPT-5.4 is very capable and useful, but I'm riding the limit almost all the time. American models quickly lead you astray; in personal projects it's not as fun. If I were to buy another personal subscription, I would go with OpenCode Go. Every month I also squeeze out whatever little inference I can from the employer-provided Curs0r and GitHub Copilot subscriptions. Unlike personal subscriptions, these run out fast.
I burn through most of this inference using OpenCode (mainly in a TUI-first workflow, but sometimes using the WebUI as well), running multiple sessions in git worktrees managed with my tool bra, and monitoring each project and branch directory with the help of my madgit. I've also built out a dedicated web dashboard to monitor all my subscription quotas: llm-usage-monitor.
As you'd probably expect - I mostly rely on open-source software, including a lot of my own. Similarly, I prefer to rely on open-weight models, even if not self-hosted. It would be foolish not to bend this a little to make good use of the deals around, but still for me the core must stay open, and not just "open" in name only.