120 trillion Tokens, Chinese AI, is "rolling" the United States to death

On April 2, 2026, Volcano Engine threw a "nuclear bomb" at a tour exhibition:

The average daily Token usage of the Doubao model has exceeded 120 trillion!

It doubled within 3 months and increased 1,000 times compared to the time of launch.

At the same press conference, Volcano Engine President Tan Dai also revealed an important detail: the number of corporate customers with cumulative token usage exceeding one trillion has increased from about 100 at the end of 2025 to 140, with 40 new customers added in just 3 months. This marks that AI has officially entered the "enterprise large-scale payment" stage from the "free trial" stage.

Almost on the same day, Zhipu AI released its first financial report after listing, giving a very impactful figure: the full-year revenue in 2025 will exceed 724 million yuan, a year-on-year increase of 132%; the annual recurring revenue (ARR) of its MaaSAPI platform has exceeded 1.7 billion yuan (approximately US0 million), achieving a 60-fold year-on-year increase. More importantly, the number of registered users on Zhipu's platform has exceeded 4 million, covering more than 218 countries and regions around the world.

Nvidia has not been idle either. It spent US billion in three months, investing in Marvell, Lumentum, and Coherent at the same time, all in on silicon photonics and AI optical interconnection.

These three things put together once again release an extremely strong signal: China's AI super dividend period has fully arrived.

120 trillion Tokens sounds abstract. Let’s do the math:

Assuming that an average API call consumes 2,000 Tokens, 120 trillion ÷ 2,000 = 600 billion calls per day.

From personal assistants to enterprise services, from writing copywriting to generating videos, AI has transformed from an “early adopter tool” into infrastructure like water, electricity and coal.

What's more important is that this is not a free carnival, but a payment of real money.

Data shows that the number of corporate customers with a cumulative Token usage exceeding one trillion has increased from 100 to 140 in three months; the Token usage of individual users has increased by 16 times in the past month.

After Zhipu bucked the trend and raised the price of the core model API by 83% in the first quarter of 2026, the number of platform calls increased by about 400%, showing a typical "volume and price increase" trend.

This means that customers no longer pay for low prices, but for “getting the job done.”

Free trial → high-frequency payment → increase in price without reducing volume, this business closed loop has completely run through.

Why is Token rising so crazily?

Because the way AI is used has completely changed.

First, video generation, eating a million Tokens in one bite.

Byte's self-developed Seedance 2.0 is called "China's first undisputed global SOTA video model" by Tan Dai. The visual special effects of the 2026 Spring Festival Gala "Congratulations to the Goddess of Flowers" and "Song of Harnessing the Wind" were generated using it.

But the explosion was accompanied by queues that lasted seven hours. The number of queues during peak hours has remained around 90,000 for a long time - demand far exceeds supply.

Why is it so popular? Because the token consumption of video generation far exceeds that of text: generating a 1-minute 720p video requires more than 1 million tokens, which is more than 100 times that of an ordinary text conversation.

However, compared with traditional film and television production, the cost-effectiveness of Seedance 2.0 is simply a blow to dimensionality reduction - the production efficiency is more than doubled compared to the traditional model, and the production cost is reduced by 70%.

According to industry estimates, the overall production cost is expected to drop from the past 10,000 yuan to several thousand yuan.

This is the real driving force behind Doubao’s token doubling in three months.

Second, the intelligent agent allows AI to change from "questioning and answering" to "doing work".

The upgraded ArkClaw (lobster intelligent agent) is no longer as simple as chatting. It can connect to Feishu, WeChat, DingTalk, and Weibo, and link with network disks to independently complete information retrieval, data processing, and cross-tool collaboration.

An enterprise-level task can easily cost hundreds of thousands or even millions of Tokens.

CITIC Securities estimates that when an agent performs a task, the overall Token consumption may increase by more than ten times, and the corresponding computing power requirements will increase by more than a hundred times. This increase in "reasoning density" increases the token consumption of a single task exponentially.

On the OpenRouter platform, more than 70% of Token consumption comes from the production environments of major Internet manufacturers, medium and large enterprises, and professional programmers. In the range of 100K to 1MToken (the most typical consumption range of agent workflow), the number of calls to the Chinese model is far ahead.

Application explosion → Token growth → Model optimization → more applications. This positive cycle of business implementation has really started to turn around.

Many people ask: Why can China's large model commercialization be so rapid?

The answer is simple: cost, outrageously low cost!

In large model operating costs, 60%-80% is electricity. The price of green electricity in western China (Gansu, Inner Mongolia, and Guizhou) is only 0.13-0.3 yuan/kWh. What about in the United States? 0.8-1.2 yuan/degree.

In terms of photovoltaic power, China is 4-5 times cheaper than the United States.

If we assume that a large-scale inference cluster has an annual power consumption of 100GWh, the annual cost of electricity alone is about US.5 million in China and about US.4 million in the United States - a difference of nearly US million.

Coupled with MoE architecture, extreme quantization, KV cache...the inference efficiency is 3-10 times higher than that of the United States. Under the double dimensionality reduction attack, the inference cost of the Chinese model is reduced to 1/6 to 1/10 of the American model.

Ultimately reflected in API pricing (USD/million Tokens):

-MiniMaxM2.5: input 0.3, output 1.1;

- GLM-5: input 0.3, output 2.55;

-Tongyi Qwen3.5: input 0.11, output 0.44;

-Compare ClaudeOpus4.6: input 5, output 25;

This means that the cost of Chinese models is only 1/10 to 1/20 of that of American giants, or even lower.

The gap is like a chasm in the sky.

And what about performance? In 90% of high-frequency scenarios such as text generation, coding, translation, and daily reasoning, the Chinese head model has reached more than 95% of GPT-5 and Gemini3.

For the vast majority of users, “enough, easy to use, and cheap” is everything. The Chinese model hit this point perfectly.

As a result, global developers voted with their feet: OpenRouter platform data shows that 47.17% of users are local developers in the United States, and Chinese developers only account for 6.01%. However, Chinese large models have accounted for 61% of the weekly token calls on the platform, surpassing the United States for three consecutive weeks.

What’s even more shocking is the ranking of calls: China occupies five of the top nine models in the world, with Xiaomi MiMo-V2-Pro, Step 3.5 Flash, MiniMax M2.5, and DeepSeek V3.2 taking the top four, and GLM-5 Turbo ranking sixth.

Such a large model is super cheap and capable enough to fight. Do you want to "decouple" the overseas market? Doesn't exist. "Cost-driven demand" has crushed all barriers.

However, on the other hand, the super dividends of China Token are pressure challenges from OpenAI, Google, and xAI.

Let’s take a look at some hard-core data:

OpenAI: ChatGPT’s share of generative AI web traffic plummeted from 86.7% in January 2025 to 64.5% in January 2026, a drop of 22.2 percentage points in one year. In February, there were approximately 535 million monthly active users worldwide, a decrease of 6.5% from the previous month. Operating losses are expected to reach billion, nearly triple the previous year. What’s even more ironic is that 80% of users interact less than 1,000 times throughout the year—the open rate is not as good as that of takeout software.

Google: Although Gemini’s share increased from 5.7% to 21.5%, AIStudio developers lost 25% and were forced to launch a low-priced version of GeminiFlash (/million Tokens), but the cost is still three times that of the Chinese model.

xAI: Grok’s share has increased from less than 1% to 3.4%, but the usage rate of the

In the next few years, we may see that the global AI market is forming a clear stratified pattern:

On one side, the American model holds 20% of the world’s high-end users, focusing on professional reasoning, brand premium, and corporate security in the international market, and then takes away 80% of the income;

On the other side, the Chinese model has won 80% of the world's mass users, focusing on inclusiveness, cost-effectiveness, and large-scale implementation based on small profits but quick turnover, and has captured 20% of the revenue.

Even though this 20% revenue ratio is small, when placed in the global market and concentrated on a few large domestic giant model companies, it can still make them happy.

Behind the 120 trillion daily average Token usage of the Doubao model is the best verification signal of the exponential surge in computing power demand.

With traditional copper cable interconnection, 70% of the computing power in the Wanka cluster is wasted on data transmission. It simply cannot handle the high bandwidth demands of video and smart objects.

So silicon photonics technology has become the only solution at present. Replacing electrical signals with optical signals can increase bandwidth by 10-100 times, reduce power consumption by 70%, and reduce latency by half.

Recently, Nvidia’s layout of silicon photonics technology is textbook level:

Early March: Invested US billion each in Lumentum and Coherent to lock in core raw materials in the 1.6T/CPO era such as CW lasers and EML chips;

End of March: Invest another US billion in Marvell to open the interconnect protocol to third-party customized chips for the first time through NVLinkFusion.

A massive investment of US billion in 30 days completed the systematic encirclement of the AI computing power cluster optical interconnection landscape.

Huang Renxun has made it clear and high-profile: "The turning point of reasoning has arrived, the demand for token generation has surged, and the world is racing to build artificial intelligence factories."

what does that mean?

Optical communications (optical chips, optical modules, optical fiber cables) have become the most certain and flexible incremental track for the AI sector.

It is reported that the global demand for EML optical chips in 2026 is about 350 million units, and the production capacity is only about 200 million units. The gap is as high as 150 million units, and the production capacity has been scheduled to 2028.

So we have seen that from 2025 to the present, in US stocks, A-shares and even Hong Kong stocks, a large number of super bull stocks have been born in these tracks that have doubled or even exceeded ten times a year. Even so, the crazy influx of funds continues to push up valuations.

Because they are convinced that the incremental cake here is still big enough to support a larger valuation narrative in the future.

Some people are worried: How long can this wave of dividends last?

I think it can last at least 3-5 years. Because there are three walls, overseas large models cannot be turned over:

First, energy barriers. "Eastern Digital and Western Computing + Western Green Power" is a unique global computing and power collaboration system. Due to geographical and energy structure restrictions, the United States will never be able to replicate China’s green power cost of 0.13-0.3 yuan/kWh.

Second, scale barriers. The greater the amount of token calls, the lower the unit cost. Once the positive cycle of scale → cost → more scale is formed, the American model will be completely squeezed out of the cost-effectiveness market.

Third, ecological barriers. The number of global downloads of China's open source large models has ranked first in the world. Alibaba’s Tongyi Qwen Qwen has been downloaded more than 700 million times, ranking first in the world; there are more than 180,000 derivative models based on Chinese models, far more than Google and Meta combined. Developers around the world have developed path dependencies, and migration costs are extremely high.

This means that at least until 2028, China’s large models will be able to enjoy this huge structural dividend.

04 Conclusion

The average daily Token calls in China have increased from less than 100 billion to 180 trillion, and the number of calls has increased by more than a thousand times in two years. Such data is enough to illustrate a trend - "a new set of business logic based on Token billing is accelerating the evolution around the world."

Token is the “digital oil” in the AI era. And China’s big model is becoming the cheapest and most efficient “digital oil producer” in the world.

1/20 the cost, 95% performance, and average daily consumption of 120 trillion.

The stacking of three advantages makes China’s AI unstoppable in the global market.

The next 2-3 years will still be the golden window period for Chinese Tokens to go overseas.

AI applications continue to explode, computing power and optical communications simultaneously deliver results, American giants continue to retreat to the high end, and Chinese large models fully dominate the global inclusive AI market.

When cheap and efficient Chinese Token covers every corner of the world, the inclusive era of AI will truly arrive.

China will be the leader and the biggest beneficiary of this revolution.

This siteOriginal articleAll follow "Attribution-NonCommercial-ShareAlike 4.0 License (CC BY-NC-SA 4.0)". Please keep the following tags for sharing and interpretation:

Original author:JFC,source:"120 Trillion Tokens, Chinese AI, is "rolling" the United States to death"