09.12.2025 06:33 ● Author: Viacheslav Vasipenok

Beijing Just Drew the Line: No More American Inference Chips Allowed

China’s regulators have quietly delivered the most decisive blow yet in the AI chip war: new data-center procurement rules effectively ban U.S.-made GPUs for inference workloads across government agencies, state-owned enterprises, and any company receiving state cloud contracts.

Starting immediately, new clusters must run on domestic silicon, primarily Huawei’s Ascend 910B/910C series and Cambricon’s MLU series. The directive is not public, but the effect is absolute: any bid containing Nvidia, AMD, or Intel inference hardware is now automatically disqualified.

The pain is real and immediate. ByteDance, which had stockpiled tens of thousands of Nvidia H20 cards (the only inference-legal chip still allowed under current U.S. export controls) in anticipation of a total Trump-era embargo, is now sitting on inventory it cannot legally deploy in new builds.

Alibaba and Tencent have been forced to rip and replace entire inference layers that were already designed around Nvidia’s TensorRT-LLM stack. Engineers privately describe Huawei’s software ecosystem as “two to three years behind” and Cambricon’s drivers as “barely production-ready,” yet public earnings calls are filled with glowing praise for “sovereign computing achievements.”

The numbers tell the story. Nvidia’s China revenue (once 20–25 % of its total) collapsed 63 % year-over-year in its most recent quarter. Sales of the H20, the deliberately crippled “China-special” card that launched at $12,000–$15,000 per unit, have plummeted to near-zero as customers cancel orders rather than build on hardware that will soon be illegal to use. AMD and Intel’s inference-focused Instinct and Gaudi lines have fared even worse.

Beijing’s calculus is brutally pragmatic. Training the absolute frontier model is a prestige race China is unlikely to win in the next 2–3 years; the raw H100/H200/Blackwell density gap is simply too large. But inference is different. Inference is 70–80 % of real-world AI compute spend, runs on far less cutting-edge silicon, and is where domestic vendors are already within striking distance.

By ring-fencing the world’s largest inference market (China consumes roughly 40 % of global AI inference cycles), regulators guarantee hundreds of billions in guaranteed demand for Huawei and Cambricon, enough to fund multiple generations of catch-up.

The results are already visible:

Huawei’s Ascend division reportedly shipped over 1.2 million 910B-equivalent chips in the first three quarters of this year alone.
Cambricon’s MLU370 and upcoming MLU390 series have secured “preferred vendor” status in every major Chinese cloud provider.
Domestic cloud prices for inference have paradoxically dropped 15–25 % despite lower raw performance, because local vendors are subsidized and desperate to gain share.

For American policymakers, the irony is bitter. The original export controls were designed to starve China of training compute; instead, they have handed Beijing a protected inference monopoly worth far more in economic terms. There is now open discussion in Washington about loosening restrictions on older-generation inference chips (H200, MI300X) to keep at least some market access, an outcome that would have been unthinkable twelve months ago.

Beijing Just Drew the Line: No More American Inference Chips Allowed

Popular

The Anatomy of an Entrepreneur

What is a Startup?

Advertising on QUASA

Top 5 Tips to Make More Money as a Content Creator

8 Logo Design Tips for Small Businesses

Latest news