Meituan Trains the First Frontier-Scale LLM Entirely on Chinese Domestic Chips: LongCat-2.0

In a landmark achievement for China’s push toward AI self-reliance, Meituan has released LongCat-2.0, a 1.6-trillion-parameter Mixture-of-Experts (MoE) model trained from scratch on a massive cluster of over 50,000 domestic Chinese AI chips.

The announcement, made on June 30, 2026, positions LongCat-2.0 as a direct response to U.S. export controls on advanced semiconductors. While previous Chinese models (including DeepSeek’s V4 Pro) relied on domestic chips primarily for inference, LongCat-2.0 demonstrates that full frontier-scale pre-training is now possible entirely on home-grown silicon.
Massive Scale on Domestic Hardware

The training run consumed more than 35 trillion tokens, including hundreds of billions of tokens with approximately 1-million-token context lengths. This level of scale — previously achieved only on NVIDIA GPUs or Google TPUs — required extensive custom engineering in parallelism, fault tolerance, and numerical stability.
The team implemented 6D parallelism (tensor, context, expert, data, pipeline, and embedding parallelism) to efficiently distribute both the MoE layers and the novel embedding components across the cluster.
Innovative Architecture: MoE + Massive N-gram Embeddings + Custom Sparse Attention
LongCat-2.0 builds on Meituan’s earlier LongCat-Flash and LongCat-Flash-Lite models.

- 1.6 trillion total parameters with only ~48 billion active parameters per token thanks to aggressive MoE sparsity.
- Huge n-gram embeddings — a 135-billion-parameter module (under 10% of the total parameter budget) that expands the embedding space roughly 100× using 5-gram tokens. This approach delivers richer local context modeling and proved more parameter-efficient than simply scaling up MoE experts. In the smaller LongCat-Flash-Lite variant, n-gram embeddings consumed nearly half the parameters.
- LongCat Sparse Attention (LSA) — a heavily modified version of DeepSeek Sparse Attention (DSA). Key improvements include Streaming-aware Indexing, Cross-Layer Indexing, and Hierarchical Indexing, enabling efficient handling of ultra-long contexts while extending support to Multi-Token Prediction for speculative decoding.
These innovations allowed Meituan to push context lengths and training efficiency far beyond what standard dense or basic MoE architectures typically support on alternative hardware.
Real-World Testing as “Owl Alpha”

The model is optimized for agentic coding — multi-step software engineering, tool use, self-correction, and long-horizon reasoning. Early benchmarks show competitive results with leading closed models (e.g., SWE-bench Pro at 59.5, strong scores on GPQA Diamond and agentic suites).
On OpenRouter, pricing was set at $0.75 per million input tokens and $3 per million output tokens — relatively high given its intelligence level, though exact inference efficiency on domestic hardware remains to be fully benchmarked by the community.
Open-Source Release
Meituan has a strong track record of open-sourcing its models under permissive licenses (Apache 2.0 / MIT).

- Weights: Coming soon to Hugging Face → https://huggingface.co/meituan-longcat/LongCat-2.0
- Blog post: https://longcat.chat/blog/longcat-2.0/
- GitHub: Expected under the Meituan-LongCat organization
Also read:
- The Disgusting Six vs One Real Example
- Part 9: The Great Exit – How Insiders, VCs, Governments, and Even Bitcoin’s Biggest Corporate Holder Are Dumping Right Now
- The Real Cost of AI Inference: Subsidies, Chips, and Whether the "Golden Age" Will Last
- Stack Overflow Figured Out How Not to Die Because of AI. Will It Work?
Why This Matters
LongCat-2.0 is more than just another large model — it is proof-of-concept that China can now train frontier-scale LLMs without relying on restricted Western hardware. The successful execution of 6D parallelism, custom sparse attention, and massive n-gram embeddings on domestic ASICs shows that the software and systems engineering gap is rapidly closing.
As the weights become available, the global open-source community will be able to evaluate LongCat-2.0’s true capabilities, fine-tune it, and potentially deploy it on the same domestic hardware stack — further accelerating China’s independent AI ecosystem.
This release signals a new phase in the global AI race: one where domestic compute clusters in China are no longer just for inference, but are fully capable of training the next generation of trillion-parameter models.
Subscribe to our newsletter
Get the latest Web3, AI, and crypto news delivered straight to your inbox.