Meituan Trains the First Frontier-Scale LLM Entirely on Chinese Domestic Chips: LongCat-2.0

In a landmark achievement for China’s push toward AI self-reliance, Meituan has released LongCat-2.0, a 1.6-trillion-parameter Mixture-of-Experts (MoE) model trained from scratch on a massive cluster of over 50,000 domestic Chinese AI chips.

Meituan Trains the First Frontier-Scale LLM Entirely on Chinese Domestic Chips: LongCat-2.0 This marks the first time a model of this scale has completed full pre-training and inference end-to-end on non-NVIDIA, non-Google-TPU hardware.

The announcement, made on June 30, 2026, positions LongCat-2.0 as a direct response to U.S. export controls on advanced semiconductors. While previous Chinese models (including DeepSeek’s V4 Pro) relied on domestic chips primarily for inference, LongCat-2.0 demonstrates that full frontier-scale pre-training is now possible entirely on home-grown silicon.

Massive Scale on Domestic Hardware

Meituan Trains the First Frontier-Scale LLM Entirely on Chinese Domestic Chips: LongCat-2.0 Meituan trained LongCat-2.0 on over 50,000 unnamed Chinese AI ASICs arranged in superpods with high-bandwidth interconnects. The chips share architectural similarities with Huawei’s Ascend 910C series, though Meituan has not publicly named the exact vendor.

The training run consumed more than 35 trillion tokens, including hundreds of billions of tokens with approximately 1-million-token context lengths. This level of scale — previously achieved only on NVIDIA GPUs or Google TPUs — required extensive custom engineering in parallelism, fault tolerance, and numerical stability.

The team implemented 6D parallelism (tensor, context, expert, data, pipeline, and embedding parallelism) to efficiently distribute both the MoE layers and the novel embedding components across the cluster.

Innovative Architecture: MoE + Massive N-gram Embeddings + Custom Sparse Attention

LongCat-2.0 builds on Meituan’s earlier LongCat-Flash and LongCat-Flash-Lite models.

Its standout technical features include:

1.6 trillion total parameters with only ~48 billion active parameters per token thanks to aggressive MoE sparsity.
Huge n-gram embeddings — a 135-billion-parameter module (under 10% of the total parameter budget) that expands the embedding space roughly 100× using 5-gram tokens. This approach delivers richer local context modeling and proved more parameter-efficient than simply scaling up MoE experts. In the smaller LongCat-Flash-Lite variant, n-gram embeddings consumed nearly half the parameters.
LongCat Sparse Attention (LSA) — a heavily modified version of DeepSeek Sparse Attention (DSA). Key improvements include Streaming-aware Indexing, Cross-Layer Indexing, and Hierarchical Indexing, enabling efficient handling of ultra-long contexts while extending support to Multi-Token Prediction for speculative decoding.

These innovations allowed Meituan to push context lengths and training efficiency far beyond what standard dense or basic MoE architectures typically support on alternative hardware.

Real-World Testing as “Owl Alpha”

Meituan Trains the First Frontier-Scale LLM Entirely on Chinese Domestic Chips: LongCat-2.0 For the past two months, LongCat-2.0 operated anonymously on OpenRouter under the codename Owl Alpha. It performed strongly in developer leaderboards, particularly on coding and agentic tasks, before Meituan revealed its identity.

The model is optimized for agentic coding — multi-step software engineering, tool use, self-correction, and long-horizon reasoning. Early benchmarks show competitive results with leading closed models (e.g., SWE-bench Pro at 59.5, strong scores on GPQA Diamond and agentic suites).

On OpenRouter, pricing was set at $0.75 per million input tokens and $3 per million output tokens — relatively high given its intelligence level, though exact inference efficiency on domestic hardware remains to be fully benchmarked by the community.

Open-Source Release

Meituan has a strong track record of open-sourcing its models under permissive licenses (Apache 2.0 / MIT).

LongCat-2.0 will follow the same path:

Weights: Coming soon to Hugging Face → https://huggingface.co/meituan-longcat/LongCat-2.0
Blog post: https://longcat.chat/blog/longcat-2.0/
GitHub: Expected under the Meituan-LongCat organization

Why This Matters

LongCat-2.0 is more than just another large model — it is proof-of-concept that China can now train frontier-scale LLMs without relying on restricted Western hardware. The successful execution of 6D parallelism, custom sparse attention, and massive n-gram embeddings on domestic ASICs shows that the software and systems engineering gap is rapidly closing.

As the weights become available, the global open-source community will be able to evaluate LongCat-2.0’s true capabilities, fine-tune it, and potentially deploy it on the same domestic hardware stack — further accelerating China’s independent AI ecosystem.

This release signals a new phase in the global AI race: one where domestic compute clusters in China are no longer just for inference, but are fully capable of training the next generation of trillion-parameter models.