Zyphra Releases ZAYA1-8B: A Sub-1B Active Parameter MoE Model That Outperforms Much Larger Rivals

Zyphra, the San Francisco-based AI startup, has just dropped one of the most impressive small-model releases of 2026. On May 6, the company unveiled ZAYA1-8B — a Mixture-of-Experts (MoE) language model with under 1 billion active parameters (precisely ~760M active out of ~8B total) that delivers frontier-level performance on mathematics, coding, and complex reasoning tasks.

Despite its tiny active-parameter footprint, ZAYA1-8B doesn’t just compete with larger open-weight models — it beats many of them and even edges out proprietary frontier systems on select benchmarks when leveraging its novel test-time compute technique.

A Full-Stack Bet on Intelligence Density

ZAYA1-8B isn’t just another distilled or quantized model.

It’s the result of an ambitious end-to-end stack built from the ground up:

MoE++ Architecture featuring Compressed Convolutional Attention (CCA) — a far more efficient attention mechanism than standard transformers.
A brand-new MLP-based router that provides significantly more stable expert selection than traditional linear routers.
Learned residual scaling — a lightweight trick that controls residual-norm growth across depth with almost zero extra cost.

The entire model was pretrained, midtrained, and post-trained exclusively on AMD Instinct MI300X hardware — a cluster of 1,024 MI300X nodes connected via AMD Pensando Pollara networking on IBM Cloud infrastructure. No NVIDIA GPUs were used at any stage. In an industry still heavily dominated by “NVIDIA or nothing” thinking, this is a quietly political statement: serious frontier training is now possible on AMD silicon.

Post-Training Pipeline That Packs a Punch

Zyphra Releases ZAYA1-8B: A Sub-1B Active Parameter MoE Model That Outperforms Much Larger Rivals The real magic happened during post-training.

Zyphra ran a five-stage pipeline:

Supervised Fine-Tuning (SFT) focused on chat, instruction following, code, math, and test-time compute skills.
Reasoning Warmup — blending math, logic, and puzzles with early test-time compute prompts.
Large-scale RLVE-Gym with dynamically scaled puzzle difficulty.
Dedicated Math & Code Reinforcement Learning.
Lightweight RLHF/RLAIF for polish and behavior.

This heavy emphasis on reasoning circuits paid off: the model shows massive gains in verifiable domains (math and code) while still delivering strong instruction-following and general capabilities.

The Star Innovation: Markovian RSA Test-Time Compute

Zyphra Releases ZAYA1-8B: A Sub-1B Active Parameter MoE Model That Outperforms Much Larger Rivals The most intriguing part of ZAYA1-8B is its Markovian RSA method — a new test-time compute (TTC) technique that combines two powerful ideas:

Parallel generation of multiple reasoning traces (inspired by RSA).
Markovian chunking — the model recursively aggregates the best parts of previous traces in fixed-length “chunks,” keeping only the tail of the previous reasoning step. This allows essentially unlimited reasoning depth without exploding the context window.

Trained into the model from SFT onward and reinforced during RL, Markovian RSA turns the small model into a reasoning powerhouse. With a modest 40k-token budget it already approaches much larger models; at “extra-high” compute (5.5M tokens per problem) it surpasses DeepSeek-V3.2 and GPT-OSS-120B High on the challenging APEX-shortlist mathematics benchmark.

Benchmark Highlights

ZAYA1-8B punches dramatically above its weight class:

HMMT'25 (with Markovian RSA): 89.6% — beats Claude 4.5 Sonnet (88.3%) and GPT-5-High.
AIME'26: 89.1%
LiveCodeBench-v6: 65.8%
GPQA-Diamond: 71.0%

It consistently outperforms open-weight models many times its size, including Mistral-Small-4-119B (6B active / 119B total) on math and coding benchmarks, while staying competitive with first-generation frontier reasoning models such as DeepSeek-R1-0528, Gemini-2.5-Pro, and Claude 4.5 Sonnet.

Availability and Openness

True to the spirit of open AI, ZAYA1-8B is fully open:

Model weights on Hugging Face (Zyphra/ZAYA1-8B);
Serverless endpoint on Zyphra Cloud;
Released under the permissive Apache-2.0 license.

A detailed technical report is also available on arXiv.

Why This Matters

In an era of ever-larger models chasing marginal gains, ZAYA1-8B proves that intelligence density — raw performance per active parameter — still has huge room for improvement. By combining architectural innovation, a sophisticated post-training stack, and clever test-time scaling, Zyphra has built a model that delivers frontier math and reasoning capabilities in a package small enough to run efficiently even on modest hardware.

For developers, researchers, and enterprises looking for high-performance reasoning without massive inference costs, ZAYA1-8B is an instant must-try. And for the broader industry, it’s yet another sign that the NVIDIA monopoly on serious AI training is cracking.

The era of tiny-but-mighty models is here — and Zyphra just fired the starting gun.