Feat: Zyphra/ZAYA1-8B by nreHieW · Pull Request #2529 · PrimeIntellect-ai/prime-rl

nreHieW · 2026-05-17T23:20:44Z

Note: This PR is a WIP and uses non-official versions of transformers and vllm as both dependencies have yet to officially support Zyphra/ZAYA1-8B. Changes to uv.lock and pyproject.toml are added to this PR for demonstration purposes only and will be removed when upstream officially supports the model. Code cleanup will be needed for this PR when model support is upstreamed in both.

[WIP until both VLLM and Transformers merge support for Zaya1-8B upstream]

Adds PrimeRL custom model support for Zaya1-8B.

Summary

Adds ZayaConfig and ZayaForCausalLM custom model implementation.
Adds vLLM weight postprocessing for Zaya’s original alternating-layer layout.
Adds ZayaMoE, router, and expert-parallel support.
Adds Zaya context-parallel support for CCA attention
Adds unit/parity tests for:
- HF vs PrimeRL forward/backward parity
- expert-parallel MoE parity
- context-parallel CCA attention parity

Sanity checks

Formating checks: uv run ruff format
Linting checks: uv run ruff check --fix

I ran RL sanity checks on Zaya with:

reverse-text
hendrycks-math

Reverse Text

Hendryks Math

Tests

uv run scripts/mini_moe.py --arch zaya --output-dir ./mini_zaya

uv run pytest tests/unit/train/models/test_zaya.py

Notes

When VLLM official ports over to the logic from the HF PR, much of the code can be simplified in this PR. Mainly because right now the custom conversion and vllm weight broadcast step is taking too long because conversion is expensive. That can be removed once VLLM and HF use the same implementation. For example, the changes in the following files can all be removed:
- src/prime_rl/trainer/models/zaya/vllm_postprocessing.py
- src/prime_rl/trainer/rl/broadcast/filesystem.py
- src/prime_rl/trainer/rl/broadcast/nccl.py
- Environmental reverts: uv.lock, pyproject.toml
This PR supports the 8B Zaya 1 model specifically. So there is no support right now for SWA
Changes made to tests/unit/train/models/test_qwen3_5_moe.py::test_qwen3_5_moe_cp_patching to ensure that it resets the flash attention method. Previously, it was the last model test to run so it did not affect anything. With Zaya, Zaya calls FlashAttention. Without this change, Zaya would be calling the patched FlashAttention (Resolved)
~~Changes made to src/prime_rl/inference/patches.py reflect latest changes with VLLM v0.21 which implements dual phase pausing, implementing it in this PR as well.~~ ([codex] chore: bump vllm to 0.21.0 #2519 Resolved)
tests/unit/train/models/test_zaya.py::test_zaya does full roundtrip forward and backward tests with the actual HF model. This is slow and since other models don't do such a roundtrip test, we can remove it as necessary
In the CCA module, we cannot just do all to all for value_delayed. In this config, value_delayed = num_key_value_heads / 2 = 1 so it cannot be sharded over cp_size

Keeps fork PR branch current with PrimeIntellect-ai/prime-rl main.

…feat/zaya

Signed-off-by: nrehiew <81154837+nreHieW@users.noreply.github.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

nreHieW and others added 25 commits May 14, 2026 15:07

initial zaya commit

efd0cc0

vllm broadcast

b9dfa0c

hf parity grouped gemm spoilt

f3139d9

working but cleaning required

25aab19

add context parallel

67c7ff1

correctness

5e5fe63

stash

d3eec20

correct HF

338236d

correctenss both residual_in_fp32

7890a68

comprehensive tests

1ae4501

match b315ae0

a54ccd4

tests utils

375748b

Merge upstream main into feat/zaya

bd506c4

working

2b9d0e5

merge

511673d

Revert verifiers submodule update

e24096c

.toml

1279ddd

vectorize packing cca

37a0970

format

88b6449

pin revisions for transformers

3896e58

Merge branch 'main' into feat/zaya

304e47d

Merge branch 'main' into feat/zaya

be2fdad

revert flash attn after Qwen3.5 tests

8945d4a

Merge upstream/main into feat/zaya

e627037

Keeps fork PR branch current with PrimeIntellect-ai/prime-rl main.

Merge branch 'feat/zaya' of https://github.com/nreHieW/prime-rl into …

0b96b29

…feat/zaya

nreHieW mentioned this pull request May 18, 2026

Feat: Add renderer for Zyphra/ZAYA1-8B PrimeIntellect-ai/renderers#51

Draft

nreHieW and others added 2 commits May 19, 2026 09:36

Merge branch 'main' into feat/zaya

5d6ad54

Signed-off-by: nrehiew <81154837+nreHieW@users.noreply.github.com>

chore(inference): align patches.py with upstream main

44b88a6

Co-authored-by: Cursor <cursoragent@cursor.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Zyphra/ZAYA1-8B#2529

Feat: Zyphra/ZAYA1-8B#2529
nreHieW wants to merge 27 commits into
PrimeIntellect-ai:mainfrom
nreHieW:feat/zaya

nreHieW commented May 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nreHieW commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Sanity checks

Reverse Text

Hendryks Math

Tests

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nreHieW commented May 17, 2026 •

edited

Loading