Feat: Zyphra/ZAYA1-8B#2529
Draft
nreHieW wants to merge 27 commits into
Draft
Conversation
Keeps fork PR branch current with PrimeIntellect-ai/prime-rl main.
Signed-off-by: nrehiew <81154837+nreHieW@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[WIP until both VLLM and Transformers merge support for Zaya1-8B upstream]
Adds PrimeRL custom model support for Zaya1-8B.
Summary
Sanity checks
uv run ruff formatuv run ruff check --fixI ran RL sanity checks on Zaya with:
Reverse Text
Hendryks Math
Tests
uv run scripts/mini_moe.py --arch zaya --output-dir ./mini_zayauv run pytest tests/unit/train/models/test_zaya.pyNotes
src/prime_rl/trainer/models/zaya/vllm_postprocessing.pysrc/prime_rl/trainer/rl/broadcast/filesystem.pysrc/prime_rl/trainer/rl/broadcast/nccl.pyuv.lock,pyproject.tomlChanges made to(Resolved)tests/unit/train/models/test_qwen3_5_moe.py::test_qwen3_5_moe_cp_patchingto ensure that it resets the flash attention method. Previously, it was the last model test to run so it did not affect anything. With Zaya, Zaya calls FlashAttention. Without this change, Zaya would be calling the patched FlashAttentionChanges made to([codex] chore: bump vllm to 0.21.0 #2519 Resolved)src/prime_rl/inference/patches.pyreflect latest changes with VLLM v0.21 which implements dual phase pausing, implementing it in this PR as well.tests/unit/train/models/test_zaya.py::test_zayadoes full roundtrip forward and backward tests with the actual HF model. This is slow and since other models don't do such a roundtrip test, we can remove it as necessaryCCAmodule, we cannot just do all to all forvalue_delayed. In this config,value_delayed = num_key_value_heads / 2 = 1so it cannot be sharded overcp_size