Skip to content

Add indexCache training support#2541

Open
faresobeid wants to merge 4 commits into
mainfrom
indexcache-train
Open

Add indexCache training support#2541
faresobeid wants to merge 4 commits into
mainfrom
indexcache-train

Conversation

@faresobeid
Copy link
Copy Markdown
Contributor

@faresobeid faresobeid commented May 18, 2026

Can use with

[trainer.model]
impl = "custom"
use_index_cache = true
index_topk_freq = 4

[inference.vllm_extra.hf_overrides]
use_index_cache = true
index_topk_freq = 4

Note

Medium Risk
Touches core attention forward paths and introduces cross-layer state (cached indices), which could affect correctness/performance if misconfigured or if assumptions about layer scheduling break.

Overview
Enables DSA IndexCache in training by adding use_index_cache, index_topk_freq, and optional index_topk_pattern to the trainer ModelConfig and propagating them into the loaded HF model_config.

Updates the custom glm_moe_dsa implementation so decoder layers can reuse sparse attention top-k indices across layers: attention/decoder forwards now thread a cached_indices tensor through the stack, and a new per-layer skip policy (_index_cache_skip_topk) controls when indices are recomputed vs reused.

Reviewed by Cursor Bugbot for commit 563b77f. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread src/prime_rl/trainer/models/glm_moe_dsa/sparse_mla_attention.py Outdated
cursoragent and others added 2 commits May 18, 2026 14:31
Co-authored-by: faresobeid <faresobeid@users.noreply.github.com>
Signed-off-by: faresobeid <111092724+faresobeid@users.noreply.github.com>
Comment thread src/prime_rl/trainer/models/glm_moe_dsa/modeling_glm_moe_dsa.py Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 563b77f. Configure here.

Comment thread src/prime_rl/trainer/models/glm_moe_dsa/sparse_mla_attention.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants