Skip to content

Add GLM5.2 DP1/EP8 load-weight slice#476

Merged
xiaguan merged 1 commit into
mainfrom
feat/glm52-load-weights-dp1-ep8
Jun 30, 2026
Merged

Add GLM5.2 DP1/EP8 load-weight slice#476
xiaguan merged 1 commit into
mainfrom
feat/glm52-load-weights-dp1-ep8

Conversation

@xiaguan

@xiaguan xiaguan commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Add a GLM5.2 load-weight-only crate and binary for the DP1/EP8 shape.
  • Load rank0 non-expert tensors plus experts 0..31, and ranks1..7 32 routed experts each.
  • Keep generation fail-closed until forward/decode lands.
  • Wire the optional glm52 server feature and model detection without pulling in PP/decode/MTP runtime or new kernels.
  • Optimize load time with rank-local GPU slabs, coalesced H2D copies, and CUDA-event mmap lifetime guards including error-path cleanup.

Performance

Measured on jz-38, 8x H200, real /data/models/GLM-5.2-FP8 checkpoint:

  • Baseline per-tensor CudaSlice load: 81622ms
  • Current implementation first run: 63420ms
  • Current implementation immediate repeat: 50803ms

Validation

  • cargo fmt --all --check
  • git diff --check
  • cargo check -p openinfer-glm52
  • cargo check -p openinfer-server --features glm52
  • cargo check -p openinfer-server --no-default-features --features glm52
  • cargo test -p openinfer-glm52 --lib
  • cargo test -p openinfer-server --no-default-features --features glm52 glm52_
  • cargo build --release -p openinfer-glm52 --bin glm52_load_weights on jz-38
  • ./target/release/glm52_load_weights --model-path /data/models/GLM-5.2-FP8 --tp-size 1 --dp-size 1 on jz-38, twice

Notes

  • This PR intentionally excludes forward/decode/PP/MTP runtime and kernel additions.
  • The load path stores raw resident weight bytes plus tensor offsets; runtime layout packing belongs to a later forward PR.
  • Error-path mmap lifetime is guarded by RAII plus stream sync; happy path uses per-shard CUDA events and a bounded live mapping ring.

@xiaguan xiaguan merged commit 70c55a5 into main Jun 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant