feat(kernels): gate MoE third-party substrates by xiaguan · Pull Request #474 · openinfer-project/openinfer

xiaguan · 2026-06-30T09:48:07Z

Summary

add DeepGEMM and FlashMLA as kernel submodules and introduce the shared moe feature for DeepEP/DeepGEMM/FlashMLA substrates
add the narrow glm52 kernel surface for DeepGEMM scale layout/grouped FP8 contracts and FlashMLA SM90 sparse decode wrappers
update kernel docs and keep non-GLM52/router/indexer/PP/TRTLLM surfaces out until the model crate has stable callers
align bench_serving with the current Qwen3 engine signature so the existing pre-commit clippy gate passes

Verification

cargo fmt
OPENINFER_CUDA_SM=90 cargo check -p openinfer-kernels --no-default-features
OPENINFER_CUDA_SM=90 OPENINFER_NCCL_ROOT=/data/code/workspace-rustllm/ep-moe-demo/.venv/lib/python3.12/site-packages/nvidia/nccl cargo check -p openinfer-kernels --no-default-features --features moe
OPENINFER_CUDA_SM=90 OPENINFER_NCCL_ROOT=/data/code/workspace-rustllm/ep-moe-demo/.venv/lib/python3.12/site-packages/nvidia/nccl cargo check -p openinfer-kernels --no-default-features --features glm52
OPENINFER_CUDA_SM=90 OPENINFER_NCCL_ROOT=/data/code/workspace-rustllm/ep-moe-demo/.venv/lib/python3.12/site-packages/nvidia/nccl cargo check -p openinfer-kernels --no-default-features --features kimi-k2
OPENINFER_CUDA_SM=90 cargo check --release -p openinfer-server --bin bench_serving
pre-commit hooks via OPENINFER_CUDA_SM=90 OPENINFER_NCCL_ROOT=/data/code/workspace-rustllm/ep-moe-demo/.venv/lib/python3.12/site-packages/nvidia/nccl git commit ...

Notes

GLM5.2 FlashMLA sparse decode is SM90-only, fixes V32 topk=2048, and intentionally does not expose dynamic topk_length.
The grouped DeepGEMM compute entry is fail-closed with CUDA_ERROR_NOT_SUPPORTED until a real DeepGEMM runner is wired.
Upstream FlashMLA CHECK_CUDA / FLASH_ASSERT paths can still terminate the process on internal CUDA/launch failures; this PR only converts C++ assertion exceptions at the FFI boundary into CUresult.
No performance win is claimed; this is feature-gated substrate/API bring-up, so no A/B benchmark is attached.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7ee9da3705

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-30T09:55:06Z

+# Shared MoE/MLA third-party substrate: DeepEP, DeepGEMM, and FlashMLA.
+moe = []
+glm52 = ["moe"]
+kimi-k2 = ["moe"]


Avoid making Kimi require unused GLM submodules

When building with --features kimi-k2, this feature edge now enables moe, and build.rs::require_moe_submodules asserts that both DeepGEMM and FlashMLA are checked out before any Kimi CUDA is compiled. The Kimi/DeepEP paths do not include those repositories, so existing Kimi-only checkouts that initialized only DeepEP will fail even though the model does not use the GLM5.2 wrappers; please split the DeepEP substrate from the GLM5.2 DeepGEMM/FlashMLA checks or require those only under glm52.

Useful? React with 👍 / 👎.

feat(kernels): gate moe third-party substrates

7ee9da3

xiaguan merged commit 1c71fee into main Jun 30, 2026
1 check passed

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(kernels): gate MoE third-party substrates#474

feat(kernels): gate MoE third-party substrates#474
xiaguan merged 1 commit into
mainfrom
feat/moe-third-party-gate

xiaguan commented Jun 30, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

xiaguan commented Jun 30, 2026

Summary

Verification

Notes

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant