docs: update DeepSeek README status section#348
Open
mvanhorn wants to merge 1 commit into
Open
Conversation
CAICAIIs
requested changes
Jun 11, 2026
| DeepSeek V4 support is intentionally narrower than the Qwen paths in the initial PR: it requires `--features deepseek-v4`, uses CUDA devices `0..7`, serves greedy requests only, terminates unsupported logprobs and non-greedy sampling requests with an explicit `stop_reason`, and does not use CUDA Graph yet. | ||
| DeepSeek support is intentionally narrower than the Qwen paths: | ||
|
|
||
| - **DeepSeek-V4-Flash** requires `--features deepseek-v4`, the 8-GPU MP8 checkpoint, and TileLang at build time. The current OpenAI-compatible path is a single-request greedy smoke/direct regression path: unsupported logprobs and non-greedy sampling requests terminate with an explicit `stop_reason`; bs>1 serving, continuous batching, service-level KV management, and CUDA Graph remain follow-up. Evidence: [`support.md`](docs/models/deepseek-v4/support.md), [`serving-baseline.md`](docs/models/deepseek-v4/serving-baseline.md), and [`decode-performance.md`](docs/models/deepseek-v4/decode-performance.md). |
Collaborator
There was a problem hiding this comment.
README.md:161 still describes DeepSeek-V4 as a single-request path and says bs>1 serving remains follow-up. That is stale for the current tree: the DSV4 scheduler now has the HTTP active-set decode path wired, and docs/models/deepseek-v4/online-throughput.md records active set 2 / decode batch 2 evidence with caveats. Please update this bullet to describe the limited active-set decode state and link online-throughput.md or http-serving-benchmark.md, while keeping continuous batching, multi-request prefill, service-level KV, and CUDA Graph as follow-up work.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Updates the README's DeepSeek coverage so both supported lines are accurately described: the Supported Models table rows are refreshed, and the status paragraph now covers DeepSeek-V4-Flash (
--features deepseek-v4, 8-GPU MP8 checkpoint, TileLang build requirement, greedy-only serving with explicitstop_reasonfor unsupported parameters, no CUDA Graph yet) and DeepSeek-V2-Lite (--features deepseek-v2-lite, 2-GPU EP path, current correctness-gate status). Each claim links to the measured-evidence docs underdocs/models/deepseek-v4/anddocs/models/deepseek-v2-lite/.Why this matters
#329 (split from the README rework tracker #122) asks that the DeepSeek section let users understand the supported model lines, feature flags, hardware expectations, serving status, and where the performance evidence lives. The current README covers only the V4 line in its status paragraph, has no V2-Lite status text, and links none of the evidence docs. Every claim in this update is sourced from the in-repo status/gate docs, and the V2-Lite text deliberately stays within the status ledger's claim boundaries (host-staged vs NCCL) rather than overclaiming production continuous batching.
Testing
Docs-only change. All referenced doc paths (
docs/models/deepseek-v4/support.md,serving-baseline.md,decode-performance.md,docs/models/deepseek-v2-lite/status.md,hf-accuracy-gate.md) exist in the tree; feature-flag names match the per-crateCargo.tomldefinitions.Fixes #329