Skip to content

docs: update DeepSeek README status section#348

Open
mvanhorn wants to merge 1 commit into
openinfer-project:mainfrom
mvanhorn:fix/329-readme-deepseek-status
Open

docs: update DeepSeek README status section#348
mvanhorn wants to merge 1 commit into
openinfer-project:mainfrom
mvanhorn:fix/329-readme-deepseek-status

Conversation

@mvanhorn

Copy link
Copy Markdown
Contributor

Summary

Updates the README's DeepSeek coverage so both supported lines are accurately described: the Supported Models table rows are refreshed, and the status paragraph now covers DeepSeek-V4-Flash (--features deepseek-v4, 8-GPU MP8 checkpoint, TileLang build requirement, greedy-only serving with explicit stop_reason for unsupported parameters, no CUDA Graph yet) and DeepSeek-V2-Lite (--features deepseek-v2-lite, 2-GPU EP path, current correctness-gate status). Each claim links to the measured-evidence docs under docs/models/deepseek-v4/ and docs/models/deepseek-v2-lite/.

Why this matters

#329 (split from the README rework tracker #122) asks that the DeepSeek section let users understand the supported model lines, feature flags, hardware expectations, serving status, and where the performance evidence lives. The current README covers only the V4 line in its status paragraph, has no V2-Lite status text, and links none of the evidence docs. Every claim in this update is sourced from the in-repo status/gate docs, and the V2-Lite text deliberately stays within the status ledger's claim boundaries (host-staged vs NCCL) rather than overclaiming production continuous batching.

Testing

Docs-only change. All referenced doc paths (docs/models/deepseek-v4/support.md, serving-baseline.md, decode-performance.md, docs/models/deepseek-v2-lite/status.md, hf-accuracy-gate.md) exist in the tree; feature-flag names match the per-crate Cargo.toml definitions.

Fixes #329

Comment thread README.md
DeepSeek V4 support is intentionally narrower than the Qwen paths in the initial PR: it requires `--features deepseek-v4`, uses CUDA devices `0..7`, serves greedy requests only, terminates unsupported logprobs and non-greedy sampling requests with an explicit `stop_reason`, and does not use CUDA Graph yet.
DeepSeek support is intentionally narrower than the Qwen paths:

- **DeepSeek-V4-Flash** requires `--features deepseek-v4`, the 8-GPU MP8 checkpoint, and TileLang at build time. The current OpenAI-compatible path is a single-request greedy smoke/direct regression path: unsupported logprobs and non-greedy sampling requests terminate with an explicit `stop_reason`; bs>1 serving, continuous batching, service-level KV management, and CUDA Graph remain follow-up. Evidence: [`support.md`](docs/models/deepseek-v4/support.md), [`serving-baseline.md`](docs/models/deepseek-v4/serving-baseline.md), and [`decode-performance.md`](docs/models/deepseek-v4/decode-performance.md).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README.md:161 still describes DeepSeek-V4 as a single-request path and says bs>1 serving remains follow-up. That is stale for the current tree: the DSV4 scheduler now has the HTTP active-set decode path wired, and docs/models/deepseek-v4/online-throughput.md records active set 2 / decode batch 2 evidence with caveats. Please update this bullet to describe the limited active-set decode state and link online-throughput.md or http-serving-benchmark.md, while keeping continuous batching, multi-request prefill, service-level KV, and CUDA Graph as follow-up work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs(readme): update DeepSeek README status

2 participants