-
Notifications
You must be signed in to change notification settings - Fork 660
[Feature] enable guided decoding ENABLE_V1_KVCACHE_SCHEDULER = 1 #5140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enables guided decoding support when ENABLE_V1_KVCACHE_SCHEDULER=1. Previously, enabling guided decoding would automatically force the V1 KVCache scheduler to be disabled. This change removes that limitation and adds the necessary logic to support guided decoding with the V1 scheduler.
Key changes:
- Removed automatic fallback logic that disabled V1 scheduler when guided decoding was enabled
- Added guided decoding backend initialization in V1 scheduler's prefill phase
- Enhanced V1 scheduler to support chunked prefill with guided decoding and improved prefill/decode phase separation logic
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| fastdeploy/worker/worker_process.py | Removes the check that forced ENABLE_V1_KVCACHE_SCHEDULER to 0 when guided decoding was enabled |
| fastdeploy/engine/args_utils.py | Removes the duplicate check that disabled V1 scheduler for guided decoding |
| fastdeploy/worker/gpu_model_runner.py | Adds guided decoding initialization in insert_tasks_v1, implements prefill token extraction for decode phase in PD disaggregation, and enhances _get_p_done_idxs_gd to identify completed prefill phases with chunked prefill support |
Co-authored-by: Copilot <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5140 +/- ##
==========================================
Coverage ? 57.16%
==========================================
Files ? 317
Lines ? 38471
Branches ? 5774
==========================================
Hits ? 21991
Misses ? 14705
Partials ? 1775
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
去掉以前的打开受限解码,就会回退ENABLE_V1_KVCACHE_SCHEDULER=0 的逻辑。
支持受限解码+ V1 调度
Modifications
Usage or Command
ENABLE_V1_KVCACHE_SCHEDULER=1
使用方式与以前一致,详见:structured_outputs.md
Accuracy Tests
受限解码 json 格式校验成功率测试通过
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.