[Feature] enable guided decoding ENABLE_V1_KVCACHE_SCHEDULER = 1 #5140

ST-XX · 2025-11-20T06:30:23Z

Motivation

去掉以前的打开受限解码，就会回退ENABLE_V1_KVCACHE_SCHEDULER=0 的逻辑。
支持受限解码+ V1 调度

Modifications

去掉回退逻辑
V1调度添加受限解码后端初始化
V1调度逻辑的 P阶段完成识别逻辑、支持Chunk Prefill
V1调度 + PD 分离，D 阶段获取 Prefill token逻辑升级

Usage or Command

ENABLE_V1_KVCACHE_SCHEDULER=1
使用方式与以前一致，详见：structured_outputs.md

Accuracy Tests

受限解码 json 格式校验成功率测试通过

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-11-20T06:30:28Z

Thanks for your contribution!

fastdeploy/worker/gpu_model_runner.py

Copilot

Pull request overview

This PR enables guided decoding support when ENABLE_V1_KVCACHE_SCHEDULER=1. Previously, enabling guided decoding would automatically force the V1 KVCache scheduler to be disabled. This change removes that limitation and adds the necessary logic to support guided decoding with the V1 scheduler.

Key changes:

Removed automatic fallback logic that disabled V1 scheduler when guided decoding was enabled
Added guided decoding backend initialization in V1 scheduler's prefill phase
Enhanced V1 scheduler to support chunked prefill with guided decoding and improved prefill/decode phase separation logic

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
fastdeploy/worker/worker_process.py	Removes the check that forced `ENABLE_V1_KVCACHE_SCHEDULER` to 0 when guided decoding was enabled
fastdeploy/engine/args_utils.py	Removes the duplicate check that disabled V1 scheduler for guided decoding
fastdeploy/worker/gpu_model_runner.py	Adds guided decoding initialization in `insert_tasks_v1`, implements prefill token extraction for decode phase in PD disaggregation, and enhances `_get_p_done_idxs_gd` to identify completed prefill phases with chunked prefill support

fastdeploy/worker/gpu_model_runner.py

Co-authored-by: Copilot <[email protected]>

codecov-commenter · 2025-11-24T10:46:46Z

Codecov Report

❌ Patch coverage is 14.28571% with 18 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@5ff93d4). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/worker/gpu_model_runner.py	14.28%	17 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5140   +/-   ##
==========================================
  Coverage           ?   57.16%           
==========================================
  Files              ?      317           
  Lines              ?    38471           
  Branches           ?     5774           
==========================================
  Hits               ?    21991           
  Misses             ?    14705           
  Partials           ?     1775

Flag	Coverage Δ
GPU	`57.16% <14.28%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

enable guided decoding ENABLE_V1_KVCACHE_SCHEDULER = 1

9641069

ST-XX requested review from Jiang-Jia-Jun and kevincheng2 and removed request for kevincheng2 November 20, 2025 07:16

kevincheng2 reviewed Nov 20, 2025

View reviewed changes

fastdeploy/worker/gpu_model_runner.py Show resolved Hide resolved

ST-XX requested a review from kevincheng2 November 20, 2025 10:20

Jiang-Jia-Jun requested a review from Copilot November 24, 2025 08:45

Copilot started reviewing on behalf of Jiang-Jia-Jun November 24, 2025 08:46 View session

Copilot finished reviewing on behalf of Jiang-Jia-Jun November 24, 2025 08:47

Copilot AI reviewed Nov 24, 2025

View reviewed changes

fastdeploy/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

ST-XX and others added 2 commits November 24, 2025 17:15

Apply suggestions from code review

295916b

Co-authored-by: Copilot <[email protected]>

Merge branch 'develop' into feature/guided_decoding_v1

cc4fe3f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] enable guided decoding ENABLE_V1_KVCACHE_SCHEDULER = 1 #5140

[Feature] enable guided decoding ENABLE_V1_KVCACHE_SCHEDULER = 1 #5140

ST-XX commented Nov 20, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Nov 20, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

codecov-commenter commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Feature] enable guided decoding ENABLE_V1_KVCACHE_SCHEDULER = 1 #5140

Are you sure you want to change the base?

[Feature] enable guided decoding ENABLE_V1_KVCACHE_SCHEDULER = 1 #5140

Conversation

ST-XX commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Nov 20, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

codecov-commenter commented Nov 24, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ST-XX commented Nov 20, 2025 •

edited

Loading