Skip to content

Conversation

@JohnLiu97Huawei
Copy link
Contributor

This pull request updates the disagg_worker.py logic to support key-value (KV) transfer parameters for distributed inference requests. Specifically, it ensures that when the worker is configured for KV transfer, both prefill and generation requests include the necessary parameters to enable remote decoding.

Enhancements for distributed inference:

  • In the _handle_request method of disagg_worker.py, added logic to inject kv_transfer_params into sampling_params for both prefill and generation requests when kv_transfer_config is set, enabling remote decode functionality.

@JohnLiu97Huawei
Copy link
Contributor Author

Self Test Result

Using the P2P mooncake kv connector, the inference is successfully executed under the deployment of 1E 1P 1D pattern.

looking up the node log we can see "I20251219 15:17:36.256863 281473336216192 ascend_direct_transport.cpp:358] AscendDirectTransport register mem addr:0x12c607200000, length:290258944, location:*, mem type:0" as evidence of the successful kv transfer.

Introduces kv_transfer_params to GenerationResponse and ensures it is set for P_INSTANCE server types. Updates request handling and response processing to propagate kv_transfer_params, enabling remote decode and prefill control.

besides, the bug of duplicated D instance init process is fixed.

Signed-off-by: John Liu BUAA <[email protected]>
@zengchuang-hw zengchuang-hw merged commit a4b5ac8 into JiusiServe:main Dec 19, 2025
10 checks passed
@JohnLiu97Huawei JohnLiu97Huawei deleted the bugfix-p2p branch December 20, 2025 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants