Skip to content

feat: add BLS decoupled response iterator's cancel() method for the request cancellation #398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 8, 2025

Conversation

richardhuo-nv
Copy link
Contributor

@richardhuo-nv richardhuo-nv commented Mar 24, 2025

What does the PR do?

Adding a cancel() method to the BLS decoupled response iterator, so that it will be able to cancel the Triton server inference request corresponding to the response iterator if the stub process gets the enough response from the response iterator.

Due to each stub InferenceRequest object can create multiple BLS Triton Server inference requests, so adding cancel() to the response iterator would be more feasible to manage cancelling individual request rather than cancelling all requests generated with the stub InferenceRequest object.

More details can be found in the change of the README.md

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

triton-inference-server/server#8097

Where should the reviewer start?

The major changes are on the python backend process, essentially the request executor. The major changes

Test plan:

  • CI Pipeline ID:
    25879687

Caveats:

Background

This feature is useful for stopping long inference requests, such as those from auto-generative large language models, which may run for an indeterminate amount of time and consume significant server resources.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

https://jirasw.nvidia.com/browse/DLIS-7831

adjust the comments

title

add finished check

rename function

adjust

comment

comment

comment

fix

fix

fix

readme

fix comments

fix leak
Copy link
Contributor

@kthui kthui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Left some minor comments on docs and a few questions.

Copy link
Member

@Tabrizian Tabrizian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@richardhuo-nv richardhuo-nv merged commit 7f21b67 into main Apr 8, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants