forked from kubernetes-sigs/gateway-api-inference-extension
-
Notifications
You must be signed in to change notification settings - Fork 8
Promote Dev #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vMaroon
wants to merge
68
commits into
main
Choose a base branch
from
dev
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Promote Dev #122
+1,715
−130
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- added configuration Signed-off-by: Maroon Ayoub <[email protected]>
Signed-off-by: Maroon Ayoub <[email protected]>
Signed-off-by: Maroon Ayoub <[email protected]>
- configured maxscorepicker as default Signed-off-by: Maroon Ayoub <[email protected]>
Signed-off-by: Maroon Ayoub <[email protected]>
Signed-off-by: Maroon Ayoub <[email protected]>
Implemented KVCacheAwareScorer
Signed-off-by: Jing Chen <[email protected]>
Update pod labels to match those supplied by ModelService
add log lines
- added tokenizer lib linking - go package pulling from neuralmagic internal repo
feat: Add the invocation of the Post response plugins
Update the vLLM P2P deployment to support KV-cache and load scorers. Signed-off-by: Kfir Toledo <[email protected]>
* Add P/D scheduler - use 2 schedulers in it, one for prefill and one for decode. P/D scheduler is enabled by environment variable value, list of scorers and their weight are defined by environment variables + delete pd-filter * Remove unused variable * Update readme file with envirnment variables relevant to P/D scheduler * Fix problem caused by merge * Add documentation for PDScheduler.Schedule function * Update names of prefill and decode filters to avoid spaces * Update comment for prefill/decode fitlers * Change IsPDEnabled to PDEnabled * Fix typo in readme * Fix pd scheduler behavior for short promprts * Fix prefill/decode related text in readme * Remove redundant filter creation of prefil/decode filters + make promptLengthThreshold local Add function for schedulerContext creation * Fixes in readme * fix compilation prblem * add pd scheduler test * add postResponse plugins array to prefile and decode config * fix comment in test * fix pd-scheduler test
Signed-off-by: Ricardo Noriega De Soto <[email protected]>
Signed-off-by: Ricardo Noriega De Soto <[email protected]>
Signed-off-by: Ricardo Noriega De Soto <[email protected]>
Signed-off-by: Ricardo Noriega De Soto <[email protected]>
Signed-off-by: Ricardo Noriega <[email protected]>
Signed-off-by: Maroon Ayoub <[email protected]>
Prefix Aware Scorer
* 'session affinity scorer' partial implementation (without headers in response) * Fix in filling request headers * Encoded value of namespaced pod name is sent in response to client * Support of session affinity scorer configuration via environment variables, is added * Go file for session affinity scorer is renamed * Redundant 'sessions' field is removed * Redundant 'ScorerWithPostResponse' struct is removed * - SessionID is renamed to sessionToken - Map fetch is done instead of loop * Session token name is changed to 'x-session-token' * Minor fixes are made in README * Small fix after merge --------- Co-authored-by: Shmuel Kallner <[email protected]>
Signed-off-by: Kfir Toledo <[email protected]>
[docs]: Add prefix flags to the README file
Signed-off-by: Maroon Ayoub <[email protected]>
Temporarily Switch to PostSchedule for Prefix-Store Updates
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.