Skip to content

Promote Dev #122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 68 commits into
base: main
Choose a base branch
from
Open

Promote Dev #122

wants to merge 68 commits into from

Conversation

vMaroon
Copy link
Member

@vMaroon vMaroon commented May 5, 2025

No description provided.

lionelvillard and others added 30 commits May 1, 2025 19:02
Update pod labels to match those supplied by ModelService
- added tokenizer lib linking
- go package pulling from neuralmagic internal repo
feat: Add the invocation of the Post response plugins
Update the vLLM P2P deployment to support KV-cache and load scorers.

Signed-off-by: Kfir Toledo <[email protected]>
* Add P/D scheduler - use 2 schedulers in it, one for prefill and one for decode. P/D scheduler is enabled by environment variable value, list of scorers and their weight are defined by environment variables
+ delete pd-filter

* Remove unused variable

* Update readme file with envirnment variables relevant to P/D scheduler

* Fix problem caused by merge

* Add documentation for PDScheduler.Schedule function

* Update names of prefill and decode filters to avoid spaces

* Update comment for prefill/decode fitlers

* Change IsPDEnabled to PDEnabled

* Fix typo in readme

* Fix pd scheduler behavior for short promprts

* Fix prefill/decode related text in  readme

* Remove redundant filter creation of prefil/decode filters + make promptLengthThreshold local
Add function for schedulerContext creation

* Fixes in readme

* fix compilation prblem

* add pd scheduler test

* add postResponse plugins array to prefile and decode config

* fix comment in test

* fix pd-scheduler test
Signed-off-by: Ricardo Noriega De Soto <[email protected]>
Signed-off-by: Ricardo Noriega De Soto <[email protected]>
Signed-off-by: Ricardo Noriega De Soto <[email protected]>
Signed-off-by: Ricardo Noriega De Soto <[email protected]>
Signed-off-by: Maroon Ayoub <[email protected]>
* 'session affinity scorer' partial implementation (without headers in response)

* Fix in filling request headers

* Encoded value of namespaced pod name is sent in response to client

* Support of session affinity scorer configuration via environment variables, is added

* Go file for session affinity scorer is renamed

* Redundant 'sessions' field is removed

* Redundant 'ScorerWithPostResponse' struct is removed

* - SessionID is renamed to sessionToken
- Map fetch is done instead of loop

* Session token name is changed to 'x-session-token'

* Minor fixes are made in README

* Small fix after merge

---------

Co-authored-by: Shmuel Kallner <[email protected]>
[docs]: Add prefix flags to the README file
Temporarily Switch to PostSchedule for Prefix-Store Updates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants