-
Notifications
You must be signed in to change notification settings - Fork 74
remove Model field from LLMRequest #782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Nir Rozenbaum <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: nirrozenbaum The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
cc @kfswain |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
I am working on the Flow Controller POC. We currently enforce the dispatch policy (for fairness) by model, not resolved target model. I will either need to use a different type when enqueuing a request than LLMRequest or will need model to remain as a field on this struct. The Flow Controller is not tightly coupled to the Scheduler, so either approach is fine. |
@LukeAVanDrie thanks for bringing this up. Ideally, each layer should get the data it needs for its mission, and only that. |
To my understanding, @LukeAVanDrie needs the
|
|
||
reqCtx.Model = llmReq.Model | ||
reqCtx.ResolvedTargetModel = llmReq.ResolvedTargetModel | ||
reqCtx.Model = model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to hold off on any changes to this file until: #781 merges, I do quite a bit of this same refactoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold until #781 is merged
A subset (model, criticality) and some new fields: a reference to the request prompt size in bytes and a reference to the request context.
This is a good point. I have removed my dependency on scheduling.LLMRequest and am defining my own type for the FlowController input. My POC's fate is no longer tied to this PR. |
This PR removes the
Model
field fromScheduling.LLMRequest
struct.from the scheduler point of view, it doesn't care about the original requested model name, only about the resolved model after traffic splitting that was done in a higher level.
we can see that
Model
field was used in unit-tests only and that it's always set to be identical to ResolvedTargetModel. Scheduler doesn't use this field.In addition to removing
Model
field fromLLMRequest
struct, this PR renamesResolvedTargetModel
toTargetModel
inLLMRequest
from the same reasons. scheduler plugins don't care about traffic splitting and "resolved" model, only about what is target model.unit-tests and other usage places were updated accordingly.