Skip to content

Implicit constraint on base model #42

Description

@ShubyM

There is currently a mismatch between what our configurations allow and what our backends support regarding dynamic model loading:

  • VLLM Backend: Model weights are loaded to the GPU at startup. It is not possible for a client to connect and specify a new base model on the fly.
  • Engine Backend: It is technically possible to dynamically load a new base model.

Our code currently allows clients to specify a new base model regardless of the backend. If VLLM is running, the code will still attempt to sample against the requested model, even though VLLM cannot load it. I think that we should be explicit in stating what we expect from our architecture and from that choice we can reduce complexity elsewhere

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions