Implicit constraint on base model

There is currently a mismatch between what our configurations allow and what our backends support regarding dynamic model loading:

- VLLM Backend: Model weights are loaded to the GPU at startup. It is not possible for a client to connect and specify a new base model on the fly.
- Engine Backend: It is technically possible to dynamically load a new base model.

Our code currently allows clients to specify a new base model regardless of the backend. If VLLM is running, the code will still attempt to sample against the requested model, even though VLLM cannot load it. I think that we should be explicit in stating what we expect from our architecture and from that choice we can reduce complexity elsewhere

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implicit constraint on base model #42

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Implicit constraint on base model #42

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions