Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to get a list of loaded models and unload a model by request #3378

Open
Nyralei opened this issue Aug 25, 2024 · 4 comments
Open

Ability to get a list of loaded models and unload a model by request #3378

Nyralei opened this issue Aug 25, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@Nyralei
Copy link
Contributor

Nyralei commented Aug 25, 2024

No description provided.

@Nyralei Nyralei added the enhancement New feature or request label Aug 25, 2024
@dave-gray101
Copy link
Collaborator

https://github.com/mudler/LocalAI/blob/master/core/http/endpoints/localai/backend_monitor.go

These endpoints already show which backends are loaded and allow them to be unloaded?

@Nyralei
Copy link
Contributor Author

Nyralei commented Aug 25, 2024

https://github.com/mudler/LocalAI/blob/master/core/http/endpoints/localai/backend_monitor.go

These endpoints already show which backends are loaded and allow them to be unloaded?

Thanks for pointing out /backend/shutdown, but it works only when model ends with .bin https://github.com/mudler/LocalAI/blob/master/core/services/backend_monitor.go#L42
otherwise it tries to append ".bin" to model name. In my case model name ends with ".gguf"

About /backend/monitor endpoint - it doesn't show which models are currently loaded, it just shows some metrics and only if model is set in request (ending with ".bin" too).
I tried calling both with

{
    "model": "ggml-whisper-large-v3.bin"
}
  1. /backend/monitor responds with
{
    "state": 1,
    "memory": {
        "total": 53627682816,
        "breakdown": {
            "gopsutil-RSS": 681861120
        }
    }
}
  1. /backend/shutdown properly shut downs

With "model": "gemma-2-27b-it-Q5_K_S.gguf":

{
    "error": {
        "code": 500,
        "message": "backend gemma-2-27b-it-Q5_K_S.gguf.bin is not currently loaded",
        "type": ""
    }
}
{
    "error": {
        "code": 500,
        "message": "model gemma-2-27b-it-Q5_K_S.gguf.bin not found",
        "type": ""
    }
}

@dave-gray101 dave-gray101 self-assigned this Aug 25, 2024
@dave-gray101
Copy link
Collaborator

Thanks for the updated comment!

That sounds like a big bug to me - I'll see if I can investigate this soon.

@jokerosky
Copy link

Seems that it shows status of a particular requested model https://github.com/mudler/LocalAI/blob/master/core/http/endpoints/localai/backend_monitor.go#L23C34-L23C39
maybe for all models it should be a separate endpoint or something like '*' instead of model name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants