Skip to content
This repository has been archived by the owner on Jun 22, 2024. It is now read-only.

How does applying a model from URL work? #4

Open
braheezy opened this issue May 28, 2023 · 1 comment
Open

How does applying a model from URL work? #4

braheezy opened this issue May 28, 2023 · 1 comment

Comments

@braheezy
Copy link

braheezy commented May 28, 2023

Hello! I am an absolute LLM noob so I apologize if these are rather basic questions. I am loving LocalAI so far and it's been incredibly easy to get running with models from the gallery.

I wanted to try a model where the definition does not contain a URL, like Vicuna or Koala. The instructions indicate a POST request should be sent, using the koala.yaml configuration file from this repository and to supply URI(s) to actual model files to use, probably from HuggingFace:

curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
     "url": "github:go-skynet/model-gallery/koala.yaml",
     "name": "koala",
     "overrides": { "parameters": {"model": "koala.bin" } },
     "files": [
        {
            "uri": "https://huggingface.co/xxxx",
            "sha256": "xxx",
            "filename": "koala.bin"
        }
     ]
   }'

So I went to HuggingFace, searched koala and reviewed one of the top results. It appears to have the model split into multiple files:

  • pytorch_model-00001-of-000002.bin
  • pytorch_model-00002-of-000002.bin

Presumably both of these files are needed. I couldn't find examples of how to handle model bin files that are split across multiple files. Additional, some light research indicates I couldn't just cat the model files together.

I found this repository that seems to host a single koala model file. So I tried that:

curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
     "url": "github:go-skynet/model-gallery/koala.yaml",
     "name": "koala",
     "overrides": { "parameters": {"model": "koala.bin" } },
     "files": [
        {
            "uri": "https://huggingface.co/4bit/koala-13B-GPTQ-4bit-128g/resolve/main/koala-13B-4bit-128g.safetensors",
            "sha256": "${SHA}",
            "filename": "koala.bin"
        }
     ]
   }'

(I downloaded the file first and calculated the SHA256, then ran this command and LocalAI also downloaded the model. Is that right?)

After the job finished processing, I was able to see the new model defined:

$ curl -q $LOCALAI/v1/models | jq '.'
{
  "object": "list",
  "data": [
    {
      "id": "ggml-gpt4all-j",
      "object": "model"
    },
    {
      "id": "koala.bin",
      "object": "model"
    },
  ]
}

I proceeded to place prompt-templates/koala.tmpl into the models/ directory. I then tried to call the model and got a 500 error:

$ curl $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "koala.bin",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9 
   }'
{"error":{"code":500,"message":"could not load model - all backends returned error: 12 errors occurred:\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\n","type":""}}

I am sure I took a wrong turn at some point. Any advice? Thanks!

@mudler
Copy link
Member

mudler commented May 29, 2023

Hey !

The files you picked are for pytorch - you should pick instead ggml files. A tip: I usually search for "ggml" in hugging face instead.

The Author (TheBloke) in the huggingface link you refered to has uploaded quite a bunch of them!

Edit: I had to update your comment and remove any link (as the license of those models is unclear)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants