You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Environment, CPU architecture, OS, and Version:
Lenovo Legion laptop, AMD 5800H CPU, 40GB RAM, NVIDIA 3060 with 6G Memory
Describe the bug
I am running local AI using this container: docker run -p 8090:8080 --rm --gpus all --name local-ai-llava -e DEBUG=true -e MODELS_PATH=/models -v /home/msameer/local-ai-models:/models -ti localai/localai:v2.20.1-cublas-cuda12-core https://gist.githubusercontent.com/msameer/dec4efaf7b1674fbd5be38d8d2b83484/raw/f4943915546ae4013eb6c0220b9ea35783bc2fbd/llava.yaml
The content of the yaml gist is as follows:
name: llava-1.6-mistral
context_size: 4096
f16: true
threads: 11
gpu_layers: 32
mmap: true
parameters:
# Reference any HF model or a local file here
model: llava-v1.6-mistral-7b.gguf
template:
chat: &template |
Instruct: {{.Input}}
Output:
# Modify the prompt template here ^^^ as per your requirements
completion: *template
When I execute this request it responds successfully after 19 seconds:
curl -X POST --location "http://localhost:8090/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "llava-1.6-mistral",
"messages": [
{
"role": "user",
"content": "How many pyramids are there in Giza?"
}
],
"temperature": 0.7
}'
I see this in the logs regardless of the gpu_layers number I put in the gist and the time for response is always the same so it looks like it's not effective:
LocalAI version:
localai/localai:v2.20.1-cublas-cuda12-core
Environment, CPU architecture, OS, and Version:
Lenovo Legion laptop, AMD 5800H CPU, 40GB RAM, NVIDIA 3060 with 6G Memory
Describe the bug
I am running local AI using this container:
docker run -p 8090:8080 --rm --gpus all --name local-ai-llava -e DEBUG=true -e MODELS_PATH=/models -v /home/msameer/local-ai-models:/models -ti localai/localai:v2.20.1-cublas-cuda12-core https://gist.githubusercontent.com/msameer/dec4efaf7b1674fbd5be38d8d2b83484/raw/f4943915546ae4013eb6c0220b9ea35783bc2fbd/llava.yaml
The content of the yaml gist is as follows:
When I execute this request it responds successfully after 19 seconds:
I see this in the logs regardless of the gpu_layers number I put in the gist and the time for response is always the same so it looks like it's not effective:
Here are some parts of the debug log as it's too long to include in full:
The text was updated successfully, but these errors were encountered: