You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+20-21Lines changed: 20 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,21 +4,20 @@ This application can be used to run LLMs (Large Language Models) in docker conta
4
4
5
5
The main motivation to start this project, was to be able to use different LLMs running on a local machine or a remote server with [langchain](https://github.com/hwchase17/langchain) using [langchain-llm-api](https://github.com/1b5d/langchain-llm-api)
6
6
7
-
tested on CPU with the following models :
8
-
9
-
- Llama 7b
10
-
- Llama 13b
11
-
- Llama 30b
12
-
- Alpaca 7b
13
-
- Alpaca 13b
14
-
- Alpaca 30b
15
-
- Vicuna 13b
16
-
- Koala 7b
17
-
18
-
tested on GPU with GPTQ-for-LlaMa with
19
-
20
-
- Koala 7B-4bit-128g
21
-
- wizardLM 7B-4bit-128g
7
+
Tested with the following models :
8
+
9
+
- Llama 7b - ggml
10
+
- Llama 13b - ggml
11
+
- Llama 30b - ggml
12
+
- Alpaca 7b - ggml
13
+
- Alpaca 13b - ggml
14
+
- Alpaca 30b - ggml
15
+
- Vicuna 13b - ggml
16
+
- Koala 7b - ggml
17
+
- Vicuna GPTQ 7B-4bit-128g
18
+
- Vicuna GPTQ 13B-4bit-128g
19
+
- Koala GPTQ 7B-4bit-128g
20
+
- wizardLM GPTQ 7B-4bit-128g
22
21
23
22
Contribution for supporting more models is welcomed.
24
23
@@ -60,7 +59,6 @@ to configure the application, edit `config.yaml` which is mounted into the docke
60
59
```
61
60
models_dir: /models # dir inside the container
62
61
model_family: alpaca
63
-
model_name: 7b
64
62
setup_params:
65
63
key: value
66
64
model_params:
@@ -101,7 +99,7 @@ POST /embeddings
101
99
```
102
100
103
101
104
-
## Llama / Alpaca on CPU - using llama.cpp
102
+
## Llama on CPU - using llama.cpp
105
103
106
104
Llama and models based on it such as Alpaca and Vicuna are intended only for academic research and any commercial use is prohibited. This project doesn't provide any links to download these models.
107
105
@@ -110,7 +108,6 @@ You can configure the model usage in a local `config.yaml` file, the configs, he
110
108
```
111
109
models_dir: /models # dir inside the container
112
110
model_family: alpaca
113
-
model_name: 7b
114
111
setup_params:
115
112
repo_id: user/repo_id
116
113
filename: ggml-model-q4_0.bin
@@ -169,20 +166,22 @@ You should see a table showing you the current nvidia driver version and some ot
You can also run the Llama model using GPTQ-for-LLaMa 4 bit quantization, you can use a docker image specially built for that purpose `1b5d/llm-api:0.0.3-gptq-llama-cuda` instead of the default image.
169
+
You can also run the Llama model using GPTQ-for-LLaMa 4 bit quantization, you can use a docker image specially built for that purpose `1b5d/llm-api:0.0.4-gptq-llama-triton` instead of the default image.
173
170
174
171
a separate docker-compose file is also available to run this mode:
175
172
176
173
```
177
-
docker compose -f docker-compose.gptq-llama-cuda.yaml up
174
+
docker compose -f docker-compose.gptq-llama-triton.yaml up
178
175
```
179
176
180
177
or by directly running the container:
181
178
182
179
```
183
-
docker run --gpus all -v $PWD/models/:/models:rw -v $PWD/config.yaml:/llm-api/config.yaml:ro -p 8000:8000 1b5d/llm-api:0.0.3-gptq-llama-cuda
180
+
docker run --gpus all -v $PWD/models/:/models:rw -v $PWD/config.yaml:/llm-api/config.yaml:ro -p 8000:8000 1b5d/llm-api:0.0.4-gptq-llama-triton
184
181
```
185
182
183
+
**Note**: `llm-api:0.0.x-gptq-llama-cuda` image has been deprecated, please switch to the triton image as it seems more reliable
0 commit comments