Skip to content

Commit 001cc0d

Browse files
committed
Document Sync by Tina
1 parent 61e7efa commit 001cc0d

File tree

2 files changed

+25
-2
lines changed

2 files changed

+25
-2
lines changed

docs/stable/cli/cli_api.md

+23-2
Original file line numberDiff line numberDiff line change
@@ -106,16 +106,37 @@ This file can be incomplete, and missing sections will be filled in by the defau
106106
"metric": "concurrency",
107107
"target": 1,
108108
"min_instances": 0,
109-
"max_instances": 10
109+
"max_instances": 10,
110+
"keep_alive": 0
110111
},
111112
"backend_config": {
112113
"pretrained_model_name_or_path": "facebook/opt-1.3b",
113114
"device_map": "auto",
114-
"torch_dtype": "float16"
115+
"torch_dtype": "float16",
116+
"hf_model_class": "AutoModelForCausalLM"
115117
}
116118
}
117119
```
118120

121+
Below is a description of all the fields in config.json.
122+
123+
| Field | Description |
124+
| ----- | ----------- |
125+
| model | This should be a HuggingFace model name, used to identify model instance. |
126+
| backend | Inference engine, support `transformers` and `vllm` now. |
127+
| num_gpus | Number of GPUs used to deploy a model instance. |
128+
| auto_scaling_config | Config about auto scaling. |
129+
| auto_scaling_config.metric | Metric used to decide whether to scale up or down. |
130+
| auto_scaling_config.target | Target value of the metric. |
131+
| auto_scaling_config.min_instances | The minimum value for model instances. |
132+
| auto_scaling_config.max_instances | The maximum value for model instances. |
133+
| auto_scaling_config.keep_alive | How long a model instance lasts after inference ends. For example, if keep_alive is set to 30, it will wait 30 seconds after the inference ends to see if there is another request. |
134+
| backend_config | Config about inference backend. |
135+
| backend_config.pretrained_model_name_or_path | The path to load the model, this can be a HuggingFace model name or a local path. |
136+
| backend_config.device_map | Device map config used to load the model, `auto` is suitable for most scenarios. |
137+
| backend_config.torch_dtype | Torch dtype of the model. |
138+
| backend_config.hf_model_class | HuggingFace model class. |
139+
119140
### sllm-cli delete
120141
Delete deployed models by name.
121142

docs/stable/getting_started/quickstart.md

+2
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,8 @@ conda activate sllm
6767
sllm-cli deploy --model facebook/opt-1.3b
6868
```
6969

70+
This will download the model from HuggingFace, if you want load the model from local path, you can use `config.json`, see [here](../cli/cli_api.md#example-configuration-file-configjson) for details.
71+
7072
Now, you can query the model by any OpenAI API client. For example, you can use the following Python code to query the model:
7173
```bash
7274
curl http://127.0.0.1:8343/v1/chat/completions \

0 commit comments

Comments
 (0)