You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Below is a description of all the fields in config.json.
122
+
123
+
| Field | Description |
124
+
| ----- | ----------- |
125
+
| model | This should be a HuggingFace model name, used to identify model instance. |
126
+
| backend | Inference engine, support `transformers` and `vllm` now. |
127
+
| num_gpus | Number of GPUs used to deploy a model instance. |
128
+
| auto_scaling_config | Config about auto scaling. |
129
+
| auto_scaling_config.metric | Metric used to decide whether to scale up or down. |
130
+
| auto_scaling_config.target | Target value of the metric. |
131
+
| auto_scaling_config.min_instances | The minimum value for model instances. |
132
+
| auto_scaling_config.max_instances | The maximum value for model instances. |
133
+
| auto_scaling_config.keep_alive | How long a model instance lasts after inference ends. For example, if keep_alive is set to 30, it will wait 30 seconds after the inference ends to see if there is another request. |
134
+
| backend_config | Config about inference backend. |
135
+
| backend_config.pretrained_model_name_or_path | The path to load the model, this can be a HuggingFace model name or a local path. |
136
+
| backend_config.device_map | Device map config used to load the model, `auto` is suitable for most scenarios. |
137
+
| backend_config.torch_dtype | Torch dtype of the model. |
138
+
| backend_config.hf_model_class | HuggingFace model class. |
Copy file name to clipboardexpand all lines: docs/stable/getting_started/quickstart.md
+2
Original file line number
Diff line number
Diff line change
@@ -67,6 +67,8 @@ conda activate sllm
67
67
sllm-cli deploy --model facebook/opt-1.3b
68
68
```
69
69
70
+
This will download the model from HuggingFace, if you want load the model from local path, you can use `config.json`, see [here](../cli/cli_api.md#example-configuration-file-configjson) for details.
71
+
70
72
Now, you can query the model by any OpenAI API client. For example, you can use the following Python code to query the model:
0 commit comments