You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please find our documents in [ServerlessLLM](https://serverlessllm.github.io/docs/stable/getting_started/quickstart).
4
+
5
+
## How to build ServerlessLLM Docs
6
+
7
+
This website is built using Docusaurus, a modern static website generator.
8
+
9
+
### Installation
10
+
11
+
To install the necessary dependencies, use the following command:
12
+
13
+
```bash
14
+
npm install
15
+
```
16
+
17
+
### Local Development
18
+
19
+
To start a local development server and open up a browser window, use the following command:
20
+
21
+
```bash
22
+
npm run start
23
+
```
24
+
25
+
Most changes are reflected live without having to restart the server.
26
+
27
+
### Build
28
+
29
+
To generate static content into the build directory, use the following command:
30
+
31
+
```bash
32
+
npm run build
33
+
```
34
+
35
+
This command generates static content into the `build` directory, which can be served using any static content hosting service.
36
+
37
+
### About the image path
38
+
39
+
Images are stored in `images` path. For example, we have an image called `a.jpg` in `images`. When we use this image in any position in the documents, we just use `/img/a.jpg`. (The document sync bot can copy `images` path into `img` folder in `serverlessllm.github.io` repo)
Copy file name to clipboardExpand all lines: docs/api/cli.md
+71-19Lines changed: 71 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -1,22 +1,40 @@
1
+
---
2
+
sidebar_position: 2
3
+
---
4
+
5
+
# CLI API
6
+
1
7
## ServerlessLLM CLI Documentation
2
8
3
9
### Overview
4
-
`sllm-cli` is a command-line interface (CLI) tool designed for managing and interacting with ServerlessLLM models. This document provides an overview of the available commands and their usage.
10
+
`sllm-cli` is a command-line interface (CLI) tool designed to manage and interact with ServerlessLLM models. This document provides an overview of the available commands and their usage.
11
+
12
+
### Installation
13
+
14
+
```bash
15
+
# Create a new environment
16
+
conda create -n sllm python=3.10 -y
17
+
conda activate sllm
18
+
19
+
# Install ServerlessLLM
20
+
pip install serverless-llm
21
+
```
5
22
6
23
### Getting Started
7
24
8
25
Before using the `sllm-cli` commands, you need to start the ServerlessLLM cluster. Follow the guides below to set up your cluster:
After setting up the ServerlessLLM cluster, you can use the commands listed below to manage and interact with your models.
15
33
16
34
### Example Workflow
17
35
18
36
1.**Deploy a Model**
19
-
> Deploy a model using the model name, which must be a HuggingFace pretrained model name. i.e. "facebook/opt-1.3b" instead of "opt-1.3b".
37
+
> Deploy a model using the model name, which must be a HuggingFace pretrained model name. i.e. `facebook/opt-1.3b` instead of `opt-1.3b`.
20
38
```bash
21
39
sllm-cli deploy --model facebook/opt-1.3b
22
40
```
@@ -45,6 +63,8 @@ After setting up the ServerlessLLM cluster, you can use the commands listed belo
45
63
### sllm-cli deploy
46
64
Deploy a model using a configuration file or model name, with options to overwrite default configurations. The configuration file requires minimal specifications, as sensible defaults are provided for advanced configuration options.
47
65
66
+
This command also supports [PEFT LoRA (Low-Rank Adaptation)](https://huggingface.co/docs/peft/main/en/index), allowing you to deploy adapters on top of a base model, either via CLI flags or directly in the configuration file.
67
+
48
68
For more details on the advanced configuration options and their default values, please refer to the [Example Configuration File](#example-configuration-file-configjson) section.
49
69
50
70
##### Usage
@@ -74,6 +94,12 @@ sllm-cli deploy [OPTIONS]
74
94
- `--max-instances <number>`
75
95
- Overwrite the maximum instances in the default configuration.
76
96
97
+
- `--enable-lora`
98
+
- Enable LoRA adapter support forthe transformers backend. Overwrite `enable_lora`in the default configuration.
99
+
100
+
- `--lora-adapters`
101
+
- Add one or more LoRA adapters in the format `<name>=<path>`. Overwrite any existing `lora_adapters`in the default configuration.
102
+
77
103
##### Examples
78
104
Deploy using a model name with default configuration:
79
105
```bash
@@ -95,6 +121,11 @@ Deploy using a model name and overwrite multiple configurations:
@@ -136,23 +172,38 @@ Below is a description of all the fields in config.json.
136
172
| backend_config.device_map | Device map config used to load the model, `auto` is suitable for most scenarios. |
137
173
| backend_config.torch_dtype | Torch dtype of the model. |
138
174
| backend_config.hf_model_class | HuggingFace model class. |
175
+
| backend_config.enable_lora | Set to true to enable loading LoRA adapters during inference. |
176
+
| backend_config.lora_adapters| A dictionary of LoRA adapters in the format `{name: path}`, where each path is a local or Hugging Face-hosted LoRA adapter directory. |
139
177
140
178
### sllm-cli delete
141
-
Delete deployed models by name.
179
+
Delete deployed models by name, or delete specific LoRA adapters associated with a base model.
180
+
181
+
This command supports:
182
+
- Removing deployed models
183
+
- Removing specific LoRA adapters while preserving the base model
142
184
143
185
##### Usage
144
186
```bash
145
-
sllm-cli delete [MODELS]
187
+
sllm-cli delete [MODELS] [OPTIONS]
146
188
```
147
189
148
190
##### Arguments
149
191
- `MODELS`
150
192
- Space-separated list of model names to delete.
151
193
194
+
##### Options
195
+
- `--lora-adapters <adapter_names>`
196
+
- Space-separated list of LoRA adapter names to delete from the given model. If provided, the base model will not be deleted — only the specified adapters will be removed.
197
+
152
198
##### Example
199
+
Delete multiple base models (and all their adapters):
| model | This should be a deployed model name, used to identify the backend instance. |
324
374
| ft_backend | fine-tuning engine, only support `peft` now. |
325
-
| dataset_config |Config about the fine-tuning dataset |
375
+
| dataset_config |Configuration for the fine-tuning dataset |
326
376
| dataset_config.dataset_source | dataset is from `hf_hub` (huggingface_hub) or `local` file |
327
377
| dataset_config.hf_dataset_name | dataset name on huggingface_hub |
328
378
| dataset_config.tokenization_field | the field to tokenize |
329
379
| dataset_config.split | Partitioning of the dataset (`train`, `validation` and `test`), You can also split the selected dataset, e.g. take only the top 10% of the training data: train[:10%] |
330
380
| dataset_config.data_files | data files will be loaded from local|
331
381
| dataset_config.extension_type | extension type of data files (`csv`, `json`, `parquet`, `arrow`) |
332
-
| lora_config |Config about lora|
382
+
| lora_config |Configuration for LoRA fine-tuning|
333
383
| lora_config.r |`r` defines how many parameters will be trained. |
334
384
| lora_config.lora_alpha | A multiplier controlling the overall strength of connections within a neural network, typically set at 1 |
335
-
| lora_config.target_modules | a list of the target_modules available on the [Hugging Face Documentation](https://github.com/huggingface/peft/blob/39ef2546d5d9b8f5f8a7016ec10657887a867041/src/peft/utils/other.py#L220)|
385
+
| lora_config.target_modules | a list of the target_modules available on the [Hugging Face Documentation][1]|
336
386
| lora_config.lora_dropout | used to avoid overfitting |
337
387
| lora_config.bias | use `none` or `lora_only`|
338
388
| lora_config.task_type | Indicates the task the model is begin trained for|
339
-
| training_config |Config about training parameters |
389
+
| training_config |Configuration for training parameters |
340
390
| training_config.auto_find_batch_size | Find a correct batch size that fits the size of Data. |
341
391
| training_config.num_train_epochs | Total number of training rounds |
342
392
| training_config.learning_rate | learning rate |
343
393
| training_config.optim |selectan optimiser |
344
-
| training_config.use_cpu |if use cpu to train |
394
+
| training_config.use_cpu | whether to use CPU for training |
0 commit comments