ServerlessLLM
diff --git a/‎docs/README.md
Lines changed: 39 additions & 0 deletions b/‎docs/README.md
Lines changed: 39 additions & 0 deletions
diff --git a/‎docs/stable/cli/cli_api.md renamed to ‎docs/api/cli.md
Lines changed: 71 additions & 19 deletions b/‎docs/stable/cli/cli_api.md renamed to ‎docs/api/cli.md
Lines changed: 71 additions & 19 deletions
diff --git a/‎docs/api/intro.md
Lines changed: 5 additions & 1 deletion b/‎docs/api/intro.md
Lines changed: 5 additions & 1 deletion
diff --git a/‎docs/images/wechat.png
7.24 KB b/‎docs/images/wechat.png
7.24 KB
diff --git a/‎docs/stable/cli/_category_.json
Lines changed: 0 additions & 4 deletions b/‎docs/stable/cli/_category_.json
Lines changed: 0 additions & 4 deletions
diff --git a/‎docs/stable/deployment/_category_.json
Lines changed: 4 additions & 0 deletions b/‎docs/stable/deployment/_category_.json
Lines changed: 4 additions & 0 deletions
@@ -0,0 +1,39 @@
+# ServerlessLLM documents
+
+Please find our documents in [ServerlessLLM](https://serverlessllm.github.io/docs/stable/getting_started/quickstart).
+
+## How to build ServerlessLLM Docs
+
+This website is built using Docusaurus, a modern static website generator.
+
+### Installation
+
+To install the necessary dependencies, use the following command:
+
+```bash
+npm install
+```
+
+### Local Development
+
+To start a local development server and open up a browser window, use the following command:
+
+```bash
+npm run start
+```
+
+Most changes are reflected live without having to restart the server.
+
+### Build
+
+To generate static content into the build directory, use the following command:
+
+```bash
+npm run build
+```
+
+This command generates static content into the `build` directory, which can be served using any static content hosting service.
+
+### About the image path
+
+Images are stored in `images` path. For example, we have an image called `a.jpg` in `images`. When we use this image in any position in the documents, we just use `/img/a.jpg`. (The document sync bot can copy `images` path into `img` folder in `serverlessllm.github.io` repo)
@@ -1,22 +1,40 @@
+---
+sidebar_position: 2
+---
+
+# CLI API
+
 ## ServerlessLLM CLI Documentation
 
 ### Overview
-`sllm-cli` is a command-line interface (CLI) tool designed for managing and interacting with ServerlessLLM models. This document provides an overview of the available commands and their usage.
+`sllm-cli` is a command-line interface (CLI) tool designed to manage and interact with ServerlessLLM models. This document provides an overview of the available commands and their usage.
+
+### Installation
+
+```bash
+# Create a new environment
+conda create -n sllm python=3.10 -y
+conda activate sllm
+
+# Install ServerlessLLM
+pip install serverless-llm
+```
 
 ### Getting Started
 
 Before using the `sllm-cli` commands, you need to start the ServerlessLLM cluster. Follow the guides below to set up your cluster:
 
-- [Installation Guide](../getting_started/installation.md)
-- [Docker Quickstart Guide](../getting_started/docker_quickstart.md)
-- [Quickstart Guide](../getting_started/quickstart.md)
+- [Single Machine Deployment](../stable/gettting_started.md)
+- [Single Machine Deployment (From Scratch)](../stable/deployment/single_machine.md)
+- [Multi-Machine Deployment](../stable/deployment/multi_machine.md)
+- [SLURM Cluster Deployment](../stable/deployment/slurm_cluster.md)
 
 After setting up the ServerlessLLM cluster, you can use the commands listed below to manage and interact with your models.
 
 ### Example Workflow
 
 1. **Deploy a Model**
-    > Deploy a model using the model name, which must be a HuggingFace pretrained model name. i.e. "facebook/opt-1.3b" instead of "opt-1.3b".
+    > Deploy a model using the model name, which must be a HuggingFace pretrained model name. i.e. `facebook/opt-1.3b` instead of `opt-1.3b`.
     ```bash
     sllm-cli deploy --model facebook/opt-1.3b
     ```
@@ -45,6 +63,8 @@ After setting up the ServerlessLLM cluster, you can use the commands listed belo
 ### sllm-cli deploy
 Deploy a model using a configuration file or model name, with options to overwrite default configurations. The configuration file requires minimal specifications, as sensible defaults are provided for advanced configuration options.
 
+This command also supports [PEFT LoRA (Low-Rank Adaptation)](https://huggingface.co/docs/peft/main/en/index), allowing you to deploy adapters on top of a base model, either via CLI flags or directly in the configuration file.
+
 For more details on the advanced configuration options and their default values, please refer to the [Example Configuration File](#example-configuration-file-configjson) section.
 
 ##### Usage
@@ -74,6 +94,12 @@ sllm-cli deploy [OPTIONS]
 - `--max-instances <number>`
   - Overwrite the maximum instances in the default configuration.
 
+- `--enable-lora`
+  - Enable LoRA adapter support for the transformers backend. Overwrite `enable_lora` in the default configuration.
+
+- `--lora-adapters`
+  - Add one or more LoRA adapters in the format `<name>=<path>`. Overwrite any existing `lora_adapters` in the default configuration.
+
 ##### Examples
 Deploy using a model name with default configuration:
 ```bash
@@ -95,6 +121,11 @@ Deploy using a model name and overwrite multiple configurations:
 sllm-cli deploy --model facebook/opt-1.3b --num-gpus 2 --target 5 --min-instances 1 --max-instances 5
 ```
 
+Deploy a base model with multiple LoRA adapters:
+```bash
+sllm-cli deploy --model facebook/opt-1.3b --backend transformers --enable-lora --lora-adapters demo_lora1=crumb/FLAN-OPT-1.3b-LoRA demo_lora2=GrantC/alpaca-opt-1.3b-lora
+```
+
 ##### Example Configuration File (`config.json`)
 This file can be incomplete, and missing sections will be filled in by the default configuration:
 ```json
@@ -113,7 +144,12 @@ This file can be incomplete, and missing sections will be filled in by the defau
         "pretrained_model_name_or_path": "facebook/opt-1.3b",
         "device_map": "auto",
         "torch_dtype": "float16",
-        "hf_model_class": "AutoModelForCausalLM"
+        "hf_model_class": "AutoModelForCausalLM",
+        "enable_lora": true,
+        "lora_adapters": {
+            "demo_lora1": "crumb/FLAN-OPT-1.3b-LoRA",
+            "demo_lora2": "GrantC/alpaca-opt-1.3b-lora"
+        }
     }
 }
 ```
@@ -136,23 +172,38 @@ Below is a description of all the fields in config.json.
 | backend_config.device_map | Device map config used to load the model, `auto` is suitable for most scenarios. |
 | backend_config.torch_dtype | Torch dtype of the model. |
 | backend_config.hf_model_class | HuggingFace model class. |
+| backend_config.enable_lora | Set to true to enable loading LoRA adapters during inference. |
+| backend_config.lora_adapters| A dictionary of LoRA adapters in the format `{name: path}`, where each path is a local or Hugging Face-hosted LoRA adapter directory. |
 
 ### sllm-cli delete
-Delete deployed models by name.
+Delete deployed models by name, or delete specific LoRA adapters associated with a base model.
+
+This command supports:
+  - Removing deployed models
+  - Removing specific LoRA adapters while preserving the base model
 
 ##### Usage
 ```bash
-sllm-cli delete [MODELS]
+sllm-cli delete [MODELS] [OPTIONS]
 ```
 
 ##### Arguments
 - `MODELS`
   - Space-separated list of model names to delete.
 
+##### Options
+- `--lora-adapters <adapter_names>`
+  - Space-separated list of LoRA adapter names to delete from the given model. If provided, the base model will not be deleted — only the specified adapters will be removed.
+
 ##### Example
+Delete multiple base models (and all their adapters):
 ```bash
 sllm-cli delete facebook/opt-1.3b facebook/opt-2.7b meta/llama2
 ```
+Delete specific LoRA adapters from a base model, keeping the base model:
+```bash
+sllm-cli delete facebook/opt-1.3b --lora-adapters demo_lora1 demo_lora2
+```
 
 ### sllm-cli generate
 Generate outputs using the deployed model.
@@ -217,7 +268,7 @@ sllm-cli encode --threads 4 /path/to/request.json
     "model": "intfloat/e5-mistral-7b-instruct",
     "task_instruct": "Given a question, retrieve passages that answer the question",
     "query": [
-      "Hi, How are you?"
+      "Hi, how are you?"
     ]
 }
 ```
@@ -267,7 +318,7 @@ sllm-cli update --config /path/to/config.json
 ```
 
 ### sllm-cli fine-tuning
-Fine-tuning the deployed model.
+Fine-tune the deployed model.
 
 ##### Usage
 ```bash
@@ -289,20 +340,19 @@ sllm-cli fine-tuning --base-model <model_name> --config <path_to_ft_config_file>
 ##### Example Configuration File (`ft_config.json`)
 ```json
 {
-    "model": "bigscience/bloomz-560m",
+    "model": "facebook/opt-125m",
     "ft_backend": "peft",
     "dataset_config": {
         "dataset_source": "hf_hub",
         "hf_dataset_name": "fka/awesome-chatgpt-prompts",
         "tokenization_field": "prompt",
-        "split": "train[:10%]",
+        "split": "train",
         "data_files": "",
         "extension_type": ""
     },
     "lora_config": {
         "r": 4,
         "lora_alpha": 1,
-        "target_modules": ["query_key_value"],
         "lora_dropout": 0.05,
         "bias": "lora_only",
         "task_type": "CAUSAL_LM"
@@ -322,26 +372,28 @@ Below is a description of all the fields in ft_config.json.
 |--------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | model                                | This should be a deployed model name, used to identify the backend instance.                                                                                                                     |
 | ft_backend                           | fine-tuning engine, only support `peft` now.                                                                                                                                               |
-| dataset_config                       | Config about the fine-tuning dataset                                                                                                                                                       |
+| dataset_config                       | Configuration for the fine-tuning dataset                                                                                                                                                       |
 | dataset_config.dataset_source        | dataset is from `hf_hub` (huggingface_hub) or `local` file                                                                                                                                 |
 | dataset_config.hf_dataset_name       | dataset name on huggingface_hub                                                                                                                                                            |
 | dataset_config.tokenization_field    | the field to tokenize                                                                                                                                                                      |
 | dataset_config.split                 | Partitioning of the dataset (`train`, `validation` and `test`), You can also split the selected dataset, e.g. take only the top 10% of the training data: train[:10%]                                                                                                                             |
 | dataset_config.data_files            | data files will be loaded from local                                                                                                                                                       |
 | dataset_config.extension_type        | extension type of data files (`csv`, `json`, `parquet`, `arrow`)                                                                                                                           |
-| lora_config                          | Config about lora                                                                                                                                                                          |
+| lora_config                          | Configuration for LoRA fine-tuning                                                                                                                                                                          |
 | lora_config.r                        | `r` defines how many parameters will be trained.                                                                                                                                           |
 | lora_config.lora_alpha               | A multiplier controlling the overall strength of connections within a neural network, typically set at 1                                                                                   |
-| lora_config.target_modules           | a list of the target_modules available on the [Hugging Face Documentation](https://github.com/huggingface/peft/blob/39ef2546d5d9b8f5f8a7016ec10657887a867041/src/peft/utils/other.py#L220) |
+| lora_config.target_modules           | a list of the target_modules available on the [Hugging Face Documentation][1] |
 | lora_config.lora_dropout             | used to avoid overfitting                                                                                                                                                                  |
 | lora_config.bias                     | use `none` or `lora_only`                                                                                                                                                                  |
 | lora_config.task_type                | Indicates the task the model is begin trained for                                                                                                                                          |
-| training_config                      | Config about training parameters                                                                                                                                                           |
+| training_config                      | Configuration for training parameters                                                                                                                                                           |
 | training_config.auto_find_batch_size | Find a correct batch size that fits the size of Data.                                                                                                                                      |
 | training_config.num_train_epochs     | Total number of training rounds                                                                                                                                                            |
 | training_config.learning_rate        | learning rate                                                                                                                                                                              |
 | training_config.optim                | select an optimiser                                                                                                                                                                        |
-| training_config.use_cpu              | if use cpu to train                                                                                                                                                                        |
+| training_config.use_cpu              | whether to use CPU for training                                                                                                                                                                        |
+
+[1]: https://github.com/huggingface/peft/blob/39ef2546d5d9b8f5f8a7016ec10657887a867041/src/peft/utils/other.py#L220
 
 ### sllm-cli status
 Check the information of deployed models
@@ -354,4 +406,4 @@ sllm-cli status
 #### Example
 ```bash
 sllm-cli status
-```
+```
@@ -2,4 +2,8 @@
 sidebar_position: 1
 ---
 
-# API docs
+# API Introduction
+
+Welcome to the ServerlessLLM API documentation. This section contains detailed information about the various APIs provided by ServerlessLLM:
+
+- [CLI API](./cli.md) - Documentation for the `sllm-cli` command-line interface
@@ -0,0 +1,4 @@
+{
+  "label": "Deployment",
+  "position": 3
+}
-Original file line number
+Diff line change
@@ @@ -0,0 +1,4 @@ @@
 +{
 +  "label": "Deployment",
 +  "position": 3
 +}