|
| 1 | +--- |
| 2 | +sidebar_position: 4 |
| 3 | +--- |
| 4 | +# PEFT LoRA Fine-tuning |
| 5 | + |
| 6 | +This feature introduces a dedicated fine-tuning backend (`ft_backend`) for handling LoRA (Low-Rank Adaptation) fine-tuning jobs in ServerlessLLM. This implementation provides isolated fine-tuning instances with specialized resource management and lifecycle control. |
| 7 | + |
| 8 | +## Prerequisites |
| 9 | + |
| 10 | +Before using the fine-tuning feature, ensure you have: |
| 11 | + |
| 12 | +1. **Base Model**: A base model must be saved using the transformers backend |
| 13 | +2. **Docker Setup**: ServerlessLLM cluster running via Docker Compose |
| 14 | +3. **Storage**: Adequate storage space for fine-tuned adapters |
| 15 | + |
| 16 | +## Usage |
| 17 | + |
| 18 | +### Step 1. **Start the ServerlessLLM Services Using Docker Compose** |
| 19 | + |
| 20 | +```bash |
| 21 | +docker compose up -d |
| 22 | +``` |
| 23 | + |
| 24 | +This command will start the Ray head node and two worker nodes defined in the `docker-compose.yml` file. |
| 25 | + |
| 26 | +:::tip |
| 27 | +Use the following command to monitor the logs of the head node: |
| 28 | + |
| 29 | +```bash |
| 30 | +docker logs -f sllm_head |
| 31 | +``` |
| 32 | +::: |
| 33 | + |
| 34 | +### Step 2: Submit Fine-tuning Job |
| 35 | + |
| 36 | +Submit a fine-tuning job using the REST API: |
| 37 | + |
| 38 | +```bash |
| 39 | +curl -X POST $LLM_SERVER_URL/v1/fine-tuning/jobs \ |
| 40 | + -H "Content-Type: application/json" \ |
| 41 | + -d @examples/fine_tuning/fine_tuning_config.json |
| 42 | +``` |
| 43 | + |
| 44 | +#### Fine-tuning Configuration |
| 45 | + |
| 46 | +Create a configuration file (`fine_tuning_config.json`) with the following structure: |
| 47 | + |
| 48 | +```json |
| 49 | +{ |
| 50 | + "model": "facebook/opt-125m", |
| 51 | + "ft_backend": "peft_lora", |
| 52 | + "num_gpus": 1, |
| 53 | + "num_cpus": 1, |
| 54 | + "timeout": 3600, |
| 55 | + "backend_config": { |
| 56 | + "output_dir": "facebook/adapters/opt-125m_adapter_test", |
| 57 | + "dataset_config": { |
| 58 | + "dataset_source": "hf_hub", |
| 59 | + "hf_dataset_name": "fka/awesome-chatgpt-prompts", |
| 60 | + "tokenization_field": "prompt", |
| 61 | + "split": "train", |
| 62 | + "data_files": "", |
| 63 | + "extension_type": "" |
| 64 | + }, |
| 65 | + "lora_config": { |
| 66 | + "r": 4, |
| 67 | + "lora_alpha": 32, |
| 68 | + "lora_dropout": 0.05, |
| 69 | + "bias": "none", |
| 70 | + "task_type": "CAUSAL_LM" |
| 71 | + }, |
| 72 | + "training_config": { |
| 73 | + "auto_find_batch_size": true, |
| 74 | + "save_strategy": "no", |
| 75 | + "num_train_epochs": 2, |
| 76 | + "learning_rate": 0.0001, |
| 77 | + "use_cpu": false |
| 78 | + } |
| 79 | + } |
| 80 | +} |
| 81 | +``` |
| 82 | + |
| 83 | +#### Configuration Parameters |
| 84 | + |
| 85 | +**Job Configuration:** |
| 86 | +- `model`: Base model name |
| 87 | +- `ft_backend`: Fine-tuning backend type (currently supports "peft_lora") |
| 88 | +- `num_cpus`: Number of CPU cores required |
| 89 | +- `num_gpus`: Number of GPUs required |
| 90 | +- `timeout`: Maximum execution time in seconds |
| 91 | + |
| 92 | +**Dataset Configuration:** |
| 93 | +- `dataset_source`: Source type ("hf_hub" or "local") |
| 94 | +- `hf_dataset_name`: HuggingFace dataset name (for hf_hub) |
| 95 | +- `data_files`: Local file paths (for local) |
| 96 | +- `extension_type`: File extension type (for local) |
| 97 | +- `tokenization_field`: Field name for tokenization |
| 98 | +- `split`: Dataset split to use |
| 99 | +- More dataset config parameters could be found in [huggingface datasets documentation](https://huggingface.co/docs/datasets/en/loading#load) |
| 100 | + |
| 101 | +**LoRA Configuration:** |
| 102 | +- `r`: LoRA rank |
| 103 | +- `lora_alpha`: LoRA alpha parameter |
| 104 | +- `target_modules`: Target modules for LoRA adaptation |
| 105 | +- `lora_dropout`: Dropout rate |
| 106 | +- `bias`: Bias handling strategy |
| 107 | +- `task_type`: Task type for PEFT |
| 108 | +- More LoraConfig parameters could be found in [huggingface documentation](https://huggingface.co/docs/peft/main/en/package_reference/lora#peft.LoraConfig) |
| 109 | + |
| 110 | +**Training Configuration:** |
| 111 | +- `num_train_epochs`: Number of training epochs |
| 112 | +- `per_device_train_batch_size`: Batch size per device |
| 113 | +- `gradient_accumulation_steps`: Gradient accumulation steps |
| 114 | +- `learning_rate`: Learning rate |
| 115 | +- `warmup_steps`: Number of warmup steps |
| 116 | +- `logging_steps`: Logging frequency |
| 117 | +- `save_steps`: Model saving frequency |
| 118 | +- `eval_steps`: Evaluation frequency |
| 119 | +- More training arguments could be found in [huggingface documentation](https://huggingface.co/docs/transformers/v4.53.3/en/main_classes/trainer#transformers.TrainingArguments) |
| 120 | + |
| 121 | +### Step 3: Expected Response |
| 122 | + |
| 123 | +Upon successful job submission, you'll receive a response with the job ID: |
| 124 | + |
| 125 | +```json |
| 126 | +{ |
| 127 | + "job_id": "job-123" |
| 128 | +} |
| 129 | +``` |
| 130 | + |
| 131 | +### Step 4: Monitor Job Status |
| 132 | + |
| 133 | +Check the status of your fine-tuning job: |
| 134 | + |
| 135 | +```bash |
| 136 | +curl -X GET "$LLM_SERVER_URL/v1/fine_tuning/jobs/job-123" |
| 137 | +``` |
| 138 | + |
| 139 | +#### Status Response |
| 140 | + |
| 141 | +```json |
| 142 | +{ |
| 143 | + "id": "job-123", |
| 144 | + "object": "fine_tuning.job", |
| 145 | + "status": { |
| 146 | + "config": { |
| 147 | + "model": "facebook/opt-125m", |
| 148 | + "ft_backend": "peft_lora", |
| 149 | + "num_gpus": 1, |
| 150 | + "num_cpus": 1, |
| 151 | + "timeout": 3600, |
| 152 | + "backend_config": { |
| 153 | + "output_dir": "facebook/adapters/opt-125m_adapter_test", |
| 154 | + "dataset_config": { |
| 155 | + "dataset_source": "hf_hub", |
| 156 | + "hf_dataset_name": "fka/awesome-chatgpt-prompts", |
| 157 | + "tokenization_field": "prompt", |
| 158 | + "split": "train", |
| 159 | + "data_files": "", |
| 160 | + "extension_type": "" |
| 161 | + }, |
| 162 | + "lora_config": { |
| 163 | + "r": 4, |
| 164 | + "lora_alpha": 32, |
| 165 | + "lora_dropout": 0.05, |
| 166 | + "bias": "none", |
| 167 | + "task_type": "CAUSAL_LM" |
| 168 | + }, |
| 169 | + "training_config": { |
| 170 | + "auto_find_batch_size": true, |
| 171 | + "save_strategy": "no", |
| 172 | + "num_train_epochs": 2, |
| 173 | + "learning_rate": 0.0001, |
| 174 | + "use_cpu": false |
| 175 | + } |
| 176 | + } |
| 177 | + }, |
| 178 | + "status": "running", |
| 179 | + "created_time": "2025-08-26T04:18:11.155785", |
| 180 | + "updated_time": "2025-08-26T04:18:11.155791", |
| 181 | + "priority": 0 |
| 182 | + } |
| 183 | +} |
| 184 | +``` |
| 185 | + |
| 186 | +**Possible Status Values:** |
| 187 | +- `pending`: Job is waiting for resources |
| 188 | +- `running`: Job is currently executing |
| 189 | +- `completed`: Job completed successfully |
| 190 | +- `failed`: Job failed with an error |
| 191 | +- `cancelled`: Job was cancelled by user |
| 192 | + |
| 193 | +### Step 5: Cancel Job (Optional) |
| 194 | + |
| 195 | +If needed, you can cancel a running job: |
| 196 | + |
| 197 | +```bash |
| 198 | +curl -X POST "$LLM_SERVER_URL/v1/fine_tuning/jobs/job-123/cancel" |
| 199 | +``` |
| 200 | + |
| 201 | +## Job Management |
| 202 | + |
| 203 | +### Resource Allocation |
| 204 | + |
| 205 | +Fine-tuning jobs are allocated resources based on the specified requirements: |
| 206 | + |
| 207 | +- **CPU**: Number of CPU cores specified in `num_cpus` |
| 208 | +- **GPU**: Number of GPUs specified in `num_gpus` |
| 209 | +- **Memory**: Automatically managed based on model size and batch size |
| 210 | + |
| 211 | +### Priority System |
| 212 | + |
| 213 | +Jobs are processed based on priority and creation time: |
| 214 | + |
| 215 | +1. **Higher Priority**: Jobs with higher priority values are processed first |
| 216 | +2. **FIFO**: Jobs with the same priority are processed in order of creation |
| 217 | +3. **Resource Availability**: Jobs wait until sufficient resources are available |
| 218 | + |
| 219 | +### Timeout Handling |
| 220 | + |
| 221 | +Jobs have configurable timeout limits: |
| 222 | + |
| 223 | +- **Default Timeout**: 3600 seconds (1 hour) |
| 224 | +- **Configurable**: Set via `timeout` parameter in job configuration |
| 225 | +- **Automatic Cleanup**: Jobs are automatically marked as failed if they exceed the timeout |
| 226 | + |
| 227 | +## Output and Storage |
| 228 | + |
| 229 | +### LoRA Adapter Storage |
| 230 | + |
| 231 | +Fine-tuned LoRA adapters are automatically saved to the `output_dir` path you config in the `fine_tuning_config.json`, like: |
| 232 | + |
| 233 | +``` |
| 234 | +{STORAGE_PATH}/transformers/facebook/adapters/opt-125m_adapter_test |
| 235 | +``` |
| 236 | + |
| 237 | +### Adapter Contents |
| 238 | + |
| 239 | +The saved adapter includes: |
| 240 | + |
| 241 | +- **LoRA Weights**: Fine-tuned LoRA parameters |
| 242 | +- **Configuration**: LoRA configuration file |
| 243 | +- **Metadata**: Training metadata and statistics |
| 244 | + |
| 245 | +## Integration with Serving |
| 246 | + |
| 247 | +### Using Fine-tuned Adapters |
| 248 | + |
| 249 | +After successful fine-tuning, the LoRA adapter can be used for inference: |
| 250 | + |
| 251 | +```bash |
| 252 | +# Deploy model with fine-tuned adapter |
| 253 | +sllm deploy --model facebook/opt-125m --backend transformers --enable-lora --lora-adapters "my_adapter=ft_facebook/opt-125m_adapter" |
| 254 | + |
| 255 | +# Use the adapter for inference |
| 256 | +curl $LLM_SERVER_URL/v1/chat/completions \ |
| 257 | +-H "Content-Type: application/json" \ |
| 258 | +-d '{ |
| 259 | + "model": "facebook/opt-125m", |
| 260 | + "messages": [ |
| 261 | + {"role": "user", "content": "Hello, how are you?"} |
| 262 | + ], |
| 263 | + "lora_adapter_name": "my_adapter" |
| 264 | +}' |
| 265 | +``` |
| 266 | + |
| 267 | +For more details about PEFT LoRA Serving, please see the [documentation](./peft_lora_serving.md) |
| 268 | +## Troubleshooting |
| 269 | + |
| 270 | +### Common Issues |
| 271 | + |
| 272 | +1. **Job Stuck in Pending**: Check resource availability and job priority |
| 273 | +2. **Dataset Loading Failures**: Verify dataset configuration and accessibility |
| 274 | +3. **Training Failures**: Check GPU memory and batch size settings |
| 275 | +4. **Timeout Errors**: Increase timeout or optimize training configuration |
| 276 | + |
| 277 | +## API Reference |
| 278 | + |
| 279 | +### Endpoints |
| 280 | + |
| 281 | +| Endpoint | Method | Description | |
| 282 | +|----------|--------|-------------| |
| 283 | +| `/v1/fine-tuning/jobs` | POST | Submit a fine-tuning job | |
| 284 | +| `/v1/fine_tuning/jobs/{fine_tuning_job_id}` | GET | Get job status | |
| 285 | +| `/v1/fine_tuning/jobs/{fine_tuning_job_id}/cancel` | POST | Cancel a running job | |
| 286 | + |
| 287 | +### Response Codes |
| 288 | + |
| 289 | +| Code | Description | |
| 290 | +|------|-------------| |
| 291 | +| 200 | Success | |
| 292 | +| 400 | Bad Request | |
| 293 | +| 404 | Job not found | |
| 294 | +| 500 | Internal Server Error | |
| 295 | + |
| 296 | +## Examples |
| 297 | + |
| 298 | +### Complete Fine-tuning Workflow |
| 299 | + |
| 300 | +```bash |
| 301 | +# 1. Save base model |
| 302 | +sllm-store save --model facebook/opt-125m --backend transformers |
| 303 | + |
| 304 | +# 2. Start the ServerlessLLM cluster with docker compose |
| 305 | +cd examples/docker |
| 306 | +docker compose up -d --build |
| 307 | + |
| 308 | +# 3. Submit fine-tuning job |
| 309 | +cd .. && cd .. |
| 310 | +curl -X POST $LLM_SERVER_URL/v1/fine-tuning/jobs \ |
| 311 | + -H "Content-Type: application/json" \ |
| 312 | + -d @examples/fine_tuning/fine_tuning_config.json |
| 313 | + |
| 314 | +# 4. Monitor job status |
| 315 | +curl -X GET "$LLM_SERVER_URL/v1/fine_tuning/jobs/job-123" |
| 316 | + |
| 317 | +# 5. Deploy base model with fine-tuned adapter |
| 318 | +sllm deploy --model facebook/opt-125m --backend transformers --enable-lora --lora-adapters "my_adapter=ft_facebook/opt-125m_adapter" |
| 319 | + |
| 320 | +# 5. Use for inference |
| 321 | +curl $LLM_SERVER_URL/v1/chat/completions \ |
| 322 | +-H "Content-Type: application/json" \ |
| 323 | +-d '{ |
| 324 | + "model": "facebook/opt-125m", |
| 325 | + "messages": [{"role": "user", "content": "Hello"}], |
| 326 | + "lora_adapter_name": "my_adapter" |
| 327 | +}' |
| 328 | +``` |
0 commit comments