Skip to content

Commit 205e0a7

Browse files
committed
Update documentation from main repository
1 parent 34124ab commit 205e0a7

File tree

3 files changed

+328
-0
lines changed

3 files changed

+328
-0
lines changed

docs/images/wechat.png

-494 KB
Loading
Lines changed: 328 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,328 @@
1+
---
2+
sidebar_position: 4
3+
---
4+
# PEFT LoRA Fine-tuning
5+
6+
This feature introduces a dedicated fine-tuning backend (`ft_backend`) for handling LoRA (Low-Rank Adaptation) fine-tuning jobs in ServerlessLLM. This implementation provides isolated fine-tuning instances with specialized resource management and lifecycle control.
7+
8+
## Prerequisites
9+
10+
Before using the fine-tuning feature, ensure you have:
11+
12+
1. **Base Model**: A base model must be saved using the transformers backend
13+
2. **Docker Setup**: ServerlessLLM cluster running via Docker Compose
14+
3. **Storage**: Adequate storage space for fine-tuned adapters
15+
16+
## Usage
17+
18+
### Step 1. **Start the ServerlessLLM Services Using Docker Compose**
19+
20+
```bash
21+
docker compose up -d
22+
```
23+
24+
This command will start the Ray head node and two worker nodes defined in the `docker-compose.yml` file.
25+
26+
:::tip
27+
Use the following command to monitor the logs of the head node:
28+
29+
```bash
30+
docker logs -f sllm_head
31+
```
32+
:::
33+
34+
### Step 2: Submit Fine-tuning Job
35+
36+
Submit a fine-tuning job using the REST API:
37+
38+
```bash
39+
curl -X POST $LLM_SERVER_URL/v1/fine-tuning/jobs \
40+
-H "Content-Type: application/json" \
41+
-d @examples/fine_tuning/fine_tuning_config.json
42+
```
43+
44+
#### Fine-tuning Configuration
45+
46+
Create a configuration file (`fine_tuning_config.json`) with the following structure:
47+
48+
```json
49+
{
50+
"model": "facebook/opt-125m",
51+
"ft_backend": "peft_lora",
52+
"num_gpus": 1,
53+
"num_cpus": 1,
54+
"timeout": 3600,
55+
"backend_config": {
56+
"output_dir": "facebook/adapters/opt-125m_adapter_test",
57+
"dataset_config": {
58+
"dataset_source": "hf_hub",
59+
"hf_dataset_name": "fka/awesome-chatgpt-prompts",
60+
"tokenization_field": "prompt",
61+
"split": "train",
62+
"data_files": "",
63+
"extension_type": ""
64+
},
65+
"lora_config": {
66+
"r": 4,
67+
"lora_alpha": 32,
68+
"lora_dropout": 0.05,
69+
"bias": "none",
70+
"task_type": "CAUSAL_LM"
71+
},
72+
"training_config": {
73+
"auto_find_batch_size": true,
74+
"save_strategy": "no",
75+
"num_train_epochs": 2,
76+
"learning_rate": 0.0001,
77+
"use_cpu": false
78+
}
79+
}
80+
}
81+
```
82+
83+
#### Configuration Parameters
84+
85+
**Job Configuration:**
86+
- `model`: Base model name
87+
- `ft_backend`: Fine-tuning backend type (currently supports "peft_lora")
88+
- `num_cpus`: Number of CPU cores required
89+
- `num_gpus`: Number of GPUs required
90+
- `timeout`: Maximum execution time in seconds
91+
92+
**Dataset Configuration:**
93+
- `dataset_source`: Source type ("hf_hub" or "local")
94+
- `hf_dataset_name`: HuggingFace dataset name (for hf_hub)
95+
- `data_files`: Local file paths (for local)
96+
- `extension_type`: File extension type (for local)
97+
- `tokenization_field`: Field name for tokenization
98+
- `split`: Dataset split to use
99+
- More dataset config parameters could be found in [huggingface datasets documentation](https://huggingface.co/docs/datasets/en/loading#load)
100+
101+
**LoRA Configuration:**
102+
- `r`: LoRA rank
103+
- `lora_alpha`: LoRA alpha parameter
104+
- `target_modules`: Target modules for LoRA adaptation
105+
- `lora_dropout`: Dropout rate
106+
- `bias`: Bias handling strategy
107+
- `task_type`: Task type for PEFT
108+
- More LoraConfig parameters could be found in [huggingface documentation](https://huggingface.co/docs/peft/main/en/package_reference/lora#peft.LoraConfig)
109+
110+
**Training Configuration:**
111+
- `num_train_epochs`: Number of training epochs
112+
- `per_device_train_batch_size`: Batch size per device
113+
- `gradient_accumulation_steps`: Gradient accumulation steps
114+
- `learning_rate`: Learning rate
115+
- `warmup_steps`: Number of warmup steps
116+
- `logging_steps`: Logging frequency
117+
- `save_steps`: Model saving frequency
118+
- `eval_steps`: Evaluation frequency
119+
- More training arguments could be found in [huggingface documentation](https://huggingface.co/docs/transformers/v4.53.3/en/main_classes/trainer#transformers.TrainingArguments)
120+
121+
### Step 3: Expected Response
122+
123+
Upon successful job submission, you'll receive a response with the job ID:
124+
125+
```json
126+
{
127+
"job_id": "job-123"
128+
}
129+
```
130+
131+
### Step 4: Monitor Job Status
132+
133+
Check the status of your fine-tuning job:
134+
135+
```bash
136+
curl -X GET "$LLM_SERVER_URL/v1/fine_tuning/jobs/job-123"
137+
```
138+
139+
#### Status Response
140+
141+
```json
142+
{
143+
"id": "job-123",
144+
"object": "fine_tuning.job",
145+
"status": {
146+
"config": {
147+
"model": "facebook/opt-125m",
148+
"ft_backend": "peft_lora",
149+
"num_gpus": 1,
150+
"num_cpus": 1,
151+
"timeout": 3600,
152+
"backend_config": {
153+
"output_dir": "facebook/adapters/opt-125m_adapter_test",
154+
"dataset_config": {
155+
"dataset_source": "hf_hub",
156+
"hf_dataset_name": "fka/awesome-chatgpt-prompts",
157+
"tokenization_field": "prompt",
158+
"split": "train",
159+
"data_files": "",
160+
"extension_type": ""
161+
},
162+
"lora_config": {
163+
"r": 4,
164+
"lora_alpha": 32,
165+
"lora_dropout": 0.05,
166+
"bias": "none",
167+
"task_type": "CAUSAL_LM"
168+
},
169+
"training_config": {
170+
"auto_find_batch_size": true,
171+
"save_strategy": "no",
172+
"num_train_epochs": 2,
173+
"learning_rate": 0.0001,
174+
"use_cpu": false
175+
}
176+
}
177+
},
178+
"status": "running",
179+
"created_time": "2025-08-26T04:18:11.155785",
180+
"updated_time": "2025-08-26T04:18:11.155791",
181+
"priority": 0
182+
}
183+
}
184+
```
185+
186+
**Possible Status Values:**
187+
- `pending`: Job is waiting for resources
188+
- `running`: Job is currently executing
189+
- `completed`: Job completed successfully
190+
- `failed`: Job failed with an error
191+
- `cancelled`: Job was cancelled by user
192+
193+
### Step 5: Cancel Job (Optional)
194+
195+
If needed, you can cancel a running job:
196+
197+
```bash
198+
curl -X POST "$LLM_SERVER_URL/v1/fine_tuning/jobs/job-123/cancel"
199+
```
200+
201+
## Job Management
202+
203+
### Resource Allocation
204+
205+
Fine-tuning jobs are allocated resources based on the specified requirements:
206+
207+
- **CPU**: Number of CPU cores specified in `num_cpus`
208+
- **GPU**: Number of GPUs specified in `num_gpus`
209+
- **Memory**: Automatically managed based on model size and batch size
210+
211+
### Priority System
212+
213+
Jobs are processed based on priority and creation time:
214+
215+
1. **Higher Priority**: Jobs with higher priority values are processed first
216+
2. **FIFO**: Jobs with the same priority are processed in order of creation
217+
3. **Resource Availability**: Jobs wait until sufficient resources are available
218+
219+
### Timeout Handling
220+
221+
Jobs have configurable timeout limits:
222+
223+
- **Default Timeout**: 3600 seconds (1 hour)
224+
- **Configurable**: Set via `timeout` parameter in job configuration
225+
- **Automatic Cleanup**: Jobs are automatically marked as failed if they exceed the timeout
226+
227+
## Output and Storage
228+
229+
### LoRA Adapter Storage
230+
231+
Fine-tuned LoRA adapters are automatically saved to the `output_dir` path you config in the `fine_tuning_config.json`, like:
232+
233+
```
234+
{STORAGE_PATH}/transformers/facebook/adapters/opt-125m_adapter_test
235+
```
236+
237+
### Adapter Contents
238+
239+
The saved adapter includes:
240+
241+
- **LoRA Weights**: Fine-tuned LoRA parameters
242+
- **Configuration**: LoRA configuration file
243+
- **Metadata**: Training metadata and statistics
244+
245+
## Integration with Serving
246+
247+
### Using Fine-tuned Adapters
248+
249+
After successful fine-tuning, the LoRA adapter can be used for inference:
250+
251+
```bash
252+
# Deploy model with fine-tuned adapter
253+
sllm deploy --model facebook/opt-125m --backend transformers --enable-lora --lora-adapters "my_adapter=ft_facebook/opt-125m_adapter"
254+
255+
# Use the adapter for inference
256+
curl $LLM_SERVER_URL/v1/chat/completions \
257+
-H "Content-Type: application/json" \
258+
-d '{
259+
"model": "facebook/opt-125m",
260+
"messages": [
261+
{"role": "user", "content": "Hello, how are you?"}
262+
],
263+
"lora_adapter_name": "my_adapter"
264+
}'
265+
```
266+
267+
For more details about PEFT LoRA Serving, please see the [documentation](./peft_lora_serving.md)
268+
## Troubleshooting
269+
270+
### Common Issues
271+
272+
1. **Job Stuck in Pending**: Check resource availability and job priority
273+
2. **Dataset Loading Failures**: Verify dataset configuration and accessibility
274+
3. **Training Failures**: Check GPU memory and batch size settings
275+
4. **Timeout Errors**: Increase timeout or optimize training configuration
276+
277+
## API Reference
278+
279+
### Endpoints
280+
281+
| Endpoint | Method | Description |
282+
|----------|--------|-------------|
283+
| `/v1/fine-tuning/jobs` | POST | Submit a fine-tuning job |
284+
| `/v1/fine_tuning/jobs/{fine_tuning_job_id}` | GET | Get job status |
285+
| `/v1/fine_tuning/jobs/{fine_tuning_job_id}/cancel` | POST | Cancel a running job |
286+
287+
### Response Codes
288+
289+
| Code | Description |
290+
|------|-------------|
291+
| 200 | Success |
292+
| 400 | Bad Request |
293+
| 404 | Job not found |
294+
| 500 | Internal Server Error |
295+
296+
## Examples
297+
298+
### Complete Fine-tuning Workflow
299+
300+
```bash
301+
# 1. Save base model
302+
sllm-store save --model facebook/opt-125m --backend transformers
303+
304+
# 2. Start the ServerlessLLM cluster with docker compose
305+
cd examples/docker
306+
docker compose up -d --build
307+
308+
# 3. Submit fine-tuning job
309+
cd .. && cd ..
310+
curl -X POST $LLM_SERVER_URL/v1/fine-tuning/jobs \
311+
-H "Content-Type: application/json" \
312+
-d @examples/fine_tuning/fine_tuning_config.json
313+
314+
# 4. Monitor job status
315+
curl -X GET "$LLM_SERVER_URL/v1/fine_tuning/jobs/job-123"
316+
317+
# 5. Deploy base model with fine-tuned adapter
318+
sllm deploy --model facebook/opt-125m --backend transformers --enable-lora --lora-adapters "my_adapter=ft_facebook/opt-125m_adapter"
319+
320+
# 5. Use for inference
321+
curl $LLM_SERVER_URL/v1/chat/completions \
322+
-H "Content-Type: application/json" \
323+
-d '{
324+
"model": "facebook/opt-125m",
325+
"messages": [{"role": "user", "content": "Hello"}],
326+
"lora_adapter_name": "my_adapter"
327+
}'
328+
```

static/img/wechat.png

-494 KB
Loading

0 commit comments

Comments
 (0)