Skip to content

Commit 51bb18e

Browse files
committedNov 17, 2024·
Document Sync by Tina
1 parent 8534e4b commit 51bb18e

File tree

1 file changed

+46
-121
lines changed

1 file changed

+46
-121
lines changed
 

‎docs/stable/serve/storage_aware_scheduling.md

+46-121
Original file line numberDiff line numberDiff line change
@@ -2,135 +2,67 @@
22
sidebar_position: 0
33
---
44

5-
# Storage Aware Scheduling
5+
# Storage Aware Scheduling with Docker Compose
66

77
## Pre-requisites
8-
To enable storage aware model loading scheduling, a hardware configuration file is required.
9-
For example, the following is a sample configuration file for two servers:
10-
```bash
11-
echo '{
12-
"0": {
13-
"host_size": "32GB",
14-
"host_bandwidth": "24GB/s",
15-
"disk_size": "128GB",
16-
"disk_bandwidth": "5GB/s",
17-
"network_bandwidth": "10Gbps"
18-
},
19-
"1": {
20-
"host_size": "32GB",
21-
"host_bandwidth": "24GB/s",
22-
"disk_size": "128GB",
23-
"disk_bandwidth": "5GB/s",
24-
"network_bandwidth": "10Gbps"
25-
}
26-
}' > hardware_config.json
27-
```
288

29-
We will use Docker to run a ServerlessLLM cluster in this example. Therefore, please make sure you have read the [Docker Quickstart Guide](../getting_started/docker_quickstart.md) before proceeding.
9+
We will use Docker Compose to run a ServerlessLLM cluster in this example. Therefore, please make sure you have read the [Docker Quickstart Guide](../getting_started/docker_quickstart.md) before proceeding.
3010

3111
## Usage
32-
Start a local Docker-based ray cluster.
3312

34-
### Step 1: Start Ray Head Node and Worker Nodes
13+
Start a local Docker-based ray cluster using Docker Compose.
3514

36-
1. Start the Ray head node.
15+
### Step 1: Clone the ServerlessLLM Repository
16+
17+
If you haven't already, clone the ServerlessLLM repository:
3718

3819
```bash
39-
docker run -d --name ray_head \
40-
--runtime nvidia \
41-
--network sllm \
42-
-p 6379:6379 \
43-
-p 8343:8343 \
44-
--gpus '"device=none"' \
45-
serverlessllm/sllm-serve
20+
git clone https://github.com/ServerlessLLM/ServerlessLLM.git
21+
cd ServerlessLLM/examples/storage_aware_scheduling
4622
```
4723

48-
2. Start the Ray worker nodes.
24+
### Step 2: Configuration
4925

50-
Ensure that you have a directory for storing your models and set the `MODEL_FOLDER` environment variable to this directory:
26+
Set the Model Directory. Create a directory on your host machine where models will be stored and set the `MODEL_FOLDER` environment variable to point to this directory:
5127

5228
```bash
53-
export MODEL_FOLDER=path/to/models
29+
export MODEL_FOLDER=/path/to/your/models
5430
```
5531

56-
```bash
57-
docker run -d --name ray_worker_0 \
58-
--runtime nvidia \
59-
--network sllm \
60-
--gpus '"device=0"' \
61-
--env WORKER_ID=0 \
62-
--mount type=bind,source=$MODEL_FOLDER,target=/models \
63-
serverlessllm/sllm-serve-worker
64-
65-
docker run -d --name ray_worker_1 \
66-
--runtime nvidia \
67-
--network sllm \
68-
--gpus '"device=1"' \
69-
--env WORKER_ID=1 \
70-
--mount type=bind,source=$MODEL_FOLDER,target=/models \
71-
serverlessllm/sllm-serve-worker
72-
```
32+
Replace `/path/to/your/models` with the actual path where you want to store the models.
7333

74-
### Step 2: Start ServerlessLLM Serve with Storage Aware Scheduler
34+
### Step 3: Enable Storage Aware Scheduling in Docker Compose
7535

76-
1. Copy the hardware configuration file to the Ray head node.
36+
The Docker Compose configuration is already located in the `examples/storage_aware_scheduling` directory. To activate storage-aware scheduling, ensure the `docker-compose.yml` file includes the necessary configurations(`sllm_head` service should include the `--enable_storage_aware` command).
37+
38+
:::tip
39+
Recommend to adjust the number of GPUs and `mem_pool_size` based on the resources available on your machine.
40+
:::
7741

78-
```bash
79-
docker cp hardware_config.json ray_head:/app/hardware_config.json
80-
```
8142

82-
2. Start the ServerlessLLM serve with the storage aware scheduler.
43+
### Step 4: Start the Services
44+
45+
Start the ServerlessLLM services using Docker Compose:
8346

8447
```bash
85-
docker exec ray_head sh -c "/opt/conda/bin/sllm-serve start --hardware-config /app/hardware_config.json"
48+
docker compose up -d --build
8649
```
8750

88-
### Step 3: Deploy Models with Placement Spec
51+
This command will start the Ray head node and two worker nodes defined in the `docker-compose.yml` file.
52+
53+
:::tip
54+
Use the following command to monitor the logs of the head node:
8955

90-
1. Create model deployment spec files.
91-
In this example, model "opt-2.7b" will be placed on server 0; while model "opt-1.3b" will be placed on server 1.
9256
```bash
93-
echo '{
94-
"model": "opt-2.7b",
95-
"backend": "transformers",
96-
"num_gpus": 1,
97-
"auto_scaling_config": {
98-
"metric": "concurrency",
99-
"target": 1,
100-
"min_instances": 0,
101-
"max_instances": 10
102-
},
103-
"placement_config": {
104-
"target_nodes": ["0"]
105-
},
106-
"backend_config": {
107-
"pretrained_model_name_or_path": "facebook/opt-2.7b",
108-
"device_map": "auto",
109-
"torch_dtype": "float16"
110-
}
111-
}' > config-opt-2.7b.json
112-
echo '{
113-
"model": "opt-1.3b",
114-
"backend": "transformers",
115-
"num_gpus": 1,
116-
"auto_scaling_config": {
117-
"metric": "concurrency",
118-
"target": 1,
119-
"min_instances": 0,
120-
"max_instances": 10
121-
},
122-
"placement_config": {
123-
"target_nodes": ["1"]
124-
},
125-
"backend_config": {
126-
"pretrained_model_name_or_path": "facebook/opt-1.3b",
127-
"device_map": "auto",
128-
"torch_dtype": "float16"
129-
}
130-
}' > config-opt-1.3b.json
57+
docker logs -f sllm_head
13158
```
59+
:::
60+
61+
### Step 5: Deploy Models with Placement Spec
62+
63+
In the `examples/storage_aware_scheduling` directory, the example configuration files (`config-opt-2.7b.json` and `config-opt-1.3b.json`) are already given.
13264

133-
> Note: Storage aware scheduling currently only supports "transformers" backend. Support for other backends will come soon.
65+
> Note: Storage aware scheduling currently only supports the "transformers" backend. Support for other backends will come soon.
13466
13567
2. Deploy models with the placement spec files.
13668

@@ -148,7 +80,7 @@ sllm-cli deploy --config config-opt-1.3b.json
14880
curl http://127.0.0.1:8343/v1/chat/completions \
14981
-H "Content-Type: application/json" \
15082
-d '{
151-
"model": "opt-2.7b",
83+
"model": "facebook/opt-2.7b",
15284
"messages": [
15385
{"role": "system", "content": "You are a helpful assistant."},
15486
{"role": "user", "content": "What is your name?"}
@@ -158,26 +90,24 @@ curl http://127.0.0.1:8343/v1/chat/completions \
15890
curl http://127.0.0.1:8343/v1/chat/completions \
15991
-H "Content-Type: application/json" \
16092
-d '{
161-
"model": "opt-1.3b",
93+
"model": "facebook/opt-1.3b",
16294
"messages": [
16395
{"role": "system", "content": "You are a helpful assistant."},
16496
{"role": "user", "content": "What is your name?"}
16597
]
16698
}'
16799
```
168100

169-
As shown in the log message, the model "opt-2.7b" is scheduled on server 0, while the model "opt-1.3b" is scheduled on server 1.
170-
```plaintext
171-
...
172-
(StorageAwareScheduler pid=1584) INFO 07-30 12:08:40 storage_aware_scheduler.py:138] Sorted scheduling options: [('0', 0.9877967834472656)]
173-
(StorageAwareScheduler pid=1584) INFO 07-30 12:08:40 storage_aware_scheduler.py:145] Allocated node 0 for model opt-2.7b
174-
...
175-
(StorageAwareScheduler pid=1584) INFO 07-30 12:08:51 storage_aware_scheduler.py:138] Sorted scheduling options: [('1', 0.4901580810546875)]
176-
(StorageAwareScheduler pid=1584) INFO 07-30 12:08:51 storage_aware_scheduler.py:145] Allocated node 1 for model opt-1.3b
177-
...
101+
As shown in the log message, the model "facebook/opt-2.7b" is scheduled on server 0, while the model "facebook/opt-1.3b" is scheduled on server 1.
102+
103+
```log
104+
(StorageAwareScheduler pid=1543) INFO 11-12 23:48:27 storage_aware_scheduler.py:137] Sorted scheduling options: [('0', 4.583079601378258)]
105+
(StorageAwareScheduler pid=1543) INFO 11-12 23:48:27 storage_aware_scheduler.py:144] Allocated node 0 for model facebook/opt-2.7b
106+
(StorageAwareScheduler pid=1543) INFO 11-12 23:48:38 storage_aware_scheduler.py:137] Sorted scheduling options: [('1', 2.266678696047572)]
107+
(StorageAwareScheduler pid=1543) INFO 11-12 23:48:38 storage_aware_scheduler.py:144] Allocated node 1 for model facebook/opt-1.3b
178108
```
179109

180-
### Step 4: Clean Up
110+
### Step 6: Clean Up
181111

182112
Delete the model deployment by running the following command:
183113

@@ -188,11 +118,6 @@ sllm-cli delete facebook/opt-1.3b facebook/opt-2.7b
188118
If you need to stop and remove the containers, you can use the following commands:
189119

190120
```bash
191-
docker exec ray_head sh -c "ray stop"
192-
docker exec ray_worker_0 sh -c "ray stop"
193-
docker exec ray_worker_1 sh -c "ray stop"
194-
195-
docker stop ray_head ray_worker_0 ray_worker_1
196-
docker rm ray_head ray_worker_0 ray_worker_1
197-
docker network rm sllm
198-
```
121+
docker compose down
122+
```
123+

0 commit comments

Comments
 (0)
Please sign in to comment.