You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/stable/getting_started/slurm_setup.md
+22-20
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,6 @@ This guide will help you get started with running ServerlessLLM on SLURM cluster
9
9
## Pre-requisites
10
10
Before you begin, make sure you have checked the following:
11
11
### Some Tips about Installation
12
-
- Both installation and build require an internet connection. please make sure the port 443 on the job node you want to install is accessible.
13
12
- If 'not enough disk space' is reported when `pip install` on the login node, you can submit it to a job node for execution
14
13
```shell
15
14
#!/bin/bash
@@ -37,19 +36,9 @@ Before you begin, make sure you have checked the following:
37
36
pip install serverless-llm[worker]
38
37
pip install serverless-llm-store
39
38
```
40
-
- Importantly, we recommend using [installing with pip](https://serverlessllm.github.io/docs/stable/getting_started/installation#installing-with-pip) if there is no CUDA driver on the node you want to install. If you want to [install from source](https://serverlessllm.github.io/docs/stable/getting_started/installation#installing-from-source), please make sure CUDA driver available on the node you want to install. Here are some commands to check it.
41
-
```shell
42
-
module avail cuda # if you can see some CUDA options, it means CUDA is available, then load the cuda module
43
-
module load cuda-12.x # load specific CUDA version
44
-
# or
45
-
nvidia-smi # if you can see the GPU information, it means CUDA is available
46
-
# or
47
-
which nvcc # if you can see the CUDA compiler's path, it means CUDA is available
48
-
```
49
-
However, we **strongly recommend that you read the documentation for the HPC you are using** to find out how to check if the CUDA driver is available.
50
39
51
-
### Find nodes with sufficient computing power
52
-
Consult the cluster documentation/administrator or run the following commands in the cluster to find a node with sufficient computing power (Compute Capacity > 7.0) ([Click here to check if your node has sufficient computing power](https://developer.nvidia.com/cuda-gpus#compute)).
40
+
### Command for Querying GPU Resource Information
41
+
Run the following commands in the cluster to check GPU resource information.
53
42
```shell
54
43
sinfo -O partition,nodelist,gres
55
44
```
@@ -75,7 +64,7 @@ compute up 2 down infinite JobNode[16-17]
75
64
### Job Nodes Setup
76
65
**`srun` Node Selection**
77
66
78
-
Only one JobNode (with sufficient compute capability) is enough.
This command requests a session on the specified node and provides an interactive shell. `--gres <DEVICE>:1` specifies the GPU device you will use, for example: `--gres gpu:gtx_1060:1`
100
89
101
-
### Step 2: Prepare multiple windows with `tmux`
90
+
### Step 2: Install from source
91
+
Firstly, please make sure CUDA driver available on the node. Here are some commands to check it.
92
+
```shell
93
+
nvidia-smi
94
+
95
+
which nvcc
96
+
```
97
+
If `nvidia-smi` has listed GPU information, but `which nvcc` has no output. Then use the following commands to load `nvcc`. Here is an example that cuda is located at `/opt/cuda-12.2.0`
Then, following the [installation guide](./installation.md) to install from source.
103
+
### Step 3: Prepare multiple windows with `tmux`
102
104
Since srun provides a single interactive shell, you can use tmux to create multiple windows. Start a tmux session:
103
105
```shell
104
106
tmux
@@ -127,7 +129,7 @@ Once multiple windows are created, you can switch between them using:
127
129
`Ctrl + B` → `W` (List all windows and select)
128
130
`Ctrl + B` → [Number] (Switch to a specific window, e.g., Ctrl + B → 1)
129
131
130
-
### Step 3: Run ServerlessLLM on the JobNode
132
+
### Step 4: Run ServerlessLLM on the JobNode
131
133
First find ports that are already occupied. Then pick your favourite number from the remaining ports to replace the following placeholder `<PORT>`. For example: `6379`
132
134
133
135
It should also be said that certain slurm system is a bit slow, **so please be patient and wait for the system to output**.
@@ -184,7 +186,7 @@ Expected output:
184
186
{"id":"chatcmpl-9f812a40-6b96-4ef9-8584-0b8149892cb9","object":"chat.completion","created":1720021153,"model":"facebook/opt-1.3b","choices":[{"index":0,"message":{"role":"assistant","content":"system: You are a helpful assistant.\nuser: What is your name?\nsystem: I am a helpful assistant.\n"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":26,"total_tokens":42}}
185
187
```
186
188
187
-
### Step 4: Clean up
189
+
### Step 5: Clean up
188
190
To delete a deployed model, use the following command:
189
191
```shell
190
192
sllm-cli delete facebook/opt-1.3b
@@ -265,7 +267,7 @@ Since the head node does not require a gpu, you can find a low-computing capacit
265
267
Finding available port on JobNode01
266
268
Available port: <avail_port>
267
269
```
268
-
Remember this <avail_port>, you will use it in Step 4
270
+
Remember this `<avail_port>`, you will use it in Step 4
269
271
270
272
### Step 2: Start the Worker Node & Store
271
273
We will start the worker node and store in the same script. Because the server loads the model weights onto the GPU and uses shared GPU memory to pass the pointer to the client. If you submit another script with ```#SBATCH --gpres=gpu:1```, it will be possibly set to use a different GPU, as specified by different ```CUDA_VISIBLE_DEVICES``` settings. Thus, they cannot pass the model weights.
@@ -275,7 +277,7 @@ We will start the worker node and store in the same script. Because the server l
275
277
```shell
276
278
#!/bin/sh
277
279
#SBATCH --partition=your_partition
278
-
#SBATCH --nodelist=JobNode02 # Note JobNode02 should have sufficient compute capacity
280
+
#SBATCH --nodelist=JobNode02
279
281
#SBATCH --gres=gpu:a6000:1 # Specify device on JobNode02
280
282
#SBATCH --job-name=sllm-worker-store
281
283
#SBATCH --output=sllm_worker.out
@@ -373,7 +375,7 @@ We will start the worker node and store in the same script. Because the server l
0 commit comments