Skip to content

Commit f475615

Browse files
committed
docs: add missing file
1 parent 718629b commit f475615

File tree

1 file changed

+136
-0
lines changed

1 file changed

+136
-0
lines changed

docs/stable/getting_started.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
---
2+
sidebar_position: 1
3+
---
4+
5+
# Getting Started
6+
7+
This guide demonstrates how to quickly set up a local ServerlessLLM cluster using Docker Compose on a single machine. We will initialize a minimal cluster, consisting of a head node and a single worker node. Then, we'll deploy a model using the `sllm-cli` and query the deployment through an OpenAI-compatible API.
8+
9+
:::note
10+
We strongly recommend using Docker (Compose) to manage your ServerlessLLM cluster, whether you are using ServerlessLLM for testing or development. However, if Docker is not a viable option for you, please refer to the [deploy from scratch guide](./deployment/single_machine.md).
11+
:::
12+
13+
## Prerequisites
14+
15+
Before you begin, ensure you have the following installed and configured:
16+
17+
1. **Docker**: Installed on your system. You can download it from [here](https://docs.docker.com/get-docker/).
18+
2. **ServerlessLLM CLI**: Installed on your system. Install it using `pip install serverless-llm`.
19+
3. **GPUs**: At least one NVIDIA GPU is required. If you have multiple GPUs, you can adjust the `docker-compose.yml` file accordingly.
20+
4. **NVIDIA Docker Toolkit**: This enables Docker to utilize NVIDIA GPUs. Follow the installation guide [here](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).
21+
22+
## Start the ServerlessLLM Cluster
23+
24+
We will use Docker Compose to simplify the ServerlessLLM setup process.
25+
26+
### Step 1: Download the Docker Compose File
27+
28+
Download the `docker-compose.yml` file from the ServerlessLLM repository:
29+
30+
```bash
31+
# Create a directory for the ServerlessLLM Docker setup
32+
mkdir serverless-llm-docker && cd serverless-llm-docker
33+
34+
# Download the docker-compose.yml file
35+
curl -O https://raw.githubusercontent.com/ServerlessLLM/ServerlessLLM/main/examples/docker/docker-compose.yml
36+
37+
# Alternatively, you can use wget:
38+
# wget https://raw.githubusercontent.com/ServerlessLLM/ServerlessLLM/main/examples/docker/docker-compose.yml
39+
```
40+
41+
### Step 2: Configuration
42+
43+
Create a directory on your host machine to store models. Then, set the `MODEL_FOLDER` environment variable to point to this directory:
44+
45+
```bash
46+
export MODEL_FOLDER=/path/to/your/models
47+
```
48+
49+
Replace `/path/to/your/models` with the actual path where you intend to store the models. This directory will be mounted into the Docker containers.
50+
51+
### Step 3: Start the Services
52+
53+
Start the ServerlessLLM services using Docker Compose:
54+
55+
```bash
56+
docker compose up -d
57+
```
58+
59+
This command will start the Ray head node and a worker node as defined in the `docker-compose.yml` file.
60+
61+
Verify that the services are ready:
62+
63+
```bash
64+
docker logs sllm_head
65+
```
66+
67+
Ensure the services are ready before proceeding. You should see output similar to the following:
68+
69+
```bash
70+
...
71+
(SllmController pid=1435) INFO 05-26 15:40:49 controller.py:68] Starting scheduler
72+
INFO: Started server process [1]
73+
INFO: Waiting for application startup.
74+
INFO: Application startup complete.
75+
INFO: Uvicorn running on http://0.0.0.0:8343 (Press CTRL+C to quit)
76+
(FcfsScheduler pid=1604) INFO 05-26 15:40:49 fcfs_scheduler.py:54] Starting FCFS scheduler
77+
(FcfsScheduler pid=1604) INFO 05-26 15:40:49 fcfs_scheduler.py:111] Starting control loop
78+
```
79+
80+
## Deploy a Model Using sllm-cli
81+
82+
Set the `LLM_SERVER_URL` environment variable:
83+
84+
```bash
85+
export LLM_SERVER_URL=http://127.0.0.1:8343
86+
```
87+
88+
Deploy a model to the ServerlessLLM cluster using the `sllm-cli`:
89+
90+
```bash
91+
sllm-cli deploy --model facebook/opt-1.3b
92+
```
93+
> Note: This command will take some time to download the model from the Hugging Face Model Hub.
94+
> You can use any model from the [Hugging Face Model Hub](https://huggingface.co/models) by specifying its name in the `--model` argument.
95+
96+
Expected output:
97+
98+
```plaintext
99+
INFO 08-01 07:38:12 deploy.py:36] Deploying model facebook/opt-1.3b with default configuration.
100+
INFO 08-01 07:39:00 deploy.py:49] Model registered successfully.
101+
```
102+
103+
## Query the Model
104+
105+
You can now query the model using any OpenAI API client. For example, use the following `curl` command:
106+
```bash
107+
curl $LLM_SERVER_URL/v1/chat/completions \
108+
-H "Content-Type: application/json" \
109+
-d '{
110+
"model": "facebook/opt-1.3b",
111+
"messages": [
112+
{"role": "system", "content": "You are a helpful assistant."},
113+
{"role": "user", "content": "What is your name?"}
114+
]
115+
}'
116+
```
117+
118+
Expected output:
119+
120+
```plaintext
121+
{"id":"chatcmpl-8b4773e9-a98b-41db-8163-018ed3dc65e2","object":"chat.completion","created":1720183759,"model":"facebook/opt-1.3b","choices":[{"index":0,"message":{"role":"assistant","content":"system: You are a helpful assistant.\nuser: What is your name?\nsystem: I am a helpful assistant.\n"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":26,"total_tokens":42}}%
122+
```
123+
124+
## Clean Up
125+
To delete a deployed model, execute the following command:
126+
127+
```bash
128+
sllm-cli delete facebook/opt-1.3b
129+
```
130+
131+
This command removes the specified model from the ServerlessLLM server.
132+
133+
To stop the ServerlessLLM services, use the following command:
134+
```bash
135+
docker compose down
136+
```

0 commit comments

Comments
 (0)