|
| 1 | +--- |
| 2 | +sidebar_position: 1 |
| 3 | +--- |
| 4 | + |
| 5 | +# Getting Started |
| 6 | + |
| 7 | +This guide demonstrates how to quickly set up a local ServerlessLLM cluster using Docker Compose on a single machine. We will initialize a minimal cluster, consisting of a head node and a single worker node. Then, we'll deploy a model using the `sllm-cli` and query the deployment through an OpenAI-compatible API. |
| 8 | + |
| 9 | +:::note |
| 10 | +We strongly recommend using Docker (Compose) to manage your ServerlessLLM cluster, whether you are using ServerlessLLM for testing or development. However, if Docker is not a viable option for you, please refer to the [deploy from scratch guide](./deployment/single_machine.md). |
| 11 | +::: |
| 12 | + |
| 13 | +## Prerequisites |
| 14 | + |
| 15 | +Before you begin, ensure you have the following installed and configured: |
| 16 | + |
| 17 | +1. **Docker**: Installed on your system. You can download it from [here](https://docs.docker.com/get-docker/). |
| 18 | +2. **ServerlessLLM CLI**: Installed on your system. Install it using `pip install serverless-llm`. |
| 19 | +3. **GPUs**: At least one NVIDIA GPU is required. If you have multiple GPUs, you can adjust the `docker-compose.yml` file accordingly. |
| 20 | +4. **NVIDIA Docker Toolkit**: This enables Docker to utilize NVIDIA GPUs. Follow the installation guide [here](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). |
| 21 | + |
| 22 | +## Start the ServerlessLLM Cluster |
| 23 | + |
| 24 | +We will use Docker Compose to simplify the ServerlessLLM setup process. |
| 25 | + |
| 26 | +### Step 1: Download the Docker Compose File |
| 27 | + |
| 28 | +Download the `docker-compose.yml` file from the ServerlessLLM repository: |
| 29 | + |
| 30 | +```bash |
| 31 | +# Create a directory for the ServerlessLLM Docker setup |
| 32 | +mkdir serverless-llm-docker && cd serverless-llm-docker |
| 33 | + |
| 34 | +# Download the docker-compose.yml file |
| 35 | +curl -O https://raw.githubusercontent.com/ServerlessLLM/ServerlessLLM/main/examples/docker/docker-compose.yml |
| 36 | + |
| 37 | +# Alternatively, you can use wget: |
| 38 | +# wget https://raw.githubusercontent.com/ServerlessLLM/ServerlessLLM/main/examples/docker/docker-compose.yml |
| 39 | +``` |
| 40 | + |
| 41 | +### Step 2: Configuration |
| 42 | + |
| 43 | +Create a directory on your host machine to store models. Then, set the `MODEL_FOLDER` environment variable to point to this directory: |
| 44 | + |
| 45 | +```bash |
| 46 | +export MODEL_FOLDER=/path/to/your/models |
| 47 | +``` |
| 48 | + |
| 49 | +Replace `/path/to/your/models` with the actual path where you intend to store the models. This directory will be mounted into the Docker containers. |
| 50 | + |
| 51 | +### Step 3: Start the Services |
| 52 | + |
| 53 | +Start the ServerlessLLM services using Docker Compose: |
| 54 | + |
| 55 | +```bash |
| 56 | +docker compose up -d |
| 57 | +``` |
| 58 | + |
| 59 | +This command will start the Ray head node and a worker node as defined in the `docker-compose.yml` file. |
| 60 | + |
| 61 | +Verify that the services are ready: |
| 62 | + |
| 63 | +```bash |
| 64 | +docker logs sllm_head |
| 65 | +``` |
| 66 | + |
| 67 | +Ensure the services are ready before proceeding. You should see output similar to the following: |
| 68 | + |
| 69 | +```bash |
| 70 | +... |
| 71 | +(SllmController pid=1435) INFO 05-26 15:40:49 controller.py:68] Starting scheduler |
| 72 | +INFO: Started server process [1] |
| 73 | +INFO: Waiting for application startup. |
| 74 | +INFO: Application startup complete. |
| 75 | +INFO: Uvicorn running on http://0.0.0.0:8343 (Press CTRL+C to quit) |
| 76 | +(FcfsScheduler pid=1604) INFO 05-26 15:40:49 fcfs_scheduler.py:54] Starting FCFS scheduler |
| 77 | +(FcfsScheduler pid=1604) INFO 05-26 15:40:49 fcfs_scheduler.py:111] Starting control loop |
| 78 | +``` |
| 79 | + |
| 80 | +## Deploy a Model Using sllm-cli |
| 81 | + |
| 82 | +Set the `LLM_SERVER_URL` environment variable: |
| 83 | + |
| 84 | +```bash |
| 85 | +export LLM_SERVER_URL=http://127.0.0.1:8343 |
| 86 | +``` |
| 87 | + |
| 88 | +Deploy a model to the ServerlessLLM cluster using the `sllm-cli`: |
| 89 | + |
| 90 | +```bash |
| 91 | +sllm-cli deploy --model facebook/opt-1.3b |
| 92 | +``` |
| 93 | +> Note: This command will take some time to download the model from the Hugging Face Model Hub. |
| 94 | +> You can use any model from the [Hugging Face Model Hub](https://huggingface.co/models) by specifying its name in the `--model` argument. |
| 95 | +
|
| 96 | +Expected output: |
| 97 | + |
| 98 | +```plaintext |
| 99 | +INFO 08-01 07:38:12 deploy.py:36] Deploying model facebook/opt-1.3b with default configuration. |
| 100 | +INFO 08-01 07:39:00 deploy.py:49] Model registered successfully. |
| 101 | +``` |
| 102 | + |
| 103 | +## Query the Model |
| 104 | + |
| 105 | +You can now query the model using any OpenAI API client. For example, use the following `curl` command: |
| 106 | +```bash |
| 107 | +curl $LLM_SERVER_URL/v1/chat/completions \ |
| 108 | +-H "Content-Type: application/json" \ |
| 109 | +-d '{ |
| 110 | + "model": "facebook/opt-1.3b", |
| 111 | + "messages": [ |
| 112 | + {"role": "system", "content": "You are a helpful assistant."}, |
| 113 | + {"role": "user", "content": "What is your name?"} |
| 114 | + ] |
| 115 | + }' |
| 116 | +``` |
| 117 | + |
| 118 | +Expected output: |
| 119 | + |
| 120 | +```plaintext |
| 121 | +{"id":"chatcmpl-8b4773e9-a98b-41db-8163-018ed3dc65e2","object":"chat.completion","created":1720183759,"model":"facebook/opt-1.3b","choices":[{"index":0,"message":{"role":"assistant","content":"system: You are a helpful assistant.\nuser: What is your name?\nsystem: I am a helpful assistant.\n"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":26,"total_tokens":42}}% |
| 122 | +``` |
| 123 | + |
| 124 | +## Clean Up |
| 125 | +To delete a deployed model, execute the following command: |
| 126 | + |
| 127 | +```bash |
| 128 | +sllm-cli delete facebook/opt-1.3b |
| 129 | +``` |
| 130 | + |
| 131 | +This command removes the specified model from the ServerlessLLM server. |
| 132 | + |
| 133 | +To stop the ServerlessLLM services, use the following command: |
| 134 | +```bash |
| 135 | +docker compose down |
| 136 | +``` |
0 commit comments