Alignedserve is a prototype service framework implemented on top of Distserve, and all plugins provided by the framework are located in the alignedserve folder.
It utilizes a high-performance C++ Transformer inference library SwiftTransformer as the execution backend, which supports many features like model/pipeline parallelism, FlashAttention, Continuous Batching, and PagedAttention.
# clone the project
# setup the distserve conda environment
conda env create -f environment.yml && conda activate alignedserve
# clone and build the SwiftTransformer library
git clone https://github.com/LLMServe/SwiftTransformer.git && cd SwiftTransformer && git submodule update --init --recursive
cmake -B build && cmake --build build -j$(nproc)
cd ..
# install distserve
pip install -e .AlignedServe requires at least two GPUs to play with. We provide an inference example in examples/serving_example.py.