CacheBlend (Under Construction):

This is the code repo for CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion. The current implementation is based on vLLM.

The newest updates will always be at LMCache. Stay tuned !!!

Installation

Python>=3.9 and CUDA >= 12.1 are required. An Nvidia GPU with >=40 GB memory is recommended. To install CacheBlend depenencies:

git clone [email protected]:YaoJiayi/CacheBlend.git
cd CacheBlend/vllm_blend
pip install -e .
cd ..
pip install -r requirements.txt

Example run

Run LLM inference with CacheBlend

python example/blend.py

Run Musique dataset

Compare LLM inference with CacheBlend and normal prefill

python example/blend_musique.py

To run datasets other than musique, please replace musique with samsum or wikimqa in the above command.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
example		example
inputs		inputs
vllm_blend		vllm_blend
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CacheBlend (Under Construction):

The newest updates will always be at LMCache. Stay tuned !!!

Installation

Example run

Run LLM inference with CacheBlend

Run Musique dataset

Compare LLM inference with CacheBlend and normal prefill

References

About

Releases

Packages

Contributors 2

Languages

YaoJiayi/CacheBlend

Folders and files

Latest commit

History

Repository files navigation

CacheBlend (Under Construction):

The newest updates will always be at LMCache. Stay tuned !!!

Installation

Example run

Run LLM inference with CacheBlend

Run Musique dataset

Compare LLM inference with CacheBlend and normal prefill

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages