Skip to content

Commit 7901e49

Browse files
authored
Merge branch 'main' into add-es
2 parents 8753cb3 + 5cb5061 commit 7901e49

22 files changed

+1018
-159
lines changed

README.md

+143-72
Original file line numberDiff line numberDiff line change
@@ -9,62 +9,107 @@ ModelCache
99
<h4 align="center">
1010
<p>
1111
<a href="https://github.com/codefuse-ai/CodeFuse-ModelCache/blob/main/README_CN.md">中文</a> |
12-
<b>English</b>
12+
<b>English</b>
1313
</p>
1414
</h4>
1515
</div>
1616

1717
## Contents
18-
- [news](#news)
19-
- [Introduction](#Introduction)
20-
- [Quick-Deployment](#Quick-Deployment)
21-
- [Service-Access](#Service-Access)
22-
- [Articles](#Articles)
23-
- [Modules](#Modules)
24-
- [Core-Features](#Core-Features)
25-
- [Acknowledgements](#Acknowledgements)
26-
- [Contributing](#Contributing)
27-
## news
28-
- 🔥🔥[2024.04.09] Add Redis Search to store and retrieve embeddings in multi-tenant scene, this can reduce the interaction time between Cache and vector databases to 10ms.
29-
- 🔥🔥[2023.12.10] we integrate LLM embedding frameworks such as 'llmEmb', 'ONNX', 'PaddleNLP', 'FastText', alone with the image embedding framework 'timm', to bolster embedding functionality.
30-
- 🔥🔥[2023.11.20] codefuse-ModelCache has integrated local storage, such as sqlite and faiss, providing users with the convenience of quickly initiating tests.
18+
19+
- [Contents](#contents)
20+
- [News](#news)
21+
- [Introduction](#introduction)
22+
- [Architecture](#architecture)
23+
- [Quick start](#quick-start)
24+
- [Dependencies](#dependencies)
25+
- [Start service](#start-service)
26+
- [Start demo](#start-demo)
27+
- [Start normal service](#start-normal-service)
28+
- [Visit the service](#visit-the-service)
29+
- [Write cache](#write-cache)
30+
- [Query cache](#query-cache)
31+
- [Clear cache](#clear-cache)
32+
- [Function comparison](#function-comparison)
33+
- [Features](#features)
34+
- [Todo List](#todo-list)
35+
- [Adapter](#adapter)
36+
- [Embedding model\&inference](#embedding-modelinference)
37+
- [Scalar Storage](#scalar-storage)
38+
- [Vector Storage](#vector-storage)
39+
- [Ranking](#ranking)
40+
- [Service](#service)
41+
- [Acknowledgements](#acknowledgements)
42+
- [Contributing](#contributing)
43+
44+
## News
45+
46+
- 🔥🔥[2024.10.22] Added tasks for 1024 developer day.
47+
- 🔥🔥[2024.04.09] Added Redis Search to store and retrieve embeddings in multi-tenant. This can reduce the interaction time between Cache and vector databases to 10ms.
48+
- 🔥🔥[2023.12.10] Integrated LLM embedding frameworks such as 'llmEmb', 'ONNX', 'PaddleNLP', 'FastText', and the image embedding framework 'timm' to bolster embedding functionality.
49+
- 🔥🔥[2023.11.20] Integrated local storage, such as sqlite and faiss. This enables you to initiate quick and convenient tests.
3150
- [2023.08.26] codefuse-ModelCache...
51+
3252
### Introduction
53+
3354
Codefuse-ModelCache is a semantic cache for large language models (LLMs). By caching pre-generated model results, it reduces response time for similar requests and improves user experience. <br />This project aims to optimize services by introducing a caching mechanism. It helps businesses and research institutions reduce the cost of inference deployment, improve model performance and efficiency, and provide scalable services for large models. Through open-source, we aim to share and exchange technologies related to large model semantic cache.
34-
## Quick Deployment
35-
The project's startup scripts are divided into flask4modelcache.py and flask4modelcache_demo.py.
36-
- flask4modelcache_demo.py is a quick test service that embeds sqlite and faiss, and users do not need to be concerned about database-related matters.
37-
- flask4modelcache.py is the normal service that requires configuration of mysql and milvus database services.
55+
56+
## Architecture
57+
58+
![modelcache modules](docs/modelcache_modules_20240409.png)
59+
60+
## Quick start
61+
62+
You can find the start script in `flask4modelcache.py` and `flask4modelcache_demo.py`.
63+
64+
- `flask4modelcache_demo.py`: A quick test service that embeds SQLite and FAISS. No database configuration required.
65+
- `flask4modelcache.py`: The standard service that requires MySQL and Milvus configuration.
66+
3867
### Dependencies
3968

40-
- Python version: 3.8 and above
41-
- Package Installation
42-
```shell
43-
pip install -r requirements.txt
44-
```
45-
### Service Startup
46-
#### Demo Service Startup
47-
1. Download the embedding model bin file from the following address: [https://huggingface.co/shibing624/text2vec-base-chinese/tree/main](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Place the downloaded bin file in the model/text2vec-base-chinese folder.
48-
2. Start the backend service using the flask4modelcache_dome.py script.
49-
```shell
50-
cd CodeFuse-ModelCache
51-
```
52-
```shell
53-
python flask4modelcache_demo.py
54-
```
69+
- Python: V3.8 or above
70+
- Package installation
71+
72+
```shell
73+
pip install -r requirements.txt
74+
```
75+
76+
### Start service
77+
78+
#### Start demo
79+
80+
1. Download the embedding model bin file from [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Place it in the `model/text2vec-base-chinese` folder.
81+
2. Start the backend service:
82+
83+
```shell
84+
cd CodeFuse-ModelCache
85+
```
86+
87+
```shell
88+
python flask4modelcache_demo.py
89+
```
90+
91+
#### Start normal service
92+
93+
Before you start standard service, do these steps:
94+
95+
1. Install MySQL and import the SQL file from `reference_doc/create_table.sql`.
96+
2. Install vector database Milvus.
97+
3. Configure database access in:
98+
- `modelcache/config/milvus_config.ini`
99+
- `modelcache/config/mysql_config.ini`
100+
4. Download the embedding model bin file from [Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Put it in `model/text2vec-base-chinese`.
101+
5. Start the backend service:
102+
103+
```bash
104+
python flask4modelcache.py
105+
```
106+
107+
## Visit the service
108+
109+
The service provides three core RESTful API functionalities: Cache-Writing, Cache-Querying, and Cache-Clearing.
110+
111+
### Write cache
55112

56-
#### Normal Service Startup
57-
Before starting the service, the following environment configurations should be performed:
58-
1. Install the relational database MySQL and import the SQL file to create the data tables. The SQL file can be found at: ```reference_doc/create_table.sql```
59-
2. Install the vector database Milvus.
60-
3. Add the database access information to the configuration files:
61-
1. ```modelcache/config/milvus_config.ini ```
62-
2. ```modelcache/config/mysql_config.ini```
63-
4. Download the embedding model bin file from the following address: [https://huggingface.co/shibing624/text2vec-base-chinese/tree/main](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Place the downloaded bin file in the model/text2vec-base-chinese folder.
64-
5. Start the backend service using the flask4modelcache.py script.
65-
## Service-Access
66-
The current service provides three core functionalities through RESTful API.: Cache-Writing, Cache-Querying, and Cache-Clearing. Demos:
67-
### Cache-Writing
68113
```python
69114
import json
70115
import requests
@@ -77,7 +122,9 @@ data = {'type': type, 'scope': scope, 'chat_info': chat_info}
77122
headers = {"Content-Type": "application/json"}
78123
res = requests.post(url, headers=headers, json=json.dumps(data))
79124
```
80-
### Cache-Querying
125+
126+
### Query cache
127+
81128
```python
82129
import json
83130
import requests
@@ -90,7 +137,9 @@ data = {'type': type, 'scope': scope, 'query': query}
90137
headers = {"Content-Type": "application/json"}
91138
res = requests.post(url, headers=headers, json=json.dumps(data))
92139
```
93-
### Cache-Clearing
140+
141+
### Clear cache
142+
94143
```python
95144
import json
96145
import requests
@@ -103,12 +152,10 @@ data = {'type': type, 'scope': scope, 'remove_type': remove_type}
103152
headers = {"Content-Type": "application/json"}
104153
res = requests.post(url, headers=headers, json=json.dumps(data))
105154
```
106-
## Articles
107-
https://mp.weixin.qq.com/s/ExIRu2o7yvXa6nNLZcCfhQ
108-
## modules
109-
![modelcache modules](docs/modelcache_modules_20240409.png)
110-
## Function-Comparison
111-
In terms of functionality, we have made several changes to the git repository. Firstly, we have addressed the network issues with huggingface and enhanced the inference speed by introducing local inference capabilities for embeddings. Additionally, considering the limitations of the SqlAlchemy framework, we have completely revamped the module responsible for interacting with relational databases, enabling more flexible database operations. In practical scenarios, LLM products often require integration with multiple users and multiple models. Hence, we have added support for multi-tenancy in the ModelCache, while also making preliminary compatibility adjustments for system commands and multi-turn dialogue.
155+
156+
## Function comparison
157+
158+
We've implemented several key updates to our repository. We've resolved network issues with Hugging Face and improved inference speed by introducing local embedding capabilities. Due to limitations in SqlAlchemy, we've redesigned our relational database interaction module for more flexible operations. We've added multi-tenancy support to ModelCache, recognizing the need for multiple users and models in LLM products. Lastly, we've made initial adjustments for better compatibility with system commands and multi-turn dialogues.
112159
113160
<table>
114161
<tr>
@@ -231,45 +278,69 @@ In terms of functionality, we have made several changes to the git repository. F
231278
</tr>
232279
</table>
233280
281+
## Features
282+
283+
In ModelCache, we incorporated the core principles of GPTCache. ModelCache has four modules: adapter, embedding, similarity, and data_manager.
234284
235-
## Core-Features
236-
In ModelCache, we adopted the main idea of GPTCache, includes core modules: adapter, embedding, similarity, and data_manager. The adapter module is responsible for handling the business logic of various tasks and can connect the embedding, similarity, and data_manager modules. The embedding module is mainly responsible for converting text into semantic vector representations, it transforms user queries into vector form.The rank module is used for sorting and evaluating the similarity of the recalled vectors. The data_manager module is primarily used for managing the database. In order to better facilitate industrial applications, we have made architectural and functional upgrades as follows:
237-
238-
- [x] We have modified it similar to Redis and embedded it into the LLMs product, providing semantic caching capabilities. This ensures that it does not interfere with LLM calls, security audits, and other functionalities, achieving compatibility with all large-scale model services.
239-
- [x] Multiple Model Loading Schemes:
240-
- Support loading local embedding models to address Hugging Face network connectivity issues.
241-
- Support loading various pretrained model embedding layers.
242-
- [x] Data Isolation Capability
243-
- Environment Isolation: Can pull different database configurations based on the environment to achieve environment isolation (dev, prepub, prod).
244-
- Multi-tenant Data Isolation: Dynamically create collections based on the model for data isolation, addressing data isolation issues in multi-model/services scenarios in LLMs products.
245-
- [x] Support for System Commands: Adopting a concatenation approach to address the issue of system commands in the prompt format.
246-
- [x] Differentiation of Long and Short Texts: Long texts pose more challenges for similarity evaluation. To address this, we have added differentiation between long and short texts, allowing for separate configuration of threshold values for determining similarity.
247-
- [x] Milvus Performance Optimization: The consistency_level of Milvus has been adjusted to "Session" level, which can result in better performance.
248-
- [x] Data Management Capability:
249-
- Ability to clear the cache, used for data management after model upgrades.
250-
- Hitquery recall for subsequent data analysis and model iteration reference.
251-
- Asynchronous log write-back capability for data analysis and statistics.
252-
- Added model field and data statistics field for feature expansion.
285+
- The adapter module orchestrates the business logic for various tasks, integrate the embedding, similarity, and data_manager modules.
286+
- The embedding module converts text into semantic vector representations, and transforms user queries into vectors.
287+
- The rank module ranks and evaluate the similarity of recalled vectors.
288+
- The data_manager module manages the databases.
289+
290+
To make ModelCache more suitable for industrial use, we made several improvements to its architecture and functionality:
291+
292+
- [x] Architectural adjustment (lightweight integration):
293+
- Embedded into LLM products using a Redis-like caching mode
294+
- Provided semantic caching without interfering with LLM calls, security audits, and other functions
295+
- Compatible with all LLM services
296+
- [x] Multiple model loading:
297+
- Supported local embedding model loading, and resolved Hugging Face network connectivity issues
298+
- Supported loading embedding layers from various pre-trained models
299+
- [x] Data isolation
300+
- Environment isolation: Read different database configurations based on the environment. Isolate development, staging, and production environments.
301+
- Multi-tenant data isolation: Dynamically create collections based on models for data isolation, addressing data separation issues in multi-model/service scenarios within large language model products
302+
- [x] Supported system instruction: Adopted a concatenation approach to resolve issues with system instructions in the prompt paradigm.
303+
- [x] Long and short text differentiation: Long texts bring more challenges for similarity assessment. Added differentiation between long and short texts, allowing for separate threshold configurations.
304+
- [x] Milvus performance optimization: Adjusted Milvus consistency level to "Session" level for better performance.
305+
- [x] Data management:
306+
- One-click cache clearing to enable easy data management after model upgrades.
307+
- Recall of hit queries for subsequent data analysis and model iteration reference.
308+
- Asynchronous log write-back for data analysis and statistics
309+
- Added model field and data statistics field to enhance features
253310
254311
## Todo List
312+
255313
### Adapter
314+
256315
- [ ] Register adapter for Milvus:Based on the "model" parameter in the scope, initialize the corresponding Collection and perform the load operation.
316+
257317
### Embedding model&inference
318+
258319
- [ ] Inference Optimization: Optimizing the speed of embedding inference, compatible with inference engines such as FasterTransformer, TurboTransformers, and ByteTransformer.
259320
- [ ] Compatibility with Hugging Face models and ModelScope models, offering more methods for model loading.
321+
260322
### Scalar Storage
323+
261324
- [ ] Support MongoDB
262325
- [ ] Support ElasticSearch
326+
263327
### Vector Storage
328+
264329
- [ ] Adapts Faiss storage in multimodal scenarios.
330+
265331
### Ranking
332+
266333
- [ ] Add ranking model to refine the order of data after embedding recall.
334+
267335
### Service
336+
268337
- [ ] Supports FastAPI.
269338
- [ ] Add visual interface to offer a more direct user experience.
270339
271340
## Acknowledgements
341+
272342
This project has referenced the following open-source projects. We would like to express our gratitude to the projects and their developers for their contributions and research.<br />[GPTCache](https://github.com/zilliztech/GPTCache)
273343
274344
## Contributing
275-
ModelCache is a captivating and invaluable project, whether you are an experienced developer or a novice just starting out, your contributions to this project are warmly welcomed. Your involvement in this project, be it through raising issues, providing suggestions, writing code, or documenting and creating examples, will enhance the project's quality and make a significant contribution to the open-source community.
345+
346+
ModelCache is a captivating and invaluable project, whether you are an experienced developer or a novice just starting out, your contributions to this project are warmly welcomed. Your involvement in this project, be it through raising issues, providing suggestions, writing code, or documenting and creating examples, will enhance the project's quality and make a significant contribution to the open-source community.

0 commit comments

Comments
 (0)