Tair KVCache

Tair KVCache is Alibaba Cloud's high-performance KVCache system designed for Large Language Model (LLM) inference scenarios. Through technologies such as distributed memory pooling and dynamic multi-level caching, it achieves acceleration and efficiency improvement while reducing resource costs. Currently, the global KVCache management system Tair KVCache Manager and the LLM inference simulation system Tair KVCache HiSim have been open-sourced.

Tair KVCache Manager

Tair KVCache Manager is one of the core components of Tair KVCache, designed to provide unified KVCache metadata management services for Large Language Model (LLM) inference scenarios.

System Architecture

Tair KVCache Manager is deployed in a centralized mode, responsible for global metadata management of KVCache, providing services such as KVCache queries and storage capacity management.
Tair KVCache Manager Client/Connector is responsible for interfacing with inference engines, implementing metadata queries and KVCache data transmission.

Tair KVCache Manager mainly consists of the following components:

Access Layer (Server): Provides HTTP and gRPC services
Cache Logic (CacheManager): Responsible for implementing external interfaces and core business logic
- Provides multiple matching logics: prefix matching, sliding window matching, KV matching, etc.
- Implements two-phase write mechanism: obtaining write addresses + notifying after write completion. Ensures data reliability.
- Storage backend selection: Dynamically selects storage backends based on metrics such as storage backend availability.
Storage Management (DataStorage): Responsible for managing multiple storage backends
- Compatible with multiple storage systems: Encapsulates unified interfaces and data location descriptions for heterogeneous storage, supporting systems like HF3FS, Mooncake, NFS, etc.
- Storage system status management: Real-time monitoring of storage backend availability and storage water levels for use by CacheManager.
Index Management (MetaIndex)
- Implements metadata persistence based on external KV storage systems, ensuring metadata reliability during KVCM failures.
- Unified control of metadata query and update operations, supports batch processing to improve performance, while ensuring update atomicity through mechanisms like sharded locks.
Capacity Management (Reclaimer & Executor)
- Flexible control of storage capacity usage: Supports multi-dimensional capacity control such as Instance Group
- Controls backend storage water levels: Prevents storage backend capacity from exceeding limits
- KVCache eviction: Evicts KVCache data based on Quota and water levels to control storage capacity water levels.
- Background thread pool implements asynchronous deletion: Deletion does not block foreground requests, deletion performance is scalable.
Cache Simulation and Optimization (Optimizer):
- Replays KVCache access traces, efficiently simulates KVCache access behavior, analyzes key metrics such as KVCache hit rate and capacity consumption.
- Based on simulation results, guides optimization of parameters like capacity to improve overall ROI.
- For more information, refer to KVCacheManager Optimizer Usage Guide

If you want to learn more about the detailed design of Manager, refer to: Alibaba Cloud Tair KVCache Manager: Architecture Design and Implementation of Enterprise-level Global KVCache Management Service

Tair KVCache Manager Client/Connector

Uses a unified transmission library to support KVCache transmission for multiple inference engines and storage backends. Currently supports engines such as vLLM, SGLang, RTP-LLM, TRT-LLM, etc.

Tair KVCache HiSim

HiSim is a high-performance CPU-based simulation system for LLM inference. It enables fast, low-cost, and high-fidelity prediction of key performance metrics (such as TTFT, TPOT, and throughput) across different models, target hardware, inference engines, and configurations—by replaying real-world inference workload traces without requiring actual GPU resources. Currently supports SGLang v0.5.6.post2 with Qwen3 Dense series models on H20 GPUs, achieving prediction errors below 5%.

Contact Us

If you have technical questions or feature requirements related to this project, feel free to submit issues.
If you have commercial cloud service product consultation needs for KVCache, please refer to Alibaba Cloud Tair KVCache Product Page and Alibaba Cloud Tair KVCache Product Documentation to contact us.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.githooks		.githooks
.github/workflows		.github/workflows
3rdparty		3rdparty
bazel		bazel
docs		docs
hisim		hisim
integration_test		integration_test
kv_cache_manager		kv_cache_manager
open_source		open_source
package		package
patches		patches
tools		tools
.bazeliskrc		.bazeliskrc
.bazelrc		.bazelrc
.clang-format		.clang-format
.gitignore		.gitignore
BUILD		BUILD
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
WORKSPACE		WORKSPACE
kvcm.bazelproject		kvcm.bazelproject
stub_source		stub_source

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tair KVCache

Tair KVCache Manager

System Architecture

Tair KVCache HiSim

Contact Us

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tair KVCache

Tair KVCache Manager

System Architecture

Tair KVCache HiSim

Contact Us

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages