|中文版本|
Tair KVCache is Alibaba Cloud's high-performance KVCache system designed for Large Language Model (LLM) inference scenarios. Through technologies such as distributed memory pooling and dynamic multi-level caching, it achieves acceleration and efficiency improvement while reducing resource costs. Currently, the global KVCache management system Tair KVCache Manager and the LLM inference simulation system Tair KVCache HiSim have been open-sourced.
Tair KVCache Manager is one of the core components of Tair KVCache, designed to provide unified KVCache metadata management services for Large Language Model (LLM) inference scenarios.
- Tair KVCache Manager is deployed in a centralized mode, responsible for global metadata management of KVCache, providing services such as KVCache queries and storage capacity management.
- Tair KVCache Manager Client/Connector is responsible for interfacing with inference engines, implementing metadata queries and KVCache data transmission.
Tair KVCache Manager mainly consists of the following components:
- Access Layer (Server): Provides HTTP and gRPC services
- Cache Logic (CacheManager): Responsible for implementing external interfaces and core business logic
- Provides multiple matching logics: prefix matching, sliding window matching, KV matching, etc.
- Implements two-phase write mechanism: obtaining write addresses + notifying after write completion. Ensures data reliability.
- Storage backend selection: Dynamically selects storage backends based on metrics such as storage backend availability.
- Storage Management (DataStorage): Responsible for managing multiple storage backends
- Compatible with multiple storage systems: Encapsulates unified interfaces and data location descriptions for heterogeneous storage, supporting systems like HF3FS, Mooncake, NFS, etc.
- Storage system status management: Real-time monitoring of storage backend availability and storage water levels for use by CacheManager.
- Index Management (MetaIndex)
- Implements metadata persistence based on external KV storage systems, ensuring metadata reliability during KVCM failures.
- Unified control of metadata query and update operations, supports batch processing to improve performance, while ensuring update atomicity through mechanisms like sharded locks.
- Capacity Management (Reclaimer & Executor)
- Flexible control of storage capacity usage: Supports multi-dimensional capacity control such as Instance Group
- Controls backend storage water levels: Prevents storage backend capacity from exceeding limits
- KVCache eviction: Evicts KVCache data based on Quota and water levels to control storage capacity water levels.
- Background thread pool implements asynchronous deletion: Deletion does not block foreground requests, deletion performance is scalable.
- Cache Simulation and Optimization (Optimizer):
- Replays KVCache access traces, efficiently simulates KVCache access behavior, analyzes key metrics such as KVCache hit rate and capacity consumption.
- Based on simulation results, guides optimization of parameters like capacity to improve overall ROI.
- For more information, refer to KVCacheManager Optimizer Usage Guide
If you want to learn more about the detailed design of Manager, refer to: Alibaba Cloud Tair KVCache Manager: Architecture Design and Implementation of Enterprise-level Global KVCache Management Service
Tair KVCache Manager Client/Connector
Uses a unified transmission library to support KVCache transmission for multiple inference engines and storage backends. Currently supports engines such as vLLM, SGLang, RTP-LLM, TRT-LLM, etc.
HiSim is a high-performance CPU-based simulation system for LLM inference. It enables fast, low-cost, and high-fidelity prediction of key performance metrics (such as TTFT, TPOT, and throughput) across different models, target hardware, inference engines, and configurations—by replaying real-world inference workload traces without requiring actual GPU resources. Currently supports SGLang v0.5.6.post2 with Qwen3 Dense series models on H20 GPUs, achieving prediction errors below 5%.
- If you have technical questions or feature requirements related to this project, feel free to submit issues.
- If you have commercial cloud service product consultation needs for KVCache, please refer to Alibaba Cloud Tair KVCache Product Page and Alibaba Cloud Tair KVCache Product Documentation to contact us.

