from warp_cache import cache
@cache()
def expensive(x, y):
# ... slow computation ...
return x + y
expensive(1, 2) # computes and caches
expensive(1, 2) # returns cached resultArguments must be hashable. Like functools.lru_cache, warp_cache uses
hash() to build cache keys. Passing unhashable types raises TypeError:
@cache()
def process(data):
return sum(data)
process((1, 2, 3)) # ok - tuples are hashable
process("hello") # ok - strings are hashable
process([1, 2, 3]) # TypeError - lists are not hashable
process({"a": 1}) # TypeError - dicts are not hashableIf you need to cache a function that takes unhashable arguments, convert them
to hashable equivalents before passing (e.g. tuple(my_list),
tuple(sorted(my_dict.items()))).
warp_cache uses SIEVE eviction - an algorithm that gets better hit rates than LRU with O(1) cost per access. There is no strategy parameter; SIEVE is used for both backends.
SIEVE works by maintaining a visited bit on each cache entry:
- On cache hit: the entry's
visitedbit is set to 1 (protecting it from eviction) - On eviction: a rotating "hand" scans the cache. Entries with
visited=1get a second chance (bit cleared to 0, hand advances). The first entry found withvisited=0is evicted.
Frequently-accessed entries stay protected, while entries that were cached but never re-accessed get evicted first. Similar to LRU but handles sequential scans better and costs less per hit (no list reordering).
Async functions are detected at decoration time - no special syntax needed.
Cache lookups still go through the Rust path; only cache misses await the
wrapped coroutine.
import asyncio
from warp_cache import cache
@cache(max_size=256)
async def fetch_user(user_id: int) -> dict:
# ... slow I/O ...
return {"id": user_id}
async def main():
user = await fetch_user(42) # miss - awaits the coroutine
user = await fetch_user(42) # hit - returns cached result instantly
asyncio.run(main())@cache(max_size=128, ttl=60.0) # entries expire after 60 seconds
def get_config(name):
...The Backend enum selects where cached data is stored. Backend is an IntEnum, but the decorator also accepts the strings "memory" and "shared" for convenience.
from warp_cache import cache, Backend
@cache(max_size=256, backend=Backend.MEMORY) # enum
@cache(max_size=256, backend="memory") # equivalent string| Backend | Value | Storage | Use case |
|---|---|---|---|
Backend.MEMORY |
0 |
In-process (default) | Single-process applications |
Backend.SHARED |
1 |
Memory-mapped file | Cross-process sharing via mmap |
The memory backend keeps cached data in the process heap. Keys are stored as Python objects directly (no serialization), and lookups go through a single Rust __call__ - hash, lookup, equality check, and return all happen in one FFI crossing with no copying.
Thread safety uses a sharded hashbrown::HashMap with GIL-conditional locking - under GIL-enabled Python, GilCell has no overhead; under free-threaded Python, per-shard parking_lot::RwLock allows parallel reads. Write lock is only taken on cache misses for SIEVE eviction.
@cache(max_size=256) # backend="memory" is the default
def compute(x):
return x ** 2Use this backend when all callers live in the same process (web server threads, thread pools, async tasks, etc.).
The shared backend stores cached data in memory-mapped files, making entries visible across multiple processes. This is useful for multi-process deployments (e.g. Gunicorn workers, Celery tasks) where you want to avoid recomputing the same expensive results in each process.
@cache(max_size=1024, backend="shared")
def get_embedding(text: str) -> list[float]:
# computed once, shared across all worker processes
...How it works:
- Two mmap files are created per decorated function:
- Data file - contains a header, a hash table (open-addressing with linear probing), and a fixed-size slab arena for entries
- Lock file - holds a seqlock (sequence counter + spinlock) for cross-process synchronization. Reads are optimistic (no lock taken); only writes acquire the spinlock
- File location:
/dev/shm/on Linux,$TMPDIR/warp_cache/on macOS - The file name is derived deterministically from the function's
__module__and__qualname__, so the same function in different processes maps to the same cache automatically - If an existing cache file has different parameters (capacity, key/value sizes, version), it is recreated
Serialization overhead:
Keys and values are serialized using a fast-path binary format for common primitives (None, bool, int, float, str, bytes, flat tuples) with pickle fallback for complex types. This adds per-operation cost compared to the memory backend, which stores Python objects directly. Expect roughly 2x lower throughput - the gap is unavoidable cross-process overhead: serialization, deterministic hashing, seqlock, and mmap copy. No Mutex is used; reads don't take any locks. The shared backend makes sense when the cached computation is expensive enough (network I/O, ML inference, heavy math) that serialization cost doesn't matter.
Size limits:
Each entry has a fixed slot size determined at creation time. Keys and values that exceed the configured limits are silently skipped (the function is called but the result is not cached). You can monitor skips via cache_info().oversize_skips.
| Parameter | Default | Description |
|---|---|---|
max_key_size |
512 bytes |
Maximum serialized size of the key (args tuple) |
max_value_size |
4096 bytes |
Maximum serialized size of the return value |
# Large values: increase max_value_size
@cache(max_size=256, backend="shared", max_value_size=65536)
def get_large_result(query: str) -> dict:
...| Platform | backend="memory" |
backend="shared" |
|---|---|---|
| Linux (x86_64, aarch64) | Yes | Yes (/dev/shm/) |
| macOS (x86_64, arm64) | Yes | Yes ($TMPDIR/warp_cache/) |
| Windows (x86_64) | Yes | No |
The shared backend relies on POSIX mmap which is not available on Windows. The seqlock uses portable atomics rather than platform-specific threading primitives. Using backend="shared" on Windows raises a RuntimeError at decoration time. The memory backend works on all platforms.
@cache(max_size=100)
def compute(n):
return n ** 2
compute(5)
compute(5)
info = compute.cache_info()
print(info) # CacheInfo(hits=1, misses=1, max_size=100, current_size=1)
compute.cache_clear() # removes all entries, resets countersThe cache is safe to use from multiple threads with no additional locking:
from concurrent.futures import ThreadPoolExecutor
from warp_cache import cache
@cache(max_size=256)
def work(x):
return x * x
with ThreadPoolExecutor(max_workers=8) as pool:
results = list(pool.map(work, range(100)))Both backends build the cache key from *args and **kwargs:
- No kwargs (common path): The
argstuple is used directly as the key. - With kwargs: Keywords are sorted by name to ensure deterministic ordering,
then combined with args as
(args, tuple(sorted(kwargs.items()))). This meansfn(a=1, b=2)andfn(b=2, a=1)always hit the same cache entry.
Arguments must be hashable (memory backend) or serializable (shared backend).
Keys are stored as Python objects on the heap — no serialization. Lookups use
Python's built-in hash() and == via the C API. This is fast but means the
cache is inherently single-process (Python object pointers are not meaningful
across processes).
Keys and values are serialized to bytes before storage. The serialization uses a fast-path binary format for common primitive types, falling back to pickle for everything else:
| Type | Format | Size |
|---|---|---|
None |
Tag byte | 1 byte |
bool |
Tag byte | 1 byte |
int (fits i64) |
Tag + little-endian i64 | 9 bytes |
float |
Tag + IEEE 754 f64 | 9 bytes |
str |
Tag + 4-byte length + UTF-8 | 5 + len bytes |
bytes |
Tag + 4-byte length + data | 5 + len bytes |
| Flat tuple of above | Tag + count + elements | varies |
| Everything else | Pickle (protocol 5) | varies |
The fast-path avoids pickle overhead entirely for the most common argument types. Large integers (outside i64 range), nested structures, dicts, sets, and custom objects fall back to pickle automatically.
The shared backend must ensure that the same function arguments produce the same
cache key in every process. Python's hash() is randomized per-process
(PYTHONHASHSEED), so the shared backend does not use it. Instead:
- Keys are serialized to a deterministic byte sequence (the binary format above)
- The bytes are hashed with ahash using fixed seeds (same seeds in every process)
- Lookups verify matches using byte-level comparison (
memcmp), not Python equality
This makes the shared backend completely immune to PYTHONHASHSEED — different
processes with different hash seeds will always agree on cache entries.
| Parameter | Type | Default | Description |
|---|---|---|---|
max_size |
int |
128 |
Maximum number of cached entries |
ttl |
float | None |
None |
Time-to-live in seconds (None = no expiry) |
backend |
str | int | Backend |
Backend.MEMORY |
"memory" for in-process, "shared" for cross-process |
max_key_size |
int |
512 |
Max serialized key bytes (shared backend only) |
max_value_size |
int |
4096 |
Max serialized value bytes (shared backend only) |