Layr/make batched cache by ronaldmannak · Pull Request #2 · Layr-Labs/mlx-swift-lm

ronaldmannak · 2026-05-02T23:36:16Z

Replace the silent fall-through in BatchGenerator.makeBatchedCache with explicit topology validation at init time. The previous code defaulted any unrecognized KVCache subclass to BatchKVCache, which silently produced incorrect caches for unsupported types.
Add BatchedCacheList, a batched analog of CacheList, so hybrid models that compose multiple caches per layer (e.g. Mamba + attention) preserve their per-layer topology under continuous batching.
Add an optional cacheParameters: GenerateParameters? argument to BatchGenerator.init so models that need cache-shaping options (e.g. maxKVSize) can receive them.

Behavior changes

BatchGenerator.init is now throws and rejects unsupported cache topologies up-front via BatchGeneratorError.unsupportedCacheTopology, including the layer index, a path within the layer, the offending cache type, and a human-readable reason.
Explicitly rejects QuantizedKVCache, ChunkedKVCache, and RotatingKVCache configured with non-zero keep tokens, with type-specific error messages instead of a generic "no batched implementation" fallback.
Cache factories are built once at init from the model's probe cache and reused per batch, removing the per-batch model.newCache(parameters:) call and re-classification.
Exact-type matches (Swift.type(of:) == X.self) are used for MambaCache, ArraysCache, and KVCacheSimple to avoid misclassifying subclasses (MambaCache : ArraysCache, ChunkedKVCache : KVCacheSimple).

ronaldmannak added 2 commits May 2, 2026 16:26

Fix BatchKVCache fall through

5904c40

Simplify BatchKVCache

60c8705