Skip to content

Layr/make batched cache#2

Open
ronaldmannak wants to merge 2 commits into
Layr-Labs:mainfrom
PicoMLX:layr/makeBatchedCache
Open

Layr/make batched cache#2
ronaldmannak wants to merge 2 commits into
Layr-Labs:mainfrom
PicoMLX:layr/makeBatchedCache

Conversation

@ronaldmannak

Copy link
Copy Markdown
  • Replace the silent fall-through in BatchGenerator.makeBatchedCache with explicit topology validation at init time. The previous code defaulted any unrecognized KVCache subclass to BatchKVCache, which silently produced incorrect caches for unsupported types.
  • Add BatchedCacheList, a batched analog of CacheList, so hybrid models that compose multiple caches per layer (e.g. Mamba + attention) preserve their per-layer topology under continuous batching.
  • Add an optional cacheParameters: GenerateParameters? argument to BatchGenerator.init so models that need cache-shaping options (e.g. maxKVSize) can receive them.

Behavior changes

  • BatchGenerator.init is now throws and rejects unsupported cache topologies up-front via BatchGeneratorError.unsupportedCacheTopology, including the layer index, a path within the layer, the offending cache type, and a human-readable reason.
  • Explicitly rejects QuantizedKVCache, ChunkedKVCache, and RotatingKVCache configured with non-zero keep tokens, with type-specific error messages instead of a generic "no batched implementation" fallback.
  • Cache factories are built once at init from the model's probe cache and reused per batch, removing the per-batch model.newCache(parameters:) call and re-classification.
  • Exact-type matches (Swift.type(of:) == X.self) are used for MambaCache, ArraysCache, and KVCacheSimple to avoid misclassifying subclasses (MambaCache : ArraysCache, ChunkedKVCache : KVCacheSimple).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant