Prerequisites
Feature Description
Gemma4 models don't reuse prompt cache due to missing PR ggml-org#22288 from upstream llama.cpp:
ggml-org#22288
Current behavior: "forcing full prompt re-processing due to lack of cache data"
Expected: checkpoint restoration works with --swa-full
please sync with upstream master to include this fix
Motivation
Official llama.cpp b9127 works perfectly - cache reuse gives speedup for repeated prompts
Possible Implementation
No response
Prerequisites
Feature Description
Gemma4 models don't reuse prompt cache due to missing PR ggml-org#22288 from upstream llama.cpp:
ggml-org#22288
Current behavior: "forcing full prompt re-processing due to lack of cache data"
Expected: checkpoint restoration works with --swa-full
please sync with upstream master to include this fix
Motivation
Official llama.cpp b9127 works perfectly - cache reuse gives speedup for repeated prompts
Possible Implementation
No response