Right now Strix’s Claude prompt caching is only applied to the system prompt, but most of the repeated token spend happens in other parts of the input context.
We should expand caching to cover more reusable prompt segments so repeated calls within a run can reuse cached tokens and cut cost significantly.