You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use weight cache for quantized tensor scale data (#14455)
Summary:
When enabling the XNNPACK weight cache and running a model with qb4 or
qc8-quantized linear weights, it triggers an assertion that is intended
to make sure all data is in the weight cache. This can be reproduced by
running the XNNPACK backend linear op tests with weight cache enabled.
The root cause appears to be that tensor scale data is bypassing the
weight cache - likely an oversight. This isn't a correctness issue, but
does cause the aforementioned assert to fail and uses marginally more
memory than it otherwise needs to.
This PR updates the XNNPACK compileModel call to use the weight cache
for scale data (instead of putting it in the unpacked_buffers list).
With this change, the linear op tests pass with weight cache enabled.
Differential Revision: D82862629
Co-authored-by: Gregory Comer <[email protected]>
0 commit comments