-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
imatrix: add option to display importance score statistics for a given imatrix file #12718
base: master
Are you sure you want to change the base?
Conversation
Nice idea, seems like something we discuss the last time? @bartowski1182 Btw is it possible to show importance score from an existing imatrix file @EAddario ? |
Thank you @ngxson. Yes, it will process any imatrix file produced by llama-imatrix, but it is restricted to single file (does not deal with multiple --in-file) |
Isn't this just related to the hidden state norms getting larger as you move through the different layers? If so, then it won't really account for the accumulation of errors caused by an early layer on the final output? |
Not sure if I'm understanding the comment correctly @jukofyork, but the logic I'm using to identify the most influential tensors/layers is to simply average the importance scores (IS) for each, add those averages together, and then compute their individual contributions from the total. The logic llama-imatrix uses to calculate the IS is to square the value of the corresponding weight during inference, keep a running total of how many times that particular value has been updated, and then save the average when inference has finished. This only applies to 2d or larger tensors, so it will ignore norms (1d), but since errors influence which weights get updated (and how frequently), the IS does account for errors, albeit indirectly. Make sense? |
I think the mean squared activations (which would be their variance assuming a mean of 0) cannot really be compared across tensors without some kind of normalization, because the values of the model weights can also affect the relative importance of the activations. ( The goal here is to find which layers need more precision, right? I'm not sure if the mean squared activations really are what you're looking for. There might be other measures like skewness and kurtosis which may be useful. But I'm not sure if taking only the activations into account is the right way to get the insights you seek. What I'd like to try eventually would be to use a simultaneous quantization algorithm to try multiple bit-widths at once in a reasonable amount of time so that the errors can be compared per tensor to help with the choice of quantization type. This would be possible for I still think it can be useful to have some way to visualize what is in In the paper you link (https://arxiv.org/pdf/2406.17415), the closest thing to what you propose would be the LIM (layer input modification) score, which is calculated as follows (in Section 3.1), where
|
Very clear now, thanks @compilade. You're correct, I'm using the mean squared activation averaged to identify which tensors/layers produce large magnitude activations and whilst agree it isn't as accurate as, say, correlation / covariance / LIM I think it's still a reasonable proxy, specially considering how the importance scores are actually used during quantization (quant_weights in ggml-quants.c) I had a quick look at your PRs. I definitely like the idea of storing imatrix data in GGUF format and can appreciate how it would improve the generation of these types of stats. #12557 is quite intriguing, but truth be told I haven't had a chance to really digest it fully (there's a lot going on!) but would love to see it merged specially if it improves ternary quants |
A new
--show-statistics
option generates a report highlighting which tensors/layers contribute the most in a model. The report is sorted from the highest influence to lowest. The process computes the average value of scores per tensor/layer and calculates their % contribution, exiting immediately after completion.This PR can be used along with quantize: Handle user-defined quantization levels for additional tensors to do layer-wise quantization similar, but not quite the same, to the process described in Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels
Output example:
llama-imatrix --in-file imatrix-DeepSeek-R1-Distill-Llama-8B-small.dat --show-statistics