imatrix: add option to display importance score statistics for a given imatrix file #12718

EAddario · 2025-04-02T13:40:19Z

A new --show-statistics option generates a report highlighting which tensors/layers contribute the most in a model. The report is sorted from the highest influence to lowest. The process computes the average value of scores per tensor/layer and calculates their % contribution, exiting immediately after completion.

This PR can be used along with quantize: Handle user-defined quantization levels for additional tensors to do layer-wise quantization similar, but not quite the same, to the process described in Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels

Output example:

llama-imatrix --in-file imatrix-DeepSeek-R1-Distill-Llama-8B-small.dat --show-statistics

Computing statistics for imatrix-DeepSeek-R1-Distill-Llama-8B-small.dat (225 tensors)

 Layer	               Tensor	          μ(Importance Scores)	   Contribution
================================================================================
    -	                        output	        5523.92	             13.9226 %
   27	                        attn_v	         356.58	              0.8987 %
   27	                        attn_k	         356.58	              0.8987 %
   27	                        attn_q	         356.58	              0.8987 %
   24	                        attn_k	         347.19	              0.8751 %
   24	                        attn_q	         347.19	              0.8751 %
   24	                        attn_v	         347.19	              0.8751 %
   25	                        attn_q	         346.77	              0.8740 %
   25	                        attn_k	         346.77	              0.8740 %
   25	                        attn_v	         346.77	              0.8740 %
   29	                        attn_v	         344.46	              0.8682 %
...
   0	                      ffn_down	           0.09	              0.0002 %

ngxson · 2025-04-02T14:09:57Z

Nice idea, seems like something we discuss the last time? @bartowski1182

Btw is it possible to show importance score from an existing imatrix file @EAddario ?

EAddario · 2025-04-02T17:21:43Z

Thank you @ngxson. Yes, it will process any imatrix file produced by llama-imatrix, but it is restricted to single file (does not deal with multiple --in-file)

jukofyork · 2025-04-03T17:20:10Z

Isn't this just related to the hidden state norms getting larger as you move through the different layers? If so, then it won't really account for the accumulation of errors caused by an early layer on the final output?

EAddario · 2025-04-06T13:21:51Z

Not sure if I'm understanding the comment correctly @jukofyork, but the logic I'm using to identify the most influential tensors/layers is to simply average the importance scores (IS) for each, add those averages together, and then compute their individual contributions from the total.

The logic llama-imatrix uses to calculate the IS is to square the value of the corresponding weight during inference, keep a running total of how many times that particular value has been updated, and then save the average when inference has finished.

This only applies to 2d or larger tensors, so it will ignore norms (1d), but since errors influence which weights get updated (and how frequently), the IS does account for errors, albeit indirectly.

Make sense?

compilade · 2025-04-06T23:23:33Z

Not sure if I'm understanding the comment correctly @jukofyork, but the logic I'm using to identify the most influential tensors/layers is to simply average the importance scores (IS) for each, add those averages together, and then compute their individual contributions from the total.

@EAddario

I think the mean squared activations (which would be their variance assuming a mean of 0) cannot really be compared across tensors without some kind of normalization, because the values of the model weights can also affect the relative importance of the activations. (llama-imatrix calculates the sum of squared activations and their count, it doesn't directly take into account the model weights; it's only when quantizing that they are taken into account (and even then it depends on the type))

The goal here is to find which layers need more precision, right?

I'm not sure if the mean squared activations really are what you're looking for.

There might be other measures like skewness and kurtosis which may be useful. But I'm not sure if taking only the activations into account is the right way to get the insights you seek.

What I'd like to try eventually would be to use a simultaneous quantization algorithm to try multiple bit-widths at once in a reasonable amount of time so that the errors can be compared per tensor to help with the choice of quantization type.

This would be possible for x[i] ≈ q[i] * s types using a cumulative search similar to #12557, but I don't know how to do that with x[i] ≈ q[i] * s - m types yet.

I still think it can be useful to have some way to visualize what is in imatrix files and/or the distribution of the activations. But not all the necessary information is kept in imatrix files, only the per-channel sum of squared activations, which is a bit limiting for this purpose. Adding more measures (like the mean, skewness and kurtosis, either per-tensor or per-channel) in the file would be easier after #9400.

In the paper you link (https://arxiv.org/pdf/2406.17415), the closest thing to what you propose would be the LIM (layer input modification) score, which is calculated as follows (in Section 3.1), where $L_i$ is the i-th layer, and $L_i^I$ are the input activations and $L_i^O$ the corresponding output activations:

$$ LIM(L_i) = -\frac{L_i^I \cdot L_i^O}{\left|L_i^I\right| \left|L_i^O\right|} $$

llama-imatrix technically has access to both the input and output activations of a layer, but only uses its input.

EAddario · 2025-04-07T22:12:12Z

Very clear now, thanks @compilade. You're correct, I'm using the mean squared activation averaged to identify which tensors/layers produce large magnitude activations and ~~whilst~~ agree it isn't as accurate as, say, correlation / covariance / LIM ~~I think it's still a reasonable proxy, specially considering how the importance scores are actually used during quantization (quant_weights in ggml-quants.c)~~

I had a quick look at your PRs. I definitely like the idea of storing imatrix data in GGUF format and can appreciate how it would improve the generation of these types of stats. #12557 is quite intriguing, but truth be told I haven't had a chance to really digest it fully (there's a lot going on!) but would love to see it merged specially if it improves ternary quants

EAddario · 2025-04-08T10:39:35Z

Had a chance to think this more thoroughly and now get the implications of @jukofyork and @compilade's comments. Agree my current approach is not really identifying influence but rather score "growth". Back to the drawing table 😆

jukofyork · 2025-04-08T11:08:19Z

Had a chance to think this more thoroughly and now get the implications of @jukofyork and @compilade's comments. Agree my current approach is not really identifying influence but rather score "growth". Back to the drawing table 😆

I can help you with this, but it will need a fair bit of compute to calculate. I've not got time to explain fully but basically:

Decide on what you are optimising: L2-error in the final hidden-state, perplexity (ie: "wellcalibratedness" of the top choice), KL-divergence (ie: "wellcalibratedness" of the full probability distribution), earth-movers-distance, hinge-loss, or whatever.
Use some form of (2-sided) Finite-Differences to to estimate the gradient of the loss you are optimising with respect to moving up/down 1 bit of quant for a given parameter group (eg: layer-based or tensor-based groupings).

You will likely have to transform the loss measure somehow:

Perplexity is actually just a transformed version of negative log-loss, as is McFadden's Pseudo-R-squared and a whole host of different domain-specific measures of "wellcalibratedness". The fact people often plot the log-PPL suggests this is not a good metric to use for this...
The real thing you are measuring is "bits" (in the Information Theory sense; not the normal colloquial term) and negative-log-loss has a nice interpretation for this (the late David MacKay's book Information Theory, Inference, and Learning Algorithms is an amazing read to see the links if you are more interested in this!).

Assuming Finite-Differences is too costly to perform, then then you can use a stochastic approximation (FDSA) or its extension SPSA to estimate the gradients using whatever compute you can muster up.

jukofyork · 2025-04-08T11:15:23Z

I've edited the post above quite a lot so should hopefully make more sense (in case you're reading from the email notification).

EAddario · 2025-04-08T21:06:13Z

Thank you, now I know what I'm doing over the weekend 😁

On a serious note, much appreciated @jukofyork. Plenty of food for thought. I'll give it proper consideration

jukofyork · 2025-04-08T21:18:08Z

Thank you, now I know what I'm doing over the weekend 😁

On a serious note, much appreciated @jukofyork. Plenty of food for thought. I'll give it proper consideration

No problem and just remember the most important thing to figure out is exactly what you are optimising first! There are actually a lot of compelling options for this; each with their own reasons for and against... All have different costs to compute too:

Metrics using the full probability distribution like KL-divergence or earth-movers distance are the most expensive.
Then metrics that need a probability and have to pass through softmax are next.
Then metrics that require multiplication with lm_head (which in modern models can be >> hidden_dim!) are next.
Metrics involving the final hidden state are the cheapest.

@misc

*WARNING*: This is mostly vibe code. Hope I'm not wasting y'alls time. Compute Layer Importance Modification (LIM) Scores The goal of this PR is to rank layers of a given tensor in order of sensitivity to quantization error. Given that it is now possible to use `llama-quantize --custom-q ...` regex, it may be possible to use these LIM Scores to decide which layers of a given tensor to quantize more or less in an attempt to preserve generation quality (e.g. low perplexity) while reducing memory footprint as compared to using same quant size across all layers of a given tensor. This experimental PR was motivated by this comment and PR: ggml-org/llama.cpp#12718 I may force-push this after more testing and experimenting to see if it is actually doing the right thing and if the output is actually useful to improve quantization quality e.g. PPL per GiB... This may just be a big mistake, lol. This is built on existing imatrix computation and assumes that values of `x[j]` are the "activations" coming right in/out of the given tensor layer. I don't know GGML and generally work in python or vanilla c not so much c++. So a lot of this was vibe coded running [ubergarm/DeepSeek-V3-0324-GGUF IQ4_K_R4 quant](https://huggingface.co/ubergarm/DeepSeek-V3-0324-GGUF/tree/main/DeepSeek-V3-0324-IQ4_K_R4). So this is partially an experiment actually trying to use an LLM instead of just enjoying the meta of manual quantization min-maxing. ``` @misc{dumitru2024layerwisequantizationpragmaticeffective, title={Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels}, author={Razvan-Gabriel Dumitru and Vikas Yadav and Rishabh Maheshwary and Paul-Ioan Clotan and Sathwik Tejaswi Madhusudhan and Mihai Surdeanu}, year={2024}, eprint={2406.17415}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2406.17415}, code={https://github.com/RazvanDu/LayerwiseQuant/}, } ```

EAddario · 2025-04-13T19:39:26Z

Following from @jukofyork and @compilade's remarks and suggestions, I've made some changes in my approach.

To set the context, and explain exactly what's the problem I'm trying to solve, I have two objectives in mind:

find a way to identify and rank which tensors/layers are most influential during inference, and
implement changes in a 100% backwards compatible way. That is, they must work with any imatrix file already generated.

The direct implication of constraint "2" is no changes to IMatrixCollector::collect_imatrix, meaning that "1" has to rely solely on the importance scores (IS) stored in imatrix files, without access to the underlying weights.

As noted by @compilade, IS "...cannot really be compared across tensors without some kind of normalization, because the values of the model weights can also affect the relative importance of the activations..." however, IS are a direct measurement of how active a particular weight was during inference, based on a given input prompt (more on this later), and therefore can be used as a (arguably suboptimal) proxy for "influence" but instead of relying on the average, a better metric is to use the sum of IS per tensor/layer (the higher the number, the "busier" the tensor/layer and the more it contributes to upstream computations).

Although there are better metrics (e.g. gradient of loss, covariance, LIM, etc.), those would require changes to the imatrix collection process, which is beyond the scope of what I'm trying to do, at least for now. Having said that, it's worth keeping an eye on the work @ubergarm is doing in WIP Compute per layer LIM Scores during imatrix

Tests performed during quantization of DeepSeek-R1-Distill-Qwen-7B seem to confirm that Σ(Bias), which is what I'm calling the sum of IS per tensor, is a good influence indicator as it can be seen in the table below, where (↑) represents quantizing half of the most influential tensors (as per Σ(Bias)) at a higher bit level, and (↓) represents quantizing half of the least influential tensors at a higher bit level:

Model	μPPL (↑)	𝜌PPL (↑)	μKLD (↑)	RMS Δp (↑)	μPPL (↓)	𝜌PPL (↓)	μKLD (↓)	RMS Δp (↓)
IQ3_M	28.740047 ±0.291290	97.19%	0.229742 ±0.000770	11.793 ±0.050	28.721610 ±0.288684	96.94%	0.249550 ±0.000841	12.332 ±0.053
IQ3_S	30.290800 ±0.307742	96.32%	0.310982 ±0.001014	13.415 ±0.057	31.315997 ±0.316217	95.95%	0.341996 ±0.001082	14.292 ±0.058
IQ4_NL	23.570503 ±0.226124	98.59%	0.102854 ±0.000465	8.080 ±0.046	23.862907 ±0.226366	98.51%	0.117131 ±0.000395	8.560 ±0.040
Q3_K_L	24.160705 ±0.229989	97.75%	0.173336 ±0.000603	10.337 ±0.048	24.853047 ±0.240164	97.56%	0.195060 ±0.000681	10.801 ±0.050
Q3_K_M	24.967196 ±0.239198	97.50%	0.194299 ±0.000681	10.877 ±0.050	25.212714 ±0.244888	97.31%	0.214337 ±0.000747	11.278 ±0.052
Q3_K_S	25.661098 ±0.246635	96.84%	0.243850 ±0.000852	12.143 ±0.054	25.916397 ±0.250857	96.60%	0.270237 ±0.000928	12.737 ±0.057
Q4_K_M	23.125382 ±0.221860	99.24%	0.053997 ±0.000215	5.795 ±0.032	23.283282 ±0.223537	99.13%	0.065186 ±0.000241	6.273 ±0.034
Q4_K_S	23.156199 ±0.222000	99.18%	0.058337 ±0.000233	6.026 ±0.034	23.263445 ±0.223330	99.08%	0.069488 ±0.000261	6.429 ±0.035
Q5_K_M	22.726887 ±0.217691	99.75%	0.013562 ±0.000062	2.924 ±0.020	22.903038 ±0.220259	99.72%	0.015792 ±0.000063	3.114 ±0.019
Q5_K_S	22.766826 ±0.218244	99.74%	0.014589 ±0.000070	3.024 ±0.020	22.892603 ±0.220059	99.71%	0.017023 ±0.000073	3.231 ±0.020
Q6_K	22.859294 ±0.219461	99.87%	0.004317 ±0.000022	1.682 ±0.016	22.847118 ±0.219384	99.86%	0.004950 ±0.000021	1.767 ±0.012
Q8_0	22.840693 ±0.219408	99.90%	0.001614 ±0.000011	1.050 ±0.010	22.832647 ±0.219310	99.90%	0.001830 ±0.000024	1.110 ±0.016

For reference, compared to the naive Q4_K_M model, the layer-wised quantized is 10.7% smaller (4.68GB vs 4.18GB) with only a 0.35% penalty on μPPL:

Model	μPPL	𝜌PPL	μKLD	RMS Δp
Q4_K_M	22.936432 ±0.220488	99.59%	0.026917 ±0.000105	4.100 ±0.024

Whilst I was considering @jukofyork's feedback, I came to think of how much the benefit of using an imatrix is dependent on the quality of the prompt used during its generation, and how difficult it's to determine how well a given prompt "exercises" all of the model's capabilities, so I added additional statistics to help in that regard.

As things stand at the moment, --show-statistics now produce the following statistics:

Σ(Bias): the sum of all squared activations across the tensor (i.e. the Importance Scores)
Min & Max: minimum and maximum activation values
μ & σ: Activation's Mean and Standard Deviation
% Active: proportion of elements whose average activation exceeds a very small threshold (1e-6). Helpful to determine how alive/dormant the tensor is during inference
N: number of activations in the tensor
Entropy: entropy of the activation distribution, in bits (standard Shannon entropy measurement) $S = -\sum_{i=1}^N p_i \log_2 p_i$
E (norm): Normalized entropy. $E(norm)=\frac{-\sum_{i=1}^N p_i \log_2 p_i}{log_2 N}$. These two metrics can be used to determine how well a prompt "exercises" the model's capabilities
ZD Score: z-score distribution as described in 3.1 Layer Importance Scores in the Layer-Wise Quantization paper

ubergarm · 2025-04-14T02:52:23Z

Thanks for the update and defining the statistics gleaned from an existing imatrix.dat file. I pulled your branch and gave it a try on LLaMA-2-13B to compare against the same model used in that Layer-wise Quantization Paper (likely different quantization).

compute imatrix and then show statistics

Compute imatrix

$ git branch | grep '*'
* (HEAD detached at EAddario/imatrix)

$ git rev-parse --short HEAD
200d88c8

$ ./build/bin/llama-imatrix --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
version: 5136 (200d88c8)
built with cc (GCC) 14.2.1 20250128 for x86_64-pc-linux-gnu

$ ./build/bin/llama-imatrix \
    --verbosity 1 \
    -m /mnt/astrodata/llm/models/TheBloke/Llama-2-13B-chat-GGUF/llama-2-13b-chat.Q8_0.gguf \
    -f wiki.test.raw \
    -o imatrix-wiki-test-llama-2-13b-chat-Q8_0-gguf.dat \
    --ctx-size 512 \
    --threads 16

...
llama_model_loader: - type  f32:   81 tensors
llama_model_loader: - type q8_0:  282 tensors
...
compute_imatrix: tokenizing the input ..
compute_imatrix: tokenization took 397.256 ms
compute_imatrix: computing over 655 chunks with batch_size 512
compute_imatrix: 1.44 seconds per pass - ETA 15.73 minutes
[1]4.8087,[2]5.4272,[3]6.3040,[4]7.0129,[5]7.1984,[6]7.0947,[7]7.2490,[8]7.3314,[9]7.5682,
...
Final estimate: PPL = 6.5257 +/- 0.04210

save_imatrix: stored collected data after 655 chunks in imatrix-wiki-test-llama-2-13b-chat-Q8_0-gguf.dat

llama_perf_context_print:        load time =   22623.39 ms
llama_perf_context_print: prompt eval time =  861807.99 ms / 335360 tokens (    2.57 ms per token,   389.14 tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =  891205.70 ms / 335361 tokens

Show Statistics

$ ./build/bin/llama-imatrix \
    --in-file imatrix-wiki-test-llama-2-13b-chat-Q8_0-gguf.dat \
    --show-statistics

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes

Computing statistics for imatrix-wiki-test-llama-2-13b-chat-Q8_0-gguf.dat (280 tensors)

Layer	Tensor              	  Σ(Bias)	    Min	         Max	       μ	        σ	 % Active	     N	     Entropy	E (norm)	  ZD Score
==========================================================================================================================================================================
   30	attn_q              	   1321.16	 0.0000	     22.1645	  0.2580	   0.5248	   99.98%	  5120	     11.8988	  96.57%	    5.4688
   30	attn_v              	   1321.16	 0.0000	     22.1645	  0.2580	   0.5248	   99.98%	  5120	     11.8988	  96.57%	    5.4688
   30	attn_k              	   1321.16	 0.0000	     22.1645	  0.2580	   0.5248	   99.98%	  5120	     11.8988	  96.57%	    5.4688
   39	ffn_down            	   1290.84	 0.0042	     29.1379	  0.0934	   0.4147	  100.00%	 13824	     12.1372	  88.24%	   25.9693
   32	attn_v              	   1285.53	 0.0000	     17.6335	  0.2511	   0.4668	   99.98%	  5120	     11.9402	  96.90%	    5.4688
   32	attn_k              	   1285.53	 0.0000	     17.6335	  0.2511	   0.4668	   99.98%	  5120	     11.9402	  96.90%	    5.4688
   32	attn_q              	   1285.53	 0.0000	     17.6335	  0.2511	   0.4668	   99.98%	  5120	     11.9402	  96.90%	    5.4688
   34	attn_q              	   1256.21	 0.0000	     14.0536	  0.2454	   0.4260	   99.98%	  5120	     11.9679	  97.13%	    5.6641
   34	attn_v              	   1256.21	 0.0000	     14.0536	  0.2454	   0.4260	   99.98%	  5120	     11.9679	  97.13%	    5.6641
   34	attn_k              	   1256.21	 0.0000	     14.0536	  0.2454	   0.4260	   99.98%	  5120	     11.9679	  97.13%	    5.6641
   29	attn_k              	   1204.44	 0.0000	     23.4754	  0.2352	   0.5280	   99.98%	  5120	     11.8456	  96.13%	    5.4688
   29	attn_v              	   1204.44	 0.0000	     23.4754	  0.2352	   0.5280	   99.98%	  5120	     11.8456	  96.13%	    5.4688
   29	attn_q              	   1204.44	 0.0000	     23.4754	  0.2352	   0.5280	   99.98%	  5120	     11.8456	  96.13%	    5.4688
   33	attn_q              	   1183.21	 0.0000	     14.3861	  0.2311	   0.3921	   99.98%	  5120	     11.9785	  97.21%	    5.4688
   33	attn_v              	   1183.21	 0.0000	     14.3861	  0.2311	   0.3921	   99.98%	  5120	     11.9785	  97.21%	    5.4688
   33	attn_k              	   1183.21	 0.0000	     14.3861	  0.2311	   0.3921	   99.98%	  5120	     11.9785	  97.21%	    5.4688
   31	attn_k              	   1182.86	 0.0000	     20.5292	  0.2310	   0.4778	   99.98%	  5120	     11.8971	  96.55%	    5.4688
   31	attn_v              	   1182.86	 0.0000	     20.5292	  0.2310	   0.4778	   99.98%	  5120	     11.8971	  96.55%	    5.4688
   31	attn_q              	   1182.86	 0.0000	     20.5292	  0.2310	   0.4778	   99.98%	  5120	     11.8971	  96.55%	    5.4688
   35	attn_k              	   1173.15	 0.0000	     12.3308	  0.2291	   0.3496	   99.98%	  5120	     12.0212	  97.56%	    5.6641
   35	attn_v              	   1173.15	 0.0000	     12.3308	  0.2291	   0.3496	   99.98%	  5120	     12.0212	  97.56%	    5.6641
   35	attn_q              	   1173.15	 0.0000	     12.3308	  0.2291	   0.3496	   99.98%	  5120	     12.0212	  97.56%	    5.6641
   28	attn_v              	   1161.62	 0.0000	     24.2086	  0.2269	   0.5975	   99.98%	  5120	     11.7171	  95.09%	    5.6641
   28	attn_q              	   1161.62	 0.0000	     24.2086	  0.2269	   0.5975	   99.98%	  5120	     11.7171	  95.09%	    5.6641
   28	attn_k              	   1161.62	 0.0000	     24.2086	  0.2269	   0.5975	   99.98%	  5120	     11.7171	  95.09%	    5.6641
   27	attn_q              	   1152.05	 0.0000	     21.7389	  0.2250	   0.5541	   99.98%	  5120	     11.7706	  95.53%	    5.4688
   27	attn_k              	   1152.05	 0.0000	     21.7389	  0.2250	   0.5541	   99.98%	  5120	     11.7706	  95.53%	    5.4688
   27	attn_v              	   1152.05	 0.0000	     21.7389	  0.2250	   0.5541	   99.98%	  5120	     11.7706	  95.53%	    5.4688
   36	attn_q              	   1125.94	 0.0000	     12.8438	  0.2199	   0.3751	   99.98%	  5120	     11.9677	  97.13%	    5.8594
   36	attn_k              	   1125.94	 0.0000	     12.8438	  0.2199	   0.3751	   99.98%	  5120	     11.9677	  97.13%	    5.8594
   36	attn_v              	   1125.94	 0.0000	     12.8438	  0.2199	   0.3751	   99.98%	  5120	     11.9677	  97.13%	    5.8594
   38	attn_k              	   1072.28	 0.0151	     12.4462	  0.2094	   0.3015	  100.00%	  5120	     12.0386	  97.70%	    6.4453
   38	attn_v              	   1072.28	 0.0151	     12.4462	  0.2094	   0.3015	  100.00%	  5120	     12.0386	  97.70%	    6.4453
   38	attn_q              	   1072.28	 0.0151	     12.4462	  0.2094	   0.3015	  100.00%	  5120	     12.0386	  97.70%	    6.4453
   37	attn_v              	   1071.17	 0.0126	     14.2128	  0.2092	   0.3167	  100.00%	  5120	     12.0204	  97.55%	    6.2500
   37	attn_k              	   1071.17	 0.0126	     14.2128	  0.2092	   0.3167	  100.00%	  5120	     12.0204	  97.55%	    6.2500
   37	attn_q              	   1071.17	 0.0126	     14.2128	  0.2092	   0.3167	  100.00%	  5120	     12.0204	  97.55%	    6.2500
   25	attn_v              	   1037.08	 0.0000	     23.9319	  0.2026	   0.6313	   99.98%	  5120	     11.5734	  93.93%	    5.4688
   25	attn_q              	   1037.08	 0.0000	     23.9319	  0.2026	   0.6313	   99.98%	  5120	     11.5734	  93.93%	    5.4688
   25	attn_k              	   1037.08	 0.0000	     23.9319	  0.2026	   0.6313	   99.98%	  5120	     11.5734	  93.93%	    5.4688
   26	attn_k              	   1031.55	 0.0031	     25.6229	  0.2015	   0.6353	  100.00%	  5120	     11.5771	  93.96%	    5.6641
   26	attn_v              	   1031.55	 0.0031	     25.6229	  0.2015	   0.6353	  100.00%	  5120	     11.5771	  93.96%	    5.6641
   26	attn_q              	   1031.55	 0.0031	     25.6229	  0.2015	   0.6353	  100.00%	  5120	     11.5771	  93.96%	    5.6641
   24	attn_k              	    955.35	 0.0000	     20.3266	  0.1866	   0.5947	   99.98%	  5120	     11.5271	  93.55%	    5.8594
   24	attn_q              	    955.35	 0.0000	     20.3266	  0.1866	   0.5947	   99.98%	  5120	     11.5271	  93.55%	    5.8594
   24	attn_v              	    955.35	 0.0000	     20.3266	  0.1866	   0.5947	   99.98%	  5120	     11.5271	  93.55%	    5.8594
   23	attn_k              	    950.08	 0.0000	     22.1702	  0.1856	   0.6765	   99.98%	  5120	     11.3836	  92.39%	    5.4688
   23	attn_v              	    950.08	 0.0000	     22.1702	  0.1856	   0.6765	   99.98%	  5120	     11.3836	  92.39%	    5.4688
   23	attn_q              	    950.08	 0.0000	     22.1702	  0.1856	   0.6765	   99.98%	  5120	     11.3836	  92.39%	    5.4688
   39	attn_q              	    926.54	 0.0431	     16.0860	  0.1810	   0.2805	  100.00%	  5120	     12.0610	  97.88%	    5.8594
   39	attn_k              	    926.54	 0.0431	     16.0860	  0.1810	   0.2805	  100.00%	  5120	     12.0610	  97.88%	    5.8594
   39	attn_v              	    926.54	 0.0431	     16.0860	  0.1810	   0.2805	  100.00%	  5120	     12.0610	  97.88%	    5.8594
   22	attn_v              	    916.79	 0.0000	     18.9033	  0.1791	   0.5414	   99.98%	  5120	     11.5694	  93.89%	    5.8594
   22	attn_q              	    916.79	 0.0000	     18.9033	  0.1791	   0.5414	   99.98%	  5120	     11.5694	  93.89%	    5.8594
   22	attn_k              	    916.79	 0.0000	     18.9033	  0.1791	   0.5414	   99.98%	  5120	     11.5694	  93.89%	    5.8594
   38	ffn_down            	    905.56	 0.0059	     75.8273	  0.0655	   0.7782	  100.00%	 13824	     11.5526	  83.99%	    2.0255
   19	attn_q              	    879.58	 0.0100	     28.6687	  0.1718	   0.8143	  100.00%	  5120	     10.9550	  88.91%	    6.0547
   19	attn_v              	    879.58	 0.0100	     28.6687	  0.1718	   0.8143	  100.00%	  5120	     10.9550	  88.91%	    6.0547
   19	attn_k              	    879.58	 0.0100	     28.6687	  0.1718	   0.8143	  100.00%	  5120	     10.9550	  88.91%	    6.0547
   36	ffn_up              	    870.19	 0.0086	      1.1614	  0.1700	   0.0388	  100.00%	  5120	     12.2979	  99.81%	   38.4766
   36	ffn_gate            	    870.19	 0.0086	      1.1614	  0.1700	   0.0388	  100.00%	  5120	     12.2979	  99.81%	   38.4766
   37	ffn_up              	    866.00	 0.0098	      1.3722	  0.1691	   0.0456	  100.00%	  5120	     12.2901	  99.74%	   40.2344
   37	ffn_gate            	    866.00	 0.0098	      1.3722	  0.1691	   0.0456	  100.00%	  5120	     12.2901	  99.74%	   40.2344
   21	attn_k              	    865.62	 0.0092	     22.5825	  0.1691	   0.7082	  100.00%	  5120	     11.1497	  90.49%	    6.0547
   21	attn_q              	    865.62	 0.0092	     22.5825	  0.1691	   0.7082	  100.00%	  5120	     11.1497	  90.49%	    6.0547
   21	attn_v              	    865.62	 0.0092	     22.5825	  0.1691	   0.7082	  100.00%	  5120	     11.1497	  90.49%	    6.0547
   13	attn_k              	    863.66	 0.0136	     41.3031	  0.1687	   1.1620	  100.00%	  5120	     10.2387	  83.09%	    5.6641
   13	attn_q              	    863.66	 0.0136	     41.3031	  0.1687	   1.1620	  100.00%	  5120	     10.2387	  83.09%	    5.6641
   13	attn_v              	    863.66	 0.0136	     41.3031	  0.1687	   1.1620	  100.00%	  5120	     10.2387	  83.09%	    5.6641
    3	ffn_down            	    863.54	 0.0001	    849.5108	  0.0625	   7.2252	  100.00%	 13824	      0.2206	   1.60%	    0.0723
   16	attn_v              	    860.58	 0.0155	     39.5863	  0.1681	   1.0040	  100.00%	  5120	     10.5837	  85.89%	    6.0547
   16	attn_q              	    860.58	 0.0155	     39.5863	  0.1681	   1.0040	  100.00%	  5120	     10.5837	  85.89%	    6.0547
   16	attn_k              	    860.58	 0.0155	     39.5863	  0.1681	   1.0040	  100.00%	  5120	     10.5837	  85.89%	    6.0547
   14	attn_q              	    859.59	 0.0144	     48.8121	  0.1679	   1.2058	  100.00%	  5120	     10.1958	  82.75%	    5.4688
   14	attn_v              	    859.59	 0.0144	     48.8121	  0.1679	   1.2058	  100.00%	  5120	     10.1958	  82.75%	    5.4688
   14	attn_k              	    859.59	 0.0144	     48.8121	  0.1679	   1.2058	  100.00%	  5120	     10.1958	  82.75%	    5.4688
   18	attn_k              	    843.95	 0.0084	     26.9360	  0.1648	   0.7675	  100.00%	  5120	     10.9957	  89.24%	    6.0547
   18	attn_v              	    843.95	 0.0084	     26.9360	  0.1648	   0.7675	  100.00%	  5120	     10.9957	  89.24%	    6.0547
   18	attn_q              	    843.95	 0.0084	     26.9360	  0.1648	   0.7675	  100.00%	  5120	     10.9957	  89.24%	    6.0547
   17	attn_k              	    842.77	 0.0124	     33.2876	  0.1646	   0.8841	  100.00%	  5120	     10.7489	  87.23%	    5.8594
   17	attn_v              	    842.77	 0.0124	     33.2876	  0.1646	   0.8841	  100.00%	  5120	     10.7489	  87.23%	    5.8594
   17	attn_q              	    842.77	 0.0124	     33.2876	  0.1646	   0.8841	  100.00%	  5120	     10.7489	  87.23%	    5.8594
   38	ffn_up              	    840.16	 0.0088	      2.6975	  0.1641	   0.0626	  100.00%	  5120	     12.2701	  99.58%	   36.9141
   38	ffn_gate            	    840.16	 0.0088	      2.6975	  0.1641	   0.0626	  100.00%	  5120	     12.2701	  99.58%	   36.9141
   35	ffn_up              	    835.32	 0.0068	      1.1382	  0.1631	   0.0333	  100.00%	  5120	     12.3025	  99.84%	   40.2344
   35	ffn_gate            	    835.32	 0.0068	      1.1382	  0.1631	   0.0333	  100.00%	  5120	     12.3025	  99.84%	   40.2344
   15	attn_q              	    820.47	 0.0159	     44.4388	  0.1602	   1.1185	  100.00%	  5120	     10.2600	  83.27%	    5.2734
   15	attn_v              	    820.47	 0.0159	     44.4388	  0.1602	   1.1185	  100.00%	  5120	     10.2600	  83.27%	    5.2734
   15	attn_k              	    820.47	 0.0159	     44.4388	  0.1602	   1.1185	  100.00%	  5120	     10.2600	  83.27%	    5.2734
   20	attn_k              	    810.73	 0.0080	     22.8515	  0.1583	   0.7303	  100.00%	  5120	     10.9871	  89.17%	    6.0547
   20	attn_v              	    810.73	 0.0080	     22.8515	  0.1583	   0.7303	  100.00%	  5120	     10.9871	  89.17%	    6.0547
   20	attn_q              	    810.73	 0.0080	     22.8515	  0.1583	   0.7303	  100.00%	  5120	     10.9871	  89.17%	    6.0547
   34	ffn_up              	    799.17	 0.0067	      1.0181	  0.1561	   0.0281	  100.00%	  5120	     12.3064	  99.87%	   38.2812
   34	ffn_gate            	    799.17	 0.0067	      1.0181	  0.1561	   0.0281	  100.00%	  5120	     12.3064	  99.87%	   38.2812
   12	attn_v              	    782.01	 0.0126	     46.9238	  0.1527	   1.2340	  100.00%	  5120	      9.8808	  80.19%	    5.2734
   12	attn_q              	    782.01	 0.0126	     46.9238	  0.1527	   1.2340	  100.00%	  5120	      9.8808	  80.19%	    5.2734
   12	attn_k              	    782.01	 0.0126	     46.9238	  0.1527	   1.2340	  100.00%	  5120	      9.8808	  80.19%	    5.2734
   33	ffn_up              	    764.58	 0.0056	      0.8259	  0.1493	   0.0239	  100.00%	  5120	     12.3087	  99.89%	   46.4844
   33	ffn_gate            	    764.58	 0.0056	      0.8259	  0.1493	   0.0239	  100.00%	  5120	     12.3087	  99.89%	   46.4844
   32	ffn_gate            	    736.26	 0.0046	      0.7709	  0.1438	   0.0227	  100.00%	  5120	     12.3091	  99.90%	   45.8984
   32	ffn_up              	    736.26	 0.0046	      0.7709	  0.1438	   0.0227	  100.00%	  5120	     12.3091	  99.90%	   45.8984
   10	attn_v              	    713.91	 0.0092	     39.3571	  0.1394	   1.0706	  100.00%	  5120	      9.9807	  81.00%	    5.6641
   10	attn_k              	    713.91	 0.0092	     39.3571	  0.1394	   1.0706	  100.00%	  5120	      9.9807	  81.00%	    5.6641
   10	attn_q              	    713.91	 0.0092	     39.3571	  0.1394	   1.0706	  100.00%	  5120	      9.9807	  81.00%	    5.6641
    9	attn_v              	    709.57	 0.0059	     35.1349	  0.1386	   0.9907	  100.00%	  5120	     10.0564	  81.61%	    6.6406
    9	attn_k              	    709.57	 0.0059	     35.1349	  0.1386	   0.9907	  100.00%	  5120	     10.0564	  81.61%	    6.6406
    9	attn_q              	    709.57	 0.0059	     35.1349	  0.1386	   0.9907	  100.00%	  5120	     10.0564	  81.61%	    6.6406
   31	ffn_gate            	    706.57	 0.0035	      0.5213	  0.1380	   0.0190	  100.00%	  5120	     12.3114	  99.91%	   53.9062
   31	ffn_up              	    706.57	 0.0035	      0.5213	  0.1380	   0.0190	  100.00%	  5120	     12.3114	  99.91%	   53.9062
   11	attn_k              	    695.69	 0.0103	     44.5534	  0.1359	   1.1356	  100.00%	  5120	      9.7664	  79.26%	    5.4688
   11	attn_q              	    695.69	 0.0103	     44.5534	  0.1359	   1.1356	  100.00%	  5120	      9.7664	  79.26%	    5.4688
   11	attn_v              	    695.69	 0.0103	     44.5534	  0.1359	   1.1356	  100.00%	  5120	      9.7664	  79.26%	    5.4688
   30	ffn_gate            	    678.07	 0.0041	      0.5778	  0.1324	   0.0203	  100.00%	  5120	     12.3097	  99.90%	   47.6562
   30	ffn_up              	    678.07	 0.0041	      0.5778	  0.1324	   0.0203	  100.00%	  5120	     12.3097	  99.90%	   47.6562
   39	ffn_gate            	    648.54	 0.0191	      5.6152	  0.1267	   0.0890	  100.00%	  5120	     12.2396	  99.33%	   12.3047
   39	ffn_up              	    648.54	 0.0191	      5.6152	  0.1267	   0.0890	  100.00%	  5120	     12.2396	  99.33%	   12.3047
   29	ffn_up              	    647.83	 0.0048	      0.4959	  0.1265	   0.0169	  100.00%	  5120	     12.3115	  99.92%	   62.6953
   29	ffn_gate            	    647.83	 0.0048	      0.4959	  0.1265	   0.0169	  100.00%	  5120	     12.3115	  99.92%	   62.6953
   28	ffn_up              	    621.34	 0.0073	      0.4593	  0.1214	   0.0171	  100.00%	  5120	     12.3108	  99.91%	   59.5703
   28	ffn_gate            	    621.34	 0.0073	      0.4593	  0.1214	   0.0171	  100.00%	  5120	     12.3108	  99.91%	   59.5703
   27	ffn_gate            	    596.51	 0.0036	      0.5035	  0.1165	   0.0176	  100.00%	  5120	     12.3092	  99.90%	   63.4766
   27	ffn_up              	    596.51	 0.0036	      0.5035	  0.1165	   0.0176	  100.00%	  5120	     12.3092	  99.90%	   63.4766
    8	attn_q              	    595.64	 0.0067	     34.9034	  0.1163	   0.8977	  100.00%	  5120	      9.9023	  80.36%	    5.8594
    8	attn_v              	    595.64	 0.0067	     34.9034	  0.1163	   0.8977	  100.00%	  5120	      9.9023	  80.36%	    5.8594
    8	attn_k              	    595.64	 0.0067	     34.9034	  0.1163	   0.8977	  100.00%	  5120	      9.9023	  80.36%	    5.8594
   37	ffn_down            	    592.02	 0.0074	     16.6926	  0.0428	   0.1790	  100.00%	 13824	     12.6990	  92.32%	   25.3906
   26	ffn_gate            	    568.09	 0.0044	      0.5478	  0.1110	   0.0182	  100.00%	  5120	     12.3079	  99.89%	   53.3203
   26	ffn_up              	    568.09	 0.0044	      0.5478	  0.1110	   0.0182	  100.00%	  5120	     12.3079	  99.89%	   53.3203
   25	ffn_gate            	    542.26	 0.0052	      0.5749	  0.1059	   0.0192	  100.00%	  5120	     12.3055	  99.87%	   47.0703
   25	ffn_up              	    542.26	 0.0052	      0.5749	  0.1059	   0.0192	  100.00%	  5120	     12.3055	  99.87%	   47.0703
    7	attn_k              	    536.38	 0.0000	     37.2838	  0.1048	   0.9200	   99.98%	  5120	      9.3955	  76.25%	    6.6406
    7	attn_q              	    536.38	 0.0000	     37.2838	  0.1048	   0.9200	   99.98%	  5120	      9.3955	  76.25%	    6.6406
    7	attn_v              	    536.38	 0.0000	     37.2838	  0.1048	   0.9200	   99.98%	  5120	      9.3955	  76.25%	    6.6406
   24	ffn_gate            	    513.76	 0.0061	      0.6509	  0.1003	   0.0216	  100.00%	  5120	     12.3012	  99.83%	   37.5000
   24	ffn_up              	    513.76	 0.0061	      0.6509	  0.1003	   0.0216	  100.00%	  5120	     12.3012	  99.83%	   37.5000
    6	attn_k              	    511.80	 0.0000	     34.5247	  0.1000	   0.7756	   99.98%	  5120	      9.8035	  79.56%	    7.4219
    6	attn_v              	    511.80	 0.0000	     34.5247	  0.1000	   0.7756	   99.98%	  5120	      9.8035	  79.56%	    7.4219
    6	attn_q              	    511.80	 0.0000	     34.5247	  0.1000	   0.7756	   99.98%	  5120	      9.8035	  79.56%	    7.4219
   36	ffn_down            	    493.83	 0.0075	      5.3032	  0.0357	   0.0743	  100.00%	 13824	     13.0480	  94.86%	   44.4879
   23	ffn_gate            	    488.15	 0.0045	      0.7809	  0.0953	   0.0255	  100.00%	  5120	     12.2943	  99.78%	   17.9688
   23	ffn_up              	    488.15	 0.0045	      0.7809	  0.0953	   0.0255	  100.00%	  5120	     12.2943	  99.78%	   17.9688
   22	ffn_up              	    461.78	 0.0070	      0.8592	  0.0902	   0.0298	  100.00%	  5120	     12.2841	  99.69%	   12.8906
   22	ffn_gate            	    461.78	 0.0070	      0.8592	  0.0902	   0.0298	  100.00%	  5120	     12.2841	  99.69%	   12.8906
    5	attn_k              	    461.03	 0.0000	     27.0042	  0.0900	   0.7100	   99.96%	  5120	      9.4849	  76.98%	    8.9844
    5	attn_v              	    461.03	 0.0000	     27.0042	  0.0900	   0.7100	   99.96%	  5120	      9.4849	  76.98%	    8.9844
    5	attn_q              	    461.03	 0.0000	     27.0042	  0.0900	   0.7100	   99.96%	  5120	      9.4849	  76.98%	    8.9844
   21	ffn_up              	    432.89	 0.0068	      1.0011	  0.0845	   0.0359	  100.00%	  5120	     12.2675	  99.56%	   10.5469
   21	ffn_gate            	    432.89	 0.0068	      1.0011	  0.0845	   0.0359	  100.00%	  5120	     12.2675	  99.56%	   10.5469
    4	attn_k              	    416.60	 0.0000	     25.1496	  0.0814	   0.6785	   99.96%	  5120	      9.2580	  75.13%	    9.9609
    4	attn_v              	    416.60	 0.0000	     25.1496	  0.0814	   0.6785	   99.96%	  5120	      9.2580	  75.13%	    9.9609
    4	attn_q              	    416.60	 0.0000	     25.1496	  0.0814	   0.6785	   99.96%	  5120	      9.2580	  75.13%	    9.9609
   35	ffn_down            	    411.85	 0.0053	      7.9751	  0.0298	   0.0819	  100.00%	 13824	     13.0757	  95.06%	   28.2841
   20	ffn_gate            	    403.55	 0.0171	      1.2925	  0.0788	   0.0435	  100.00%	  5120	     12.2438	  99.37%	    8.7891
   20	ffn_up              	    403.55	 0.0171	      1.2925	  0.0788	   0.0435	  100.00%	  5120	     12.2438	  99.37%	    8.7891
   19	ffn_gate            	    382.99	 0.0103	      1.2834	  0.0748	   0.0409	  100.00%	  5120	     12.2452	  99.38%	    8.9844
   19	ffn_up              	    382.99	 0.0103	      1.2834	  0.0748	   0.0409	  100.00%	  5120	     12.2452	  99.38%	    8.9844
   18	ffn_gate            	    360.11	 0.0086	      1.1621	  0.0703	   0.0419	  100.00%	  5120	     12.2340	  99.29%	    9.1797
   18	ffn_up              	    360.11	 0.0086	      1.1621	  0.0703	   0.0419	  100.00%	  5120	     12.2340	  99.29%	    9.1797
   34	ffn_down            	    343.68	 0.0057	      1.9176	  0.0249	   0.0342	  100.00%	 13824	     13.3093	  96.76%	   43.4028
   17	ffn_up              	    336.38	 0.0122	      1.4292	  0.0657	   0.0480	  100.00%	  5120	     12.2045	  99.05%	    8.5938
   17	ffn_gate            	    336.38	 0.0122	      1.4292	  0.0657	   0.0480	  100.00%	  5120	     12.2045	  99.05%	    8.5938
   16	ffn_gate            	    311.79	 0.0122	      1.7776	  0.0609	   0.0573	  100.00%	  5120	     12.1552	  98.65%	    8.3984
   16	ffn_up              	    311.79	 0.0122	      1.7776	  0.0609	   0.0573	  100.00%	  5120	     12.1552	  98.65%	    8.3984
   33	ffn_down            	    307.16	 0.0097	      7.3743	  0.0222	   0.0698	  100.00%	 13824	     13.2318	  96.20%	   14.9740
   15	ffn_up              	    288.24	 0.0109	      2.0467	  0.0563	   0.0615	  100.00%	  5120	     12.1205	  98.37%	    8.0078
   15	ffn_gate            	    288.24	 0.0109	      2.0467	  0.0563	   0.0615	  100.00%	  5120	     12.1205	  98.37%	    8.0078
   14	ffn_up              	    272.26	 0.0103	      2.6254	  0.0532	   0.0710	  100.00%	  5120	     12.0645	  97.91%	    7.8125
   14	ffn_gate            	    272.26	 0.0103	      2.6254	  0.0532	   0.0710	  100.00%	  5120	     12.0645	  97.91%	    7.8125
   32	ffn_down            	    270.24	 0.0095	      0.7403	  0.0195	   0.0193	  100.00%	 13824	     13.4759	  97.97%	   46.8027
   13	ffn_up              	    254.86	 0.0113	      2.6888	  0.0498	   0.0725	  100.00%	  5120	     12.0363	  97.68%	    7.2266
   13	ffn_gate            	    254.86	 0.0113	      2.6888	  0.0498	   0.0725	  100.00%	  5120	     12.0363	  97.68%	    7.2266
   31	ffn_down            	    250.66	 0.0086	      0.9231	  0.0181	   0.0188	  100.00%	 13824	     13.4937	  98.10%	   43.7645
   12	ffn_gate            	    239.95	 0.0166	      2.6666	  0.0469	   0.0752	  100.00%	  5120	     11.9867	  97.28%	    7.2266
   12	ffn_up              	    239.95	 0.0166	      2.6666	  0.0469	   0.0752	  100.00%	  5120	     11.9867	  97.28%	    7.2266
   30	ffn_down            	    237.44	 0.0079	      0.5803	  0.0172	   0.0149	  100.00%	 13824	     13.5080	  98.20%	   50.7812
   11	ffn_up              	    230.23	 0.0148	      2.8725	  0.0450	   0.0777	  100.00%	  5120	     11.9567	  97.04%	    7.0312
   11	ffn_gate            	    230.23	 0.0148	      2.8725	  0.0450	   0.0777	  100.00%	  5120	     11.9567	  97.04%	    7.0312
   29	ffn_down            	    227.64	 0.0074	      6.8119	  0.0165	   0.0593	  100.00%	 13824	     13.3079	  96.75%	    7.5231
   10	ffn_up              	    220.84	 0.0059	      2.3218	  0.0431	   0.0624	  100.00%	  5120	     12.0437	  97.74%	    7.4219
   10	ffn_gate            	    220.84	 0.0059	      2.3218	  0.0431	   0.0624	  100.00%	  5120	     12.0437	  97.74%	    7.4219
   39	attn_output         	    213.80	 0.0049	      1.7995	  0.0418	   0.0570	  100.00%	  5120	     11.6992	  94.95%	   90.6250
    3	attn_k              	    212.66	 0.0000	     17.1690	  0.0415	   0.4298	   99.98%	  5120	      8.5517	  69.40%	    7.0312
    3	attn_q              	    212.66	 0.0000	     17.1690	  0.0415	   0.4298	   99.98%	  5120	      8.5517	  69.40%	    7.0312
    3	attn_v              	    212.66	 0.0000	     17.1690	  0.0415	   0.4298	   99.98%	  5120	      8.5517	  69.40%	    7.0312
    9	ffn_gate            	    211.89	 0.0064	      1.9591	  0.0414	   0.0548	  100.00%	  5120	     12.0596	  97.87%	    7.6172
    9	ffn_up              	    211.89	 0.0064	      1.9591	  0.0414	   0.0548	  100.00%	  5120	     12.0596	  97.87%	    7.6172
    2	attn_v              	    211.81	 0.0000	     13.5470	  0.0414	   0.5105	   99.86%	  5120	      7.5117	  60.96%	    5.0781
    2	attn_q              	    211.81	 0.0000	     13.5470	  0.0414	   0.5105	   99.86%	  5120	      7.5117	  60.96%	    5.0781
    2	attn_k              	    211.81	 0.0000	     13.5470	  0.0414	   0.5105	   99.86%	  5120	      7.5117	  60.96%	    5.0781
   28	ffn_down            	    210.59	 0.0071	      0.7934	  0.0152	   0.0169	  100.00%	 13824	     13.4661	  97.90%	   42.6794
   27	ffn_down            	    204.54	 0.0061	      8.1876	  0.0148	   0.0705	  100.00%	 13824	     13.2151	  96.08%	    4.0509
   26	ffn_down            	    195.28	 0.0058	      3.9368	  0.0141	   0.0383	  100.00%	 13824	     13.2929	  96.64%	   14.0336
    8	ffn_gate            	    189.36	 0.0115	      1.6949	  0.0370	   0.0461	  100.00%	  5120	     12.0880	  98.10%	    7.8125
    8	ffn_up              	    189.36	 0.0115	      1.6949	  0.0370	   0.0461	  100.00%	  5120	     12.0880	  98.10%	    7.8125
   38	attn_output         	    185.57	 0.0016	      1.4583	  0.0362	   0.0547	  100.00%	  5120	     11.5948	  94.10%	   53.1250
   25	ffn_down            	    177.29	 0.0051	      0.8608	  0.0128	   0.0142	  100.00%	 13824	     13.4412	  97.72%	   47.8877
   24	ffn_down            	    167.83	 0.0045	      0.8385	  0.0121	   0.0184	  100.00%	 13824	     13.3351	  96.95%	   32.1904
    7	ffn_up              	    167.13	 0.0085	      1.2138	  0.0326	   0.0395	  100.00%	  5120	     12.0921	  98.13%	    6.8359
    7	ffn_gate            	    167.13	 0.0085	      1.2138	  0.0326	   0.0395	  100.00%	  5120	     12.0921	  98.13%	    6.8359
   23	ffn_down            	    161.22	 0.0045	      1.2035	  0.0117	   0.0192	  100.00%	 13824	     13.3102	  96.77%	   31.1777
   22	ffn_down            	    150.90	 0.0038	      0.8320	  0.0109	   0.0151	  100.00%	 13824	     13.3489	  97.05%	   39.8582
    1	attn_k              	    148.63	 0.0000	     22.4289	  0.0290	   0.5286	   99.80%	  5120	      5.8192	  47.23%	    3.7109
    1	attn_q              	    148.63	 0.0000	     22.4289	  0.0290	   0.5286	   99.80%	  5120	      5.8192	  47.23%	    3.7109
    1	attn_v              	    148.63	 0.0000	     22.4289	  0.0290	   0.5286	   99.80%	  5120	      5.8192	  47.23%	    3.7109
   21	ffn_down            	    147.96	 0.0036	      1.6641	  0.0107	   0.0245	  100.00%	 13824	     13.1859	  95.86%	   19.8206
    6	ffn_up              	    143.83	 0.0134	      0.7677	  0.0281	   0.0279	  100.00%	  5120	     12.1471	  98.58%	    7.4219
    6	ffn_gate            	    143.83	 0.0134	      0.7677	  0.0281	   0.0279	  100.00%	  5120	     12.1471	  98.58%	    7.4219
   37	attn_output         	    127.32	 0.0007	      1.2476	  0.0249	   0.0382	  100.00%	  5120	     11.6690	  94.70%	   36.5234
   36	attn_output         	    124.95	 0.0022	      0.7087	  0.0244	   0.0317	  100.00%	  5120	     11.7572	  95.42%	   64.4531
   20	ffn_down            	    119.81	 0.0030	      0.3580	  0.0087	   0.0095	  100.00%	 13824	     13.4021	  97.44%	   53.0237
    5	ffn_gate            	    114.26	 0.0015	      0.5836	  0.0223	   0.0180	  100.00%	  5120	     12.1927	  98.95%	    8.2031
    5	ffn_up              	    114.26	 0.0015	      0.5836	  0.0223	   0.0180	  100.00%	  5120	     12.1927	  98.95%	    8.2031
   19	ffn_down            	    110.82	 0.0026	      0.5981	  0.0080	   0.0117	  100.00%	 13824	     13.3221	  96.85%	   37.1817
   18	ffn_down            	    100.26	 0.0026	      1.6162	  0.0073	   0.0172	  100.00%	 13824	     13.2686	  96.46%	   18.5185
   17	ffn_down            	     91.33	 0.0017	      0.9219	  0.0066	   0.0102	  100.00%	 13824	     13.3992	  97.41%	   30.8883
    4	ffn_gate            	     87.21	 0.0002	      0.2963	  0.0170	   0.0101	  100.00%	  5120	     12.2345	  99.29%	   10.5469
    4	ffn_up              	     87.21	 0.0002	      0.2963	  0.0170	   0.0101	  100.00%	  5120	     12.2345	  99.29%	   10.5469
   16	ffn_down            	     83.68	 0.0018	      0.3795	  0.0061	   0.0068	  100.00%	 13824	     13.4214	  97.58%	   46.2240
   35	attn_output         	     80.93	 0.0009	      0.3628	  0.0158	   0.0178	  100.00%	  5120	     11.8167	  95.90%	   67.3828
   15	ffn_down            	     69.29	 0.0015	      0.4523	  0.0050	   0.0060	  100.00%	 13824	     13.4392	  97.70%	   43.4028
   34	attn_output         	     68.75	 0.0018	      0.3458	  0.0134	   0.0159	  100.00%	  5120	     11.7593	  95.43%	   90.4297
    3	ffn_gate            	     63.74	 0.0000	      0.9831	  0.0124	   0.0160	  100.00%	  5120	     12.1360	  98.49%	    7.8125
    3	ffn_up              	     63.74	 0.0000	      0.9831	  0.0124	   0.0160	  100.00%	  5120	     12.1360	  98.49%	    7.8125
   21	attn_output         	     63.53	 0.0021	      0.5559	  0.0124	   0.0145	  100.00%	  5120	     11.8760	  96.38%	   53.7109
   15	attn_output         	     63.25	 0.0013	      0.1506	  0.0124	   0.0118	  100.00%	  5120	     11.9061	  96.62%	   81.6406
   14	ffn_down            	     60.91	 0.0014	      0.3164	  0.0044	   0.0045	  100.00%	 13824	     13.4907	  98.08%	   48.8281
   32	attn_output         	     60.46	 0.0005	      0.4920	  0.0118	   0.0169	  100.00%	  5120	     11.7173	  95.09%	   67.5781
   14	attn_output         	     59.20	 0.0033	      0.2145	  0.0116	   0.0095	  100.00%	  5120	     12.0477	  97.77%	   57.4219
   31	attn_output         	     58.85	 0.0005	      0.4893	  0.0115	   0.0167	  100.00%	  5120	     11.6401	  94.47%	   50.1953
   16	attn_output         	     58.58	 0.0012	      0.1902	  0.0114	   0.0095	  100.00%	  5120	     12.0063	  97.44%	   88.8672
   17	attn_output         	     58.46	 0.0005	      0.2506	  0.0114	   0.0106	  100.00%	  5120	     11.9494	  96.98%	   61.5234
   33	attn_output         	     53.96	 0.0014	      0.2382	  0.0105	   0.0079	  100.00%	  5120	     12.0467	  97.77%	  108.9844
   24	attn_output         	     53.59	 0.0005	      0.5380	  0.0105	   0.0263	  100.00%	  5120	     11.1589	  90.56%	   33.2031
   13	ffn_down            	     53.16	 0.0012	      0.1572	  0.0038	   0.0035	  100.00%	 13824	     13.5008	  98.15%	   50.1302
   20	attn_output         	     52.53	 0.0015	      0.2461	  0.0103	   0.0114	  100.00%	  5120	     11.8431	  96.11%	   75.1953
   30	attn_output         	     50.85	 0.0007	      0.2020	  0.0099	   0.0085	  100.00%	  5120	     11.9906	  97.31%	   95.5078
   12	ffn_down            	     46.43	 0.0004	      0.0648	  0.0034	   0.0025	  100.00%	 13824	     13.5358	  98.41%	   70.0231
   11	ffn_down            	     44.24	 0.0008	      0.4759	  0.0032	   0.0049	  100.00%	 13824	     13.4624	  97.87%	   23.6545
   13	attn_output         	     43.56	 0.0003	      0.1377	  0.0085	   0.0073	  100.00%	  5120	     11.9801	  97.23%	   63.0859
   12	attn_output         	     43.40	 0.0009	      0.1860	  0.0085	   0.0078	  100.00%	  5120	     11.9642	  97.10%	   72.8516
   11	attn_output         	     42.74	 0.0006	      0.5558	  0.0083	   0.0176	  100.00%	  5120	     11.4660	  93.05%	   50.1953
   25	attn_output         	     42.61	 0.0006	      0.3259	  0.0083	   0.0095	  100.00%	  5120	     11.8723	  96.35%	   69.9219
   23	attn_output         	     42.58	 0.0005	      0.1831	  0.0083	   0.0095	  100.00%	  5120	     11.7843	  95.64%	   62.6953
   19	attn_output         	     42.16	 0.0004	      0.2335	  0.0082	   0.0076	  100.00%	  5120	     12.0083	  97.45%	   41.7969
   26	attn_output         	     41.73	 0.0003	      0.2064	  0.0082	   0.0076	  100.00%	  5120	     11.9276	  96.80%	   79.4922
   27	attn_output         	     41.03	 0.0003	      0.8884	  0.0080	   0.0141	  100.00%	  5120	     11.8718	  96.35%	   25.7812
   22	attn_output         	     40.76	 0.0003	      0.1580	  0.0080	   0.0071	  100.00%	  5120	     11.8881	  96.48%	   99.6094
   18	attn_output         	     40.68	 0.0014	      0.2471	  0.0079	   0.0069	  100.00%	  5120	     12.0482	  97.78%	   57.2266
   10	ffn_down            	     39.95	 0.0006	      0.1846	  0.0029	   0.0025	  100.00%	 13824	     13.5468	  98.49%	   48.9728
    2	ffn_up              	     38.98	 0.0000	      0.1812	  0.0076	   0.0036	  100.00%	  5120	     12.2648	  99.54%	    7.4219
    2	ffn_gate            	     38.98	 0.0000	      0.1812	  0.0076	   0.0036	  100.00%	  5120	     12.2648	  99.54%	    7.4219
   29	attn_output         	     38.72	 0.0016	      0.0977	  0.0076	   0.0053	  100.00%	  5120	     12.0489	  97.78%	  130.2734
   28	attn_output         	     38.28	 0.0006	      0.1802	  0.0075	   0.0064	  100.00%	  5120	     11.9516	  96.99%	  131.0547
   10	attn_output         	     36.31	 0.0004	      0.1589	  0.0071	   0.0085	  100.00%	  5120	     11.7977	  95.75%	   60.7422
    9	ffn_down            	     36.00	 0.0006	      0.7241	  0.0026	   0.0067	  100.00%	 13824	     13.3678	  97.19%	   10.7784
    8	ffn_down            	     30.51	 0.0004	      0.3576	  0.0022	   0.0042	  100.00%	 13824	     13.3650	  97.17%	   20.4716
    9	attn_output         	     25.89	 0.0003	      0.1683	  0.0051	   0.0074	  100.00%	  5120	     11.6535	  94.58%	   51.5625
    7	ffn_down            	     25.57	 0.0002	      0.3904	  0.0018	   0.0055	  100.00%	 13824	     13.1784	  95.81%	    9.4763
    6	ffn_down            	     18.29	 0.0003	      0.1456	  0.0013	   0.0018	  100.00%	 13824	     13.4276	  97.62%	   35.3733
    0	attn_q              	     18.29	 0.0000	      5.9196	  0.0036	   0.0950	   94.32%	  5120	      4.4566	  36.17%	    4.8828
    0	attn_k              	     18.29	 0.0000	      5.9196	  0.0036	   0.0950	   94.32%	  5120	      4.4566	  36.17%	    4.8828
    0	attn_v              	     18.29	 0.0000	      5.9196	  0.0036	   0.0950	   94.32%	  5120	      4.4566	  36.17%	    4.8828
    8	attn_output         	     17.56	 0.0001	      0.0978	  0.0034	   0.0039	  100.00%	  5120	     11.8420	  96.10%	   55.8594
    1	ffn_gate            	     17.11	 0.0000	      0.5277	  0.0033	   0.0083	  100.00%	  5120	     11.9241	  96.77%	    5.0781
    1	ffn_up              	     17.11	 0.0000	      0.5277	  0.0033	   0.0083	  100.00%	  5120	     11.9241	  96.77%	    5.0781
    7	attn_output         	     13.82	 0.0001	      0.0629	  0.0027	   0.0034	  100.00%	  5120	     11.7857	  95.65%	   51.5625
    5	ffn_down            	     12.69	 0.0001	      0.3858	  0.0009	   0.0034	  100.00%	 13824	     13.2589	  96.39%	    7.2338
    6	attn_output         	      9.60	 0.0000	      0.0566	  0.0019	   0.0026	  100.00%	  5120	     11.6751	  94.75%	   54.8828
    4	ffn_down            	      7.48	 0.0001	      0.0299	  0.0005	   0.0006	  100.00%	 13824	     13.4405	  97.71%	   54.4705
    0	ffn_gate            	      7.24	 0.0000	      0.3432	  0.0014	   0.0109	   99.94%	  5120	      9.7065	  78.77%	    6.4453
    0	ffn_up              	      7.24	 0.0000	      0.3432	  0.0014	   0.0109	   99.94%	  5120	      9.7065	  78.77%	    6.4453
    5	attn_output         	      6.31	 0.0000	      0.0573	  0.0012	   0.0018	  100.00%	  5120	     11.7298	  95.19%	   33.3984
    4	attn_output         	      4.28	 0.0000	      0.0411	  0.0008	   0.0016	  100.00%	  5120	     11.5801	  93.98%	   32.4219
    0	ffn_down            	      4.25	 0.0000	      3.6589	  0.0003	   0.0312	   99.73%	 13824	      1.6508	  12.00%	    0.1447
    3	attn_output         	      3.57	 0.0000	      0.0637	  0.0007	   0.0025	  100.00%	  5120	     10.5307	  85.46%	   26.9531
    2	ffn_down            	      2.67	 0.0000	      0.0087	  0.0002	   0.0002	  100.00%	 13824	     13.3953	  97.39%	   44.5602
    1	ffn_down            	      2.13	 0.0000	      0.6453	  0.0002	   0.0061	  100.00%	 13824	      8.4307	  61.29%	    0.3617
    2	attn_output         	      1.46	 0.0000	      0.0200	  0.0003	   0.0005	  100.00%	  5120	     11.4702	  93.09%	   42.7734
    1	attn_output         	      1.05	 0.0000	      0.0229	  0.0002	   0.0006	  100.00%	  5120	     10.2723	  83.37%	   50.5859
    0	attn_output         	      0.46	 0.0000	      0.0577	  0.0001	   0.0011	   90.25%	  5120	      7.1328	  57.89%	   12.8906

Graph of Entropy & ZD Score by Layer and Tensor

Discussion

So I'm not sure how best to read these stats and interpret the
graphs. According to the Layer-wise Quantization Paper the top 3 most
important layers according to their LIM Score are 1, 2, and 40. The least
important being 32, 33, and 34. However, I don't see a correlation in
the graphs at least with Entropy and what you are calling "ZD Score"*

*Just to confirm, what you are calling "ZD Score" is calculated using
the imatrix activations whereas in the paper it is defined as all weights in a given layer, (my emphasis):

We examine the proportion of weights in a layer exhibiting a z-score greater than 1.
where for layer Li, wi represents an individual weight, µ the mean of the weights, and σ their standard deviation.

Anyway, just some observations. I didn't slice the data to look at the
other metrics nor try to normalize all the tensors of a given layer
togther into a single "layer" score.

Fascinating stuff, hopefully I can dig in more later this week! Cheers!

EAddario · 2025-07-06T06:20:26Z

Hi @compilade, good to go? or should I change something else?

compilade

Hi @compilade, good to go? or should I change something else?

Sure, this seems ready.

The only thing that's bothering me is that the stats aren't calculated on the actual activations (and this can make some of the stats misleading, like cosine similarity never being negative (because it's performed on mean squared activations)), but that's a limitation of what is currently stored in imatrix files, so I'm not considering this a blocker.

This could either be merged before or after #9400, but the PR merged afterwards will have to adapt to some of the change of the other one¹ (and so it kind of needs to be coordinated).

The changes suggested in https://github.com/ggml-org/llama.cpp/pull/12718#discussion_r2188201527 will need to be made in the PR merged last ↩

tools/imatrix/imatrix.cpp

Co-authored-by: compilade <[email protected]>

EAddario · 2025-07-07T20:39:50Z

The only thing that's bothering me is that the stats aren't calculated on the actual activations

Agree 100% and no doubts #9400 will provide a neat way to address this. In the meantime, I'll update the README.md file over the weekend to document what the new option is actually doing and what the calculated stats really mean, and will then re-request a review

EAddario · 2025-07-12T10:53:10Z

@compilade, I've updated the README file to reflect current limitations when calculating the stats, and made a note to reimplement/improve the functionality once #9400 is merged. Until then, completely up to you to merge now, or wait until #9400 is in place.

tools/imatrix/README.md

EAddario added 6 commits April 1, 2025 21:54

Add --show-statistics option

d8e902e

Add --show-statistics logic

f46693b

Merge branch 'master' into imatrix

b3ac78b

Add tensor name parsing

dc3373e

Tidy output format

0589c3e

Fix typo in title

e1fd1af

github-actions bot added the examples label Apr 2, 2025

Green-Sky mentioned this pull request Apr 3, 2025

Add imatrix support leejet/stable-diffusion.cpp#633

Open

Merge branch 'master' into imatrix

490a8fe

EAddario marked this pull request as draft April 8, 2025 10:27

Improve tensor influence ranking

62ac268

EAddario mentioned this pull request Apr 12, 2025

quantize: Handle user-defined quantization levels for additional tensors #12511

Merged

ubergarm mentioned this pull request Apr 13, 2025

WIP Compute per layer LIM Scores during imatrix ikawrakow/ik_llama.cpp#326

Closed

5 tasks

Add better statistics

73d8ecb

Merge branch 'master' into imatrix

200d88c

Change statistics' sort order

0b7f9c4

EAddario added 5 commits June 23, 2025 21:10

Merge branch 'master' into imatrix

ed4ba31

Rename labels

19f8e15

Add m_stats getter and refactor compute_statistics out of load_imatrix

f5fd2b7

Refactor variable names

bc3bd57

Merge branch 'master' into imatrix

c3ede42

EAddario requested a review from compilade June 25, 2025 08:29

EAddario added 5 commits June 29, 2025 11:55

Merge branch 'master' into imatrix

1389753

Minor cosmetic change

fde3089

Retrigger checks (empty commit)

c5a3d0a

Merge branch 'master' into imatrix

688d0c2

Rerun checks (empty commit)

b1c481a

compilade approved these changes Jul 7, 2025

View reviewed changes

tools/imatrix/imatrix.cpp Show resolved Hide resolved

tools/imatrix/imatrix.cpp Outdated Show resolved Hide resolved

tools/imatrix/imatrix.cpp Outdated Show resolved Hide resolved

EAddario and others added 2 commits July 7, 2025 21:07

Fix unnecessary type promotion

dd13175

Co-authored-by: compilade <[email protected]>

Reverting change to improve code readability

0cd8e67

EAddario added 7 commits July 7, 2025 21:42

Merge branch 'master' into imatrix

6c72d8e

Rerun checks (empty commit)

6826341

Rerun checks (empty commit)

432650b

Rerun checks - third time's the Charm 🤞 (empty commit)

61a21a4

Merge branch 'master' into imatrix

1a43247

Minor cosmetic change

a3fdb2b

Update README

f9391bd

EAddario mentioned this pull request Jul 12, 2025

imatrix : use GGUF to store importance matrices #9400

Open

10 tasks

Fix typo

98bcd3e

compilade reviewed Jul 12, 2025

View reviewed changes

tools/imatrix/README.md Outdated Show resolved Hide resolved

EAddario added 3 commits July 13, 2025 17:18

Merge branch 'master' into imatrix

71d8492

Update README

69a0b17

Rerun checks (empty commit)

9f2c558

imatrix: add option to display importance score statistics for a given imatrix file #12718

Are you sure you want to change the base?

imatrix: add option to display importance score statistics for a given imatrix file #12718

Conversation

EAddario commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Apr 2, 2025

Uh oh!

EAddario commented Apr 2, 2025

Uh oh!

jukofyork commented Apr 3, 2025

Uh oh!

EAddario commented Apr 6, 2025

Uh oh!

compilade commented Apr 6, 2025

Uh oh!

EAddario commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EAddario commented Apr 8, 2025

Uh oh!

jukofyork commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jukofyork commented Apr 8, 2025

Uh oh!

EAddario commented Apr 8, 2025

Uh oh!

jukofyork commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EAddario commented Apr 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ubergarm commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Compute imatrix

Show Statistics

Discussion

Uh oh!

EAddario commented Jul 6, 2025

Uh oh!

compilade left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Footnotes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EAddario commented Jul 7, 2025

Uh oh!

EAddario commented Jul 12, 2025

Uh oh!

Uh oh!

Uh oh!

EAddario commented Apr 2, 2025 •

edited

Loading

EAddario commented Apr 7, 2025 •

edited

Loading

jukofyork commented Apr 8, 2025 •

edited

Loading

jukofyork commented Apr 8, 2025 •

edited

Loading

EAddario commented Apr 13, 2025 •

edited

Loading

ubergarm commented Apr 14, 2025 •

edited

Loading

compilade left a comment •

edited

Loading