Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support multiheadattention int8 #3940

Open
wants to merge 58 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
2a5a296
feat(tools/quantize): support toml
tpoisonooo Jun 13, 2022
e8ad914
apply code-format changes
tpoisonooo Jun 13, 2022
77e6546
feat(tools/quantize): add .ini parser
tpoisonooo Jun 15, 2022
8e2f806
apply code-format changes
tpoisonooo Jun 15, 2022
146b8ba
improvement(tools/quantize): add ini config
tpoisonooo Jun 16, 2022
12f075f
Merge branch 'master' of https://github.com/tencent/ncnn into ncnn-in…
tpoisonooo Jun 16, 2022
f719ee7
Merge branch 'ncnn-int8-toml' of https://github.com/tpoisonooo/ncnn i…
tpoisonooo Jun 16, 2022
9863b26
improvement(tools/quantize): refactor code
tpoisonooo Jun 16, 2022
1612caf
apply code-format changes
tpoisonooo Jun 16, 2022
be66fac
test(tools/quantize/ncnn2int8): test quant sqznet
tpoisonooo Jun 16, 2022
ba6640d
improvement(CMakeLists): downgrade to cxx11
tpoisonooo Jun 16, 2022
d106fc0
apply code-format changes
tpoisonooo Jun 16, 2022
fab112d
Update CMakeLists.txt
tpoisonooo Jun 16, 2022
77cf07a
Update ncnn2table.cpp
tpoisonooo Jun 16, 2022
9262515
Merge branch 'ncnn-int8-toml' of https://github.com/tpoisonooo/ncnn i…
tpoisonooo Jun 16, 2022
9d473f5
fix(CI): remove cxx17 grammar
tpoisonooo Jun 16, 2022
181714e
fix(tools/quantize): typo
tpoisonooo Jun 16, 2022
b32dd56
docs(ncnn2int8): add ini description
tpoisonooo Jun 17, 2022
12bef90
feat(ncnn2int8): parse mha
tpoisonooo Jun 17, 2022
c7641ca
feat(src/layer): add mha int8
tpoisonooo Jun 17, 2022
f20318b
apply code-format changes
tpoisonooo Jun 17, 2022
4de1aff
feat(src/layer): add mha int8
tpoisonooo Jun 18, 2022
acedd44
Merge branch 'master' of https://github.com/tencent/ncnn into support…
tpoisonooo Jun 18, 2022
9d743fe
Merge branch 'support-mha-int8' of https://github.com/tpoisonooo/ncnn…
tpoisonooo Jun 18, 2022
2428661
feat(src/layer): mha int8 input transform
tpoisonooo Jun 18, 2022
5305e50
apply code-format changes
tpoisonooo Jun 18, 2022
8d276f4
feat(src/layer/multiheadattention): add log_int_softmax
tpoisonooo Jun 19, 2022
a560617
Merge branch 'support-mha-int8' of https://github.com/tpoisonooo/ncnn…
tpoisonooo Jun 19, 2022
75061d9
apply code-format changes
tpoisonooo Jun 19, 2022
30d6388
feat(src/layer): log_int_softmax
tpoisonooo Jun 21, 2022
09db0c5
Merge branch 'support-mha-int8' of https://github.com/tpoisonooo/ncnn…
tpoisonooo Jun 21, 2022
33eaa02
apply code-format changes
tpoisonooo Jun 21, 2022
25ca6bc
fix(tools/quantize): value_get template specialization
tpoisonooo Jun 21, 2022
fe6ee36
apply code-format changes
tpoisonooo Jun 21, 2022
cb3ac68
fix(quantize/ncnn2int8): convert weight missing clone
tpoisonooo Jun 21, 2022
de0e76a
fix(multiheadattention.cpp): load bias
tpoisonooo Jun 21, 2022
449f9cb
fix(src/layer): model load size error
tpoisonooo Jun 22, 2022
3c96faa
fix(net_quantize.cpp): weight scale
tpoisonooo Jun 23, 2022
c81850e
apply code-format changes
tpoisonooo Jun 23, 2022
83e3368
fix(lis): scale error
tpoisonooo Jun 24, 2022
58df666
fix(mha): single opr precision
tpoisonooo Jun 25, 2022
b958cab
improvement(mha): fp32 version using fake quant
tpoisonooo Jun 25, 2022
0843acf
fix(mha): remove LIS and get good precision
tpoisonooo Jun 25, 2022
527b03a
Merge branch 'support-mha-int8' of https://github.com/tpoisonooo/ncnn…
tpoisonooo Jun 25, 2022
aa6e791
apply code-format changes
tpoisonooo Jun 25, 2022
bdf52ab
improvement(mha): quantize softmax output
tpoisonooo Jun 26, 2022
1bf72dc
apply code-format changes
tpoisonooo Jun 26, 2022
9258065
improvement(benchmark): clean code
tpoisonooo Jun 26, 2022
6c7d992
docs(operators.md): update mha
tpoisonooo Jun 26, 2022
3f1844b
revert(src/layer/mha): do not quantize softmax
tpoisonooo Jun 27, 2022
240137b
improvement(test): add mha test
tpoisonooo Jun 29, 2022
14d45ab
apply code-format changes
tpoisonooo Jun 29, 2022
c9f430f
fix(CI): rebase code
tpoisonooo Jul 28, 2022
66ed718
Merge branch 'support-mha-int8' of https://github.com/tpoisonooo/ncnn…
tpoisonooo Jul 28, 2022
435e380
apply code-format changes
tpoisonooo Jul 28, 2022
497dbd7
fix(CI): test mha exceeding
tpoisonooo Aug 1, 2022
5c5a586
fix(src/layer/mha): miss convert weight to int8
tpoisonooo Aug 3, 2022
8c44ccf
apply code-format changes
tpoisonooo Aug 3, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions benchmark/benchncnn.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,10 @@ int main(int argc, char** argv)

benchmark("vision_transformer", ncnn::Mat(384, 384, 3), opt);

benchmark("FastestDet", ncnn::Mat(352, 352, 3), opt);

benchmark("vision_transformer_int8", ncnn::Mat(384, 384, 3), opt);

#if NCNN_VULKAN
delete g_blob_vkallocator;
delete g_staging_vkallocator;
Expand Down
146 changes: 146 additions & 0 deletions benchmark/vision_transformer_int8.param

Large diffs are not rendered by default.

13 changes: 11 additions & 2 deletions docs/developer-guide/operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -1054,9 +1054,10 @@ y = affine(out)
| 0 | embed_dim | int | 0 | |
| 1 | num_head | int | 1 | |
| 2 | weight_data_size| int | 0 | |
| 3 | int8_scale_term| int | 0 | |

| weight | type | shape |
| ------------- | ----- | --------------------- |
| weight | type | shape | description |
| ------------- | ----- | --- | --------------------- |
| q_weight_data | float/fp16/int8 | [weight_data_size] |
| q_bias_data | float | [embed_dim] |
| k_weight_data | float/fp16/int8 | [weight_data_size] |
Expand All @@ -1065,6 +1066,14 @@ y = affine(out)
| v_bias_data | float | [embed_dim] |
| out_weight_data| float/fp16/int8 | [weight_data_size] |
| out_bias_data | float | [embed_dim] |
| q_input_scale | float | [1] |
| k_input_scale | float | [1] |
| v_input_scale | float | [1] |
| q_weight_scales | float | [embed_dim] |
| k_weight_scales | float | [embed_dim] |
| v_weight_scales | float | [embed_dim] |
| internal_scales | float | [5] | scales for xq/xk/xv/before_softmax/before_output |


# MVN
```
Expand Down
28 changes: 26 additions & 2 deletions docs/how-to-use-and-FAQ/quantized-int8-inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Some imagenet sample images here https://github.com/nihui/imagenet-sample-images

```shell
find images/ -type f > imagelist.txt
./ncnn2table mobilenet-opt.param mobilenet-opt.bin imagelist.txt mobilenet.table mean=[104,117,123] norm=[0.017,0.017,0.017] shape=[224,224,3] pixel=BGR thread=8 method=kl
./ncnn2table mobilenet-opt.param mobilenet-opt.bin imagelist.txt mobilenet.table mean=[104,117,123] norm=[0.017,0.017,0.017] shape=[224,224,3] pixel=BGR thread=8 method=kl format=txt
```

* mean and norm are the values you passed to ```Mat::substract_mean_normalize()```
Expand All @@ -35,6 +35,7 @@ find images/ -type f > imagelist.txt
* pixel is the pixel format of your model, image pixels will be converted to this type before ```Extractor::input()```
* thread is the CPU thread count that could be used for parallel inference
* method is the post training quantization algorithm, kl and aciq are currently supported
* format is the output file type of quantization parameters, choose `ini` for `txt`. Using `txt` by default

If your model has multiple input nodes, you can use multiple list files and other parameters

Expand All @@ -60,7 +61,7 @@ mobilenet.load_model("mobilenet-int8.bin");

## mixed precision inference

Before quantize your model, comment the layer weight scale line in table file, then the layer will do the float32 inference
Before quantize your model, comment layer weight scale line in the table file with `txt` format, then the layer will do the float32 inference

```
conv1_param_0 156.639840536
Expand All @@ -69,3 +70,26 @@ conv1_param_0 156.639840536
```
#conv1_param_0 156.639840536
```

If you are using `ini` format, just remove whole quantization parameters of the layer, for example:

```
[conv0]
type = "Conv"
weight = [ 156.639840536 ]
input_scale = 1.23

[fire]
type = "Gemm"
weight = [ 156.639840536 ]
input_scale = 1.23
```

to

```
[fire]
type = "Gemm"
weight = [ 156.639840536 ]
input_scale = 1.23
```
Loading