Visual Attention Network (VAN) paper pdf

This is a PyTorch implementation of VAN proposed by our paper "Visual Attention Network".

Figure 1: Compare with different vision backbones on ImageNet-1K validation set.

Citation:

@article{guo2022visual,
  title={Visual Attention Network},
  author={Guo, Meng-Hao and Lu, Cheng-Ze and Liu, Zheng-Ning and Cheng, Ming-Ming and Hu, Shi-Min},
  journal={arXiv preprint arXiv:2202.09741},
  year={2022}
}

News:

2022.02.22 Release paper on ArXiv.

2022.03.15 Supported by Hugging Face.

2022.05.01 Supported by OpenMMLab.

Abstract:

While originally designed for natural language processing (NLP) tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, the 2D nature of images brings three challenges for applying self-attention in computer vision. (1) Treating images as 1D sequences neglects their 2D structures. (2) The quadratic complexity is too expensive for high-resolution images. (3) It only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel large kernel attention (LKA) module to enable self-adaptive and long-range correlations in self-attention while avoiding the above issues. We further introduce a novel neural network based on LKA, namely Visual Attention Network (VAN). While extremely simple and efficient, VAN outperforms the state-of-the-art vision transformers (ViTs) and convolutional neural networks (CNNs) with a large margin in extensive experiments, including image classification, object detection, semantic segmentation, instance segmentation, etc.

Figure 2: Decomposition diagram of large-kernel convolution. A standard convolution can be decomposed into three parts: a depth-wise convolution (DW-Conv), a depth-wise dilation convolution (DW-D-Conv) and a 1×1 convolution (1×1 Conv).

Figure 3: The structure of different modules: (a) the proposed Large Kernel Attention (LKA); (b) non-attention module; (c) the self-attention module (d) a stage of our Visual Attention Network (VAN). CFF means convolutional feed-forward network. The difference between (a) and (b) is the element-wise multiply. It is worth noting that (c) is designed for 1D sequences. .

Image Classification

Data prepare: ImageNet with the following folder structure.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

2. VAN Models

Model	#Params(M)	GFLOPs	Top1 Acc(%)	Download
VAN-Tiny	4.1	0.9	75.4	Google Drive, Tsinghua Cloud, Hugging Face 🤗
VAN-Small	13.9	2.5	81.1	Google Drive, Tsinghua Cloud, Hugging Face 🤗
VAN-Base	26.6	5.0	82.8	Google Drive, Tsinghua Cloud,Hugging Face 🤗,
VAN-Large	44.8	9.0	83.9	Google Drive, Tsinghua Cloud, Hugging Face 🤗
VAN-Huge	TODO	TODO	TODO	TODO

Unofficial keras (tensorflow) version.

3.Requirement

1. Pytorch >= 1.7
2. timm == 0.4.12

4. Train

We use 8 GPUs for training by default. Run command (It has been writen in train.sh):

MODEL=van_tiny # van_{tiny, small, base, large}
DROP_PATH=0.1 # drop path rates [0.1, 0.1, 0.1, 0.2] for [tiny, small, base, large]
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash distributed_train.sh 8 /path/to/imagenet \
	  --model $MODEL -b 128 --lr 1e-3 --drop-path $DROP_PATH

5. Validate

Run command (It has been writen in eval.sh) as:

MODEL=van_tiny # van_{tiny, small, base, large}
python3 validate.py /path/to/imagenet --model $MODEL \
  --checkpoint /path/to/model -b 128

6.Acknowledgment

Our implementation is mainly based on pytorch-image-models and PoolFormer. Thanks for their authors.

LICENSE

This repo is under the Apache-2.0 license. For commercial use, please contact the authors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Attention Network (VAN) paper pdf

Citation:

News:

2022.02.22 Release paper on ArXiv.

2022.03.15 Supported by Hugging Face.

2022.05.01 Supported by OpenMMLab.

Abstract:

Image Classification

2. VAN Models

3.Requirement

4. Train

5. Validate

6.Acknowledgment

LICENSE

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
images		images
models		models
LICENSE		LICENSE
README.md		README.md
distributed_train.sh		distributed_train.sh
eval.sh		eval.sh
train.py		train.py
train.sh		train.sh
validate.py		validate.py

License

diamond0910/VAN-Classification

Folders and files

Latest commit

History

Repository files navigation

Visual Attention Network (VAN) paper pdf

Citation:

News:

2022.02.22 Release paper on ArXiv.

2022.03.15 Supported by Hugging Face.

2022.05.01 Supported by OpenMMLab.

Abstract:

Image Classification

2. VAN Models

3.Requirement

4. Train

5. Validate

6.Acknowledgment

LICENSE

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages