Dahyun Kang1,2 Piotr Koniusz3,4 Minsu Cho2 Naila Murray1
This repo is the official implementation of the CVPR 2023 paper: Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation.
This project is built upon the following environment:
The package requirements can be installed via environment.yml
, which includes
pytorch
==1.12.0torchvision
==0.13.0cudatoolkit
==11.3pytorch-lightning
==1.6.5einops
==0.6.0
conda env create --name pytorch1.12 --file environment.yml -p YOURCONDADIR/envs/pytorch1.12
conda activate pytorch1.12
Make sure to replace YOURCONDADIR
in the installation path with your conda dir, e.g., ~/anaconda3
Download the datasets by following the file structure below and set args.datapath=YOUR_DATASET_DIR
:
YOUR_DATASET_DIR/
├── VOC2012/
│ ├── Annotations/
│ ├── JPEGImages/
│ ├── ...
├── COCO2014/
│ ├── annotations/
│ ├── train2014/
│ ├── val2014/
│ ├── ...
├── ...
python main.py --datapath YOUR_DATASET_DIR \
--benchmark {pascal, coco} \
--logpath YOUR_DIR_TO_SAVE_CKPT \
--fold {0, 1, 2, 3} \
--sup mask
python main.py --datapath YOUR_DATASET_DIR \
--benchmark {pascal, coco} \
--logpath YOUR_DIR_TO_SAVE_CKPT \
--fold {0, 1, 2, 3} \
--sup pseudo
Performance results | Links to download checkpoints | |||||||
methods | 1-way 1-shot | 2-way 1-shot | 4-fold validation folds | |||||
metric | cls. 0/1 ER | seg. mIoU | cls. 0/1 ER | seg. mIoU | fold0 | fold1 | fold2 | fold3 |
image-level supervised models | 79.9 | 33.2 | 64.6 | 31.9 | link | link | link | link |
pixel-level supervised models | 85.7 | 55.5 | 70.4 | 53.7 | link | link | link | link |
Performance results | Links to download checkpoints | |||||||
methods | 1-way 1-shot | 2-way 1-shot | 4-fold validation folds | |||||
metric | cls. 0/1 ER | seg. mIoU | cls. 0/1 ER | seg. mIoU | fold0 | fold1 | fold2 | fold3 |
image-level supervised models | 78.2 | 19.6 | 62.4 | 18.3 | link | link | link | link |
pixel-level supervised models | 80.8 | 38.3 | 64.0 | 36.2 | link | link | link | link |
If you find our code or paper useful, please consider citing our paper:
@inproceedings{kang2023distilling,
title={Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification \& Segmentation},
author={Kang, Dahyun and Koniusz, Piotr and Cho, Minsu and Murray, Naila},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2023}
}