Downstream tasks for dBOT

Catalog

Fine-tuning on ImageNet-1K
Object detection and instance segmentation on COCO
Semantic segmentation on ADE20K
Transfer classification on smaller datasets

Fine-tuning on ImageNet-1K

To fine-tune ViT on ImageNet-1k with multi-node distributed training, run the following scripts on 4 nodes with 8 GPUs each.

effective_batch_size is 1024 with effective_batch_size = batch_size * nnodes * ngpus * accum_iter.

ViT-B/16

./run.sh dbot_base_finetune \
imagenet_finetune \
vit_base_patch16 \
8 \
dbot_base_pre/checkpoint-1600.pth \
--epochs 100 \
--blr 3e-4 \
--batch_size 32 \
--data_path './data/imagenet'

ViT-L/16

./run.sh dbot_large_finetune \
imagenet_finetune \
vit_large_patch16 \
8 \
dbot_large_pre/checkpoint-1600.pth \
--drop_path 0.25 \
--clip_grad 1.0 \
--blr 3e-4 \
--layer_decay 0.75 \
--epochs 50 \
--batch_size 32 \
--data_path './data/imagenet'

ViT-H/14

./run.sh dbot_huge_finetune_224 \
imagenet_finetune \
vit_huge_patch14 \
8 \
dbot_huge_pre/checkpoint-1600.pth \
--drop_path 0.3 \
--blr 3e-4 \
--layer_decay 0.75 \
--epochs 50 \
--batch_size 16 \
--accum_iter 2 \
--data_path './data/imagenet'

ViT-H/14-448

./run.sh dbot_huge_finetune_448 \
imagenet_finetune \
vit_huge_patch14 \
8 \
dbot_huge_finetune_224/checkpoint-best.pth \
--drop_path 0.4 \
--blr 7e-5 \
--layer_decay 0.9 \
--input_size 448 \ 
--batch_size 8 \
--accum_iter 4 \
--epochs 10 \
--warmup_epochs 2 \
--data_path './data/imagenet'

Object detection and instance segmentation on COCO

To fine-tune ViT on COCO with Cacade Mask R-CNN as the task layer, run the following scripts:

ViT-B/16

./run.sh dbot_base_cascade-coco-det \
cascade_coco_det \
vit_base \
8 \
dbot_base_pre/checkpoint-1600.pth \
data.samples_per_gpu=2 \
lr_config.step=8,11 \
runner.max_epochs=12 \
optimizer.paramwise_cfg.layer_decay_rate=0.75

ViT-L/16

./run.sh dbot_large_cascade-coco-det \
cascade_coco_det \
vit_large \
8 \
dbot_large_pre/checkpoint-1600.pth \
data.samples_per_gpu=2
lr_config.step=8,11
runner.max_epochs=12
optimizer.paramwise_cfg.layer_decay_rate=0.75

Semantic segmentation on ADE20K

🎯 We opt for MAE's implementation for semantic segmentation following mae_segmentation codebase. Specifically, we trained relative position embeddings from scratch on top of pre-trained fixed absolute position embeddings. We emperically find better segmentation results for both MAE and dBOT using this than using iBOT's implementation.

To fine-tune ViT on ADE20K with UperNet as the task layer, run the following scripts:

ViT-B/16

./run.sh dbot_base_ade20k-seg \
ade20k_seg \
base 8 \
dbot_base_pre/checkpoint-1600.pth \
optimizer_config.use_fp16=True \
optimizer.lr=1e-4 \
optimizer.paramwise_cfg.layer_decay_rate=0.65 \
model.backbone.drop_path_rate=0.2 \
--deterministic

ViT-L/16

./run.sh dbot_large_ade20k-seg \
ade20k_seg \
large 8 \
dbot_large_pre/checkpoint-1600.pth \
optimizer_config.use_fp16=True \
optimizer.lr=3e-5 \
optimizer.paramwise_cfg.layer_decay_rate=0.95 \
model.backbone.drop_path_rate=0.2 \
--deterministic

Transfer classification on smaller datasets

To fine-tune ViT on smaller datasets, run the following scripts:

./run.sh dbot_base_transfer \
cifar10_cls+cifar_cls+cars_cls+flwrs_cls+inat_cls+inat19_cls 
vit_base 8 \
dbot_base_pre/checkpoint-1600.pth \

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOWNSTREAM.md

DOWNSTREAM.md

Downstream tasks for dBOT

Catalog

Fine-tuning on ImageNet-1K

Object detection and instance segmentation on COCO

Semantic segmentation on ADE20K

Transfer classification on smaller datasets

Files

DOWNSTREAM.md

Latest commit

History

DOWNSTREAM.md

File metadata and controls

Downstream tasks for dBOT

Catalog

Fine-tuning on ImageNet-1K

Object detection and instance segmentation on COCO

Semantic segmentation on ADE20K

Transfer classification on smaller datasets