DePerceiver: Improving Small Object Detection in DETR

This is the PyTorch Lightning code for DePerceiver. DePerceiver is a model that performs object detection by modifying the implementation of DETR and esceptio's implementation for Perceiver IO. Although DETR performs on par with state of the art object detection models like Faster R-CNN, it lags behind in small object detection. This is due to the downsampling in the backbone; when its image features are fed into the transformers, it is difficult to use the low resolution features to detect small objects. DETR uses a transformer, whose self-attention module is quadratic with respect to image size, which does not allow it to use features at multiple scales.

We propose DePerceiver, which makes use of the Perceiver IO architecture in place of the transformer. Perceiver IO is a transformer-like architecture that has linear complexity, which allows the model to run more efficiently and to use larger feature maps (or multi-scale).

This code is our attempt to improve on object detection baselines with the DePerceiver model. The COCO 2017 dataset needs to be downloaded to train and evaluate the models in this repository. Once that is complete, the model can be trained with main_naive.py. The implementation details of the architecture are present under the deperceiver/ folder.

For a more detailed description of the project see our paper here: Paper

This code was submitted as a final project by Edward Li and Aditya Kannan for Visual Learning and Recognition (16-824) in Fall 2021.

Dataset

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Usage

First, clone the repository:

git clone https://github.com/mooey5775/DePerceiver.git

Then install PyTorch and other necessary libraries in a conda environment on Python 3.8

conda install -c pytorch pytorch torchvision
conda install cython scipy tqdm
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
conda install -c conda-forge wandb pytorch-lightning
pip install einops

Training

By default, wandb logging is enabled. You can pass --no-wandb to disable it. In your command line, run wandb login to set up wandb before training.

Then we can train the default DePerceiver implementation with 32x downsampling in the backbone. We suggest running with more than 1 GPU to make training run quicker. Below we use 4 GPUs.

python -m torch.distributed.run --nproc_per_node=4 main_naive.py --gpus 4 --coco_path /path/to/coco --lr_drop 33 --epochs 50 --num_workers 3 --run_name default_run --output_dir /path/to/checkpoint/directory --amp --no_aux_loss

We can change the downsampling factor to be 4, 8, or 16:

python -m torch.distributed.run --nproc_per_node=4 main_naive.py --gpus 4 --coco_path /path/to/coco --lr_drop 33 --epochs 50 --num_workers 3 --run_name naive_16_run --output_dir /path/to/checkpoint/directory --amp --no_aux_loss --downsample_factor 16

We also attempted a multi-scale implementation, which can be trained as follows. It does not perform very well and likely needs more tuning.

python -m torch.distributed.run --nproc_per_node=4 main_naive.py --gpus 4 --coco_path /path/to/coco --lr_drop 33 --epochs 50 --num_workers 3 --run_name naive_multiscale_run --output_dir /path/to/checkpoint/directory --amp --no_aux_loss --multiscale

Using this, we can train and evaluate our model. You can use wandb to visualize the model performance.

Models

Trained models for 32, 16, and 8 downscaling after 50 epochs are available here: Models

License

DePerceiver is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Name	Name	Last commit message	Last commit date
Latest commit mooey5775 update paper Dec 11, 2021 cee9b82 · Dec 11, 2021 History 55 Commits
deperceiver	deperceiver	added some comments	Dec 8, 2021
.gitignore	.gitignore	add debugging cifar model	Nov 11, 2021
LICENSE	LICENSE	add license	Oct 17, 2021
README.md	README.md	add model link	Dec 11, 2021
activation.svg	activation.svg	add paper and disable wandb	Dec 11, 2021
environment.yaml	environment.yaml	complexity analysis	Dec 1, 2021
flop_counter.ipynb	flop_counter.ipynb	update flop counting	Dec 11, 2021
main.py	main.py	disable find unused params for speedup	Oct 18, 2021
main_debug.py	main_debug.py	add debugging cifar model	Nov 11, 2021
main_naive.py	main_naive.py	add paper and disable wandb	Dec 11, 2021
model_diagram.png	model_diagram.png	added diagram	Dec 11, 2021
paper.pdf	paper.pdf	update paper	Dec 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DePerceiver: Improving Small Object Detection in DETR

Dataset

Usage

Training

Models

License

About

Releases

Packages

Contributors 2

Languages

License

mooey5775/DePerceiver

Folders and files

Latest commit

History

Repository files navigation

DePerceiver: Improving Small Object Detection in DETR

Dataset

Usage

Training

Models

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages