MICCAI 2024, the 27th International Conference on Medical Image Computing and Computer Assisted Intervention, Marrakech, Morocco, October 2024.
Keywords: Endoscopic Image Computing, Feature Selection Gates, Hard-Attention Gates, Gradient Routing, CNNs, Vision Transformers, Gastroenterological Polyp Size Estimation, Medical Image Analysis, Overfitting Reduction, Model Generalization.
This repository contains the official implementation of the paper "Feature Selection Gates with Gradient Routing for Endoscopic Image Computing", presented at MICCAI 2024. This toolbox provides implementations for CNNs, multistream CNNs, ViTs, and their augmented variants using Feature-Selection Gates (FSG) or Hard-Attention Gates (HAG) with Gradient Routing (GR). The primary objective is to enhance model generalization and reduce overfitting, specifically in the context of gastroenterological polyp size assessment.
If you find this toolbox useful in your research, please cite the following papers:
Accepted Publication:
@inproceedings{roffo2024FSG,
title={Feature Selection Gates with Gradient Routing for Endoscopic Image Computing},
author={Giorgio Roffo and Carlo Biffi and Pietro Salvagnini and Andrea Cherubini},
booktitle={MICCAI 2024, the 27th International Conference on Medical Image Computing and Computer Assisted Intervention, Marrakech, Morocco, October 2024.},
year={2024},
organization={Springer}
}
Preprint Version:
@misc{roffo2024hardattention,
title={Hard-Attention Gates with Gradient Routing for Endoscopic Image Computing},
author={Giorgio Roffo and Carlo Biffi and Pietro Salvagnini and Andrea Cherubini},
year={2024},
eprint={2407.04400},
archivePrefix={arXiv},
primaryClass={eess.IV}
}
We extend our gratitude to the MICCAI community and all collaborators for their invaluable contributions and support.
In this work, we present Feature-Selection Gates (FSG), also known as Hard-Attention Gates (HAG), along with a novel approach called Gradient Routing (GR) for Online Feature Selection (OFS) in deep learning models. This method aims to enhance performance in endoscopic image computing by reducing overfitting and improving generalization.
Key contributions:
- FSG/HAG: Implements sparsification with learnable weights, serving as a regularization strategy to promote sparse connectivity in neural networks (Convolutional and Vision Transformer models).
- GR: Optimizes FSG/HAG parameters through dual forward passes, independent of the main model, refining feature re-weighting.
- Performance Improvement: Validated across multiple datasets, including CIFAR-100 and specialized endoscopic datasets (REAL-Colon, Misawa, and SUN), showing significant gains in binary and triclass polyp size classification.
from torchvision.models import vit_b_16, ViT_B_16_Weights
from vit_with_fsg import vit_with_fsg
import torch
print("π₯ Loading pretrained ViT...")
backbone = vit_b_16(weights=ViT_B_16_Weights.DEFAULT)
print("π§ Injecting FSG into backbone...")
model = vit_with_fsg(vit_backbone=backbone)
dummy_input = torch.randn(1, 3, 224, 224)
output = model(dummy_input)
print("β
Output shape:", output.shape)
Feature Selection/Attention Gates with Gradient Routing for Online Feature Selection.
βββ example_configs
βββ gr_checkpoints
β βββ miccai24_FSG_GR_vit.zip
β βββ pretrained_models.txt
βββ MICCAI_2024_official_dataset_splits
β βββ MICCAI2024-FSG-GR-datasets-official-splits-.zip
β βββ per_object_gt_group_distribution_per_fold.png
β βββ per_object_gt_group_distribution_per_unique_id_per_fold.png
β βββ per_object_kfold_distribution.png
βββ modules
β βββ analytics
β β βββ calculate_metrics.py
β β βββ visualizations.py
β βββ datasets
β β βββ dataset.py
β β βββ sampler.py
β βββ losses
β β βββ classification_loss.py
β β βββ weighted_size_combined_loss.py
β βββ models
β β βββ gr_transfutils
β β βββ fsg_vision_transformers.py
β β βββ multi_stream_nets.py
β β βββ vision_transformers.py
β βββ schedulers
β β βββ cosine_annealing_warm_restarts.py
β βββ transforms
β β βββ transforms_sizing.py
β β βββ base_params.py
βββ runners
β βββ build_configuration.py
β βββ trainer.py
βββ vit_with_fsg.py # * FSG-ViT integration (import this only)
βββ demo_training_mnist.py
βββ demo_inference_mnist.py
βββ demo_training_imnet.py
βββ demo_inference_imnet.py
βββ README.md
βββ visualize_dataset.py
βββ README.md
- Download Link: REAL-colon Dataset on Figshare
- GitHub Repository: REAL-colon Dataset Code
The REAL (Real-world multi-center Endoscopy Annotated video Library) - colon dataset comprises 60 recordings of real-world colonoscopies from four different clinical studies, each contributing 15 videos. The dataset includes:
- Total Size: Approximately 880.78 GB
- Frames: 2,757,723 total frames
- Polyps: 132 removed colorectal polyps
- Annotations: 351,264 bounding box annotations
The dataset is organized as follows:
- 60 compressed folders named
{SSS}-{VVV}_frames
containing video frames (SSS
indicates the clinical study,VVV
represents the video name). - 60 compressed folders named
{SSS}-{VVV}_annotations
containing video annotations for each recording. video_info.csv
: Metadata for each video.lesion_info.csv
: Metadata for each lesion, including endoscope brand, bowel cleanliness score, number of surgically removed colon lesions, and more.dataset_description.md
: A README file with information about the dataset.
To automatically download the dataset, run figshare_dataset.py
from the GitHub repository. The script will download the dataset into the ./dataset
folder by default. You can change the output folder by setting the DOWNLOAD_DIR
variable in figshare_dataset.py
. Given the large size of the dataset, ensure you have sufficient storage and bandwidth before starting the download.
- Download Link: SUN Colonoscopy Video Database
- Request Download Link: Email [email protected]
The database is available for non-commercial research or educational purposes only. Commercial use is prohibited without permission. Proper citation is required when using the dataset: For access, send a request email to [email protected].
The SUN (Showa University and Nagoya University) Colonoscopy Video Database is designed for evaluating automated colorectal-polyp detection systems. It includes:
- Total Frames: 158,690 frames
- Polyp Frames: 49,136 frames from 100 polyps, annotated with bounding boxes
- Non-Polyp Frames: 109,554 frames
The database is organized as follows:
- Polyp Frame Annotations: Each polyp frame is annotated with bounding boxes provided in text files. Each line in the text file corresponds to a bounding box in the format:
Filename min_Xcoordinate,min_Ycoordinate,max_Xcoordinate,max_Ycoordinate,class_id
. Class_id 0 represents polyp frames, and class_id 1 represents non-polyp frames. - Image Formats: JPEG for images, text files for bounding box annotations.
Database characteristics:
- Patients: 99 (71 males, 28 females)
- Median Age: 69 years (IQR: 58β74)
- Polyps: 100 polyps with details including size, morphology, location, and pathological diagnosis.
This script (preprocess_raw_datasets.py
) preprocesses raw datasets like the REAL-colon and SUN Colonoscopy Video Database, making them ready for deep learning model training.
-
Parameter File Handling:
- Reads a provided parameter file or uses a default configuration file.
- Specifies dataset paths, output folder, and other settings.
-
Dataset Preprocessing:
- Checks and prepares the output folder.
- Processes datasets to extract frames containing polyps and saves them in CSV format.
- Handles REAL-colon and SUN datasets specifically with appropriate preprocessors.
-
K-Fold Split Creation:
- Generates K-fold splits for cross-validation if creating or recreating the dataset.
- You can download and use the official MICCAI 2024 splits or create new ones.
-
Data Statistics Generation:
- Produces and displays statistics about the dataset.
-
Download the Datasets:
- Download the REAL-colon dataset from Figshare.
- Request access to the SUN Colonoscopy Video Database by emailing [email protected].
-
Run the Script:
- Ensure the datasets are downloaded to the specified paths.
- Execute the script with the parameter file:
python preprocess_raw_datasets.py -parFile path/to/your/parameter_file.yml
- If no parameter file is provided, the script will use a default configuration file.
The script simplifies dataset preparation, enabling efficient training of deep learning models on standardized data.
- β
Drop-in: Easily wraps any
torchvision
ViT model (e.g.vit_b_16
,vit_l_16
) - β General-purpose: Use on natural images, medical data, and even token sequences in NLP
- β Regularizes ViTs for low-data regimes (tested on CIFAR-100, endoscopic videos, etc.)
- β No ViT surgery: FSG wraps Transformer layers directly
While this method was originally proposed for polyp size estimation in colonoscopy, it is designed to generalize across:
- 𧬠Medical image analysis
- πΌοΈ General image classification
- π NLP Transformers (e.g. GPT, BERT)
Dataset | Training Script | Inference Script | Checkpoint Path |
---|---|---|---|
MNIST | demo_training_mnist.py |
demo_inference_mnist.py |
./checkpoints/fsg_vit_mnist_demo.pth |
Imagenette | demo_training_imnet.py |
demo_inference_imnet.py |
./checkpoints/fsg_vit_imagenette_demo.pth |
β οΈ These demos use reduced datasets and epochs to run quickly and demonstrate the API.
Train the ViT-B16 + FSG model using small demo datasets:
# Train on MNIST (test set used for speed)
python demo_training_mnist.py
# Train on Imagenette (ImageNet-mini val set)
python demo_training_imnet.py
This will save model checkpoints to:
./checkpoints/fsg_vit_mnist_demo.pth
./checkpoints/fsg_vit_imagenette_demo.pth
Run inference using a saved checkpoint or from scratch:
# Inference with pretrained checkpoint
python demo_inference_mnist.py --checkpoint ./checkpoints/fsg_vit_mnist_demo.pth
python demo_inference_imnet.py --checkpoint ./checkpoints/fsg_vit_imagenette_demo.pth
# Inference from scratch (random weights)
python demo_inference_mnist.py
python demo_inference_imnet.py
βΉοΈ When no --checkpoint
is given, the model is evaluated without any fine-tuning.
If you use this project, please cite our work:
@inproceedings{roffo2024FSG,
title={Feature Selection Gates with Gradient Routing for Endoscopic Image Computing},
author={Giorgio Roffo and Carlo Biffi and Pietro Salvagnini and Andrea Cherubini},
booktitle={MICCAI 2024, the 27th International Conference on Medical Image Computing and Computer Assisted Intervention, Marrakech, Morocco, October 2024.},
year={2024},
organization={Springer}
}
For inquiries or support regarding the implementation or the paper, please reach out to the corresponding authors via the contact information provided in the paper.
Giorgio Roffo - [email protected]
Andrea Cherubini - [email protected]
v1.0, 2024/10/09
This project is licensed for use and is subject to the terms outlined in the LICENSE file.