Skip to content

"Capsule Networks against Medical Imaging Data Challenges" - LABELS@MICCAI 2018

License

Notifications You must be signed in to change notification settings

ameliajimenez/capsule-networks-medical-data-challenges

Repository files navigation

Capsule Networks against Medical Imaging Data Challenges

by Amelia Jiménez-Sánchez, Shadi Albarqouni, Diana Mateus

This repository provides a TensorFlow implementation of our work -> [Paper] [arXiv] [slides] [poster]

Overview

In this paper, we experimentally demonstrate that the equivariance properties of Capsule Networks (CapsNets) reduce the strong data requirements, and are therefore very promising for medical image analysis. Focusing on computer-aided diagnosis (classification) tasks, we address the problems of limited amount of annotated data, imbalance of class distributions and data augmentation.

Here there is a summary of our findings:

Requirements:

  • Python 3.5+
  • TensorFlow 1.4+
  • Sklearn
  • OpenCV
  • Spams
  • Pandas
  • Numpy

Usage

1. Cloning the repository

$ git clone https://github.com/ameliajimenez/capsule-networks-medical-data-challenges.git
$ cd capsule-networks-medical-data-challenges/

2. Downloading datasets

For the two vision datasets (MNIST, Fashion-MNIST), it is enough to set data_path to ./data/mnist or ./data/fashion, respectively. Data will be downloaded only the first time.
For the two medical datasets: mitosis detection (TUPAC16) and diabetic retinopathy detection (DIARETDB1), first images have to be downloaded from their respective websited and after, patches are extracted and stored.

  • Download the "Auxiliary dataset: mitoses" of TUPAC16 from the Dataset section, and move all the folders to raw_data/tupac16/. Please note that you need to register on their website and login to be able to download the data.
  • Download DIARETDB1 dataset and move the content from resources/images into raw_data/diaretdb1/.

3. Pre-processing and extracting patches

Pre-processing consists of normalization using color deconvolution (stain normalization) and keeping the hematoxylin channel for TUPAC16, and applying contrast limited adaptive histogram equalization (CLAHE) on the lab color space and keeping only the green channel for DIARETDB1. By running the following scripts, patch extraction is performed and data is stored in data directory. The augment argument controls the use of data augmentation, you can set it to "True" or "False".

$ python preprocess_tupac16.py
$ python preprocess_diaretdb1.py

4. Loading the data

To perform experiments with limited amount of data, change the percentage_train argument. For the class-imbalance experiments, set unbalance to True and define the imbalanceness by changing unbalance_dict argument. To compare the performance with and without data augmentation, use the appropiate data_path for the medical datasets and for the vision ones (MNIST and Fashion-MNIST) set augment argument to True.

data_provider = read_datasets(data_path,
                              augment=False,
                              percentage_train=100.,
                              unbalance=False,  unbalance_dict={"percentage": 20, "label1": 0, "label2": 8})

5. Definition of the network

To select the desired architecture for training and test, define net in train.py and test.py respectively. Specify the number of classes in the classification problem with n_class and set is_training to True (training) or False (test). For example, for training a CapsNet with 10 classes.

net = capsnet.CapsNet(n_class=10, is_training=True)

please notice that trainer argument has to be changed accordingly.

6. Training

Specify the network as described in Step 5 and use model_path argument to store your models, e.g. model_path = "./models/mnist/capsnet/".

$ python train.py

7. Test

To restore and test a model, define the network as described in Step 5 and from where models should be restored with model_path argument.

$ python test.py

8. Visualization of CapsNet

You can compare the input images and their reconstruction by visualizing them.
You can also modify the dimensions of the output vector of the secondary capsule to interpret the different dimensions.

$ python visualization.py

Citation

If this work is useful for your research, please cite our paper:

@inproceedings{JimnezSnchez2018CapsuleNA,
  title={Capsule Networks Against Medical Imaging Data Challenges},
  author={Amelia Jim{\'e}nez-S{\'a}nchez and Shadi Albarqouni and Diana Mateus},
  booktitle={CVII-STENT/LABELS@MICCAI},
  year={2018}
}

Acknowledgement

The code of Capsule Networks is based on the implementation by @ageron.