This repository accompanies the paper "A Temporal Convolutional Network-Based Approach and a Benchmark Dataset for Colonoscopy Video Temporal Segmentation" [1]. It provides the implementation of ColonTCN, a Temporal Convolutional Network-based approach for segmenting colonoscopy videos into anatomical sections and procedural phases. The project leverages a benchmark dataset derived from the annotated REAL-Colon (RC) dataset, which features 2.7 million frames across 60 full-procedure videos, and proposed two k-fold validation splits and metrics to evaluate model performance.
Clone the repository and set up a virtual environment
git clone https://github.com/YOUR_USERNAME/temporal_segmentation.git
cd temporal_segmentation
python -m venv venv && source venv/bin/activate # On macOS/Linux
venv\Scripts\activate # On Windows
Install the necessary dependencies from the requirements.txt
file:
pip install -r requirements.txt
The benchmark dataset used in this project is the REAL-Colon (RC) dataset [2]. Click here for instructions on automatically downloading, extracting, and preparing data splits for benchmarking temporal segmentation models.
The pretrained ColonTCN models obtained in [1] are available at the following link for both the 4-fold and 5-fold scenario:
π Google Drive β ColonTCN Checkpoints
To use them, download the entire folder and place the contents into: experiments/model/
. Then, run:
CUDA_VISIBLE_DEVICES=0 python3 src/test_shared_model.py -parFile ymls/inference/test_shared_4fold_colontcn.yml
CUDA_VISIBLE_DEVICES=0 python3 src/test_shared_model.py -parFile ymls/inference/test_shared_5fold_colontcn.yml
Models are trained in a 4-fold or 5-fold setting on RC using the following command and specific configuration files for each fold.
CUDA_VISIBLE_DEVICES=0 python src/training.py -parFile ymls/training/colontcn_4fold/training_colontcn_4fold_fold1.yml
All configuration files for training a ColonTCN model in the 4-fold or 5-fold setting are reported at:
ymls/training/colontcn_4fold/
ymls/training/colontcn_5fold/
To test models in the 4-fold or 5-fold setting src/training.py on RC using the following command and specific configuration files for each fold.
CUDA_VISIBLE_DEVICES=0 python3 src/inference_testing_on_folds.py -parFile ymls/inference/inference_testing_4fold_colontcn.yml
CUDA_VISIBLE_DEVICES=0 python3 src/inference_testing_on_folds.py -parFile ymls/inference/inference_testing_5fold_colontcn.yml
To profile a model for its computational efficiency such as inference time and memory usage.
CUDA_VISIBLE_DEVICES=0 python src/profiling.py --config ymls/profiling/colontcn_4fold.yml
CUDA_VISIBLE_DEVICES=0 python src/profiling.py --config ymls/profiling/colontcn_5fold.yml
The following is an overview of the repository structure.
Files and directories marked as "(ignored)" are not included in the repository due to .gitignore
.
βββ data/
β βββ create_embeddings_datasets.py # Script to embed RC videos into video latent representations using a frame encoder
β βββ dataset/
β β βββ RC_annotation/ # RC dataset annotations (CSVs) released with this work (ignored)
β β βββ RC_dataset/ # Raw RC dataset downloaded from Figshare (ignored)
β β βββ RC_embedded_dataset/ # RC dataset videos embedded with a frame encoder (ignored)
β β βββ RC_lists/ # Fold-based data splits (4-fold and 5-fold) for model benchmarking
β βββ images/ # Images used in the repository (e.g., visualizations, results)
β βββ ymls/ # YAML config files for dataset processing
β βββ README.md # Documentation for the `data/` directory
βββ experiments/
β βββ outputs/ # Output training folders and Inference/testing results (ignored)
βββ models/ # ColonTCN models proposed in [1] (ignored)
βββ temp_datasets/ # Folder where to save temp datasets to speed up training and testing (ignored)
β βββ visualizations/ # Output visualizations (ignored)
βββ src/ # Main source code directory
β βββ data_loader/
β β βββ embeddings_dataset.py # Data loader for embedding-based datasets
β βββ feature_extraction/
β β βββ feature_extraction.py # Feature extraction module for processing RC videos
β β βββ frame_classification_model.py # Frame-wise classification model
β β βββ video_loader.py # Handles video file reading and frame extraction
β β βββ ymls/ # YAML config files for feature extraction
β β βββ feature_extraction_1x_RC.yml
β β βββ feature_extraction_5x_aug_RC.yml
β βββ inference.py # Script for performing inference on the trained model
β βββ inference_testing_on_folds.py # Script for testing inference across multiple data folds
β βββ models/
β β βββ colontcn.py # Implementation of the Colontcn model
β β βββ factory.py # Model factory for loading different architectures
β β βββ layers.py # Custom model layers
β βββ optimizers/
β β βββ builders.py # Optimizer builder functions
β β βββ losses.py # Loss functions for training
β βββ profiling.py # Profiling script to analyze performance
β βββ testing.py # Unit tests for model evaluation
β βββ training.py # Main training script
β βββ utils/
β βββ io.py # Utility functions for file I/O operations
βββ .gitignore # Specifies ignored files for version control
βββ README.md # Main project documentation
βββ ymls/ # Folder containing Training/Testing/Profiling config files
If you find the work of this repository useful, please consider to cite in your work:
[1] Biffi, C., Roffo, G., Salvagnini, P., & Cherubini, A. (2025). A Temporal Convolutional Network-Based Approach and a Benchmark Dataset for Colonoscopy Video Temporal Segmentation. arXiv preprint arXiv:2502.03430.
[2] Biffi, C., Antonelli, G., Bernhofer, S., Hassan, C., Hirata, D., Iwatate, M., Maieron, A., Salvagnini, P., & Cherubini, A. (2024). REAL-Colon: A dataset for developing real-world AI applications in colonoscopy. Scientific Data, 11(1), 539. https://doi.org/10.1038/s41597-024-03359-0
For any inquiries, please open an issue in this repository or write at [email protected]