Skip to content

PACS-like medical image repository management code, used for Anomaly Detection model generation

License

Notifications You must be signed in to change notification settings

Fulmine-Labs/Fulmine-Labs-mini-PACS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fulmine Labs mini-PACS (model generation for medical image Anomaly Detection)

Implement a basic Picture Archive Communication System (PACS) to manage DICOM images. Use these images to implement an anomaly detection system, to help with test automation.

Eyball

Date: 5/1/2024

Fulmine Labs LLC

Overview

The challenge: Fulmine Labs will use medical images for quality/testing related, machine learning (ML) initiatives. The best practice for managing this data is to use Digital Imaging and Communications in Medicine (DICOM) standard compliant images with a PACS-like system.

The code in this project implements and tests a basic PACS with the following architecture:

[ Orthanc Repository (Open Source component) ]
       |
       | (DICOM Images) <----------------------------------------->  [ OHIF Viewer (Open Source component) ]
       v
[ Fulmine-Labs-Mini-PACS - Data Setup Script ]      
       |								
       | (Metadata and generated images)   
       |                                          
[ SQLite Database ]							
       |								
       | (API Requests)						 
       v
[ Flask Application ]
       |
       | (HTTP Requests for Data)
       v
[ Client (Pytest, Browser) ]
       |
       | (Model Training Data)
       v
[ Anomaly Detection Model Training ]

The data setup script will traverse all folders in a specified location, identify DICOM images and if they have appropriate Window Center and Width DICOM header information, will convert them to PNG files at another specified location and add the related metadata to an SQLite database.

The database maintains the Patient -> Study -> Series -> Image relationship, as well as tracking the output image file names and parameters used in their creation, allowing PACS-like SQL queries to be constructed.

Currently supported endpoints (usually at http://127.0.0.1:5000) are:

  • '/' - welcome message
  • '/patients/<patient_id>' - get patient information
  • '/studies/<study_id>' - get study information
  • '/series/<series_id>' - get series information
  • '/images/<image_id>' - get image information
  • '/patients/<patient_id>/studies' - get studies for a patient
  • '/patients/<patient_id>/studycount' - get study count for a patient
  • '/patients/<patient_id>/seriescount' - get series count for a patient
  • '/patients/<patient_id>/imagecount' - get image count for a patient
  • '/patients/<patient_id>/counts' - get all counts for a patient
  • '/patients/count' - get total patient count
  • '/studies/count' - get total studies count
  • '/series/count' - get total series count
  • '/images/count' - get total images count
  • '/imageinfo/' - get image info by providing the file name

Once PNG images have been generated from the DICOM images, these are used as the basis of the 'valid' class in an ML image classifier. To reduce overfitting, additional images are generated and added to the valid class. These include:

  • The same images with random window centers and widths
  • The same images with light random blurring to simulate pixel interpolation or compression
  • The same images with flips and rotations
  • The same images zoomed in and out, also with random window centers and widths

The 'invalid' class will be comprised of, for example:

  • Non-medical images selected from the Kaggle 'real and fake' dataset
  • The same valid images as above with simulated error/message boxes in order to help to detect anomalous conditions
  • Some custom anomalous images, including AI-generated medical images

All of the images above will be distributed randomly between training, validation and testing folders in order to train and test the model. In addition, in order to test how well the model recognizes previously unseen medical images of the same type, some custom images will be selected from Google searches and used only for testing.

Overall the folder structure looks like this:

Orthanc (DICOM images)
  ├── subfolders

original training (PNG)
  ├──train
  ├──validate
  ├──test

Kaggle_real_and_fake_images (PNG)
  ├── subfolders

Custom_invalid (MIX)
  ├── subfolders

Custom_test_valid (from internet)

training_images
  ├──train
  │		├── valid
  │		│   ├── original training (dummy)
  │		│   ├── blurred
  │		│   └── window_leveled
  │	    	│   └── rotate_and_flip
  │    		│   └── zoomed
  |    		|     └── window_leveled
  │		└── invalid/
  │   		 	├── Kaggle_real_and_fake_images
  │    	 		└── copied from Custom_invalid
  ├──validate
  │		├── valid
  │		│   ├── original validate (dummy)
  │		│   ├── blurred
  │		│   ├── window_leveled
  │		│   ├── rotate_and_flip
  │		│   └── zoomed
  |     	|     └── window_leveled
  │		└── invalid/
  │   		 	├── Kaggle_real_and_fake_images
  │    	 		└── copied from Custom_invalid
  ├──test
     ├──valid
     │  	├── original test (dummy)
     │		├── copied from Custom_test_valid
     └── invalid/
          	├── Kaggle_real_and_fake_images
           	└── copied from Custom_invalid

pairs_data
  ├──train_pairs.txt
  ├──validate_pairs.txt
  └──test_pairs.txt  

Once the data is prepared, the Siamese Network model training is initiated. The model and weights are saved and the weights are reloaded and used to test those images selected for testing, producing metrics on Accuracy, Precision, Recall and F1 score, as well as display the images that failed for analysis. Although this code trains and tests a Siamese Network architecture moodel the functions and folder structure also support training Classifier and Encoder architecture models. The perormancce of these architectures have been compared and although the F1 scores are similiar they tend to perform less well than the Siamese Network models with medical images that were not part of the selected training studies. Therefore it is suggested that the Siamese Network model is used. On 043024, this model had an F1 score as below:

  • 2024-04-29 16:15:49 - INFO - Siamese network F1 Score: 0.9850479803615264
  • 2024-04-29 16:15:49 - INFO - False Negatives detected by Siamese Network: 27 The model and weights are included in this repository and will also be available in the Eyball repository.

Datasources used

  • The Cancer Imaging Archive.
  • The Artifact 'Real and Fake' dataset from Kaggle.

Current Version

The current stable version of the project is 0.3.0. See the CHANGELOG.md file for details about this version.

Prerequisites

  • Anaconda, with an environment having the Python libraries listed in requirements.txt
  • The Orthanc open source DICOM server (optional)
  • The OHIF Viewer integrated with Orthanc (optional). For this you will need a Github account and Node Package Manager -> nodejs -> yarn.
  • DICOM images
  • The Artifact Real and Fake image dataset from Kaggle

Usage

Install Anaconda, create an environment and install the dependencies in requirements.txt

  1. Orthanc Installation and configuration (if you need to download images):
  • Install the Orthanc open source DICOM server from https://www.orthanc-server.com/
  • Configure Orthanc by modifying the orthanc.json configuration file. This includes setting parameters like storage directories, enabling the DICOM and HTTP servers, and specifying network settings such as the port (defaults to 8042).
  • Enable the CORS (Cross-Origin Resource Sharing) configuration in Orthanc by adding the following lines to Orthanc.json:

"HttpServer" : { "EnableCors" : true, "CorsAllowedOrigins" : [ "*" ], "CorsAllowedMethods" : [ "GET", "POST", "PUT", "DELETE", "HEAD", "OPTIONS" ], "CorsAllowedHeaders" : [ "*" ] }

  1. Integrating OHIF Viewer with Orthanc (if you need a reference viewer implementation):
  • Fork the OHIF Viewer repository on GitHub to your own GitHub account.
  • Clone the forked repository to your local machine using Git: git clone https://github.com/YOUR-USERNAME/Viewers.git Navigate to the Cloned Directory: cd Viewers Add the original OHIF Viewer repository as an upstream remote to your local repository: git remote add upstream https://github.com/OHIF/Viewers.git
  • Run yarn install to install all necessary dependencies.
  • Start the OHIF Viewer using the appropriate Yarn command: yarn run dev:orthanc
  • Viewing Studies: Once the viewer is running correctly, confirm that you are able to view the studies hosted on your Orthanc server.
  1. Clone the Fulmine-Labs-Mini-PACS repository to your local machine and navigate to the cloned directory in Anaconda Powershell Prompt: 'cd Fulmine-Labs-Mini-PACS'
  2. Install the dependencies listed in requirements.txt with 'pip install -r requirements.txt'
  3. Open Jupyter Notebook or Jupyter Lab from Anaconda. It should start in your default web browser.
  4. Open Fulmine-Labs-Mini-PACS.ipynb from the cloned directory inside Jupyter
  5. Edit any test parameters in the second cell, as needed. Currently, the following parameters can be set:
  • verbose, True or False. True will enable logging in Jupyter Notebook, but all messages will be logged to the log file for the run.
  • source_dir = r'D:\Orthanc', the location of the folder containing DICOM images
  • target_dir = r'D:\training', the location of the output DICOM files, for ML training. The PNG files will be written to the same folder name with '_images' appended
  • training_ratio, validation_ratio = 0.7, 0.15, the DICOM files will be randomly assigned for model training based on these ratios
  • delete_db = True - Variable to control database deletion on script re-run
  • db_path = 'medical_imaging.db' - location of created SQLite database
  • img_width, img_height = 152, 152 - image dimensions
  • batch_size = 32 - training batch size
  • epochs = 20 - training epochs. Can increase the epochs since early stopping will handle overfitting
  • threshold = 0.5 - classifier threshold
  • message_box_percentage = 100 - percentage of images to apply message_boxes to
  • max_invalid_images = 15000 - maximum number of images to process (needs to approximately balance the number of valid images)
  • max_custom_invalid_images = 100 - maximum number of custom invalid images to process
  • max_custom_valid_images = 100 - maximum number of custom valid images to process
  1. In Jupyter, 'Run All Cells'
  2. Start the provided Flask API interface to the database by opening an Anaconda command prompt and then using python Flask_API.py (ensure that the database name in the Python file is the same as the one that you generated).

Navigate to http://127.0.0.1:5000/ You should see a message: 'Welcome to the Fulmine Labs mini-PACS API!'

Testing

This code was run in Jupyter Notebook and Jupyter Lab from Anaconda 2.5.2 on Windows 11.

The tests were run from a Jupyter Lab session in Brave 1.61.114 and from an Anaconda CMD.exe session.

THe OHIF Viewer was version 3.8.0-beta.36

Orthanc DICOM server was version 1.1.2, for Windows 64 bit

Anonymized Lung CT images were downloaded from 12 patients, from the Cancer Imaging Archive

  • Downloaded studies for 12 Lung CT patients, including all studies for patient with patient ID 'TCGA-34-7107' from the Cancer Imaging Archive, in Orthanc.
  • Tested the DICOM web interface of Orthanc by accessing http://localhost:8042/dicom-web/studies.
  • Ran the Fulmine-Labs-Mini-PACS.ipynb data setup script and tested the database and API after starting the provided Flask server using python flask_API.py.
  • Ran python -v test_API.py, which tests some of the API endpoints. This produces output similar to:

alt text

  • Used the 'DB Browser for SQLite' tool to browse the created database contents
  • Used the OHIF Viewer as a reference, to visually compare the PNG training images created and managed by Fulmine LABS mini-PACS with the same images displayed in the viewer. Note: The API can be used to identify the patient, study, series and image number for a particular output image PNG file name, as well as the image information used to generate the PNG image. For example this endpoint:

http://127.0.0.1:5000/imageinfo/1fa2a798-770f-4542-b877-946c0757cac2

returns this data:

{
  "InstanceNumber": "165",
  "PatientID": "TCGA-34-7107",
  "RescaleIntercept": "-1024",
  "RescaleSlope": "1",
  "SeriesDescription": "STD CTAC",
  "StudyDescription": "PET / CT TUMOR IMAGING",
  "WindowCenter": "40.0",
  "WindowWidth": "400.0"
}

Two tests were used to verify model accuracy:

  1. Typical AI model training metrics, using reserved images from the valid and invalid datasets that were not used for training

At the time of writing the model has the following scores:

- True Positives: 3143
- False Positives: 6
- True Negatives: 2737
- False Negatives: 5
- Accuracy: 0.9981
- Precision: 0.9981
- Recall: 0.9984
- F1 Score: 0.9983
  1. Some random Lung CT images, sampled from the internet. Of these 7 images, currently 1 is being misclassified.

Known issues

  1. Even accounting for scaling differences, the images generated by the OHIF Viewer and the Fulmine Labs output PNG training images are very similar, but not identical at the pixel level. The reason for this should be investigated further, but it is probably due to some pixel interpolation being done by the OHIF Viewer, that the PNG image creation is not currently doing. Examining the OHIF source code could help explain this and potentially enhance the Fulmine Labs PNG image creation process. See below for examples:

alt text Fulmine

alt text OHIF

alt text Fulmine

alt text OHIF

  1. The OHIF Viewer handles images that do not have Window Center and/or Window Width in the DICOM header. The PNG creation process currently ignores these images. Again, examining the OHIF source code and the DICOM standard could help explain the OHIF methodology and potentially include a wider range of images in the PNG training image creation process.

  2. There were issues with the model.fit function when training models if the Tensorflow version was greater than 2.15.0. Suggest sticking at 2.15.0 and trying again with a later version.

  3. There were Jupyter crash issues loading the saved Siamese Network model. Saving and loading just the weights seems to workaround this.

Acknowledgements

This code was written collaboratively with GPT-4V. Thank you Assistant!

The Open Health Imaging Foundation

Orthanc open source DICOM server

DB Browser SQLite

The Cancer Imaging Archive

Artifact

License

MIT open source license

Collaboration

We welcome contributions at all levels of experience, whether it's with code, documentation, tests, bug reports, feature requests, or other forms of feedback. If you're interested in helping improve this tool, here are some ways you can contribute:

Ideas for Improvements: Have an idea that could make the Fulmine Labs mini-PACS better? Open an issue with the tag enhancement to start a discussion about your idea.

Bug Reports: Notice something amiss? Submit a bug report under issues, and be sure to include as much detail as possible to help us understand the problem.

Feature Requests: If you have a suggestion for a new feature, describe it in an issue with the tag feature request.

Documentation: Good documentation is just as important as good code. Although this is currently a very simple tool, if you'd like to contribute documentation, we'd greatly appreciate it.

Code: If you're looking to update or write new code, check out the open issues and look for ones tagged with good first issue or help wanted.

Contact

Duncan Henderson, Fulmine Labs LLC [email protected]

About

PACS-like medical image repository management code, used for Anomaly Detection model generation

Resources

License

Stars

Watchers

Forks

Packages

No packages published