Early leukemia detection through explainable computer vision
An AI-powered system that detects leukemia cells from microscopic blood images and provides interpretable explanations for medical professionals. This project combines state-of-the-art deep learning with clinical explainability to support early diagnosis.
Early detection of leukemia is critical for successful treatment. This project aims to:
- Detect leukemia cells from microscopic images with high accuracy
- Explain which visual features the AI model uses for classification (Grad-CAM heatmaps)
- Communicate findings in natural language through LLM-generated medical explanations
The system is designed to be a decision support tool for hematologists, not a replacement for medical expertise.
- January 29, 2026 - Achieved 97.66% accuracy with ResNet18 on C-NMC dataset (SOTA-level performance)
- January 22, 2026 - Completed data pipeline and EDA-driven augmentation strategy
ResNet18 model achieves state-of-the-art performance on the C-NMC dataset (Cancer and Normal Myeloid Cells):
| Metric | Score | Clinical Relevance |
|---|---|---|
| Accuracy | 97.66% | Overall diagnostic accuracy |
| Recall (Sensitivity) | 98.32% | Only 1.68% of leukemia cases missed |
| Precision | 98.18% | High confidence in positive diagnoses |
| F1-Score | 98.25% | Balanced performance |
π― Clinical Impact: With 98.32% sensitivity, the model successfully detects 98 out of 100 leukemia cases, making it highly suitable for screening applications where minimizing false negatives is critical.
- π§ Deep Learning Classification: ResNet/EfficientNet/ViT architectures for cell classification
- π Visual Explainability: Grad-CAM heatmaps highlighting relevant cellular features
- π¬ Natural Language Explanations: LLM-powered descriptions of diagnostic reasoning
- π Medical-Grade Metrics: Precision, Recall, F1-Score, AUC-ROC optimized for clinical use
- π¨ Interactive Demo: Streamlit/Gradio interface for easy testing
leukocare-ai/
βββ data/ # Dataset storage
βββ notebooks/ # Jupyter notebooks for exploration
β βββ 01_eda.ipynb # Data exploration
β βββ 02_preprocessing.ipynb
β βββ 03_modeling.ipynb
β βββ 04_explainability.ipynb
βββ src/ # Source code (production-ready)
β βββ data/ # Data loading and augmentation
β βββ models/ # Model architectures
β βββ training/ # Training loops and metrics
β βββ explainability/ # Grad-CAM, SHAP, visualization
β βββ llm/ # LLM explanation generation
βββ scripts/ # Executable scripts
β βββ train.py # Training script
β βββ evaluate.py # Model evaluation
β βββ inference.py # Single image inference
βββ configs/ # Configuration files
βββ tests/ # Unit tests
βββ outputs/ # Model checkpoints and results
. Vision Transformer (ViT): State-of-the-art attention mechanism
- Transfer Learning: Pre-trained on ImageNet
- Fine-tuning: All layers unfrozen after initial training
- Loss Function: Focal Loss (handles class imbalance)
- Optimizer: Adam with cosine annealing
- Augmentation: Medical-specific (rotations, color jitter, no vertical flips)
Grad-CAM (Gradient-weighted Class Activation Mapping)
Generates heatmaps showing which regions of the cell image influenced the model's decision.
from src.explainability.gradcam import GradCAM
# Generate explanation
cam = GradCAM(model, target_layer='layer4')
heatmap = cam.generate_cam(image, target_class=1)Key Features Analyzed:
- Nucleus morphology (size, shape, chromatin pattern)
- Cytoplasm characteristics
- Nuclear-cytoplasmic ratio
- Presence of granulations
LLM-powered explanations translate visual features into clinical language:
"The model classifies this cell as LEUKEMIC (confidence: 94.2%)
based on the following observations:
1. Enlarged nucleus with irregular chromatin pattern (highlighted
in red on the heatmap)
2. High nuclear-cytoplasmic ratio characteristic of blast cells
3. Absence of normal granulations in the cytoplasm
These features are consistent with acute lymphoblastic leukemia (ALL)
morphology. Clinical correlation and additional testing recommended."
- Setup project structure
- Download datasets (ALL-IDB, C-NMC)
- Notebook 01: EDA - visualize images, analyze class distribution
- Notebook 02: Preprocessing - normalization, augmentation, train/val/test split
- Build data pipeline (
src/data/) - Notebook 03: Train ResNet50 baseline
- Implement training loop with metrics
- First evaluation on validation set
- Notebook 03: Test EfficientNet & ViT
- Hyperparameter tuning (LR, optimizer, augmentation)
- Select best model and evaluate on test set
- Notebook 04: Implement Grad-CAM
- Generate heatmaps for validation set
- Validate model attention on medical features
- (Optional) Test SHAP
- Notebook 05: Setup and test LLM API
- Design explanation prompts
- Build pipeline: Image β Prediction β Heatmap β LLM β Explanation
- Refine explanations quality
- Build Streamlit/Gradio demo
- Write production scripts (
train.py,inference.py) - Complete documentation
- Create presentation
Contributions are welcome! Please read our Contributing Guidelines.
# Install dev dependencies
pip install -r requirements-dev.txt
# Run tests
pytest tests/
# Code formatting
black src/
flake8 src/THIS SOFTWARE IS FOR RESEARCH AND EDUCATIONAL PURPOSES ONLY.
- β NOT approved as a medical device
- β NOT validated for clinical diagnosis
- β NOT a replacement for professional medical judgment
- β NOT suitable for patient care without proper validation
Always consult qualified healthcare professionals for medical decisions.
- Labati et al. (2011). "ALL-IDB: Acute Lymphoblastic Leukemia Image Database"
- Gupta et al. (2019). "C-NMC Challenge Dataset"
- Selvaraju et al. (2017). "Grad-CAM: Visual Explanations from Deep Networks"
- Lin et al. (2017). "Focal Loss for Dense Object Detection"
- Dosovitskiy et al. (2020). "An Image is Worth 16x16 Words: Transformers for Image Recognition"
- Terwilliger & Abdul-Hay (2017). "Acute lymphoblastic leukemia: a comprehensive review"
- WHO Classification of Tumours of Haematopoietic and Lymphoid Tissues (2017)
Charlie - ML Engineer & Computer Vision Specialist
- GitHub: @Ekliipce
- LinkedIn: Charles-AndrΓ© Arsenec
- Project: WearIT Paris - AI-powered virtual try-on
For questions, suggestions, or collaboration opportunities:
- Email: [email protected]
- Open an issue on GitHub