A Scalable Modular Framework for Multimodal AI in Oncology
Documentation | Paper | Examples | Demo | Google Colab
π HoneyBee has been officially published in Nature Digital Medicine!
Tripathi, A., Waqas, A., Schabath, M.B. et al. HONeYBEE: enabling scalable multimodal AI in oncology through foundation model-driven embeddings. npj Digit. Med. 8, 622 (2025). https://doi.org/10.1038/s41746-025-02003-4
HoneyBee is a comprehensive multimodal AI framework designed specifically for oncology research and clinical applications. It seamlessly integrates and processes diverse medical data typesβclinical text, radiology images, pathology slides, and molecular dataβthrough a unified, modular architecture. Built with scalability and extensibility in mind, HoneyBee empowers researchers to develop sophisticated AI models for cancer diagnosis, prognosis, and treatment planning.
Warning
Alpha Release: This framework is currently in alpha. APIs may change, and some features are still under development.
- 3-Layer Design: Clean separation between data loaders, embedding models, and processors
- Unified API: Consistent interface across all modalities
- Extensible: Easy to add new models and data sources
- Production-Ready: Optimized for both research and clinical deployment
- Pathology: Whole Slide Images (WSI) - SVS, TIFF formats with tissue detection
- Radiology: DICOM, NIFTI processing with 3D support
- Preprocessing: Advanced augmentation and normalization pipelines
- Document Processing: PDF support with OCR for scanned documents
- NLP Pipeline: Cancer entity extraction, temporal parsing, medical ontology integration
- Database Integration: Native MINDS format support
- Long Document Handling: Multiple tokenization strategies for clinical notes
- Genomics: Support for expression data and mutation profiles
- Integration: Seamless combination with imaging and clinical data
- GatorTron: Domain-specific clinical language model
- BioBERT: Biomedical text understanding
- PubMedBERT: Scientific literature embeddings
- Clinical-T5: Text-to-text clinical transformers
- REMEDIS: Self-supervised medical image representations
- RadImageNet: Pre-trained radiological feature extractors
- UNI: Universal medical image encoder
- Custom Models: Easy integration of proprietary models
- Cross-Modal Learning: Unified representations across modalities
- Attention Mechanisms: Interpretable fusion strategies
- Patient-Level Aggregation: Comprehensive patient profiles
- Survival Analysis: Cox PH, Random Survival Forest, DeepSurv
- Classification: Multi-class cancer type prediction
- Retrieval: Similar patient identification
- Visualization: Interactive t-SNE dashboards
- Risk Stratification: Patient outcome prediction
- Treatment Planning: Personalized therapy recommendations
- Biomarker Discovery: Multi-omic pattern identification
- Python 3.8+
- PyTorch 2.0+
- CUDA 11.7+ (optional, for GPU acceleration)
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y openslide-tools tesseract-ocr
# macOS
brew install openslide tesseract
# Windows
# Install from official websites:
# - OpenSlide: https://openslide.org/download/
# - Tesseract: https://github.com/UB-Mannheim/tesseract/wiki# Install the base package
pip install honeybee-ml
# Download required NLTK data for clinical processing
python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab')"# Clone the repository
git clone https://github.com/lab-rasool/HoneyBee.git
cd HoneyBee
# Install dependencies
pip install -r requirements.txt
# Download required NLTK data
python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab')"
# Install in development mode
pip install -e .Create a .env file in the project root:
# MINDS database credentials (if using MINDS format)
HOST=your_server
PORT=5433
DB_USER=postgres
PASSWORD=your_password
DATABASE=minds
# HuggingFace API (for some models)
HF_API_KEY=your_huggingface_api_keyHoneyBee has been successfully applied to:
- Cancer Subtype Classification: Automated identification of cancer subtypes from multimodal data
- Survival Prediction: Risk stratification and outcome prediction for treatment planning
- Similar Patient Retrieval: Finding patients with similar clinical profiles for precision medicine
- Biomarker Discovery: Identifying multimodal patterns associated with treatment response
We welcome contributions! Please see our Contributing Guidelines for details.
# Fork and clone your fork
git clone https://github.com/lab-rasool/HoneyBee.git
cd HoneyBee
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -r requirements.txt
pip install -e .- Alpha Status: Some features are still under development
- Memory Requirements: WSI processing requires significant RAM (16GB+ recommended)
- GPU Recommended: While CPU fallback exists, GPU acceleration significantly improves performance
- Limited Test Coverage: Comprehensive test suite is planned for future releases
See the LICENSE file for details.
If you use HoneyBee in your research, please cite our paper:
Tripathi, A., Waqas, A., Schabath, M.B. et al. HONeYBEE: enabling scalable multimodal AI in
oncology through foundation model-driven embeddings. npj Digit. Med. 8, 622 (2025).
https://doi.org/10.1038/s41746-025-02003-4