This is the repository for CLEAR-IT: Contrastive Learning to Capture the Immune Composition of Tumor Microenvironments.
For pre-trained models, embeddings, and model predictions, see our data repository (not published yet)
-
Clone the repository and (optionally) create a fresh Python environment.
git clone https://github.com/qnano/CLEAR-IT.git cd CLEAR-IT # (optional) create & activate a virtual environment python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
-
Install dependencies and the package.
pip install -r requirements.txt pip install -e .
A single, ready-to-build Dockerfile is provided as clearit.Dockerfile. The image installs CLEAR-IT and, on first run, auto-creates a config.yaml if you mount your CLEAR-IT-Data folder at /data.
Build the image:
docker build -f clearit.Dockerfile -t clearit:latest .Run (starts JupyterLab per the image entrypoint). Replace the path with your local CLEAR-IT-Data folder. Add --gpus all if you have NVIDIA GPUs set up:
docker run --rm -p 8888:8888 \
-v /abs/path/to/CLEAR-IT-Data:/data \
clearit:latestWhat happens on first run?
- The container detects the
/datamount and writes/workspace/config.yamlwith all paths pointing to/data/...andexperiments_dirpointing to the experiments bundled in the container install. - You can override the file later if you want custom locations.
Need a shell instead of Jupyter? Use this command:
docker run --rm -it --entrypoint bash \ -v /abs/path/to/CLEAR-IT-Data:/data \ clearit:latest
The CLEAR-IT library exposes three driver scripts to (1) pre-train encoders, (2) train classification heads, and (3) perform linear evaluation. Each script is pointed to a YAML recipe describing one or more experiments. Recipe files live under the experiments/ folder in this repository.
Using Docker? You can skip this step on first run — the container writes /workspace/config.yaml automatically when /data is mounted (see Installation → Docker). For local installs, copy and edit the template:
cp config_template.yaml config.yaml
# then open config.yaml and update the paths under `paths:`The scripts and notebooks will look for a config.yaml in your working directory (typically the repo root). The locations set here determine where datasets are read from and where models and outputs are written.
Each recipe can contain multiple encoders/heads to be trained or parameters to use for linear evaluation. Below are three examples.
Pre-training all encoders for investigating changes of the pre-training batch size and NT-Xent temperature on the TNBC1-MxIF8 dataset:
python -m clearit.scripts.run_pretrain --recipe ./experiments/01_hyperopt/tnbc1-mxif8/round01/01_pretrain/01_batch-tau.yamlTraining linear classification heads on top of those pre-trained encoders:
python -m clearit.scripts.run_train_heads --recipe ./experiments/01_hyperopt/tnbc1-mxif8/round01/02_classifier/01_batch-tau.yamlUsing the classification heads to perform linear evaluation of those pre-trained encoders:
python -m clearit.scripts.run_inference_pipeline --recipe ./experiments/01_hyperopt/tnbc1-mxif8/round01/03_linear-eval/01_batch-tau.yamlWhere do results go?
- Trained encoders and heads are saved under
models_dir. - Predictions are written under
outputs_dir. - All of these locations are defined in your
config.yaml(see below).
If you want to train the models yourself and are starting from raw sources, use the scripts in scripts/ to convert external datasets into the unified format expected by CLEAR-IT. These scripts read from raw_datasets and write to datasets as configured in config.yaml.
We recommend downloading the prepared data from our data repository, which contains the folder structure and instructions on how to obtain the raw datasets for conversion..
This repository's structure is as follows:
.
├── clearit # CLEAR-IT Python library
├── clearit.Dockerfile # Dockerfile for running CLEAR-IT in a Docker container
├── config_template.yaml # Template config file. Modify and rename this to config.yaml
├── experiments # YAML recipe files for training all models and performing linear evaluation
├── notebooks # Jupyter Notebooks for plotting
├── requirements.txt # requirements.txt for custom environments
├── scripts # Scripts for converting external datasets used in the study to a unified format
└── setup.py # setup.py for a local install of clearit
We recommend placing the contents of the data repository in this directory (or somewhere else on fast storage), extending the structure as follows:
├── datasets # Location of the converted datasets, ready to be used by CLEAR-IT
├── embeddings # Pre-computed embeddings for benchmarking purposes
├── models # Pre-trained CLEAR-IT encoders and linear classifiers
├── outputs # Predictions made via linear evaluation or benchmarking, survival classifiers
├── raw_datasets # Location of the unconverted datasets - the conversion scripts in the scripts directory will move these to the datasets directory
The config_template.yaml file contains a template for a config.yaml file, which scripts and notebooks will look for:
# config_template.yaml
# Create a copy of this file and name it `config.yaml` to point to custom paths
paths:
# Absolute or relative path to the unpacked CLEAR-IT-Data directory
data_root: /path/to/data/repository/CLEAR-IT # Corresponds to the GitHub repository's root directory
datasets_dir: /path/to/data/repository/CLEAR-IT/datasets # The datasets directory from the data repository
raw_datasets_dir: /path/to/data/repository/CLEAR-IT/raw_datasets # The raw_datasets directory from the data repository
models_dir: /path/to/data/repository/CLEAR-IT/models # The models directory from the data repository
outputs_dir: /path/to/data/repository/CLEAR-IT/outputs # The outputs directory from the data repository
experiments_dir: /path/to/data/repository/CLEAR-IT/experiments # The experiments directory from the GitHub repositoryBy modifying the config.yaml, you are free to choose where you place individual directories (if space is a concern). If you want to train models, we recommend putting the datasets directory on fast storage (for example an SSD).
FileNotFoundError: config.yaml— ensure you copiedconfig_template.yamltoconfig.yamland that you run commands from the repository root (or point your working directory accordingly).- Docker can’t see your data — double-check your
-v /host/path:/container/pathvolume mounts and thatconfig.yamluses the container paths when running inside Docker.