The PlanktoShare classifier predicts different plankton and non-plankton classes from data captured by the Plankton Imager (PI-10) sensor.
- Download model weights from PLACEHOLDER. Two options are available, the
ResNet50-detailedbeing more extensive with 49 different possible classifications, and theOSPARmodel predicting 12 classes. Store these into/models/ - Store your raw, unaltered Pi10-data into a preferable location. We recommend storing it in
/data/, but can be stored in any accessible location using the argument--source_dir - For map creation, download the "EEA coastline for analysis" from the European Environment Agency. Store into
/data/ - For map creation, download the "Marine and land zones: the union of world country boundaries and EEZ's (version 4)" from Marineregions.org. Store into
/data/
# install the repository
git clone git@github.com:geoJoost/planktoshare.git
# Setup the environment
conda create --name plankton_imager
conda activate plankton_imager
conda install pip
pip install fastai
# IMPORTANT: Modify this installation link to the correct CUDA/CPU version
# Check the CUDA version using `nvidia-smi` in the command-line
# If no CUDA is available, use the CPU installation; Be aware that this is significantly slower and discouraged for larger datasets
# See: https://pytorch.org/get-started/locally/
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
conda install -c conda-forge pandas numpy polars seaborn xlsxwriter chardet geopandas python-docx memory_profiler pyarrow fiona pyproj
# To start the entire pipeline, navigate to your working directory
cd PATH/TO/WORKING_DIRECTORY
# Run the classifier
# See options below
# Not implemented yet
python main.py --source_dir data/YOUR_DATA_PATH --model_name ResNet50-detailed --cruise_name SURVEY_NAME --batch_size 300
# For more detailed options, see `main.py`
Options available in main.py:
source_dir: This should be the path to your data folder directly from the Pi-10. It is recommended to store this within the repository in/data/.model_name: This corresponds to the model to use for inference. Options available are:osparto use the OSPAR classifier (12 classes), orResNet50-detailedto use the ResNet50 model which predicts 49 different plankton and non-plankton classes.cruise_name: This is used for intermediate outputs and for generating the final report. Any string is accepted without any spaces in the name, use '-' or '_' instead.batch_size: Number of samples to use withininference.py. This is highly dependent on the available memory within your PC/HPC. Default value of 32 is recommended for local machines.
Use the original dataset structure as provided by the PI-10 imager without modifications.
CRUISE_NAME
├── 2024-06-24
│ ├── 1454.tar
│ ├── 1458.tar
│ ├── 1459.tar
│ ├── 1500.tar
│ ├── 1510.tar
│ ├── 1520.tar
│ ├── 1530.tar
│ ├── 1540.tar
│ ├── 1550.tar
│ ├── 1600.tar
│ ├── 1610.tar
│ ├── 1620.tar
│ ├── 1630.tar
│ └── 1640.tar
├── 2024-06-25
│ ├── 0000.tar
│ ├── 0010.tar
│ ├── 0020.tar
│ ├── 0030.tar
│ ├── 0040.tar
│ ├── 0050.tar
│ ├── 0100.tar
│ ├── 0110.tar
CRUISE_NAME_UNTARRED
├── 2024-06-24
│ ├── untarred_1454
│ │ ├── Background.tif
│ │ ├── Bubbles.txt
│ │ ├── Cameralog.txt
│ │ ├── HitsMisses.txt
│ │ ├── RawImages\pia7.2024-06-24.1454+N00000000.tif
│ │ ├── RawImages\pia7.2024-06-24.1454+N00000001.tif
│ │ ├── RawImages\pia7.2024-06-24.1454+N00000002.tif
│ │ ├── RawImages\pia7.2024-06-24.1454+N00000003.tif
│ │ ├── RawImages\pia7.2024-06-24.1454+N00000004.tif
The PlanktoShare repository automates the processing of PI-10 data using custom classifiers. The inference script performs the following steps:
-
Iterate through directory to process each
.tarfile. -
Temporarily extracts all
.tifimages from each.tar. -
Uses the classifier defined by the
model_nameargument to classify images. -
Detects and discards corrupted images.
-
Generate outputs For each 10-minute bin, creates:
- Detailed CSV: Includes per-image metadata:
- Image details (filename, datetime, EXIF geodata)
- Cruise information (cruise name, instrument code)
- Model predictions (class ID/label, confidence scores)
- Summarized CSV: Provides aggregated statistics:
- Total predicted images per class
- Density and summary statistics (e.g., average confidence)
- Detailed CSV: Includes per-image metadata:
-
Stratified random sampling (n=100) per class for manual validation and creating training data.
-
Automatically generates a summary report (examples in
/reports/).
- Remove FastAI implementation
- Error in
learn.load(MODEL_FILENAME, weights_only=False)can be caused in older PyTorch versions. In this case, simply remove theweights_onlyargument.
If you use this code or dataset, please cite our paper. For questions, feedback, or collaborations, feel free to contact us.
