This repository provides the weights and scripts of YOLO models tested in the study A Geometric and Deep Learning Reproducible Pipeline for Monitoring Floating Anthropogenic Debris in Urban Rivers Using In Situ Camera.
It includes several variants of YOLOv5, YOLOv8, and YOLOv11, trained to detect three classes:
- anthropogenic debris
- natural debris
- non-debris materials
These models are part of a reproducible methodology to monitor riverine anthropogenic debris pollution with in situ cameras, with applications ranging from embedded devices (Raspberry Pi) to GPU server environments.
- Models architecture: YOLOv5-n & -m, YOLOv8-n & -m, YOLOv11-n & -m
- Models weights (here to access them)
- Training code
The training dataset used in this study is available upon request.
Please contact romain.wenger@live-cnrs.unistra.fr to obtain access.
The dataset will be publicly released after the official publication of the research article.
The structure is divided into three main components: the backbone, responsible for extracting hierarchical features from the input image using a series of convolutional and pooling layers; the neck, which enhances feature aggregation across different scales (often using modules such as PANet or FPN in recent versions); and the head, which performs final object detection by predicting bounding boxes, objectness scores, and class probabilities. While the figure reflects a simplified backbone resembling early YOLO versions, the general structure remains consistent across modern versions such as YOLOv5, YOLOv8, and YOLOv11, with architectural refinements aimed at improving speed and accuracy.
The YOLO models in this repository were trained with the following configuration:
-
General
epochs: 200batch: 32imgsz: 1280device: [0, 1] (2 GPUs)workers: 8
-
Data
data:train3.yaml(the dataset will be made available later)
-
Augmentations
flipud: 0.5fliplr: 0.5mosaic: 1.0hsv_h: 0.015hsv_s: 0.7hsv_v: 0.4perspective: 0.0005scale: 0.5shear: 0.0translate: 0.1
-
Optimization
lr0: 0.001lrf: 0.2momentum: 0.937weight_decay: 0.0005
-
Callbacks
patience: 10 (early stopping if validation does not improve)cos_lr: True (cosine learning rate scheduler)
| # | Model | Epoch | Proc. time (h) | Inf. time CPU (ms) | FPS CPU | mAP50 | mAP50-95 | Recall | Precision |
|---|---|---|---|---|---|---|---|---|---|
| 1 | yolov5n | 80 | 2.085 | 135.27 | 7.39 | 0.933 | 0.655 | 0.885 | 0.936 |
| 2 | yolov5m | 104 | 11.234 | 466.17 | 2.15 | 0.951 | 0.708 | 0.908 | 0.949 |
| 3 | yolov8n | 87 | 2.332 | 143.12 | 6.99 | 0.939 | 0.672 | 0.888 | 0.970 |
| 4 | yolov8m | 112 | 14.537 | 517.54 | 1.93 | 0.965 | 0.728 | 0.942 | 0.954 |
| 5 | yolov11n | 104 | 3.312 | 137.10 | 7.29 | 0.946 | 0.694 | 0.912 | 0.925 |
| 6 | yolov11m | 134 | 21.508 | 537.08 | 1.86 | 0.962 | 0.747 | 0.937 | 0.955 |
| 7 | yolov8n-30neg | 186 | 7.081 | 136.23 | 7.34 | 0.958 | 0.713 | 0.933 | 0.962 |
| 8 | yolov11m-30neg | 112 | 24.692 | 549.80 | 1.82 | 0.936 | 0.720 | 0.929 | 0.922 |
| 9 | yolov8n-cluster | 26 | 0.679 | 142.33 | 7.03 | 0.608 | 0.318 | 0.562 | 0.757 |
| 10 | yolov11m-cluster | 39 | 6.080 | 560.82 | 1.78 | 0.631 | 0.371 | 0.588 | 0.695 |
The animation below shows the YOLOv11-m model running on a video sequence recorded on the Steingiessen river, detecting floating anthropogenic debris in real time.
If you use this work, please cite:
@article{grimmer2025debris,
title = {A geometric and deep learning reproducible pipeline for monitoring floating anthropogenic debris in urban rivers using in situ camera},
journal = {Submitted},
volume = {},
pages = {},
year = {2025},
issn = {},
doi = {},
url = {},
author = {Gauthier Grimmer and Romain Wenger and Clément Flint and Germain Forestier and Gilles Rixhon and Valentin Chardon}
}

