This is the knshnb's part of the Preferred Dolphin's solution for Happywhale - Whale and Dolphin Identification.
Please prepare dataset according to input/README.md and place under input/
.
$ ls -F input
fullbody_test_charm.csv pseudo_labels/ test_backfin.csv* train_images/
fullbody_test.csv README.md test_images/ yolov5_test.csv
fullbody_train_charm.csv sample_submission.csv* train2.csv yolov5_train.csv
fullbody_train.csv species.npy* train_backfin.csv
individual_id.npy* test2.csv train.csv
Before the final training round, we repeated 2 rounds of Step 1-2 for pseudo labeling.
By default, input/pseudo_labels/round2.csv
(the pseudo labels we created) is specified in the config file so that you can skip the first two rounds.
You can train from scratch by setting None
in pseudo_label
field in config files.
By src/train.py
, we
- train model by whole train data.
- inference test data and save results under
result/{exp_name}/-1/
.
Several examples of config files are located in config/
.
Example: Training and inference efficientnet_b6 and efficientnet_b7
python -m src.train --config_path config/efficientnet_b6.yaml --exp_name b6
python -m src.train --config_path config/efficientnet_b7.yaml --exp_name b7
By src/ensemble.py
, we
- calculate mean of the predictions by knn and logit for each model.
- ensemble predictions of the models specified by
--model_dirs
. - save prediction as
submission/{out_prefix}-{new_ratio}-{threshold}.csv
. - save pseudo label as
submission/pseudo_label_{out_prefix}.csv
.
Predictions generated by charmq's repository are saved in the same format, so you can ensemble them by just specifying paths to model directories.
Example: Ensemble b6 and b7
python -m src.ensemble --model_dirs result/b6/-1 result/b7/-1 --out_prefix b6-b7
In our post submission, single model (efficientnet_b7) achieved a score that could rank 3rd place in the final leaderboard. We also confirmed that ensemble of only two models (efficientnet_b6 and efficientnet_b7) could win 1st place. Ensembling more backbones and charmq's modesl can achieve even better results.
@article{patton2023deep,
title={A deep learning approach to photo--identification demonstrates high performance on two dozen cetacean species},
author={Patton, Philip T and Cheeseman, Ted and Abe, Kenshin and Yamaguchi, Taiki and Reade, Walter and Southerland, Ken and Howard, Addison and Oleson, Erin M and Allen, Jason B and Ashe, Erin and others},
journal={Methods in ecology and evolution},
volume={14},
number={10},
pages={2611--2625},
year={2023},
publisher={Wiley Online Library}
}
- For an overview of our key ideas and detailed explanation, please also refer to 1st Place Solution in Kaggle discussion.
- My teammate charmq's repository.