1st Place Solution of Kaggle Happywhale Competition

This is the knshnb's part of the Preferred Dolphin's solution for Happywhale - Whale and Dolphin Identification.

Dataset

Please prepare dataset according to input/README.md and place under input/.

$ ls -F input
fullbody_test_charm.csv   pseudo_labels/          test_backfin.csv*  train_images/
fullbody_test.csv         README.md               test_images/       yolov5_test.csv
fullbody_train_charm.csv  sample_submission.csv*  train2.csv         yolov5_train.csv
fullbody_train.csv        species.npy*            train_backfin.csv
individual_id.npy*        test2.csv               train.csv

Reproducing the winning score

Before the final training round, we repeated 2 rounds of Step 1-2 for pseudo labeling. By default, input/pseudo_labels/round2.csv (the pseudo labels we created) is specified in the config file so that you can skip the first two rounds. You can train from scratch by setting None in pseudo_label field in config files.

Step 1: Training and inference

By src/train.py, we

train model by whole train data.
inference test data and save results under result/{exp_name}/-1/.

Several examples of config files are located in config/.

Example: Training and inference efficientnet_b6 and efficientnet_b7

python -m src.train --config_path config/efficientnet_b6.yaml --exp_name b6
python -m src.train --config_path config/efficientnet_b7.yaml --exp_name b7

Step 2: Postprocess and ensemble

By src/ensemble.py, we

calculate mean of the predictions by knn and logit for each model.
ensemble predictions of the models specified by --model_dirs.
save prediction as submission/{out_prefix}-{new_ratio}-{threshold}.csv.
save pseudo label as submission/pseudo_label_{out_prefix}.csv.

Predictions generated by charmq's repository are saved in the same format, so you can ensemble them by just specifying paths to model directories.

Example: Ensemble b6 and b7

python -m src.ensemble --model_dirs result/b6/-1 result/b7/-1 --out_prefix b6-b7

In our post submission, single model (efficientnet_b7) achieved a score that could rank 3rd place in the final leaderboard. We also confirmed that ensemble of only two models (efficientnet_b6 and efficientnet_b7) could win 1st place. Ensembling more backbones and charmq's modesl can achieve even better results.

Citation

@article{patton2023deep,
  title={A deep learning approach to photo--identification demonstrates high performance on two dozen cetacean species},
  author={Patton, Philip T and Cheeseman, Ted and Abe, Kenshin and Yamaguchi, Taiki and Reade, Walter and Southerland, Ken and Howard, Addison and Oleson, Erin M and Allen, Jason B and Ashe, Erin and others},
  journal={Methods in ecology and evolution},
  volume={14},
  number={10},
  pages={2611--2625},
  year={2023},
  publisher={Wiley Online Library}
}

Links

For an overview of our key ideas and detailed explanation, please also refer to 1st Place Solution in Kaggle discussion.
My teammate charmq's repository.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
config		config
input		input
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1st Place Solution of Kaggle Happywhale Competition

Dataset

Reproducing the winning score

Step 1: Training and inference

Step 2: Postprocess and ensemble

Citation

Links

About

Releases

Packages

Languages

knshnb/kaggle-happywhale-1st-place

Folders and files

Latest commit

History

Repository files navigation

1st Place Solution of Kaggle Happywhale Competition

Dataset

Reproducing the winning score

Step 1: Training and inference

Step 2: Postprocess and ensemble

Citation

Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages