Contains helper scripts to preprocess train data. No dataset is provided.
This repository contains submodules.
Clone it using:
git clone --recurse-submodules https://github.com/Seeneva/ml-scripts.gitOr init submodules using:
git submodule initRepository contains ./setup.py and ./requirements.txt with the list of required Python dependencies.
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtpip install -r requirements.txtThis model is used to detect object on comic book pages.
| Class ID | Name | Description |
|---|---|---|
| 0 | speech_baloon | Single speech balloon |
| 1 | panel | Single panel on the page |
- Put all your comic book pages dataset into
./yolo/datasetdirectory. This directory shouldn't contains any subdirectories. - Anotate each comic book page in the directory using YOLO format and supported classes. You can use tools like labelImg.
- Run
./yolo_width_split.pyscript to split 'Double Page Spread' image into few separate images. Original wide images will be moved into./yolo/data_wide_backupdirectory. - (Optionally) Run
./yolo_stats.pyscript to calculate dataset details. - Run
./yolo_train_data.pyscript to generate required files to train YOLO model. All files will be placed into./yolodirectory. - Create and put YOLO
yolo-obj.cfgfile into./yolodirectory. - Train your model using YOLOv4-tiny darknet.
- (Optional) Convert model into TensorFlow Lite format:
git clone -b config https://github.com/Seeneva/tensorflow-yolov4-tflite.git converter
python ./converter/save_model.py --weights ${YOUR_YOLO_BACKUP_PATH}/yolo-obj_final.weights --output ${YOUR_TF_BACKUP_PATH}/tf --score_thres 0.7 --input_size 480x736 --model yolov4 --tiny --framework tflite
python ./converter/convert_tflite.py --weights ${YOUR_TF_BACKUP_PATH}/tf --output ${YOUR_TF_BACKUP_PATH}/tf/seeneva.tflite --input_size 480x736 --quantize_mode float16OCR model is used to recognize text inside speech balloons.
You should install Tesseract on your system and make sure that your environment can run make commands.
- Run
./yolo_extract_objects.py --class_id 0to crop all speech balloons from YOLO dataset and place them into./yolo/objects/0directory. - Now you need to crop each text line in the cropped speech balloons and save them as separete *.png files in the
./tesseract/${LANG_NAME}_seeneva-ground-truthdirectory. - Create *.gt.txt file for each text line *.png file in the
./tesseract/${LANG_NAME}_seeneva-ground-truth. You can use./tesseract_cteate_txt.pyto automate it.
- Write out a content of each line *.png into *.gt.txt file. Usually all letters should be uppercased. So for image 1.png (example above) you should write
PROGRAMMEDinto 1.gt.txt file. - Run
./tesseract_check_data.pyto check that dataset is fine. - Run
./tesseract_train.shto start training. - (Optionally) Convert into fast (int) format using:
combine_tessdata -c ./tesstrain/data/${LANG_NAME}_seeneva.traineddataCopyright © 2021 Sergei Solodovnikov under the Apache License 2.0.
Note that dependencies may have different license. See 3RD-PARTY-LICENSES for more information.
