Skip to content

How to Diagnose Overfitting and Underfitting of Tesseract Models? #3587

@Mann1904

Description

@Mann1904

i'm using following command for tesseract training
training/lstmtraining --debug_interval 100
--continue_from ~/tesstutorial/eng_from_chi/eng.lstm
--traineddata ~/tesstutorial/engtrain/eng/eng.traineddata
--append_index 5 --net_spec '[Lfx256 O1c111]'
--model_output ~/tesstutorial/eng_from_chi/base
--train_listfile ~/tesstutorial/engtrain/eng.training_files.txt
--eval_listfile /tesstutorial/engeval/eng.validation_files.txt
--max_iterations 100000 &>/tesstutorial/eng_from_chi/basetrain.log

I'm mentioning the eval_listfile but I'm not getting eval char error rate so how can I diagnose overfitting and underfitting of the tesseract model and how tesseract is using this eval_listfile during training?

2 Percent improvement time=37, best error was 14.054 @ 807
At iteration 844/1300/1302, Mean rms=1.441%, delta=3.219%, char train=9.934%, word train=29.023%, skip ratio=0.2%, New best char error = 9.934 Transitioned to stage 1 wrote best model:/home/ocr/tesseract_annotated_images/training_data_phase_1/model/cn_id_in_9.934_844.checkpoint wrote checkpoint.

2 Percent improvement time=74, best error was 14.054 @ 807
At iteration 881/1400/1402, Mean rms=1.307%, delta=2.858%, char train=8.238%, word train=25.18%, skip ratio=0.2%, New best char error = 8.238 wrote best model:/home/ocr/tesseract_annotated_images/training_data_phase_1/model/cn_id_in_8.238_881.checkpoint wrote checkpoint.

2 Percent improvement time=76, best error was 9.934 @ 844
At iteration 920/1500/1502, Mean rms=1.205%, delta=2.464%, char train=6.985%, word train=22.086%, skip ratio=0.2%, New best char error = 6.985 wrote best model:/home/ocr/tesseract_annotated_images/training_data_phase_1/model/cn_id_in_6.985_920.checkpoint wrote checkpoint.

2 Percent improvement time=104, best error was 9.934 @ 844
At iteration 948/1600/1602, Mean rms=1.137%, delta=2.376%, char train=6.547%, word train=20.674%, skip ratio=0.2%, New best char error = 6.547 wrote best model:/home/ocr/tesseract_annotated_images/training_data_phase_1/model/cn_id_in_6.547_948.checkpoint wrote checkpoint.

2 Percent improvement time=86, best error was 8.238 @ 881
At iteration 967/1700/1702, Mean rms=1.038%, delta=1.926%, char train=5.528%, word train=18.608%, skip ratio=0.2%, New best char error = 5.528 wrote best model:/home/ocr/tesseract_annotated_images/training_data_phase_1/model/cn_id_in_5.528_967.checkpoint wrote checkpoint.

2 Percent improvement time=76, best error was 6.985 @ 920
At iteration 996/1800/1802, Mean rms=0.974%, delta=1.84%, char train=4.978%, word train=16.795%, skip ratio=0.2%, New best char error = 4.978 wrote best model:/home/ocr/tesseract_annotated_images/training_data_phase_1/model/cn_id_in_4.978_996.checkpoint wrote checkpoint.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions