-
Notifications
You must be signed in to change notification settings - Fork 212
Open
Description
Hello!
When training a model I get the following error:
INFO 2024-11-03 19:13:30,783 FOLD 0: INFO 2024-11-03 19:13:30,684 calamari_ocr.ocr.training.trai: Training finished
INFO 2024-11-03 19:13:30,884 FOLD 0: 2024-11-03 19:13:30.833537: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
INFO 2024-11-03 19:13:30,884 FOLD 0: [[{{node PyFunc}}]]
CRITICAL 2024-11-03 19:13:39,935 tfaip.util.logging: Uncaught exception
Traceback (most recent call last):
File "/home/fablab/miniconda3/envs/test_gpu/bin/calamari-cross-fold-train", line 8, in <module>
sys.exit(run())
File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/site-packages/calamari_ocr/scripts/cross_fold_train.py", line 13, in run
return main(parse_args())
File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/site-packages/calamari_ocr/scripts/cross_fold_train.py", line 31, in main
trainer.run()
File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/site-packages/calamari_ocr/ocr/training/cross_fold_trainer.py", line 321, in run
pool.map_async(train_individual_model, run_args).get()
File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/site-packages/calamari_ocr/ocr/training/cross_fold_trainer.py", line 53, in train_individual_model
verbose=run_args.get("verbose", False),
File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/site-packages/calamari_ocr/utils/multiprocessing.py", line 83, in run
raise Exception("Error: Process finished with code {}".format(process.returncode))
Exception: Error: Process finished with code -9
Can you help me understand what might cause this error and what its impact is? Is this error even relevant, as it is printed after the Training finished message?
I am using:
- WSL with Ubuntu 22.04.3 LTS.
- Python 3.7
- Tensorflow 2.6.0
- Cuda 11.2
- cuDNN 8
- calamari 2.2.2
The training command I used was:
CUDA_VISIBLE_DEVICES=0 calamari-cross-fold-train \
--train PageXML \
--train.images "training_data_senat_reduced/*.png" \
--temporary_dir calamari_cd_training_output_warmstart_gothic_03_11 \
--keep_temporary_files True \
--scenario.tensorboard_logger_history_size 50 \
--device.gpus 0 \
--codec.include {string.digits + string.ascii_letters} \
--best_models_dir "calamari_cf_training_03_11" \
--weights "calamari_models_experimental/deep3_htr-gothic/0.ckpt.json" \
"calamari_models_experimental/deep3_htr-gothic/1.ckpt.json" \
"calamari_models_experimental/deep3_htr-gothic/2.ckpt.json" \
"calamari_models_experimental/deep3_htr-gothic/3.ckpt.json" \
"calamari_models_experimental/deep3_htr-gothic/4.ckpt.json" \
--n_augmentations=5 \
--network deep3 \
|& tee output_cf_03_11.txt
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels