Skip to content

Failed precondition: Python interpreter state is not initialized. #369

@MGJamJam

Description

@MGJamJam

Hello!
When training a model I get the following error:

INFO     2024-11-03 19:13:30,783                         FOLD 0: INFO     2024-11-03 19:13:30,684 calamari_ocr.ocr.training.trai: Training finished
INFO     2024-11-03 19:13:30,884                         FOLD 0: 2024-11-03 19:13:30.833537: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
INFO     2024-11-03 19:13:30,884                         FOLD 0: 	 [[{{node PyFunc}}]]
CRITICAL 2024-11-03 19:13:39,935             tfaip.util.logging: Uncaught exception
Traceback (most recent call last):
  File "/home/fablab/miniconda3/envs/test_gpu/bin/calamari-cross-fold-train", line 8, in <module>
    sys.exit(run())
  File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/site-packages/calamari_ocr/scripts/cross_fold_train.py", line 13, in run
    return main(parse_args())
  File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/site-packages/calamari_ocr/scripts/cross_fold_train.py", line 31, in main
    trainer.run()
  File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/site-packages/calamari_ocr/ocr/training/cross_fold_trainer.py", line 321, in run
    pool.map_async(train_individual_model, run_args).get()
  File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
  File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/site-packages/calamari_ocr/ocr/training/cross_fold_trainer.py", line 53, in train_individual_model
    verbose=run_args.get("verbose", False),
  File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/site-packages/calamari_ocr/utils/multiprocessing.py", line 83, in run
    raise Exception("Error: Process finished with code {}".format(process.returncode))
Exception: Error: Process finished with code -9

Can you help me understand what might cause this error and what its impact is? Is this error even relevant, as it is printed after the Training finished message?

I am using:

  • WSL with Ubuntu 22.04.3 LTS.
  • Python 3.7
  • Tensorflow 2.6.0
  • Cuda 11.2
  • cuDNN 8
  • calamari 2.2.2

The training command I used was:

CUDA_VISIBLE_DEVICES=0 calamari-cross-fold-train \
    --train PageXML \
    --train.images "training_data_senat_reduced/*.png" \
    --temporary_dir calamari_cd_training_output_warmstart_gothic_03_11 \
    --keep_temporary_files True \
    --scenario.tensorboard_logger_history_size 50 \
    --device.gpus 0 \
    --codec.include {string.digits + string.ascii_letters} \
    --best_models_dir "calamari_cf_training_03_11" \
    --weights "calamari_models_experimental/deep3_htr-gothic/0.ckpt.json" \
              "calamari_models_experimental/deep3_htr-gothic/1.ckpt.json" \
              "calamari_models_experimental/deep3_htr-gothic/2.ckpt.json" \
              "calamari_models_experimental/deep3_htr-gothic/3.ckpt.json" \
              "calamari_models_experimental/deep3_htr-gothic/4.ckpt.json" \
   --n_augmentations=5 \
   --network deep3 \
   |& tee output_cf_03_11.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions