Skip to content

fix word/glyph ID's, bound boxes and add output precision option#372

Merged
andbue merged 10 commits intoCalamari-OCR:masterfrom
jahtz:id_pos
May 12, 2025
Merged

fix word/glyph ID's, bound boxes and add output precision option#372
andbue merged 10 commits intoCalamari-OCR:masterfrom
jahtz:id_pos

Conversation

@jahtz
Copy link
Contributor

@jahtz jahtz commented May 12, 2025

This PR improves the generation of word/glyph IDs, updates the CLI option for better output precision and fixes word/glyph positions.

Changes

  • Word ID: Now resets for each line. To ensure uniqueness, the word counter is concatenated with the line ID, starting from 1 in the format: <line_id>_w<word_counter>
  • Glyph ID: Now resets after each word and starts from 1, following the format: <line_id>_w<word_counter>_g<glyph_counter>
  • CLI Option Update:
    • The option --data.output_glyphs is replaced with --data.output_precision.
    • Accepted values: LINES, WORDS, GLYPHS (default: LINES).
  • Fix Glyph/Word bounding boxes @andbue

@codecov
Copy link

codecov bot commented May 12, 2025

Codecov Report

Attention: Patch coverage is 0% with 72 lines in your changes missing coverage. Please review.

Please upload report for BASE (master@7b03ba9). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...amari_ocr/ocr/dataset/datareader/pagexml/reader.py 0.00% 19 Missing ⚠️
...atareader/generated_line_dataset/line_generator.py 0.00% 15 Missing ⚠️
calamari_ocr/ocr/predict/params.py 0.00% 15 Missing ⚠️
calamari_ocr/ocr/voting/confidence_voter.py 0.00% 10 Missing ⚠️
...r/ocr/dataset/imageprocessors/final_preparation.py 0.00% 4 Missing ⚠️
...r/dataset/imageprocessors/data_range_normalizer.py 0.00% 3 Missing ⚠️
calamari_ocr/scripts/predict.py 0.00% 2 Missing ⚠️
calamari_ocr/ocr/model/ctcdecoder/ctc_decoder.py 0.00% 1 Missing ⚠️
calamari_ocr/ocr/model/ensemblemodel.py 0.00% 1 Missing ⚠️
calamari_ocr/ocr/model/model.py 0.00% 1 Missing ⚠️
... and 1 more
Additional details and impacted files
@@           Coverage Diff            @@
##             master    #372   +/-   ##
========================================
  Coverage          ?   0.00%           
========================================
  Files             ?     129           
  Lines             ?    6906           
  Branches          ?       0           
========================================
  Hits              ?       0           
  Misses            ?    6906           
  Partials          ?       0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andbue andbue merged commit 1c1b793 into Calamari-OCR:master May 12, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants