Skip to content

Conversation

@vlachvojta
Copy link
Collaborator

@vlachvojta vlachvojta commented Mar 22, 2025

Old behavior

  • words get aligned using CTC logits into word bounding boxes only if trying to export to ALTO XML
  • the alignment is not stored elsewhere

New behavior

  • word alignment = storing words with bounding boxes, transcription (, ...) in core/layout/word elements (list in textline elements)
  • user can enable the word alignment feature with OCR.ALIGN_WORDS = yes in .ini config
    • (it is disabled by default, as word alignment takes some time)
  • aligned words are stored in PAGE XML Word elements with id, polygon, transcription, and transcription confidence
  • TextLine in core/layout stores aligned words in the new class Word + information on how the words got aligned (from_logits or mean_width for error fallback)
  • word polygons are rendered to the image using a thin black line

…ts alignment.

New PAGE XML has both textline transcription and individual words with bounding boxes.

- fallback for impossible alignment is still average width for every word
@vlachvojta vlachvojta self-assigned this Mar 22, 2025
@vlachvojta vlachvojta added the enhancement New feature or request label Mar 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants