Skip to content

Text behind drawing #1775

Jun 25, 2022 · 2 comments · 1 reply
Discussion options

You must be logged in to vote

I am not sure I understand what you actually want:

  • either extract text whether or not visible?
  • or find out whether text is invisible - even if not marked as such by the Tr 3 parameter?

If the first aspect:
This is a no-brainer: text extraction always works if it actually is text (which means: not everything looking like text is text)

The second aspect is a little trickier, because text extraction will not tell you about visibility.
But there are ways to still fnd things out:
page.get_bboxlog() returns a list of rectangles of stuff being shown on a page, together with the type of content wrapped by the rect: text (including whether Tr 3), images or drawings.
The sequence in the list repr…

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by caiocesarrm
Comment options

You must be logged in to vote
1 reply
@JorjMcKie
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants