Skip to content

Is there a way to detect header and footer coordinates? #1804

Answered by JorjMcKie
ghost asked this question in Q&A
Discussion options

You must be logged in to vote

PDF knows nothing about such things as "header" or "footer".
You have to rely on "outside" knowledge.
But once you have the resp. bboxes, you can redact away those text contents of course.
So either you know beforehand that doc type x has headers/footers at positions y and z, or you must extract the full text first and do whatever heuristics to find and eliminate those parts from the extraction output.
In the latter case you obviously no more need to redact anything.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant
Converted from issue

This discussion was converted from issue #1803 on July 07, 2022 18:26.