-
In the script I'm working with I need to redact these parts before getting the text from the document because it breaks up the text and I have no way to regex it out because I need this script to work with different documents. Is there a way to find the footer / header coordinates to put them in a Rect and redact? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
PDF knows nothing about such things as "header" or "footer". |
Beta Was this translation helpful? Give feedback.
PDF knows nothing about such things as "header" or "footer".
You have to rely on "outside" knowledge.
But once you have the resp. bboxes, you can redact away those text contents of course.
So either you know beforehand that doc type x has headers/footers at positions y and z, or you must extract the full text first and do whatever heuristics to find and eliminate those parts from the extraction output.
In the latter case you obviously no more need to redact anything.