Skip to content

Extracting underline texts from PDF #1756

Answered by JorjMcKie
jhines84 asked this question in Q&A
Discussion options

You must be logged in to vote

If however we have regular text and lines:

drawn_lines=[...]  # your identified lines
blocks=page.get_text("dict",flags=fitz.TEXTFLAGS_TEXT)["blocks"]
max_lineheight=0
for b in blocks:
    for l in b["lines"]:
        bbox=fitz.Rect(l["bbox"])
        if bbox.height > max_lineheight:
            max_lineheight = bbox.height
# we now have the max lineheight on this page
for p1, p2 in draw_lines:
    rect = fitz.Rect(p1.x, p1.y - max_lineheight, p2.x, p2.y) # the rectangle "above" a drawn line
    text = page.get_textbox(rect)
    print(f"Underlined: '{text}'.")

Replies: 5 comments 20 replies

Comment options

You must be logged in to vote
1 reply
@jhines84
Comment options

Comment options

You must be logged in to vote
5 replies
@jhines84
Comment options

@JorjMcKie
Comment options

@jhines84
Comment options

@JorjMcKie
Comment options

@jhines84
Comment options

Comment options

You must be logged in to vote
6 replies
@JorjMcKie
Comment options

@jhines84
Comment options

@jhines84
Comment options

@JorjMcKie
Comment options

@jhines84
Comment options

Comment options

You must be logged in to vote
5 replies
@jhines84
Comment options

@JorjMcKie
Comment options

@jhines84
Comment options

@JorjMcKie
Comment options

@jhines84
Comment options

Comment options

You must be logged in to vote
3 replies
@jhines84
Comment options

@JorjMcKie
Comment options

@jhines84
Comment options

Answer selected by JorjMcKie
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #1755 on June 16, 2022 17:58.