Skip to content

Latest commit

 

History

History

layout_parsing

Unstructured inference (Layout parsing)

Input

Input is a PDF file which you want to parse layout. sample.pdf is a sample pdf.

Input

(Image from https://github.com/Unstructured-IO/unstructured-inference/blob/main/sample-docs/layout-parser-paper.pdf)

Output

Outputs are images with bounding boxes.

Output

Usage

Automatically downloads the onnx and prototxt files on the first run. It is necessary to be connected to the Internet while downloading.

For sample pdf,

$ python layout_parsing.py --savepath <image folder>

And, if you want to parse layout of image,

$ python lauout_parsing.py --savepath <image folder> --input xxx.jpg

If you want to specify input pdf,

$ python lauout_parsing.py --savepath <image folder> --input xxx.pdf -fp

Reference

Framework

PyTorch

Model format

ONNX opset = 11

Netron