Input is a PDF file which you want to parse layout. sample.pdf is a sample pdf.
(Image from https://github.com/Unstructured-IO/unstructured-inference/blob/main/sample-docs/layout-parser-paper.pdf)
Outputs are images with bounding boxes.
Automatically downloads the onnx and prototxt files on the first run. It is necessary to be connected to the Internet while downloading.
For sample pdf,
$ python layout_parsing.py --savepath <image folder>
And, if you want to parse layout of image,
$ python lauout_parsing.py --savepath <image folder> --input xxx.jpg
If you want to specify input pdf,
$ python lauout_parsing.py --savepath <image folder> --input xxx.pdf -fp
PyTorch
ONNX opset = 11