Name		Name	Last commit message	Last commit date
parent directory ..
LICENSE.txt		LICENSE.txt
README.md		README.md
constants.py		constants.py
input.jpg		input.jpg
layout_parsing.py		layout_parsing.py
layout_parsing_utils.py		layout_parsing_utils.py
output.jpg		output.jpg
requirements.txt		requirements.txt
sample.pdf		sample.pdf
yolox.py		yolox.py

README.md

Unstructured inference (Layout parsing)

Input

Input is a PDF file which you want to parse layout. sample.pdf is a sample pdf.

(Image from https://github.com/Unstructured-IO/unstructured-inference/blob/main/sample-docs/layout-parser-paper.pdf)

Output

Outputs are images with bounding boxes.

Usage

Automatically downloads the onnx and prototxt files on the first run. It is necessary to be connected to the Internet while downloading.

For sample pdf,

$ python layout_parsing.py --savepath <image folder>

And, if you want to parse layout of image,

$ python lauout_parsing.py --savepath <image folder> --input xxx.jpg

If you want to specify input pdf,

$ python lauout_parsing.py --savepath <image folder> --input xxx.pdf -fp

Reference

unstructured-inference

Framework

PyTorch

Model format

ONNX opset = 11

Netron

layout_parsing_yolox.onnx.prototxt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

layout_parsing

layout_parsing

README.md

Unstructured inference (Layout parsing)

Input

Output

Usage

Reference

Framework

Model format

Netron

Files

layout_parsing

Directory actions

More options

Directory actions

More options

Latest commit

History

layout_parsing

Folders and files

parent directory

README.md

Unstructured inference (Layout parsing)

Input

Output

Usage

Reference

Framework

Model format

Netron