Skip to content

Commit f2ad4ae

Browse files
authored
refactor: layout and model (#56)
Updated so that YoloX uses the same code paths as other models. * Removed code that was duplicating existing functions * Integrated YoloX model into existing pipeline * This means YoloX now has an associated UnstructuredModel called UnstructuredYoloXModel * The output of YoloX inference is now a layoutparser Layout object, like the other UnstructuredModels so that it can integrate with the DocumentLayout, PageLayout, and LayoutElement classes * Other functionality is provided by the DocumentLayout, PageLayout, and LayoutElement classes * Removed jsons dependency that was used in testing and updated deps. The dependency had a cool feature that allowed it to construct non-json-serializable classes from a json, but since that was being used to check content, content could be checked through another method. * One piece of functionality that was in the code prior to this change and isn't now is the ability to create test artifacts that can be viewed for debugging purposes. This can be added back in, but artifacts must be created from the new outputs of the model (either a layoutparser Layout object or PageLayout object). This PR also allows the capability to create a PageLayout with a fixed layout, i.e. the layout and location of text elements is specified at construction rather than inferred by model. This is useful for use cases where the layout of a particular form is known.
1 parent 90875b1 commit f2ad4ae

File tree

17 files changed

+612
-591
lines changed

17 files changed

+612
-591
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,3 +134,6 @@ dmypy.json
134134

135135
# Mac stuff
136136
.DS_Store
137+
138+
# VSCode
139+
.vscode/

CHANGELOG.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
1-
## 0.2.8-dev0
1+
## 0.2.8
22

3+
* Refactored YoloX inference code to integrate better with framework
34
* Improved testing time
45

56
## 0.2.7
@@ -14,9 +15,6 @@
1415
## 0.2.5
1516

1617
* Add YoloX model for images and PDFs
17-
18-
## 0.2.5-dev0
19-
2018
* Add generic model interface
2119

2220
## 0.2.4

pytest.ini

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
[pytest]
22
markers =
3-
slow: marks tests as slow (deselect with '-m "not long"')
3+
slow: marks tests as slow (deselect with '-m "not long"')

requirements/base.txt

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ cycler==0.11.0
2828
# via matplotlib
2929
effdet==0.3.0
3030
# via layoutparser
31-
fastapi==0.89.1
31+
fastapi==0.92.0
3232
# via unstructured-inference (setup.py)
3333
filelock==3.9.0
3434
# via
@@ -40,7 +40,7 @@ fonttools==4.38.0
4040
# via matplotlib
4141
h11==0.14.0
4242
# via uvicorn
43-
huggingface-hub==0.12.0
43+
huggingface-hub==0.12.1
4444
# via
4545
# timm
4646
# transformers
@@ -51,15 +51,15 @@ idna==3.4
5151
# via
5252
# anyio
5353
# requests
54+
importlib-resources==5.12.0
55+
# via matplotlib
5456
iopath==0.1.10
5557
# via layoutparser
56-
jsons==1.6.3
57-
# via unstructured-inference (setup.py)
5858
kiwisolver==1.4.4
5959
# via matplotlib
6060
layoutparser[layoutmodels,tesseract]==0.3.4
6161
# via unstructured-inference (setup.py)
62-
matplotlib==3.6.3
62+
matplotlib==3.7.0
6363
# via pycocotools
6464
mpmath==1.2.1
6565
# via sympy
@@ -96,7 +96,7 @@ pdf2image==1.16.2
9696
# via layoutparser
9797
pdfminer-six==20221105
9898
# via pdfplumber
99-
pdfplumber==0.7.6
99+
pdfplumber==0.8.0
100100
# via layoutparser
101101
pillow==9.4.0
102102
# via
@@ -108,13 +108,13 @@ pillow==9.4.0
108108
# torchvision
109109
portalocker==2.7.0
110110
# via iopath
111-
protobuf==4.21.12
111+
protobuf==4.22.0
112112
# via onnxruntime
113113
pycocotools==2.0.6
114114
# via effdet
115115
pycparser==2.21
116116
# via cffi
117-
pydantic==1.10.4
117+
pydantic==1.10.5
118118
# via fastapi
119119
pyparsing==3.0.9
120120
# via matplotlib
@@ -150,7 +150,7 @@ six==1.16.0
150150
# python-multipart
151151
sniffio==1.3.0
152152
# via anyio
153-
starlette==0.22.0
153+
starlette==0.25.0
154154
# via fastapi
155155
sympy==1.11.1
156156
# via onnxruntime
@@ -174,21 +174,21 @@ tqdm==4.64.1
174174
# huggingface-hub
175175
# iopath
176176
# transformers
177-
transformers==4.26.0
177+
transformers==4.26.1
178178
# via unstructured-inference (setup.py)
179-
typing-extensions==4.4.0
179+
typing-extensions==4.5.0
180180
# via
181181
# huggingface-hub
182182
# iopath
183183
# pydantic
184184
# starlette
185185
# torch
186186
# torchvision
187-
typish==1.9.3
188-
# via jsons
189187
urllib3==1.26.14
190188
# via requests
191189
uvicorn==0.20.0
192190
# via unstructured-inference (setup.py)
193191
wand==0.6.11
194192
# via pdfplumber
193+
zipp==3.14.0
194+
# via importlib-resources

requirements/dev.txt

Lines changed: 34 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@
66
#
77
anyio==3.6.2
88
# via jupyter-server
9+
appnope==0.1.3
10+
# via
11+
# ipykernel
12+
# ipython
913
argon2-cffi==21.3.0
1014
# via
1115
# jupyter-server
@@ -49,7 +53,13 @@ idna==3.4
4953
# via
5054
# anyio
5155
# jsonschema
52-
ipykernel==6.21.1
56+
importlib-metadata==6.0.0
57+
# via
58+
# jupyter-client
59+
# nbconvert
60+
importlib-resources==5.12.0
61+
# via jsonschema
62+
ipykernel==6.21.2
5363
# via
5464
# ipywidgets
5565
# jupyter
@@ -59,7 +69,7 @@ ipykernel==6.21.1
5969
# qtconsole
6070
ipython==8.10.0
6171
# via
62-
# -r dev.in
72+
# -r requirements/dev.in
6373
# ipykernel
6474
# ipywidgets
6575
# jupyter-console
@@ -87,8 +97,8 @@ jsonschema[format-nongpl]==4.17.3
8797
# jupyter-events
8898
# nbformat
8999
jupyter==1.0.0
90-
# via -r dev.in
91-
jupyter-client==8.0.2
100+
# via -r requirements/dev.in
101+
jupyter-client==8.0.3
92102
# via
93103
# ipykernel
94104
# jupyter-console
@@ -97,12 +107,13 @@ jupyter-client==8.0.2
97107
# nbclient
98108
# notebook
99109
# qtconsole
100-
jupyter-console==6.4.4
110+
jupyter-console==6.5.1
101111
# via jupyter
102112
jupyter-core==5.2.0
103113
# via
104114
# ipykernel
105115
# jupyter-client
116+
# jupyter-console
106117
# jupyter-server
107118
# nbclassic
108119
# nbclient
@@ -112,7 +123,7 @@ jupyter-core==5.2.0
112123
# qtconsole
113124
jupyter-events==0.6.3
114125
# via jupyter-server
115-
jupyter-server==2.2.1
126+
jupyter-server==2.3.0
116127
# via
117128
# nbclassic
118129
# notebook-shim
@@ -130,7 +141,7 @@ matplotlib-inline==0.1.6
130141
# via
131142
# ipykernel
132143
# ipython
133-
mistune==2.0.4
144+
mistune==2.0.5
134145
# via nbconvert
135146
nbclassic==0.5.1
136147
# via notebook
@@ -174,8 +185,10 @@ pexpect==4.8.0
174185
pickleshare==0.7.5
175186
# via ipython
176187
pip-tools==6.12.2
177-
# via -r dev.in
178-
platformdirs==2.6.2
188+
# via -r requirements/dev.in
189+
pkgutil-resolve-name==1.3.10
190+
# via jsonschema
191+
platformdirs==3.0.0
179192
# via jupyter-core
180193
prometheus-client==0.16.0
181194
# via
@@ -210,14 +223,15 @@ python-dateutil==2.8.2
210223
# via
211224
# arrow
212225
# jupyter-client
213-
python-json-logger==2.0.4
226+
python-json-logger==2.0.6
214227
# via jupyter-events
215228
pyyaml==6.0
216229
# via jupyter-events
217230
pyzmq==25.0.0
218231
# via
219232
# ipykernel
220233
# jupyter-client
234+
# jupyter-console
221235
# jupyter-server
222236
# nbclassic
223237
# notebook
@@ -247,7 +261,7 @@ six==1.16.0
247261
# rfc3339-validator
248262
sniffio==1.3.0
249263
# via anyio
250-
soupsieve==2.3.2.post1
264+
soupsieve==2.4
251265
# via beautifulsoup4
252266
stack-data==0.6.2
253267
# via ipython
@@ -259,6 +273,10 @@ terminado==0.17.1
259273
# notebook
260274
tinycss2==1.2.1
261275
# via nbconvert
276+
tomli==2.0.1
277+
# via
278+
# build
279+
# pyproject-hooks
262280
tornado==6.2
263281
# via
264282
# ipykernel
@@ -274,6 +292,7 @@ traitlets==5.9.0
274292
# ipython
275293
# ipywidgets
276294
# jupyter-client
295+
# jupyter-console
277296
# jupyter-core
278297
# jupyter-events
279298
# jupyter-server
@@ -300,6 +319,10 @@ wheel==0.38.4
300319
# via pip-tools
301320
widgetsnbextension==4.0.5
302321
# via ipywidgets
322+
zipp==3.14.0
323+
# via
324+
# importlib-metadata
325+
# importlib-resources
303326

304327
# The following packages are considered to be unsafe in a requirements file:
305328
# pip

requirements/test.txt

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66
#
77
anyio==3.6.2
88
# via httpcore
9+
appdirs==1.4.4
10+
# via label-studio-tools
911
attrs==22.2.0
1012
# via pytest
1113
black==23.1.0
@@ -37,7 +39,7 @@ httpcore==0.16.3
3739
# via httpx
3840
httpx==0.23.3
3941
# via -r requirements/test.in
40-
huggingface-hub==0.12.0
42+
huggingface-hub==0.12.1
4143
# via -r requirements/test.in
4244
idna==3.4
4345
# via
@@ -47,15 +49,19 @@ idna==3.4
4749
# yarl
4850
iniconfig==2.0.0
4951
# via pytest
50-
label-studio-sdk==0.0.18
52+
label-studio-sdk==0.0.19
5153
# via -r requirements/test.in
52-
lxml==4.9.2
54+
label-studio-tools==0.0.2
5355
# via label-studio-sdk
56+
lxml==4.9.2
57+
# via
58+
# label-studio-sdk
59+
# label-studio-tools
5460
mccabe==0.7.0
5561
# via flake8
5662
multidict==6.0.4
5763
# via yarl
58-
mypy==0.991
64+
mypy==1.0.1
5965
# via -r requirements/test.in
6066
mypy-extensions==1.0.0
6167
# via
@@ -72,13 +78,13 @@ pdf2image==1.16.2
7278
# via -r requirements/test.in
7379
pillow==9.4.0
7480
# via pdf2image
75-
platformdirs==2.6.2
81+
platformdirs==3.0.0
7682
# via black
7783
pluggy==1.0.0
7884
# via pytest
7985
pycodestyle==2.10.0
8086
# via flake8
81-
pydantic==1.10.4
87+
pydantic==1.10.5
8288
# via label-studio-sdk
8389
pyflakes==3.0.1
8490
# via flake8
@@ -111,7 +117,7 @@ tomli==2.0.1
111117
# pytest
112118
tqdm==4.64.1
113119
# via huggingface-hub
114-
typing-extensions==4.4.0
120+
typing-extensions==4.5.0
115121
# via
116122
# black
117123
# huggingface-hub

setup.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,7 @@
5858
# ref: https://github.com/opencv/opencv-python/issues/772
5959
"opencv-python==4.6.0.66",
6060
"onnxruntime",
61-
"jsons",
62-
"transformers"
61+
"transformers",
6362
],
6463
extras_require={},
6564
)

0 commit comments

Comments
 (0)