CUTIE

TensorFlow implementation of the paper "CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor." Xiaohui Zhao ArXiv 2019

Results

Result evaluated on 4,484 receipt documents, including taxi receipts, meals entertainment receipts, and hotel receipts, with 9 different key information classes. (AP / softAP)

Method	#Params	Taxi	Hotel
CloudScan	-	82.0 / -	60.0 / -
BERT	110M	88.1 / -	71.7 / -
CUTIE	14M	94.0 / 97.3	74.6 / 87.0

Installation & Usage

pip install -r requirements.txt

Generate your own dictionary with main_build_dict.py / main_data_tokenizer.py
Train your model with main_train_json.py

CUTIE achieves best performance with rows/cols well configured. For more insights, refer to statistics in the file (others/TrainingStatistic.xlsx).

TLDR

For information about the input example, refer to issue discussion.

The project is refreshed with all history removed. All programs are runnable expect that the data example is not uploaded. Since the project was built in my previous workplace, the data format can not be uploaded without permission right now. However, you may infer the correct data format from the data_loader_json.py file. Pull request is welcomed for making the project runnable out of the box.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CUTIE

Results

Installation & Usage

TLDR

Files

README.md

Latest commit

History

README.md

File metadata and controls

CUTIE

Results

Installation & Usage

TLDR